Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 113 additions & 5 deletions docs/semantic-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,13 @@ All settings are fields on `BasicMemoryConfig` and can be set via environment va
| Config Field | Env Var | Default | Description |
|---|---|---|---|
| `semantic_search_enabled` | `BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED` | Auto (`true` when semantic deps are available) | Enable semantic search. Required before vector/hybrid modes work. |
| `semantic_embedding_provider` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER` | `"fastembed"` | Embedding provider: `"fastembed"` (local) or `"openai"` (API). |
| `semantic_embedding_provider` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER` | `"fastembed"` | Embedding provider: `"fastembed"` (local), `"openai"` (API), or `"litellm"` (multi-provider API). |
| `semantic_embedding_model` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_MODEL` | `"bge-small-en-v1.5"` | Model identifier. Auto-adjusted per provider if left at default. |
| `semantic_embedding_dimensions` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_DIMENSIONS` | Auto-detected | Vector dimensions. 384 for FastEmbed, 1536 for OpenAI. Override only if using a non-default model. |
| `semantic_embedding_batch_size` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_BATCH_SIZE` | `64` | Number of texts to embed per batch. |
| `semantic_embedding_dimensions` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_DIMENSIONS` | Provider default | Vector dimensions. 384 for FastEmbed, 1536 for OpenAI/LiteLLM OpenAI. Required when using a non-default LiteLLM model. |
| `semantic_embedding_forward_dimensions` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_FORWARD_DIMENSIONS` | Auto | LiteLLM-only override for whether configured dimensions are sent as a provider-side output-size request. |
| `semantic_embedding_batch_size` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_BATCH_SIZE` | `2` | Number of texts to embed per batch. |
| `semantic_embedding_document_input_type` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_DOCUMENT_INPUT_TYPE` | Auto for known LiteLLM models | Optional LiteLLM `input_type` for indexed document/passages. |
| `semantic_embedding_query_input_type` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_QUERY_INPUT_TYPE` | Auto for known LiteLLM models | Optional LiteLLM `input_type` for search queries. |
| `semantic_vector_k` | `BASIC_MEMORY_SEMANTIC_VECTOR_K` | `100` | Candidate count for vector nearest-neighbour retrieval. Higher values improve recall at the cost of latency. |

## Embedding Providers
Expand Down Expand Up @@ -135,7 +138,111 @@ export BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
```

When switching from FastEmbed to OpenAI (or vice versa), you must rebuild embeddings since the vector dimensions differ:
### LiteLLM

Uses the LiteLLM SDK to call embedding models from providers such as OpenAI, Cohere, Azure, Bedrock, NVIDIA NIM, and other LiteLLM-supported backends. Requires the provider's API credentials.

```bash
export BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED=true
export BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER=litellm
export BASIC_MEMORY_SEMANTIC_EMBEDDING_MODEL=cohere/embed-english-v3.0
export BASIC_MEMORY_SEMANTIC_EMBEDDING_DIMENSIONS=1024
export COHERE_API_KEY=...
```

Basic Memory creates vector tables before the first embedding call, so non-default LiteLLM models must set `BASIC_MEMORY_SEMANTIC_EMBEDDING_DIMENSIONS`. The LiteLLM OpenAI default (`openai/text-embedding-3-small`) uses 1536 dimensions automatically.

For fixed-size LiteLLM models, dimensions are used as Basic Memory's local vector schema and
validation size. Basic Memory automatically sends dimensions as a provider-side output-size
request for `text-embedding-3` model strings, where LiteLLM/OpenAI support reduced output
dimensions. If an Azure/OpenAI deployment uses an arbitrary LiteLLM model string such as
`azure/<deployment-name>` and the underlying model supports reduced dimensions, set
`BASIC_MEMORY_SEMANTIC_EMBEDDING_FORWARD_DIMENSIONS=true`.

Some retrieval models are asymmetric: indexed passages and search queries must be embedded with different provider parameters. Basic Memory automatically sets LiteLLM `input_type` for known asymmetric model families:

- Cohere v3: documents use `search_document`, queries use `search_query`
- NVIDIA NIM retrieval models: documents use `passage`, queries use `query`

For other asymmetric LiteLLM models, set the input types explicitly:

```bash
export BASIC_MEMORY_SEMANTIC_EMBEDDING_DOCUMENT_INPUT_TYPE=passage
export BASIC_MEMORY_SEMANTIC_EMBEDDING_QUERY_INPUT_TYPE=query
```

#### Live LiteLLM Validation

Provider APIs differ in subtle ways: some accept `dimensions`, some require separate
document/query roles, and some route through deployment aliases that do not reveal the
underlying model name. Before adding or changing LiteLLM model support, run the opt-in live
evaluation harness:

```bash
export OPENAI_API_KEY=sk-...
export COHERE_API_KEY=...
just test-litellm-live
```

The built-in live cases cover:

| Case | Required key | What it validates |
|---|---|---|
| `openai/text-embedding-3-small` | `OPENAI_API_KEY` | Standard LiteLLM OpenAI embedding calls and normalized 1536-dimensional output. |
| `cohere/embed-english-v3.0` | `COHERE_API_KEY` | Cohere v3 asymmetric `search_document` / `search_query` handling and fixed 1024-dimensional output. |

The harness embeds two documents and one query, checks vector dimensions and normalization,
then verifies the authentication query ranks the authentication document above the distractor.
It prints a table with per-model scores, norms, latency, role settings, and dimension-forwarding
mode.

To validate provider aliases or additional LiteLLM backends, save custom JSON cases:

```bash
export AZURE_API_KEY=...
export AZURE_API_BASE=https://example.openai.azure.com
export AZURE_API_VERSION=2024-02-01

cat > /tmp/litellm-azure-cases.json <<'JSON'
[
{
"name": "azure-text-embedding-3-small-512",
"model": "azure/<deployment-name>",
"dimensions": 512,
"api_key_env": "AZURE_API_KEY",
"forward_dimensions": true
}
]
JSON

just test-litellm-live --cases-file /tmp/litellm-azure-cases.json
```

NVIDIA NIM retrieval models can be checked the same way:

```bash
export NVIDIA_NIM_API_KEY=...

cat > /tmp/litellm-nvidia-cases.json <<'JSON'
[
{
"name": "nvidia-embed-qa-4",
"model": "nvidia_nim/nvidia/embed-qa-4",
"dimensions": 1024,
"api_key_env": "NVIDIA_NIM_API_KEY",
"document_input_type": "passage",
"query_input_type": "query"
}
]
JSON

just test-litellm-live --cases-file /tmp/litellm-nvidia-cases.json
```

For repeatable local runs, put the same JSON array in a file and pass
`just test-litellm-live --cases-file path/to/litellm-cases.json`.

When switching providers, models, dimensions, or LiteLLM document/query input types, rebuild embeddings:

```bash
bm reindex --embeddings
Expand Down Expand Up @@ -203,9 +310,10 @@ bm reindex -p my-project

- **Upgrade note**: Migration now performs a one-time automatic embedding backfill on upgrade.
- **Manual enable case**: If you explicitly had `semantic_search_enabled=false` and then turn it on
- **Provider change**: After switching between `fastembed` and `openai`
- **Provider change**: After switching between `fastembed`, `openai`, and `litellm`
- **Model change**: After changing `semantic_embedding_model`
- **Dimension change**: After changing `semantic_embedding_dimensions`
- **LiteLLM role change**: After changing `semantic_embedding_document_input_type` or `semantic_embedding_query_input_type`

The reindex command shows progress with embedded/skipped/error counts:

Expand Down
4 changes: 4 additions & 0 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,10 @@ test-semantic-report:
BASIC_MEMORY_ENV=test BASIC_MEMORY_BENCHMARK_OUTPUT=.benchmarks/semantic-quality.jsonl uv run pytest -p pytest_mock -v -s --no-cov -m semantic test-int/semantic/
uv run python test-int/semantic/report.py .benchmarks/semantic-quality.jsonl

# Run opt-in live LiteLLM provider checks against configured external APIs
test-litellm-live *args:
BASIC_MEMORY_ENV=test BASIC_MEMORY_RUN_LITELLM_INTEGRATION=1 PYTHONPATH=test-int:src uv run python -m semantic.litellm_live_harness {{args}}

# Run semantic benchmarks (Postgres combos only)
test-semantic-postgres:
BASIC_MEMORY_ENV=test uv run pytest -p pytest_mock -v --no-cov -m semantic -k postgres test-int/semantic/
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ dependencies = [
"fastembed>=0.7.4",
"sqlite-vec>=0.1.6",
"openai>=1.100.2",
"litellm>=1.60.0,<2.0.0",
"logfire>=4.19.0",
"psutil>=5.9.0",
]
Expand Down Expand Up @@ -84,6 +85,7 @@ markers = [
"windows: Windows-specific tests (deselect with '-m \"not windows\"')",
"smoke: Fast end-to-end smoke tests for MCP flows",
"semantic: Tests requiring semantic dependencies (fastembed, sqlite-vec, openai)",
"live: Tests that call external provider APIs and require explicit opt-in",
]

[tool.ruff]
Expand Down
28 changes: 27 additions & 1 deletion src/basic_memory/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,19 @@ def __init__(self, **data: Any) -> None: ...
)
semantic_embedding_dimensions: int | None = Field(
default=None,
description="Embedding vector dimensions. Auto-detected from provider if not set (384 for FastEmbed, 1536 for OpenAI).",
description=(
"Embedding vector dimensions. Uses provider defaults when unset "
"(384 for FastEmbed, 1536 for OpenAI and LiteLLM OpenAI default); "
"required for custom LiteLLM models."
),
)
semantic_embedding_forward_dimensions: bool | None = Field(
default=None,
description=(
"LiteLLM-only override for sending semantic_embedding_dimensions as a "
"provider-side output-size request parameter. When unset, Basic Memory "
"auto-detects model strings such as text-embedding-3."
),
)
# Trigger: full local rebuilds spend most of their time waiting behind shared
# embed flushes, not constructing vectors themselves.
Expand All @@ -263,6 +275,20 @@ def __init__(self, **data: Any) -> None: ...
description="Maximum number of concurrent provider requests for batched embedding generation when the active provider supports request-level concurrency.",
gt=0,
)
semantic_embedding_document_input_type: str | None = Field(
default=None,
description=(
"Optional LiteLLM input_type for indexed document/passages. "
"Use with asymmetric embedding models such as Cohere or NVIDIA retrieval models."
),
)
semantic_embedding_query_input_type: str | None = Field(
default=None,
description=(
"Optional LiteLLM input_type for search queries. "
"Use with asymmetric embedding models such as Cohere or NVIDIA retrieval models."
),
)
semantic_embedding_sync_batch_size: int = Field(
default=2,
description="Batch size for vector sync orchestration flushes.",
Expand Down
37 changes: 35 additions & 2 deletions src/basic_memory/repository/embedding_provider_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,11 @@
str,
str,
int | None,
bool | None,
int,
int,
str | None,
str | None,
str,
int | None,
int | None,
Expand Down Expand Up @@ -86,8 +89,11 @@ def _provider_cache_key(app_config: BasicMemoryConfig) -> ProviderCacheKey:
app_config.semantic_embedding_provider.strip().lower(),
app_config.semantic_embedding_model,
app_config.semantic_embedding_dimensions,
app_config.semantic_embedding_forward_dimensions,
app_config.semantic_embedding_batch_size,
app_config.semantic_embedding_request_concurrency,
app_config.semantic_embedding_document_input_type,
app_config.semantic_embedding_query_input_type,
_resolve_cache_dir(app_config),
resolved_threads,
resolved_parallel,
Expand All @@ -103,8 +109,11 @@ def reset_embedding_provider_cache() -> None:
def create_embedding_provider(app_config: BasicMemoryConfig) -> EmbeddingProvider:
"""Create an embedding provider based on semantic config.

When semantic_embedding_dimensions is set in config, it overrides
the provider's default dimensions (384 for FastEmbed, 1536 for OpenAI).
When semantic_embedding_dimensions is set in config, it overrides the
provider's default dimensions (384 for FastEmbed, 1536 for OpenAI and
the LiteLLM OpenAI default). Custom LiteLLM models require an explicit
dimension because the vector table schema is created before the first
embedding response is available.
"""
cache_key = _provider_cache_key(app_config)
with _EMBEDDING_PROVIDER_CACHE_LOCK:
Expand Down Expand Up @@ -151,6 +160,30 @@ def create_embedding_provider(app_config: BasicMemoryConfig) -> EmbeddingProvide
request_concurrency=app_config.semantic_embedding_request_concurrency,
**extra_kwargs,
)
elif provider_name == "litellm":
from basic_memory.repository.litellm_provider import LiteLLMEmbeddingProvider

model_name = app_config.semantic_embedding_model or "openai/text-embedding-3-small"
Comment thread
phernandez marked this conversation as resolved.
if model_name == "bge-small-en-v1.5":
model_name = "openai/text-embedding-3-small"
if (
app_config.semantic_embedding_dimensions is None
and model_name != "openai/text-embedding-3-small"
):
raise ValueError(
"semantic_embedding_dimensions must be set when "
"semantic_embedding_provider='litellm' uses a non-default model. "
f"Configured model: {model_name!r}."
)
provider = LiteLLMEmbeddingProvider(
model_name=model_name,
batch_size=app_config.semantic_embedding_batch_size,
request_concurrency=app_config.semantic_embedding_request_concurrency,
document_input_type=app_config.semantic_embedding_document_input_type,
query_input_type=app_config.semantic_embedding_query_input_type,
forward_dimensions=app_config.semantic_embedding_forward_dimensions,
**extra_kwargs,
)
else:
raise ValueError(f"Unsupported semantic embedding provider: {provider_name}")

Expand Down
Loading
Loading