diff --git a/docs/litellm-provider.md b/docs/litellm-provider.md index 4625a80ed..e1164b3a0 100644 --- a/docs/litellm-provider.md +++ b/docs/litellm-provider.md @@ -8,6 +8,14 @@ Cohere, Bedrock, NVIDIA NIM, and other LiteLLM-supported embedding providers. Use this page when you want to try a non-default embedding model, validate a provider, or tune LiteLLM-specific settings. +> **Experimental — advanced users only.** The LiteLLM provider is experimental and +> intended for users who are comfortable operating remote embedding backends. It makes +> paid, networked API calls, requires per-model dimension and input-role configuration, +> and reindexing a real corpus can be slow and spend provider quota (see +> [Reindexing with a remote provider](#reindexing-with-a-remote-provider)). For most +> users, the default local **FastEmbed** provider is the recommended choice. Use LiteLLM +> only if you know what you're doing. + ## Quick Start The default LiteLLM model is OpenAI `text-embedding-3-small` through the LiteLLM @@ -101,6 +109,44 @@ those changes: bm reindex --embeddings ``` +## Reindexing with a remote provider + +Embedding a real corpus through a network API is far slower than local FastEmbed, and +the defaults are tuned for the local case. Two things to know before you run a full +reindex. + +**Raise the sync batch size.** `semantic_embedding_sync_batch_size` defaults to `2`, and +it — not `semantic_embedding_batch_size` — governs throughput on the sync pipeline. With +the default, a full reindex can take tens of seconds *per note* against a remote provider. +Raising both to a larger value turns a multi-minute (or longer) reindex into well under a +minute for the same corpus: + +```bash +export BASIC_MEMORY_SEMANTIC_EMBEDDING_SYNC_BATCH_SIZE=32 +export BASIC_MEMORY_SEMANTIC_EMBEDDING_BATCH_SIZE=64 +``` + +Stay within the provider's per-request size and rate limits — Cohere v3, for example, +accepts up to 96 inputs per embedding request. + +**Changing dimensions requires recreating the vector table.** Basic Memory dimensions the +vector table on first index and refuses to mix sizes. Switching to a model with a +different dimension (for example FastEmbed 384 → OpenAI 1536 → Cohere 1024) makes a plain +`bm reindex` raise an `Embedding dimension mismatch` error. Recreate the table with a full +rebuild — files are the source of truth, so this re-indexes from disk and re-embeds +everything: + +```bash +bm reset --reindex +``` + +To trial a provider without disturbing your existing index, point Basic Memory at a +throwaway config + database instead: + +```bash +export BASIC_MEMORY_CONFIG_DIR=/tmp/bm-litellm-trial +``` + ## Provider Setup Examples LiteLLM reads provider credentials from the environment. These are the examples diff --git a/docs/semantic-search.md b/docs/semantic-search.md index 30fb09d7d..e5665c226 100644 --- a/docs/semantic-search.md +++ b/docs/semantic-search.md @@ -99,7 +99,7 @@ All settings are fields on `BasicMemoryConfig` and can be set via environment va | Config Field | Env Var | Default | Description | |---|---|---|---| | `semantic_search_enabled` | `BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED` | Auto (`true` when semantic deps are available) | Enable semantic search. Required before vector/hybrid modes work. | -| `semantic_embedding_provider` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER` | `"fastembed"` | Embedding provider: `"fastembed"` (local), `"openai"` (API), or `"litellm"` (multi-provider API). | +| `semantic_embedding_provider` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER` | `"fastembed"` | Embedding provider: `"fastembed"` (local), `"openai"` (API), or `"litellm"` (multi-provider API, **experimental** — advanced users only). | | `semantic_embedding_model` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_MODEL` | `"bge-small-en-v1.5"` | Model identifier. Auto-adjusted per provider if left at default. | | `semantic_embedding_dimensions` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_DIMENSIONS` | Provider default | Vector dimensions. 384 for FastEmbed, 1536 for OpenAI/LiteLLM OpenAI. Required when using a non-default LiteLLM model. | | `semantic_embedding_forward_dimensions` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_FORWARD_DIMENSIONS` | Auto | LiteLLM-only override for whether configured dimensions are sent as a provider-side output-size request. | @@ -140,6 +140,8 @@ export OPENAI_API_KEY=sk-... ### LiteLLM +> **Experimental — advanced users only.** The LiteLLM provider is experimental and aimed at users comfortable operating remote embedding backends: paid API calls, per-model dimension and input-role configuration, and slower reindexing of large corpora. For most users, FastEmbed (local, default) is recommended. See [LiteLLM Provider](litellm-provider.md) for the caveats and tuning. + Uses the LiteLLM SDK to call embedding models from providers such as OpenAI, Cohere, Azure, Bedrock, NVIDIA NIM, and other LiteLLM-supported backends. Requires the provider's API credentials. For the full option reference, provider setup examples, and live validation harness, see [LiteLLM Provider](litellm-provider.md).