Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions docs/litellm-provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ Cohere, Bedrock, NVIDIA NIM, and other LiteLLM-supported embedding providers.
Use this page when you want to try a non-default embedding model, validate a provider,
or tune LiteLLM-specific settings.

> **Experimental — advanced users only.** The LiteLLM provider is experimental and
> intended for users who are comfortable operating remote embedding backends. It makes
> paid, networked API calls, requires per-model dimension and input-role configuration,
> and reindexing a real corpus can be slow and spend provider quota (see
> [Reindexing with a remote provider](#reindexing-with-a-remote-provider)). For most
> users, the default local **FastEmbed** provider is the recommended choice. Use LiteLLM
> only if you know what you're doing.

## Quick Start

The default LiteLLM model is OpenAI `text-embedding-3-small` through the LiteLLM
Expand Down Expand Up @@ -101,6 +109,44 @@ those changes:
bm reindex --embeddings
```

## Reindexing with a remote provider

Embedding a real corpus through a network API is far slower than local FastEmbed, and
the defaults are tuned for the local case. Two things to know before you run a full
reindex.

**Raise the sync batch size.** `semantic_embedding_sync_batch_size` defaults to `2`, and
it — not `semantic_embedding_batch_size` — governs throughput on the sync pipeline. With
the default, a full reindex can take tens of seconds *per note* against a remote provider.
Raising both to a larger value turns a multi-minute (or longer) reindex into well under a
minute for the same corpus:

```bash
export BASIC_MEMORY_SEMANTIC_EMBEDDING_SYNC_BATCH_SIZE=32
export BASIC_MEMORY_SEMANTIC_EMBEDDING_BATCH_SIZE=64
```

Stay within the provider's per-request size and rate limits — Cohere v3, for example,
accepts up to 96 inputs per embedding request.

**Changing dimensions requires recreating the vector table.** Basic Memory dimensions the
vector table on first index and refuses to mix sizes. Switching to a model with a
different dimension (for example FastEmbed 384 → OpenAI 1536 → Cohere 1024) makes a plain
`bm reindex` raise an `Embedding dimension mismatch` error. Recreate the table with a full
rebuild — files are the source of truth, so this re-indexes from disk and re-embeds
everything:

```bash
bm reset --reindex
```

To trial a provider without disturbing your existing index, point Basic Memory at a
throwaway config + database instead:

```bash
export BASIC_MEMORY_CONFIG_DIR=/tmp/bm-litellm-trial
```

## Provider Setup Examples

LiteLLM reads provider credentials from the environment. These are the examples
Expand Down
4 changes: 3 additions & 1 deletion docs/semantic-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ All settings are fields on `BasicMemoryConfig` and can be set via environment va
| Config Field | Env Var | Default | Description |
|---|---|---|---|
| `semantic_search_enabled` | `BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED` | Auto (`true` when semantic deps are available) | Enable semantic search. Required before vector/hybrid modes work. |
| `semantic_embedding_provider` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER` | `"fastembed"` | Embedding provider: `"fastembed"` (local), `"openai"` (API), or `"litellm"` (multi-provider API). |
| `semantic_embedding_provider` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER` | `"fastembed"` | Embedding provider: `"fastembed"` (local), `"openai"` (API), or `"litellm"` (multi-provider API, **experimental** — advanced users only). |
| `semantic_embedding_model` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_MODEL` | `"bge-small-en-v1.5"` | Model identifier. Auto-adjusted per provider if left at default. |
| `semantic_embedding_dimensions` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_DIMENSIONS` | Provider default | Vector dimensions. 384 for FastEmbed, 1536 for OpenAI/LiteLLM OpenAI. Required when using a non-default LiteLLM model. |
| `semantic_embedding_forward_dimensions` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_FORWARD_DIMENSIONS` | Auto | LiteLLM-only override for whether configured dimensions are sent as a provider-side output-size request. |
Expand Down Expand Up @@ -140,6 +140,8 @@ export OPENAI_API_KEY=sk-...

### LiteLLM

> **Experimental — advanced users only.** The LiteLLM provider is experimental and aimed at users comfortable operating remote embedding backends: paid API calls, per-model dimension and input-role configuration, and slower reindexing of large corpora. For most users, FastEmbed (local, default) is recommended. See [LiteLLM Provider](litellm-provider.md) for the caveats and tuning.

Uses the LiteLLM SDK to call embedding models from providers such as OpenAI, Cohere, Azure, Bedrock, NVIDIA NIM, and other LiteLLM-supported backends. Requires the provider's API credentials.
For the full option reference, provider setup examples, and live validation harness, see [LiteLLM Provider](litellm-provider.md).

Expand Down
Loading