Skip to content

docs(core): mark LiteLLM provider experimental#899

Merged
phernandez merged 1 commit into
mainfrom
docs/litellm-experimental
Jun 8, 2026
Merged

docs(core): mark LiteLLM provider experimental#899
phernandez merged 1 commit into
mainfrom
docs/litellm-experimental

Conversation

@phernandez

Copy link
Copy Markdown
Member

Summary

Marks the LiteLLM embedding provider (#809) as experimental / advanced-users-only in the docs, and documents two operational findings surfaced during a live OpenAI + Cohere smoke test against a real notes corpus.

What changed

docs/litellm-provider.md

  • Experimental banner at the top — calls out paid/networked API calls, per-model dimension and input-role configuration, and slow reindexing; points to local FastEmbed as the recommended default.
  • New Reindexing with a remote provider section:
    • semantic_embedding_sync_batch_size (default 2), not semantic_embedding_batch_size, is the throughput lever — raise both before reindexing a real corpus, or a full reindex runs tens of seconds per note.
    • Switching embedding dimensions requires recreating the vector table via bm reset --reindex (a plain bm reindex raises Embedding dimension mismatch); BASIC_MEMORY_CONFIG_DIR lets you trial a provider in isolation.

docs/semantic-search.md

  • Experimental callout in the LiteLLM section.
  • Inline (experimental — advanced users only) in the provider config reference table.

Validation

Findings are grounded in a live smoke test (isolated BASIC_MEMORY_CONFIG_DIR sandboxes, real keys):

  • OpenAI text-embedding-3-smallfloat[1536] table, hybrid search returned the exact-right specs at score 1.0, vector scores non-collapsed.
  • Cohere embed-english-v3.0float[1024] table, asymmetric search_document/search_query input types auto-applied (confirmed in runtime log), 16/16 entities embedded, hybrid + vector sane.

Docs-only change; no code touched.

🤖 Generated with Claude Code

…exing

Add an experimental / advanced-users-only banner to the LiteLLM provider page
and the semantic-search provider reference. Capture two operational findings
from live OpenAI + Cohere smoke testing:

- semantic_embedding_sync_batch_size (default 2) — not batch_size — is the
  throughput lever for remote providers; raise both before reindexing a real
  corpus, or a full reindex runs tens of seconds per note.
- switching embedding dimensions requires recreating the vector table via
  `bm reset --reindex` (a plain reindex raises "Embedding dimension mismatch");
  use BASIC_MEMORY_CONFIG_DIR to trial a provider without touching the main index.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude finished @phernandez's task in 1m 18s —— View job


PR Review: docs(core): mark LiteLLM provider experimental

Tasks:

  • Gather context and read changed files
  • Review against team checklist
  • Post final review

Checklist Results

Code Quality & Standards — N/A (docs-only PR)

Testing & Documentation

Security & Performance

  • No secrets or credentials embedded
  • Performance guidance (batch size tuning) is concrete and correct

Compatibility — N/A (docs-only)


Review

This is a clean, well-scoped docs PR. The content is grounded, the warnings are appropriately prominent, and the two operational findings (sync batch size vs embed batch size, dimension mismatch requiring bm reset --reindex) are genuinely useful and not obvious from the code alone.

Positive observations:

  • The experimental banner is placed correctly at the top of litellm-provider.md — users can't miss it before reaching any Quick Start section.
  • The distinction between semantic_embedding_sync_batch_size (sync pipeline throughput) vs semantic_embedding_batch_size (per-request chunks) is subtle but important. The prose calls it out clearly.
  • The BASIC_MEMORY_CONFIG_DIR=/tmp/bm-litellm-trial isolation tip is a nice touch for users evaluating providers without risking their real index.
  • The bm reset --reindex vs bm reindex distinction for dimension changes is exactly the kind of gotcha that prevents user frustration.

Minor issues / suggestions:

  1. semantic-search.md config table is missing two entries (semantic_embedding_sync_batch_size, semantic_embedding_request_concurrency) — the new reindexing section in litellm-provider.md correctly points at sync_batch_size, but users reading semantic-search.md's config reference won't see that lever listed. This is a pre-existing gap the PR could have opportunistically closed, but it's not a blocker. Fix this →

  2. BASIC_MEMORY_CONFIG_DIR isolation tip (docs/litellm-provider.md:143-148): The snippet only shows the export, not what comes next (configure provider env vars + bm reindex). A one-liner comment like # then set your provider vars and run: bm reindex --embeddings would complete the mental model for readers skimming code blocks. Very minor.

  3. Example values (32/64) are in a bash block without qualification (docs/litellm-provider.md:124-127): These look like canonical recommendations. A brief note like # adjust to stay within your provider's rate limits inline would clarify they're starting points, not magic numbers. The prose already mentions provider limits once, so this is a style nit only.


Summary

Docs are accurate, well-structured, and address real operational friction. The checklist items that apply are all satisfied. The minor gaps noted above are non-blocking and can be addressed in follow-ups or in this PR if the author prefers.

Verdict: ✅ Approved — ready to merge as-is.

@phernandez phernandez merged commit 07cb7a6 into main Jun 8, 2026
25 checks passed
@phernandez phernandez deleted the docs/litellm-experimental branch June 8, 2026 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant