A small CLI chat agent whose entire long-term memory lives in one
.sqlrite file on disk. Every turn the agent:
- Embeds the user's message.
- Hybrid-searches its memory: top-k vector KNN (HNSW) over past
messages and summaries, plus keyword recall over the structured
factstable. - Injects the recalled context into the system prompt.
- Sends prompt + recent turns to an LLM and prints the reply.
- Writes the new turn back to the same SQLRite database.
Close the process, reopen it days later — your assistant still knows your dog's name. No Postgres, no Pinecone, no Redis. One file.
Why this example? Single-file embedded storage is the right architecture for a local agent, and SQLRite's HNSW vector index + structured SQL gives you semantic and deterministic recall from one store. This is a place where the database genuinely fits the workload, not just a demo.
flowchart LR
User[/"User input"/] --> Embed["Embedder<br/>(hash / OpenAI / local)"]
Embed --> Recall["Memory.recall()"]
Recall -->|vector KNN| Msgs[("messages<br/>HNSW(embedding)")]
Recall -->|vector KNN| Sums[("summaries<br/>HNSW(embedding)")]
Recall -->|keyword| Facts[("facts<br/>SQL")]
Recall --> Prompt["Prompt assembly<br/>(system + recent turns)"]
Prompt --> LLM["LLM<br/>(Anthropic / echo)"]
LLM --> Reply[/"Assistant reply"/]
Reply --> Writeback["Memory.log_message()"]
Writeback --> Msgs
User -.-> Writeback
Three tables, one file:
| Table | Purpose | Indexed how |
|---|---|---|
messages |
Every user / assistant turn, plus its 384-dim embedding. | HNSW on embedding |
summaries |
Periodic rollups for old context that's too long to inline. | HNSW on embedding |
facts |
Structured (subject, predicate, object) triples extracted from user turns. |
Plain SQL — keyword recall |
# 1. Clone the rust_sqlite repo (this example ships inside it).
git clone https://github.com/joaoh82/rust_sqlite
cd rust_sqlite/examples/python-agent
# 2. Create a virtualenv and install the example.
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
# 3. (Optional) Install an LLM provider extra. Without one you get
# the offline "echo" agent — the recall pipeline still runs.
pip install 'sqlrite-agent[anthropic]' # default LLM
# pip install 'sqlrite-agent[openai]' # use OpenAI embeddings too
# pip install 'sqlrite-agent[local-embeddings]' # 384-dim sentence-transformerThe sqlrite Python wheel comes from PyPI automatically (pinned to the
0.10.x release that introduced VECTOR(N) + HNSW indexes).
# Zero config — runs against a fresh in-memory hash embedder and the
# offline echo "LLM". You see the recall pipeline work end-to-end
# without an API key.
python -m sqlrite_agent
# With Anthropic — set ANTHROPIC_API_KEY and run.
export ANTHROPIC_API_KEY=sk-ant-...
python -m sqlrite_agent
# Pick where the DB lives (default: ~/.sqlrite-agent.sqlrite).
python -m sqlrite_agent --db ./my-agent.sqlrite
# Multiple parallel conversations in one DB.
python -m sqlrite_agent --conversation work
python -m sqlrite_agent --conversation personal
# Force a specific embedder.
python -m sqlrite_agent --embedder local # sentence-transformers
python -m sqlrite_agent --embedder openai # text-embedding-3-smallWhile the REPL is running, anything starting with / is a command:
| Command | What it does |
|---|---|
/help |
Show all commands. |
/stats |
Counts of messages, summaries, facts. |
/facts |
List every extracted fact. |
/recent |
Last 10 turns in chronological order. |
/recall <query> |
Show what would be recalled for a query, without replying. |
/summarize |
Summarize the last 20 turns into a single summaries row. |
/quit |
Exit. Ctrl-D also works. |
Run this top-to-bottom to see persistent memory survive a process
restart. Uses the zero-key default (hash embedder + echo chat).
# Session 1 — drop some facts, then quit.
$ python -m sqlrite_agent --db agent.sqlrite
sqlrite-agent 0.1.0 — db=agent.sqlrite, ...
loaded memory: 0 messages, 0 summaries, 0 facts.
you> My dog's name is Mochi.
agent> [echo agent ...]
you> Mochi loves carrots more than treats.
agent> [echo agent ...]
you> I live in Lisbon, Portugal.
agent> [echo agent ...]
you> /facts
user.dog.name = Mochi
user.location = Lisbon, Portugal
you> /quit
# Session 2 — fresh process, same DB.
$ python -m sqlrite_agent --db agent.sqlrite
sqlrite-agent 0.1.0 — db=agent.sqlrite, ...
loaded memory: 6 messages, 0 summaries, 2 facts.
you> What's my dog's name?
[recalled: 1 facts, 0 summaries, 4 messages]
agent> [echo agent ... — but the recall block above includes
user.dog.name = Mochi]With ANTHROPIC_API_KEY set, the second turn answers "Mochi" instead
of the canned echo because the LLM sees the recalled fact in its
system prompt.
The memory file is plain SQLRite — open it from anywhere:
$ cargo install sqlrite-engine # or grab a binary from GitHub Releases
$ sqlrite agent.sqlrite
SQLRite v0.10.0
sqlrite> SELECT role, content FROM messages ORDER BY id LIMIT 5;
sqlrite> SELECT subject, predicate, object FROM facts;
sqlrite> SELECT id, content
...> FROM messages
...> ORDER BY vec_distance_cosine(embedding, (SELECT embedding FROM messages WHERE id = 1))
...> LIMIT 3;This is the demo's whole point: the agent's memory is just SQL. You can query it, back it up, copy it between machines, or load it into the Node / Go / WASM SDK without converting anything.
Memory.recall(query) runs three searches in parallel and merges the
results. Pseudocode:
embedding = embedder.embed(query)
keywords = query_keywords(query) # filtered to content words
vector_messages = SELECT ... FROM messages
ORDER BY vec_distance_cosine(embedding, ?)
LIMIT k_messages
vector_summaries = SELECT ... FROM summaries
ORDER BY vec_distance_cosine(embedding, ?)
LIMIT k_summaries
lexical_messages = SELECT ... FROM messages
WHERE fts_match(content, ?)
ORDER BY bm25_score(content, ?) DESC
LIMIT k_messages -- Phase 8, BM25 over the inverted index
facts = SELECT * FROM facts
WHERE subject LIKE ...
OR predicate LIKE ...
OR object LIKE ...
LIMIT k_factsThe vector and lexical message lists are merged in Python (dedupe by
id, vector ranking primary) — that's the simplest correct shape for
hybrid retrieval: vector finds conceptual neighbors even with zero
lexical overlap, and BM25 surfaces exact-term matches the vector
embedding might rank too low. See docs/fts.md
for the BM25 surface and examples/hybrid-retrieval/
for an example that fuses both into a single ORDER BY arithmetic.
| Provider | Dependencies | API key | Real semantic recall | First-run friction |
|---|---|---|---|---|
hash (default) |
None — stdlib only | No | Bag-of-words approximation only. Good enough for the demo, mediocre for real RAG. | Zero. |
openai |
openai package |
OPENAI_API_KEY |
Excellent. text-embedding-3-small constrained to 384 dims. |
~30s install. |
local |
sentence-transformers (~500 MB with torch) |
No | Excellent. all-MiniLM-L6-v2, fully offline. |
~5 min install. |
Swap with --embedder hash | openai | local. The dimension is fixed
at 384 to match VECTOR(384) in the schema; if you need a different
dim, change DEFAULT_DIM in sqlrite_agent/db.py and start with a
fresh DB.
This is an example, not a production agent. Things v1 punts on:
- Memory eviction. No automatic rolling-window or summarize-and-evict
loop yet — run
/summarizemanually when the conversation grows unwieldy. - Fact extraction. Six hand-written regex patterns. A real agent
would call the LLM to extract facts so it catches phrasings the
regex misses. Easy upgrade: wrap an LLM call in
facts.extract_facts. - Single-query hybrid. The agent merges vector hits and BM25 hits
in Python. The engine also supports a single SQL query that fuses
both into one
ORDER BYarithmetic (0.5 * bm25_score(...) + 0.5 * (1.0 - vec_distance_cosine(...))) — seeexamples/hybrid-retrieval/. The merge approach handles conceptual queries with no token overlap; the single-query approach is tighter when you always want BM25 to pre-filter. Pick per workload. - Concurrency. The agent assumes single-user single-process. The
engine supports concurrent reads + a single writer via fs2 advisory
locks; running two
sqlrite-agentprocesses against the same DB works, but they won't see each other's in-flight writes until commit.
# Tests run fully offline with the hash embedder and echo chat — no
# API keys, no network.
pip install -e '.[dev]'
pytestexamples/python-agent/
├── pyproject.toml # package metadata + pinned sqlrite dep
├── README.md # this file
├── sqlrite_agent/
│ ├── __init__.py
│ ├── __main__.py # python -m sqlrite_agent → cli.main()
│ ├── agent.py # turn loop, prompt assembly, summarization
│ ├── chat.py # LLM provider abstraction (Anthropic / Echo)
│ ├── cli.py # interactive REPL + slash commands
│ ├── db.py # schema, migrations, all SQL
│ ├── embeddings.py # Embedder abstraction (hash / OpenAI / local)
│ ├── facts.py # regex-based fact extractor
│ ├── memory.py # hybrid recall over messages + summaries + facts
│ └── sqlutil.py # safe SQL-literal inlining
└── tests/ # offline pytest suite (31 tests)
The agent binds only to the SQLRite Python SDK's documented public
surface (sqlrite.connect, Connection, Cursor). It does not reach
into internals.
MIT — same as the rest of the rust_sqlite repo.