Name	Name	Last commit message	Last commit date
parent directory ..
sqlrite_agent	sqlrite_agent
tests	tests
.gitignore	.gitignore
README.md	README.md
pyproject.toml	pyproject.toml

sqlrite-agent — a Python LLM agent with persistent memory in SQLRite

A small CLI chat agent whose entire long-term memory lives in one .sqlrite file on disk. Every turn the agent:

Embeds the user's message.
Hybrid-searches its memory: top-k vector KNN (HNSW) over past messages and summaries, plus keyword recall over the structured facts table.
Injects the recalled context into the system prompt.
Sends prompt + recent turns to an LLM and prints the reply.
Writes the new turn back to the same SQLRite database.

Close the process, reopen it days later — your assistant still knows your dog's name. No Postgres, no Pinecone, no Redis. One file.

Why this example? Single-file embedded storage is the right architecture for a local agent, and SQLRite's HNSW vector index + structured SQL gives you semantic and deterministic recall from one store. This is a place where the database genuinely fits the workload, not just a demo.

Architecture

flowchart LR
    User[/"User input"/] --> Embed["Embedder<br/>(hash / OpenAI / local)"]
    Embed --> Recall["Memory.recall()"]
    Recall -->|vector KNN| Msgs[("messages<br/>HNSW(embedding)")]
    Recall -->|vector KNN| Sums[("summaries<br/>HNSW(embedding)")]
    Recall -->|keyword| Facts[("facts<br/>SQL")]
    Recall --> Prompt["Prompt assembly<br/>(system + recent turns)"]
    Prompt --> LLM["LLM<br/>(Anthropic / echo)"]
    LLM --> Reply[/"Assistant reply"/]
    Reply --> Writeback["Memory.log_message()"]
    Writeback --> Msgs
    User -.-> Writeback

Three tables, one file:

Table	Purpose	Indexed how
`messages`	Every user / assistant turn, plus its 384-dim embedding.	HNSW on `embedding`
`summaries`	Periodic rollups for old context that's too long to inline.	HNSW on `embedding`
`facts`	Structured `(subject, predicate, object)` triples extracted from user turns.	Plain SQL — keyword recall

Install

# 1. Clone the rust_sqlite repo (this example ships inside it).
git clone https://github.com/joaoh82/rust_sqlite
cd rust_sqlite/examples/python-agent

# 2. Create a virtualenv and install the example.
python3 -m venv .venv && source .venv/bin/activate
pip install -e .

# 3. (Optional) Install an LLM provider extra. Without one you get
#    the offline "echo" agent — the recall pipeline still runs.
pip install 'sqlrite-agent[anthropic]'         # default LLM
# pip install 'sqlrite-agent[openai]'           # use OpenAI embeddings too
# pip install 'sqlrite-agent[local-embeddings]' # 384-dim sentence-transformer

The sqlrite Python wheel comes from PyPI automatically (pinned to the 0.10.x release that introduced VECTOR(N) + HNSW indexes).

Run

# Zero config — runs against a fresh in-memory hash embedder and the
# offline echo "LLM". You see the recall pipeline work end-to-end
# without an API key.
python -m sqlrite_agent

# With Anthropic — set ANTHROPIC_API_KEY and run.
export ANTHROPIC_API_KEY=sk-ant-...
python -m sqlrite_agent

# Pick where the DB lives (default: ~/.sqlrite-agent.sqlrite).
python -m sqlrite_agent --db ./my-agent.sqlrite

# Multiple parallel conversations in one DB.
python -m sqlrite_agent --conversation work
python -m sqlrite_agent --conversation personal

# Force a specific embedder.
python -m sqlrite_agent --embedder local       # sentence-transformers
python -m sqlrite_agent --embedder openai      # text-embedding-3-small

CLI commands

While the REPL is running, anything starting with / is a command:

Command	What it does
`/help`	Show all commands.
`/stats`	Counts of messages, summaries, facts.
`/facts`	List every extracted fact.
`/recent`	Last 10 turns in chronological order.
`/recall <query>`	Show what would be recalled for a query, without replying.
`/summarize`	Summarize the last 20 turns into a single `summaries` row.
`/quit`	Exit. `Ctrl-D` also works.

60-second demo script

Run this top-to-bottom to see persistent memory survive a process restart. Uses the zero-key default (hash embedder + echo chat).

# Session 1 — drop some facts, then quit.
$ python -m sqlrite_agent --db agent.sqlrite
sqlrite-agent 0.1.0 — db=agent.sqlrite, ...
  loaded memory: 0 messages, 0 summaries, 0 facts.

you> My dog's name is Mochi.
agent> [echo agent ...]

you> Mochi loves carrots more than treats.
agent> [echo agent ...]

you> I live in Lisbon, Portugal.
agent> [echo agent ...]

you> /facts
  user.dog.name = Mochi
  user.location = Lisbon, Portugal

you> /quit

# Session 2 — fresh process, same DB.
$ python -m sqlrite_agent --db agent.sqlrite
sqlrite-agent 0.1.0 — db=agent.sqlrite, ...
  loaded memory: 6 messages, 0 summaries, 2 facts.

you> What's my dog's name?
  [recalled: 1 facts, 0 summaries, 4 messages]
agent> [echo agent ... — but the recall block above includes
                          user.dog.name = Mochi]

With ANTHROPIC_API_KEY set, the second turn answers "Mochi" instead of the canned echo because the LLM sees the recalled fact in its system prompt.

Open the DB yourself with the SQLRite REPL

The memory file is plain SQLRite — open it from anywhere:

$ cargo install sqlrite-engine   # or grab a binary from GitHub Releases
$ sqlrite agent.sqlrite
SQLRite v0.10.0
sqlrite> SELECT role, content FROM messages ORDER BY id LIMIT 5;
sqlrite> SELECT subject, predicate, object FROM facts;
sqlrite> SELECT id, content
   ...>   FROM messages
   ...>   ORDER BY vec_distance_cosine(embedding, (SELECT embedding FROM messages WHERE id = 1))
   ...>   LIMIT 3;

This is the demo's whole point: the agent's memory is just SQL. You can query it, back it up, copy it between machines, or load it into the Node / Go / WASM SDK without converting anything.

How recall works

Memory.recall(query) runs three searches in parallel and merges the results. Pseudocode:

embedding = embedder.embed(query)
keywords  = query_keywords(query)        # filtered to content words

vector_messages   = SELECT ... FROM messages
                     ORDER BY vec_distance_cosine(embedding, ?)
                     LIMIT k_messages

vector_summaries  = SELECT ... FROM summaries
                     ORDER BY vec_distance_cosine(embedding, ?)
                     LIMIT k_summaries

lexical_messages  = SELECT ... FROM messages
                     WHERE fts_match(content, ?)
                     ORDER BY bm25_score(content, ?) DESC
                     LIMIT k_messages           -- Phase 8, BM25 over the inverted index

facts             = SELECT * FROM facts
                     WHERE subject LIKE ...
                        OR predicate LIKE ...
                        OR object LIKE ...
                     LIMIT k_facts

The vector and lexical message lists are merged in Python (dedupe by id, vector ranking primary) — that's the simplest correct shape for hybrid retrieval: vector finds conceptual neighbors even with zero lexical overlap, and BM25 surfaces exact-term matches the vector embedding might rank too low. See docs/fts.md for the BM25 surface and examples/hybrid-retrieval/ for an example that fuses both into a single ORDER BY arithmetic.

Embedding-provider tradeoffs

Provider	Dependencies	API key	Real semantic recall	First-run friction
`hash` (default)	None — stdlib only	No	Bag-of-words approximation only. Good enough for the demo, mediocre for real RAG.	Zero.
`openai`	`openai` package	`OPENAI_API_KEY`	Excellent. `text-embedding-3-small` constrained to 384 dims.	~30s install.
`local`	`sentence-transformers` (~500 MB with torch)	No	Excellent. `all-MiniLM-L6-v2`, fully offline.	~5 min install.

Swap with --embedder hash | openai | local. The dimension is fixed at 384 to match VECTOR(384) in the schema; if you need a different dim, change DEFAULT_DIM in sqlrite_agent/db.py and start with a fresh DB.

Known simplifications

This is an example, not a production agent. Things v1 punts on:

Memory eviction. No automatic rolling-window or summarize-and-evict loop yet — run /summarize manually when the conversation grows unwieldy.
Fact extraction. Six hand-written regex patterns. A real agent would call the LLM to extract facts so it catches phrasings the regex misses. Easy upgrade: wrap an LLM call in facts.extract_facts.
Single-query hybrid. The agent merges vector hits and BM25 hits in Python. The engine also supports a single SQL query that fuses both into one ORDER BY arithmetic (0.5 * bm25_score(...) + 0.5 * (1.0 - vec_distance_cosine(...))) — see examples/hybrid-retrieval/. The merge approach handles conceptual queries with no token overlap; the single-query approach is tighter when you always want BM25 to pre-filter. Pick per workload.
Concurrency. The agent assumes single-user single-process. The engine supports concurrent reads + a single writer via fs2 advisory locks; running two sqlrite-agent processes against the same DB works, but they won't see each other's in-flight writes until commit.

Development

# Tests run fully offline with the hash embedder and echo chat — no
# API keys, no network.
pip install -e '.[dev]'
pytest

Layout

examples/python-agent/
├── pyproject.toml             # package metadata + pinned sqlrite dep
├── README.md                  # this file
├── sqlrite_agent/
│   ├── __init__.py
│   ├── __main__.py            # python -m sqlrite_agent → cli.main()
│   ├── agent.py               # turn loop, prompt assembly, summarization
│   ├── chat.py                # LLM provider abstraction (Anthropic / Echo)
│   ├── cli.py                 # interactive REPL + slash commands
│   ├── db.py                  # schema, migrations, all SQL
│   ├── embeddings.py          # Embedder abstraction (hash / OpenAI / local)
│   ├── facts.py               # regex-based fact extractor
│   ├── memory.py              # hybrid recall over messages + summaries + facts
│   └── sqlutil.py             # safe SQL-literal inlining
└── tests/                     # offline pytest suite (31 tests)

The agent binds only to the SQLRite Python SDK's documented public surface (sqlrite.connect, Connection, Cursor). It does not reach into internals.

License

MIT — same as the rest of the rust_sqlite repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

sqlrite-agent — a Python LLM agent with persistent memory in SQLRite

Architecture

Install

Run

CLI commands

60-second demo script

Open the DB yourself with the SQLRite REPL

How recall works

Embedding-provider tradeoffs

Known simplifications

Development

Layout

License

FilesExpand file tree

python-agent

Directory actions

More options

Directory actions

More options

Latest commit

History

python-agent

Folders and files

parent directory

README.md

sqlrite-agent — a Python LLM agent with persistent memory in SQLRite

Architecture

Install

Run

CLI commands

60-second demo script

Open the DB yourself with the SQLRite REPL

How recall works

Embedding-provider tradeoffs

Known simplifications

Development

Layout

License