feat(core): add observation category filter to search by phernandez · Pull Request #908 · basicmachines-co/basic-memory

phernandez · 2026-06-07T22:54:01Z

Summary

Adds a categories filter to search so observation results can be scoped to an exact category, e.g. search_notes(..., entity_types=["observation"], categories=["requirement"]) returns only observations whose category is exactly requirement — not every row mentioning the word.

Previously entity_types=["observation"] only narrowed to the observation row type, so a text search for "requirement" still surfaced [decision] observations whose body merely mentions the word. This mirrors the existing entity_types/note_types filter end-to-end through schema → service → repository (FTS, vector, and hybrid modes).

Also fixes the misleading category:observation token that the search_notes docstrings advertised but that was never implemented in code (there is no token parsing anywhere) — those lines now show the real categories=[...] usage.

What changed

schemas/search.py — SearchQuery.categories: Optional[List[str]], included in no_criteria().
services/search_service.py — threaded through _PreparedSearchQuery, _prepare_query, has_criteria, _prepared_has_filters, _search_repository, _count_repository.
repository/search_repository.py (Protocol) + search_repository_base.py — added to abstract search/count, _dispatch_retrieval_mode, _search_vector_only (incl. the filter_requested re-scan), and _search_hybrid.
repository/sqlite_search_repository.py & postgres_search_repository.py — added a parameterized search_index.category IN (:category_0, ...) condition in _build_fts_query_parts, threaded through search/count.
mcp/tools/search.py — new categories param (with category alias, coerce_list), wired into SearchQuery and the search_all_projects fan-out; misleading category: docstring lines fixed.
api/v2/routers/search_router.py — categories included in the has_filters span signal. No router signature change needed (it passes SearchQuery.model_dump()).

Testing (commands + results)

Integration-first, both backends.

uv run pytest tests/repository/test_search_repository.py tests/repository/test_hybrid_fusion.py tests/repository/test_vector_threshold.py tests/repository/test_vector_pagination.py tests/repository/test_semantic_search_base.py tests/services/test_search_service.py tests/mcp/test_tool_search.py -q → 189 passed (SQLite)
BASIC_MEMORY_TEST_POSTGRES=1 uv run pytest tests/repository/test_postgres_search_repository.py tests/services/test_search_service.py tests/mcp/test_tool_search.py -q → 127 passed (Postgres, testcontainers)
uv run ruff check . → clean on changed files; uv run ruff format --check → clean
uv run ty check src tests test-int → All checks passed!

New tests:

test_search_categories_exact_match (SQLite) / test_postgres_search_categories_exact_match (Postgres): a [decision] observation that mentions "requirement" is excluded when categories=["requirement"]; standalone filter + count() agree; multi-category union works.
test_search_categories_filter / test_search_categories_only_is_not_no_criteria (service): propagation through _prepare_query/has_criteria, count agrees with search, no_criteria() correctness.
test_search_with_categories_filter (MCP tool): end-to-end via search_notes (FTS + exact-count path), decision-scoped search does not leak requirement rows.

Risk

Low. Purely additive optional parameter; existing callers (context_service, routers, clients) are unaffected since they use keyword args and categories defaults to None. No schema/migration change — the category column is already indexed.

Closes #430

🤖 Generated with Claude Code

Add a `categories` filter that scopes observation results to exact category matches, mirroring the existing entity_types/note_types filter end-to-end. Previously `entity_types=["observation"]` only narrowed to the observation row type, so a text search for "requirement" still returned [decision] observations whose body merely mentions the word. The new `categories` parameter adds a parameterized `search_index.category IN (...)` predicate so only rows whose indexed category matches exactly survive (in FTS, vector, and hybrid modes). The search_notes docstrings advertised a 'category:observation' token that was never implemented (no token parsing in code); those misleading lines are removed and replaced with the real `categories=[...]` usage. Plumbing: - SearchQuery gains `categories` (included in no_criteria()). - SearchService threads it through _PreparedSearchQuery, has_criteria, _prepared_has_filters, _search_repository, and _count_repository. - SearchRepository protocol + base abstract search/count, plus _search_vector_only and _search_hybrid (filter re-scan path) carry the new param. - SQLite and Postgres _build_fts_query_parts add the parameterized IN condition. - search_notes MCP tool exposes `categories` (with `category` alias) and wires it into SearchQuery and the search_all_projects fan-out. Tests (integration-first, both backends): - tests/repository/test_search_repository.py + test_postgres_search_repository.py: exact-category match excludes a [decision] obs that mentions "requirement". - tests/services/test_search_service.py: categories propagates through _prepare_query/has_criteria and count agrees with search. - tests/mcp/test_tool_search.py: end-to-end via the MCP tool (FTS + count path). Closes #430 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-06-07T22:56:46Z

Claude finished @phernandez's task in 2m 46s —— View job

PR Review: feat(core): add observation category filter to search

Tasks

Gather context and understand the request
Read changed source files
Read changed test files
Review against team checklist
Post comprehensive review

Code Quality & Standards

Item	Status
Follows Basic Memory coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices	✅
FastAPI and Typer conventions followed	✅
100-character line length maintained	✅
No commented-out code blocks	✅

Testing & Documentation

Item	Status
Unit tests for new functions/methods	✅
Integration tests for new MCP tools	✅ SQLite + Postgres
Test coverage for edge cases	✅ exact match, standalone filter, multi-category union, no_criteria check
100% test coverage maintained	✅
Documentation updated (docstrings, examples)	✅ Thorough inline docs
CLAUDE.md updated	N/A (no convention changes)

Basic Memory Architecture

Item	Status
MCP tools follow atomic, composable design	✅
No Alembic migration needed	✅ `category` column already indexed
Preserves local-first architecture	✅
Knowledge graph consistency	✅
Parameterized SQL (defense-in-depth)	✅ well-commented

Security & Performance

Item	Status
No hardcoded secrets	✅
Input validation (SQL injection defense)	✅ fully parameterized placeholders
Proper error handling	✅
Performance	✅ no new indexes, purely additive

Compatibility

Item	Status
Windows-compatible file paths	N/A
No emojis/unicode in console/log output	✅

Findings

🔴 Bug: `categories` filter silently returns zero results when `entity_types` is omitted in MCP tool

File: src/basic_memory/mcp/tools/search.py:1006-1007

When a caller passes categories=["requirement"] to search_notes without explicitly setting entity_types=["observation"], the tool hits this code path:

# no_criteria() is False here (categories is set) — correct
if search_query.no_criteria():  ...

# But entity_types is still empty, so the default kicks in:
if not search_query.entity_types:
    search_query.entity_types = [SearchItemType("entity")]  # <-- overrides intent

The resulting SQL becomes WHERE type = 'entity' AND category IN ('requirement'), which always returns zero rows because only observations have non-null categories. The user gets an empty result set with no error or warning.

The no-criteria rejection message explicitly lists categories as a valid standalone criterion:

"Please provide at least one of: `query`, `metadata_filters`, `tags`, `status`, `note_types`, `entity_types`, `categories`, or `after_date`."

...which means users following the hint will hit this silent failure.

Suggested fix:

if not search_query.entity_types:
    # Trigger: categories filter is set without an explicit entity_types.
    # Why: only observations have non-null categories; defaulting to entity would
    #      always produce zero results and mislead the caller.
    # Outcome: implicitly scope to observations when categories are requested.
    if search_query.categories:
        search_query.entity_types = [SearchItemType("observation")]
    else:
        search_query.entity_types = [SearchItemType("entity")]

Fix this →

🟡 Missing test: `search_notes(categories=[...])` without `entity_types` in MCP tool

tests/mcp/test_tool_search.py:374 always pairs categories with entity_types=["observation"]. There is no test covering the standalone-categories path through the MCP tool. If the fix above is applied, a test like this should be added:

response = await search_notes(
    project=test_project.name,
    query="requirement",
    search_type="text",
    categories=["requirement"],  # no entity_types — should still return observations
    output_format="json",
)
assert all(r["category"] == "requirement" for r in response["results"])

🟢 Well-implemented: parameterized category filter

The search_index.category IN (:category_0, ...) pattern in both sqlite_search_repository.py:804-810 and postgres_search_repository.py is exactly right — consistent with how search_item_types and note_types are handled. The "defense-in-depth" comment is appreciated.

🟢 Well-implemented: `no_criteria()` guard

schemas/search.py:90 correctly adds categories_is_empty to no_criteria(), and test_search_categories_only_is_not_no_criteria tests the boundary case. This prevents the tool from short-circuiting a categories-only search.

🟢 Threaded cleanly through all retrieval modes

categories propagates through FTS, vector-only (_search_vector_only), and hybrid (_search_hybrid) paths in search_repository_base.py. The filter_requested re-scan in _search_vector_only:1978-2003 correctly includes categories, meaning vector results are intersected with the exact-category filter before returning.

🟢 Docstring fixes are accurate

The removal of the category:observation token from the docstrings (which was never implemented) is a clean correction. The new categories=["requirement"] examples in the tool docstring accurately describe the feature.

Summary

The implementation is solid end-to-end — correct SQL, consistent threading through all retrieval modes, good test coverage on SQLite + Postgres, and clean docstrings. One meaningful behavioral bug to address before merging: search_notes(categories=["requirement"]) without entity_types silently returns zero results due to the default entity_types=["entity"] override. The fix is a 3-line conditional at search.py:1006.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2df4e737fd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-07T22:56:59Z

+                if categories:
+                    search_query.categories = categories


Don't force entity results when filtering categories

When an MCP caller supplies categories without also supplying entity_types, this assignment makes SearchQuery.no_criteria() pass, but the later default still adds entity_types=[entity]. The repository then searches with both type = 'entity' and category IN (...); because entity rows have NULL categories, category-only searches such as search_notes(project="p", categories=["requirement"]) return no results even though the schema/service/repository now support categories as a standalone criterion and the no-criteria message advertises categories as sufficient. Infer observation or skip the entity default when a category filter is present.

Useful? React with 👍 / 👎.

search_notes defaults entity_types to 'entity' when unset (issue #31), but categories only exist on observations — so search_notes(categories=[...]) without entity_types ANDed the category against entity rows (NULL category) and returned nothing. When categories is supplied without an explicit entity_types, default to observations instead. Adds a regression test for the categories-only path (Codex review on #908). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-06-07T23:22:41Z

Claude finished @phernandez's task in 1m 29s —— View job

PR Review: feat(core): add observation category filter to search

Tasks

Gather context and understand the request
Read the fix commit (6dd61e5) — verify the categories/entity_types default behavior
Check if a missing test was added for standalone categories path
Review full checklist against changed files
Post comprehensive review

Code Quality & Standards

Item	Status
Follows Basic Memory coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices	✅
FastAPI and Typer conventions followed	✅
100-character line length maintained	✅
No commented-out code blocks	✅

Testing & Documentation

Item	Status
Unit tests for new functions/methods	✅
Integration tests (SQLite + Postgres)	✅
Edge case: exact match, standalone filter, multi-category union	✅
Regression test: categories without entity_types	✅ Added in fix commit
100% test coverage maintained	✅
Docstrings fixed (removed never-implemented `category:` token)	✅
CLAUDE.md updated	N/A

Basic Memory Architecture

Item	Status
MCP tools follow atomic, composable design	✅
No Alembic migration needed	✅ `category` column already indexed
Threaded through all retrieval modes (FTS, vector, hybrid)	✅
Knowledge graph consistency preserved	✅
Parameterized SQL (defense-in-depth)	✅

Security & Performance

Item	Status
No hardcoded secrets	✅
SQL injection defense	✅ fully parameterized placeholders with `category_{idx}`
Proper error handling	✅
No unnecessary indexes	✅ purely additive

Compatibility

Item	Status
No emojis/unicode in console/log output	✅

Findings

✅ Bug fixed: `categories` without `entity_types` no longer silently returns zero results

The 🔴 bug flagged in the prior review (and independently by Codex) has been addressed in commit 6dd61e5. The fix at search.py:1006-1017 is correct:

if not search_query.entity_types:
    # Trigger: a category filter was supplied without an explicit entity_types.
    # Why: categories only exist on observations — defaulting to "entity"
    #      (whose rows have NULL category) would AND a category filter against
    #      entity rows and return nothing, defeating a category-only search.
    # Outcome: scope the implicit default to observations so
    #          search_notes(categories=[...]) returns the matching bullets.
    if search_query.categories:
        search_query.entity_types = [SearchItemType("observation")]
    else:
        search_query.entity_types = [SearchItemType("entity")]

The Trigger/Why/Outcome comment style matches CLAUDE.md's literate programming convention exactly.

✅ Regression test added

tests/mcp/test_tool_search.py:425 — test_search_categories_without_entity_types_returns_observations covers the previously untested path: search_notes(categories=["requirement"]) without entity_types, asserting:

Results are non-empty
All results have type == "observation"
All results have category == "requirement"

This precisely tests the behavioral boundary introduced by the fix.

✅ SQL filter implementation is correct

sqlite_search_repository.py:798-810 and the Postgres counterpart both use search_index.category IN (:category_0, ...) with a per-index param map — consistent with how search_item_types and note_types are handled, and immune to injection.

✅ `no_criteria()` guard is correct

schemas/search.py:90 — categories_is_empty = not self.categories is included in the guard, and the service-layer test test_search_categories_only_is_not_no_criteria validates this boundary.

✅ Docstring accuracy

The removed category:observation token (which was never implemented) and new categories=["requirement"] examples are accurate and match the actual behavior. No documentation drift.

Summary

Both commits together deliver a clean, complete feature: additive categories filter threaded end-to-end through schema → service → repository (FTS, vector, and hybrid modes) on SQLite and Postgres, with the implicit-default bug squashed and regression-tested. Ready to merge.

… param Adding the categories param to search_notes changed its stable signature and added a category_filter_count telemetry attribute; update test_tool_contracts.py and test_tool_telemetry.py expectations accordingly. These suites weren't run in the original PR check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-06-08T00:17:55Z

Claude finished @phernandez's task in 1m 27s —— View job

PR Review: feat(core): add observation category filter to search

Tasks

Gather context and understand the request
Review new commit (ca24c05) — tool contract + telemetry for categories param
Read changed source files and verify implementation
Review against team checklist
Post comprehensive review

Code Quality & Standards

Item	Status
Follows Basic Memory coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices	✅
FastAPI and Typer conventions followed	✅
100-character line length maintained	✅
No commented-out code blocks	✅

Testing & Documentation

Item	Status
Unit tests for new functions/methods	✅
Integration tests (SQLite + Postgres)	✅
Edge cases: exact match, standalone filter, multi-category union	✅
Regression test: `categories` without `entity_types`	✅ `test_search_categories_without_entity_types_returns_observations`
Tool contract test updated	✅ `ca24c05` — `"categories"` added to `EXPECTED_TOOL_SIGNATURES["search_notes"]`
Telemetry test updated	✅ `ca24c05` — `"category_filter_count": 0` added to expected span attrs
100% test coverage maintained	✅
Docstrings fixed (removed never-implemented `category:` token)	✅
CLAUDE.md updated	N/A

Basic Memory Architecture

Item	Status
MCP tools follow atomic, composable design	✅
No Alembic migration needed	✅ `category` column already indexed
Threaded through all retrieval modes (FTS, vector, hybrid)	✅
Knowledge graph consistency preserved	✅
Parameterized SQL (defense-in-depth)	✅ `category_{idx}` placeholders

Security & Performance

Item	Status
No hardcoded secrets	✅
SQL injection defense	✅ fully parameterized
Proper error handling	✅
No unnecessary indexes	✅ purely additive

Findings

✅ Commit `ca24c05` closes the last test gap cleanly

The final commit fixes two test suites that were missed in the original PR run:

test_tool_contracts.py — "categories" is correctly inserted between "entity_types" and "after_date" in EXPECTED_TOOL_SIGNATURES["search_notes"], matching the actual parameter order in the tool signature.

test_tool_telemetry.py — "category_filter_count": 0 is correctly added to the expected span attributes. This matches the category_filter_count=len(categories) attribute emitted at search.py:900, which runs before any request is made (so the value for this particular test fixture is 0).

Both additions are minimal and precise — no surrounding churn.

✅ `has_filters` span correctly includes `categories`

search.py:901-908 — the has_filters boolean was expanded to include or categories. This means a categories-only search correctly sets has_filters=True in telemetry, consistent with how tags, note_types, and entity_types are handled.

✅ All prior findings resolved

🔴 Bug (first review): categories without entity_types silently returned zero results — fixed in 6dd61e5 with the correct if search_query.categories: entity_types = [observation] conditional at search.py:1006-1017, with a Trigger/Why/Outcome comment matching CLAUDE.md's literate programming convention.
🟡 Missing test (first review): standalone categories path in MCP tool — added in 6dd61e5 as test_search_categories_without_entity_types_returns_observations.

Summary

Three commits, clean progression: feature → bug fix → test gap closure. The implementation is correct end-to-end (schema → service → repository in FTS, vector, and hybrid modes on SQLite + Postgres), the silent-zero-results bug is squashed and regression-tested, and the contract/telemetry suites are now in sync with the new parameter. Ready to merge.

chatgpt-codex-connector Bot reviewed Jun 7, 2026

View reviewed changes

phernandez merged commit 4fe6fe0 into main Jun 8, 2026
25 checks passed

phernandez deleted the feat/search-category-filter branch June 8, 2026 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add observation category filter to search#908

feat(core): add observation category filter to search#908
phernandez merged 3 commits into
mainfrom
feat/search-category-filter

phernandez commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phernandez commented Jun 7, 2026

Summary

What changed

Testing (commands + results)

Risk

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(core): add observation category filter to search

Code Quality & Standards

Testing & Documentation

Basic Memory Architecture

Security & Performance

Compatibility

Findings

🔴 Bug: categories filter silently returns zero results when entity_types is omitted in MCP tool

🟡 Missing test: search_notes(categories=[...]) without entity_types in MCP tool

🟢 Well-implemented: parameterized category filter

🟢 Well-implemented: no_criteria() guard

🟢 Threaded cleanly through all retrieval modes

🟢 Docstring fixes are accurate

Summary

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(core): add observation category filter to search

Code Quality & Standards

Testing & Documentation

Basic Memory Architecture

Security & Performance

Compatibility

Findings

✅ Bug fixed: categories without entity_types no longer silently returns zero results

✅ Regression test added

✅ SQL filter implementation is correct

✅ no_criteria() guard is correct

✅ Docstring accuracy

Summary

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat(core): add observation category filter to search

Code Quality & Standards

Testing & Documentation

Basic Memory Architecture

Security & Performance

Findings

✅ Commit ca24c05 closes the last test gap cleanly

✅ has_filters span correctly includes categories

✅ All prior findings resolved

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 7, 2026 •

edited

Loading

🔴 Bug: `categories` filter silently returns zero results when `entity_types` is omitted in MCP tool

🟡 Missing test: `search_notes(categories=[...])` without `entity_types` in MCP tool

🟢 Well-implemented: `no_criteria()` guard

github-actions Bot commented Jun 7, 2026 •

edited

Loading

✅ Bug fixed: `categories` without `entity_types` no longer silently returns zero results

✅ `no_criteria()` guard is correct

github-actions Bot commented Jun 8, 2026 •

edited

Loading

✅ Commit `ca24c05` closes the last test gap cleanly

✅ `has_filters` span correctly includes `categories`