Skip to content

feat(core): add observation category filter to search#908

Merged
phernandez merged 3 commits into
mainfrom
feat/search-category-filter
Jun 8, 2026
Merged

feat(core): add observation category filter to search#908
phernandez merged 3 commits into
mainfrom
feat/search-category-filter

Conversation

@phernandez

Copy link
Copy Markdown
Member

Summary

Adds a categories filter to search so observation results can be scoped to an exact category, e.g. search_notes(..., entity_types=["observation"], categories=["requirement"]) returns only observations whose category is exactly requirement — not every row mentioning the word.

Previously entity_types=["observation"] only narrowed to the observation row type, so a text search for "requirement" still surfaced [decision] observations whose body merely mentions the word. This mirrors the existing entity_types/note_types filter end-to-end through schema → service → repository (FTS, vector, and hybrid modes).

Also fixes the misleading category:observation token that the search_notes docstrings advertised but that was never implemented in code (there is no token parsing anywhere) — those lines now show the real categories=[...] usage.

What changed

  • schemas/search.pySearchQuery.categories: Optional[List[str]], included in no_criteria().
  • services/search_service.py — threaded through _PreparedSearchQuery, _prepare_query, has_criteria, _prepared_has_filters, _search_repository, _count_repository.
  • repository/search_repository.py (Protocol) + search_repository_base.py — added to abstract search/count, _dispatch_retrieval_mode, _search_vector_only (incl. the filter_requested re-scan), and _search_hybrid.
  • repository/sqlite_search_repository.py & postgres_search_repository.py — added a parameterized search_index.category IN (:category_0, ...) condition in _build_fts_query_parts, threaded through search/count.
  • mcp/tools/search.py — new categories param (with category alias, coerce_list), wired into SearchQuery and the search_all_projects fan-out; misleading category: docstring lines fixed.
  • api/v2/routers/search_router.pycategories included in the has_filters span signal. No router signature change needed (it passes SearchQuery.model_dump()).

Testing (commands + results)

Integration-first, both backends.

  • uv run pytest tests/repository/test_search_repository.py tests/repository/test_hybrid_fusion.py tests/repository/test_vector_threshold.py tests/repository/test_vector_pagination.py tests/repository/test_semantic_search_base.py tests/services/test_search_service.py tests/mcp/test_tool_search.py -q189 passed (SQLite)
  • BASIC_MEMORY_TEST_POSTGRES=1 uv run pytest tests/repository/test_postgres_search_repository.py tests/services/test_search_service.py tests/mcp/test_tool_search.py -q127 passed (Postgres, testcontainers)
  • uv run ruff check . → clean on changed files; uv run ruff format --check → clean
  • uv run ty check src tests test-intAll checks passed!

New tests:

  • test_search_categories_exact_match (SQLite) / test_postgres_search_categories_exact_match (Postgres): a [decision] observation that mentions "requirement" is excluded when categories=["requirement"]; standalone filter + count() agree; multi-category union works.
  • test_search_categories_filter / test_search_categories_only_is_not_no_criteria (service): propagation through _prepare_query/has_criteria, count agrees with search, no_criteria() correctness.
  • test_search_with_categories_filter (MCP tool): end-to-end via search_notes (FTS + exact-count path), decision-scoped search does not leak requirement rows.

Risk

Low. Purely additive optional parameter; existing callers (context_service, routers, clients) are unaffected since they use keyword args and categories defaults to None. No schema/migration change — the category column is already indexed.

Closes #430

🤖 Generated with Claude Code

Add a `categories` filter that scopes observation results to exact category
matches, mirroring the existing entity_types/note_types filter end-to-end.

Previously `entity_types=["observation"]` only narrowed to the observation row
type, so a text search for "requirement" still returned [decision] observations
whose body merely mentions the word. The new `categories` parameter adds a
parameterized `search_index.category IN (...)` predicate so only rows whose
indexed category matches exactly survive (in FTS, vector, and hybrid modes).

The search_notes docstrings advertised a 'category:observation' token that was
never implemented (no token parsing in code); those misleading lines are removed
and replaced with the real `categories=[...]` usage.

Plumbing:
- SearchQuery gains `categories` (included in no_criteria()).
- SearchService threads it through _PreparedSearchQuery, has_criteria,
  _prepared_has_filters, _search_repository, and _count_repository.
- SearchRepository protocol + base abstract search/count, plus _search_vector_only
  and _search_hybrid (filter re-scan path) carry the new param.
- SQLite and Postgres _build_fts_query_parts add the parameterized IN condition.
- search_notes MCP tool exposes `categories` (with `category` alias) and wires it
  into SearchQuery and the search_all_projects fan-out.

Tests (integration-first, both backends):
- tests/repository/test_search_repository.py + test_postgres_search_repository.py:
  exact-category match excludes a [decision] obs that mentions "requirement".
- tests/services/test_search_service.py: categories propagates through
  _prepare_query/has_criteria and count agrees with search.
- tests/mcp/test_tool_search.py: end-to-end via the MCP tool (FTS + count path).

Closes #430

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude finished @phernandez's task in 2m 46s —— View job


PR Review: feat(core): add observation category filter to search

Tasks

  • Gather context and understand the request
  • Read changed source files
  • Read changed test files
  • Review against team checklist
  • Post comprehensive review

Code Quality & Standards

Item Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices
FastAPI and Typer conventions followed
100-character line length maintained
No commented-out code blocks

Testing & Documentation

Item Status
Unit tests for new functions/methods
Integration tests for new MCP tools ✅ SQLite + Postgres
Test coverage for edge cases ✅ exact match, standalone filter, multi-category union, no_criteria check
100% test coverage maintained
Documentation updated (docstrings, examples) ✅ Thorough inline docs
CLAUDE.md updated N/A (no convention changes)

Basic Memory Architecture

Item Status
MCP tools follow atomic, composable design
No Alembic migration needed category column already indexed
Preserves local-first architecture
Knowledge graph consistency
Parameterized SQL (defense-in-depth) ✅ well-commented

Security & Performance

Item Status
No hardcoded secrets
Input validation (SQL injection defense) ✅ fully parameterized placeholders
Proper error handling
Performance ✅ no new indexes, purely additive

Compatibility

Item Status
Windows-compatible file paths N/A
No emojis/unicode in console/log output

Findings

🔴 Bug: categories filter silently returns zero results when entity_types is omitted in MCP tool

File: src/basic_memory/mcp/tools/search.py:1006-1007

When a caller passes categories=["requirement"] to search_notes without explicitly setting entity_types=["observation"], the tool hits this code path:

# no_criteria() is False here (categories is set) — correct
if search_query.no_criteria():  ...

# But entity_types is still empty, so the default kicks in:
if not search_query.entity_types:
    search_query.entity_types = [SearchItemType("entity")]  # <-- overrides intent

The resulting SQL becomes WHERE type = 'entity' AND category IN ('requirement'), which always returns zero rows because only observations have non-null categories. The user gets an empty result set with no error or warning.

The no-criteria rejection message explicitly lists categories as a valid standalone criterion:

"Please provide at least one of: `query`, `metadata_filters`, `tags`, `status`, `note_types`, `entity_types`, `categories`, or `after_date`."

...which means users following the hint will hit this silent failure.

Suggested fix:

if not search_query.entity_types:
    # Trigger: categories filter is set without an explicit entity_types.
    # Why: only observations have non-null categories; defaulting to entity would
    #      always produce zero results and mislead the caller.
    # Outcome: implicitly scope to observations when categories are requested.
    if search_query.categories:
        search_query.entity_types = [SearchItemType("observation")]
    else:
        search_query.entity_types = [SearchItemType("entity")]

Fix this →


🟡 Missing test: search_notes(categories=[...]) without entity_types in MCP tool

tests/mcp/test_tool_search.py:374 always pairs categories with entity_types=["observation"]. There is no test covering the standalone-categories path through the MCP tool. If the fix above is applied, a test like this should be added:

response = await search_notes(
    project=test_project.name,
    query="requirement",
    search_type="text",
    categories=["requirement"],  # no entity_types — should still return observations
    output_format="json",
)
assert all(r["category"] == "requirement" for r in response["results"])

🟢 Well-implemented: parameterized category filter

The search_index.category IN (:category_0, ...) pattern in both sqlite_search_repository.py:804-810 and postgres_search_repository.py is exactly right — consistent with how search_item_types and note_types are handled. The "defense-in-depth" comment is appreciated.


🟢 Well-implemented: no_criteria() guard

schemas/search.py:90 correctly adds categories_is_empty to no_criteria(), and test_search_categories_only_is_not_no_criteria tests the boundary case. This prevents the tool from short-circuiting a categories-only search.


🟢 Threaded cleanly through all retrieval modes

categories propagates through FTS, vector-only (_search_vector_only), and hybrid (_search_hybrid) paths in search_repository_base.py. The filter_requested re-scan in _search_vector_only:1978-2003 correctly includes categories, meaning vector results are intersected with the exact-category filter before returning.


🟢 Docstring fixes are accurate

The removal of the category:observation token from the docstrings (which was never implemented) is a clean correction. The new categories=["requirement"] examples in the tool docstring accurately describe the feature.


Summary

The implementation is solid end-to-end — correct SQL, consistent threading through all retrieval modes, good test coverage on SQLite + Postgres, and clean docstrings. One meaningful behavioral bug to address before merging: search_notes(categories=["requirement"]) without entity_types silently returns zero results due to the default entity_types=["entity"] override. The fix is a 3-line conditional at search.py:1006.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2df4e737fd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +971 to +972
if categories:
search_query.categories = categories

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't force entity results when filtering categories

When an MCP caller supplies categories without also supplying entity_types, this assignment makes SearchQuery.no_criteria() pass, but the later default still adds entity_types=[entity]. The repository then searches with both type = 'entity' and category IN (...); because entity rows have NULL categories, category-only searches such as search_notes(project="p", categories=["requirement"]) return no results even though the schema/service/repository now support categories as a standalone criterion and the no-criteria message advertises categories as sufficient. Infer observation or skip the entity default when a category filter is present.

Useful? React with 👍 / 👎.

search_notes defaults entity_types to 'entity' when unset (issue #31), but categories only exist on observations — so search_notes(categories=[...]) without entity_types ANDed the category against entity rows (NULL category) and returned nothing. When categories is supplied without an explicit entity_types, default to observations instead. Adds a regression test for the categories-only path (Codex review on #908).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude finished @phernandez's task in 1m 29s —— View job


PR Review: feat(core): add observation category filter to search

Tasks

  • Gather context and understand the request
  • Read the fix commit (6dd61e5) — verify the categories/entity_types default behavior
  • Check if a missing test was added for standalone categories path
  • Review full checklist against changed files
  • Post comprehensive review

Code Quality & Standards

Item Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices
FastAPI and Typer conventions followed
100-character line length maintained
No commented-out code blocks

Testing & Documentation

Item Status
Unit tests for new functions/methods
Integration tests (SQLite + Postgres)
Edge case: exact match, standalone filter, multi-category union
Regression test: categories without entity_types ✅ Added in fix commit
100% test coverage maintained
Docstrings fixed (removed never-implemented category: token)
CLAUDE.md updated N/A

Basic Memory Architecture

Item Status
MCP tools follow atomic, composable design
No Alembic migration needed category column already indexed
Threaded through all retrieval modes (FTS, vector, hybrid)
Knowledge graph consistency preserved
Parameterized SQL (defense-in-depth)

Security & Performance

Item Status
No hardcoded secrets
SQL injection defense ✅ fully parameterized placeholders with category_{idx}
Proper error handling
No unnecessary indexes ✅ purely additive

Compatibility

Item Status
No emojis/unicode in console/log output

Findings

✅ Bug fixed: categories without entity_types no longer silently returns zero results

The 🔴 bug flagged in the prior review (and independently by Codex) has been addressed in commit 6dd61e5. The fix at search.py:1006-1017 is correct:

if not search_query.entity_types:
    # Trigger: a category filter was supplied without an explicit entity_types.
    # Why: categories only exist on observations — defaulting to "entity"
    #      (whose rows have NULL category) would AND a category filter against
    #      entity rows and return nothing, defeating a category-only search.
    # Outcome: scope the implicit default to observations so
    #          search_notes(categories=[...]) returns the matching bullets.
    if search_query.categories:
        search_query.entity_types = [SearchItemType("observation")]
    else:
        search_query.entity_types = [SearchItemType("entity")]

The Trigger/Why/Outcome comment style matches CLAUDE.md's literate programming convention exactly.

✅ Regression test added

tests/mcp/test_tool_search.py:425test_search_categories_without_entity_types_returns_observations covers the previously untested path: search_notes(categories=["requirement"]) without entity_types, asserting:

  • Results are non-empty
  • All results have type == "observation"
  • All results have category == "requirement"

This precisely tests the behavioral boundary introduced by the fix.

✅ SQL filter implementation is correct

sqlite_search_repository.py:798-810 and the Postgres counterpart both use search_index.category IN (:category_0, ...) with a per-index param map — consistent with how search_item_types and note_types are handled, and immune to injection.

no_criteria() guard is correct

schemas/search.py:90categories_is_empty = not self.categories is included in the guard, and the service-layer test test_search_categories_only_is_not_no_criteria validates this boundary.

✅ Docstring accuracy

The removed category:observation token (which was never implemented) and new categories=["requirement"] examples are accurate and match the actual behavior. No documentation drift.


Summary

Both commits together deliver a clean, complete feature: additive categories filter threaded end-to-end through schema → service → repository (FTS, vector, and hybrid modes) on SQLite and Postgres, with the implicit-default bug squashed and regression-tested. Ready to merge.

… param

Adding the categories param to search_notes changed its stable signature and added a category_filter_count telemetry attribute; update test_tool_contracts.py and test_tool_telemetry.py expectations accordingly. These suites weren't run in the original PR check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @phernandez's task in 1m 27s —— View job


PR Review: feat(core): add observation category filter to search

Tasks

  • Gather context and understand the request
  • Review new commit (ca24c05) — tool contract + telemetry for categories param
  • Read changed source files and verify implementation
  • Review against team checklist
  • Post comprehensive review

Code Quality & Standards

Item Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices
FastAPI and Typer conventions followed
100-character line length maintained
No commented-out code blocks

Testing & Documentation

Item Status
Unit tests for new functions/methods
Integration tests (SQLite + Postgres)
Edge cases: exact match, standalone filter, multi-category union
Regression test: categories without entity_types test_search_categories_without_entity_types_returns_observations
Tool contract test updated ca24c05"categories" added to EXPECTED_TOOL_SIGNATURES["search_notes"]
Telemetry test updated ca24c05"category_filter_count": 0 added to expected span attrs
100% test coverage maintained
Docstrings fixed (removed never-implemented category: token)
CLAUDE.md updated N/A

Basic Memory Architecture

Item Status
MCP tools follow atomic, composable design
No Alembic migration needed category column already indexed
Threaded through all retrieval modes (FTS, vector, hybrid)
Knowledge graph consistency preserved
Parameterized SQL (defense-in-depth) category_{idx} placeholders

Security & Performance

Item Status
No hardcoded secrets
SQL injection defense ✅ fully parameterized
Proper error handling
No unnecessary indexes ✅ purely additive

Findings

✅ Commit ca24c05 closes the last test gap cleanly

The final commit fixes two test suites that were missed in the original PR run:

test_tool_contracts.py"categories" is correctly inserted between "entity_types" and "after_date" in EXPECTED_TOOL_SIGNATURES["search_notes"], matching the actual parameter order in the tool signature.

test_tool_telemetry.py"category_filter_count": 0 is correctly added to the expected span attributes. This matches the category_filter_count=len(categories) attribute emitted at search.py:900, which runs before any request is made (so the value for this particular test fixture is 0).

Both additions are minimal and precise — no surrounding churn.

has_filters span correctly includes categories

search.py:901-908 — the has_filters boolean was expanded to include or categories. This means a categories-only search correctly sets has_filters=True in telemetry, consistent with how tags, note_types, and entity_types are handled.

✅ All prior findings resolved

  • 🔴 Bug (first review): categories without entity_types silently returned zero results — fixed in 6dd61e5 with the correct if search_query.categories: entity_types = [observation] conditional at search.py:1006-1017, with a Trigger/Why/Outcome comment matching CLAUDE.md's literate programming convention.
  • 🟡 Missing test (first review): standalone categories path in MCP tool — added in 6dd61e5 as test_search_categories_without_entity_types_returns_observations.

Summary

Three commits, clean progression: feature → bug fix → test gap closure. The implementation is correct end-to-end (schema → service → repository in FTS, vector, and hybrid modes on SQLite + Postgres), the silent-zero-results bug is squashed and regression-tested, and the contract/telemetry suites are now in sync with the new parameter. Ready to merge.

@phernandez phernandez merged commit 4fe6fe0 into main Jun 8, 2026
25 checks passed
@phernandez phernandez deleted the feat/search-category-filter branch June 8, 2026 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Allow observation category filtering to return only observations whose category is exactly matched

1 participant