Skip to content

test(mcp): integration coverage for colliding observation permalinks (index + delete)#926

Merged
phernandez merged 1 commit into
mainfrom
fix/issue-909
Jun 10, 2026
Merged

test(mcp): integration coverage for colliding observation permalinks (index + delete)#926
phernandez merged 1 commit into
mainfrom
fix/issue-909

Conversation

@groksrc

@groksrc groksrc commented Jun 9, 2026

Copy link
Copy Markdown
Member

Refs #909

Summary

Reworked per @phernandez's comment: #931 merged the fix for #909 first (digest-suffixed observation permalinks), so this PR now carries only the additive integration coverage — the source changes and unit tests that would double-handle the collision are dropped.

Two end-to-end MCP tests (test-int/mcp/test_observation_permalink_collision_integration.py) that pin the behavior independent of the disambiguation mechanism:

  1. Both colliding observations stay searchable — two observations sharing a category and a 200-char content prefix are independently findable via search_notes (the original [BUG] Duplicate observation permalink collision drops the second observation from the search index #909 repro, green since fix(core): disambiguate truncated observation permalinks to prevent index collisions #931).
  2. delete_note leaves no ghost index rowssearch_index has no FK cascade from entity, so index-time permalink disambiguation must be matched by delete-time cleanup or the extra observation survives as a ghost row pointing at the deleted file. Post-delete assertions inspect the index via search_service directly, because the MCP search pipeline hides rows whose entity is gone — which would mask exactly the orphan this test guards against (that hiding behavior may deserve its own look).

The delete/ghost scenario is the one our earlier review round demonstrated empirically against an -{obs.id}-style fix; #931's recomputable digest permalinks handle it correctly, and this test keeps it that way.

Test plan

  • uv run pytest test-int/mcp/test_observation_permalink_collision_integration.py — 2 passed against current main
  • ruff check / ruff format --check clean
  • Branch rebuilt as a single signed-off commit on top of main (DCO)

🤖 Generated with Claude Code

Reviewed SHA: 5cdc109ef0ddc64da9f013f1f3382cf5e7805ae9
Verdict: approve
Status: success - BM Bossbot approved this head SHA

Summary:
Reviewed the full provided diff. The added integration coverage exercises the MCP write/search/delete flow and uses the direct search service only for the post-delete orphan-index assertion, which is an appropriate oracle for this regression. No blocking issues found.

Blocking findings:

  • None

Non-blocking findings:

  • None

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @groksrc's task in 2m 49s —— View job


PR Review: fix(search): duplicate observation permalink collision

  • Gather context and understand the request
  • Read changed files
  • Review against team checklist
  • Post findings

Summary: This PR correctly fixes a real bug where two observations sharing a category and a 200-char content prefix produced identical synthetic permalinks, causing the second to be silently dropped from the search index. The core fix (while-loop disambiguation + entity-id cleanup) is sound. Coverage is thorough. A few issues below.


Code Quality & Standards

Item Status
Follows Basic Memory coding conventions
Python 3.12+ type annotations / async patterns
SQLAlchemy 2.0 best practices
100-char line length
No commented-out code blocks
Literate-style Trigger/Why/Outcome comments

Testing & Documentation

Item Status
Unit tests for new behavior
Integration tests (MCP level)
Edge case: pathological double-collision
Incoming-relation cleanup regression
100% coverage maintained

Issues

1. SearchService.handle_delete still leaves incoming relation rows orphaned

SyncService.handle_delete now correctly cleans up incoming relations after the entity-id delete:

# sync_service.py ~L1312
for relation in entity.incoming_relations:
    await self.search_service.delete_by_permalink(relation.permalink)

But SearchService.handle_delete (called from entity_service.delete_entity → MCP delete_note) does not:

# search_service.py ~L987
await self.delete_by_entity_id(entity.id)
# ← no incoming_relations loop
await self._clear_entity_vectors(entity.id)

This was a pre-existing gap (old code only iterated outgoing_relations), but now SyncService explicitly handles the case and the two paths diverge. Deleting a note via MCP will leave the search index rows for its incoming relations pointing at a deleted entity.

The fix is a straightforward copy of the same loop into SearchService.handle_delete:

# After delete_by_entity_id(entity.id), before _clear_entity_vectors:
for relation in entity.incoming_relations:
    await self.delete_by_permalink(relation.permalink)

Fix this →


2. Stale docstring in SearchService.handle_delete

search_service.py:972:

"""Handle complete entity deletion from search and semantic index state.

This replicates the logic from sync_service.handle_delete() to properly clean up
all search index entries for an entity and its related data.

The "replicates" claim is now inaccurate since the two diverge on incoming relations. Should be updated to describe what it actually does, not reference sync_service.


3. Disambiguation creates a permanent permalink mismatch between DB and search index

This is a known trade-off and the PR handles cleanup correctly (entity-id delete), but it's worth making explicit: once obs_permalink is suffixed with -{obs.id}, the value stored in search_index.permalink no longer matches obs.permalink in the DB. Any future code that tries to look up search rows by obs.permalink directly will miss these rows. Since the only supported cleanup path is now delete_by_entity_id, this is safe as long as that invariant is maintained. No action required, just worth noting for reviewers.


4. Minor: the disambiguation loop appends the same obs.id every iteration

The comment explains termination correctly but for a future reader: in each iteration obs_permalink gets longer by -{obs.id} with the same obs.id. So the collision chain is p, p-N, p-N-N, etc. This is correct (each value is distinct) but slightly surprising. The pathological test at the unit level exercises this. No change needed.


Architecture

Item Status
MCP tools atomic, composable
No DB schema changes (no migration needed)
Knowledge graph consistency maintained
Local-first architecture preserved

Security & Performance

Item Status
No hardcoded secrets
Cleanup path (entity-id delete) is more efficient than N permalink deletes
Input validation unchanged

Verdict: The core fix is correct and the test coverage is excellent. The main issue worth addressing before merge is the asymmetry in SearchService.handle_delete (item 1) — it means delete_note via MCP will leave incoming-relation ghost rows that sync delete wouldn't. Everything else is minor.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eaab505f3b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/services/search_service.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ed21a8b01

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/services/search_service.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e630b9c4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/services/test_search_service.py Outdated
@phernandez phernandez added the needs-review Maintainer decision needed before merge label Jun 10, 2026
@phernandez

Copy link
Copy Markdown
Member

Labeling needs-review: #931 merged a fix for #909 first (digest-suffixed observation permalinks), so the search_service/sync_service changes here would double-handle the collision. The integration test in this PR may still be additive over #931's coverage — worth a maintainer look at salvaging it with authorship preserved (as done for #918 via #943) rather than closing outright.

Distinct observations sharing a category and a 200-char content prefix
used to collide on their truncated synthetic permalink and the second
was silently dropped from the search index (#909, fixed by #931 via a
content digest suffix).

These end-to-end MCP tests pin the behavior independent of the
disambiguation mechanism: both observations stay searchable, and
delete_note removes every index row — including disambiguated ones,
which would otherwise survive as ghost rows since search_index has no
FK cascade from entity. Post-delete assertions inspect the index via
search_service directly because the MCP search pipeline hides rows
whose entity is gone, which would mask an orphan.

Refs #909

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
@groksrc groksrc changed the title fix(search): duplicate observation permalink collision drops the second observation from the search index test(mcp): integration coverage for colliding observation permalinks (index + delete) Jun 10, 2026
@groksrc

groksrc commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Reworked as suggested: dropped the src changes and unit tests superseded by #931, kept just the additive MCP integration coverage (collision searchability + delete-time ghost-row guard). Branch is now a single signed-off, test-only commit on top of main. The delete/ghost test is the scenario the review round on this PR originally surfaced — #931's digest approach handles it, and this pins it.

@phernandez phernandez removed the needs-review Maintainer decision needed before merge label Jun 10, 2026
@phernandez

Copy link
Copy Markdown
Member

Thanks for the rework — this is exactly the right salvage. The stale threads on the dropped approach are resolved; this is queued to land.

@phernandez phernandez merged commit 0a3a6bb into main Jun 10, 2026
30 of 32 checks passed
@phernandez phernandez deleted the fix/issue-909 branch June 10, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants