fix(core): disambiguate truncated observation permalinks to prevent index collisions by phernandez · Pull Request #931 · basicmachines-co/basic-memory

phernandez · 2026-06-09T23:11:55Z

Summary

Observation.permalink truncates content to 200 chars to stay under PostgreSQL's 2704-byte btree index row limit. Two distinct observations with the same category and an identical 200-char prefix collided on the same synthetic permalink, and index_entity_markdown's permalink-keyed dedup silently dropped the second one from the search index, making it unfindable.

This change appends a short stable digest (first 12 hex chars of sha256 of the full content) to the truncated prefix, so distinct content always yields distinct permalinks while exact duplicates still dedupe and the permalink stays well under the btree limit. The permalink is a computed property, not a stored column, so no migration is needed.

Upgrade note: existing search_index rows for >200-char observations keep their old digest-less permalinks until the entity is re-synced or bm reindex --search runs; release notes should mention this.

Test plan

uv run pytest tests/repository/test_observation_repository.py tests/services/test_search_service.py -x -q — 73 passed

Closes #909

🤖 Generated with Claude Code

Reviewed SHA: e1fe35560c58b4f0f825b55a243cb361796b726e
Verdict: approve
Status: success - BM Bossbot approved this head SHA

Summary:
Reviewed the full provided diff against the Basic Memory engineering style and the affected observation permalink/search-indexing path. The change preserves short-content permalinks, keeps exact long duplicates dedupable, and disambiguates distinct long observations that share a truncated prefix. The added repository and search-service regression tests cover the reported failure mode. I found no blocking issues.

Blocking findings:

None

Non-blocking findings:

None

BM Bossbot image provenance

Pull request: #931
Generated asset: /home/runner/work/basic-memory/basic-memory/docs/assets/infographics/pr-931.webp
Image model: gpt-image-2
Size: 1536x1024
Quality: high
Visual format: auto
Theme source: none

Theme / choice instruction:
None supplied.

Image prompt sent to gpt-image-2:

Create a polished landscape WebP visual for Basic Memory PR #931.

This is a non-gating visual summary. The authoritative merge gate is the
GitHub commit status named BM Bossbot Approval, not this image.

Use the BM Bossbot review summary below as source material. Preserve the
concrete before/after value story without inventing facts or turning
implementation details into clutter.

Choose the most appropriate visual form: infographic, map, scene, poster,
painting, tableau, cover image, illustrated artifact, or another image form that
best communicates the PR intent. Choose exactly one visual mode and follow only
that mode's rules. Do not blend the modes.

Mode A - infographic/map:
- Use a readable map backbone with structured information design: sections,
  route lines, checkpoints, nodes, annotations, status badges, compact evidence
  bullets, and a legend.
- Use this mode when the PR needs several facts, gates, checks, or before/after
  points to be read explicitly.

Mode B - editorial scene/poster/painting:
- Use image-first composition: an actual scene, movie poster, painting, tableau,
  cover image, or symbolic illustrated artifact.
- Use this mode when the PR intent can be shown through one staged visual moment
  with minimal text.
- Avoid dashboard layouts, data panels, timeline strips, flowcharts, legends,
  before/after boxes, bullet lists, and checklist columns in this mode.

The visual theme should drive the composition through original style cues while
the engineering meaning stays easy to scan.

Use high contrast, smooth anti-aliased text when text is present, clean edges,
and non-tiny labels. Text is optional for scene-first images, but any text that
appears must be readable.

Avoid fake screenshots, code blocks, invented claims, copyrighted characters,
logos, named fictional universes, direct band logos, album art, celebrity
likenesses, or decorations that obscure content.

BM Bossbot summary:
Reviewed SHA: `e1fe35560c58b4f0f825b55a243cb361796b726e`
Verdict: `approve`
Status: `success` - BM Bossbot approved this head SHA

Summary:
Reviewed the full provided diff against the Basic Memory engineering style and the affected observation permalink/search-indexing path. The change preserves short-content permalinks, keeps exact long duplicates dedupable, and disambiguates distinct long observations that share a truncated prefix. The added repository and search-service regression tests cover the reported failure mode. I found no blocking issues.

Blocking findings:
- None

Non-blocking findings:
- None

Images API revised prompt:
Not provided by the Images API.

…ndex collisions Observation.permalink truncates content to 200 chars to stay under PostgreSQL's 2704-byte btree index row limit. Two distinct observations with the same category and an identical 200-char prefix therefore collided on the same synthetic permalink, and index_entity_markdown's permalink-keyed dedup silently dropped the second one from the search index, making it unfindable. Append a short stable digest (first 12 hex chars of sha256 of the FULL content) to the truncated prefix so distinct content always yields distinct permalinks, while exact duplicates still dedupe and the permalink stays well under the btree limit. The permalink is a computed property, not a stored column, so no migration is needed. Fixes #909 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1fe35560c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-10T00:43:43Z

+            digest = hashlib.sha256(self.content.encode("utf-8")).hexdigest()[:12]
+            content_for_permalink = f"{self.content[:200]}-{digest}"


Delete legacy observation index rows during upgrade

For projects that already have indexed >200-character observations, this changes the computed permalink while the old digest-less search_index rows remain until reindex. If such a note is deleted before it is re-synced, both delete paths (SearchService.handle_delete and SyncService.handle_delete) compute the new digest permalink from the ORM observation and call delete_by_permalink, so the legacy observation row is never matched and remains searchable with a deleted entity_id; consider deleting by entity_id for observation cleanup or also removing the pre-digest permalink during the transition.

Useful? React with 👍 / 👎.

Distinct observations sharing a category and a 200-char content prefix used to collide on their truncated synthetic permalink and the second was silently dropped from the search index (#909, fixed by #931 via a content digest suffix). These end-to-end MCP tests pin the behavior independent of the disambiguation mechanism: both observations stay searchable, and delete_note removes every index row — including disambiguated ones, which would otherwise survive as ghost rows since search_index has no FK cascade from entity. Post-delete assertions inspect the index via search_service directly because the MCP search pipeline hides rows whose entity is gone, which would mask an orphan. Refs #909 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Drew Cain <groksrc@gmail.com>

…926) Refs #909. Integration coverage over the #931 fix. Signed-off-by: phernandez <paul@basicmemory.com>

phernandez and others added 2 commits June 9, 2026 17:39

Merge branch 'main' into fix/909-observation-permalink-collision

e1fe355

chatgpt-codex-connector Bot reviewed Jun 10, 2026

View reviewed changes

phernandez merged commit 93ed340 into main Jun 10, 2026
29 checks passed

phernandez deleted the fix/909-observation-permalink-collision branch June 10, 2026 01:20

phernandez mentioned this pull request Jun 10, 2026

test(mcp): integration coverage for colliding observation permalinks (index + delete) #926

Merged

groksrc mentioned this pull request Jun 10, 2026

fix(core): use Observation.permalink in build_context to match the search index #946

Merged

phernandez pushed a commit that referenced this pull request Jun 10, 2026

test(mcp): integration coverage for colliding observation permalinks (#…

0a3a6bb

…926) Refs #909. Integration coverage over the #931 fix. Signed-off-by: phernandez <paul@basicmemory.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): disambiguate truncated observation permalinks to prevent index collisions#931

fix(core): disambiguate truncated observation permalinks to prevent index collisions#931
phernandez merged 2 commits into
mainfrom
fix/909-observation-permalink-collision

phernandez commented Jun 9, 2026 •

edited by github-actions Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		digest = hashlib.sha256(self.content.encode("utf-8")).hexdigest()[:12]
		content_for_permalink = f"{self.content[:200]}-{digest}"

Conversation

phernandez commented Jun 9, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

phernandez commented Jun 9, 2026 •

edited by github-actions Bot

Loading