fix(mcp): recover edit_note when file exists on disk but is not indexed#934
fix(mcp): recover edit_note when file exists on disk but is not indexed#934phernandez wants to merge 12 commits into
Conversation
edit_note failed with "Entity not found" when the markdown file existed on disk but had not been indexed yet (write file directly to disk, immediately edit it). Fixes #581. Recovery path: - Add POST /v2/projects/{project_id}/knowledge/sync-file, which indexes a single markdown file under the project root via the existing sync_one_markdown_file primitive (fail-fast: 400 for traversal or non-markdown paths, 404 when the file is not on disk). - Add KnowledgeClient.sync_file for the new endpoint. - When identifier resolution misses in edit_note, index the candidate file (identifier + .md, or the identifier itself when it already ends in .md) and retry resolution exactly once. Recovery runs before the append/prepend auto-create branch so appending to an unindexed on-disk note edits the real file instead of colliding with it. - When the identifier does not map to a file on disk, the existing guidance is returned, now mentioning the file-on-disk-but-not-indexed case and suggesting a sync. The CLI 'bm tool edit-note' delegates to the MCP tool, so it inherits the recovery behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
Review fixes for the #581 recovery endpoint: - The sync-file endpoint indexed the caller-supplied path verbatim. On case-insensitive filesystems (macOS/Windows) a wrong-cased path like 'notes/Disk-Note.md' passes the existence check for 'notes/disk-note.md', misses the DB row keyed by the on-disk path, and inserts a duplicate entity under the wrong-cased path. The endpoint now rejects non-normalized segments ('./', '//') with 400 and canonicalizes the path by matching real directory entries (exact name first, then a unique case-insensitive match), so the index lookup and sync always use the on-disk casing. Behavior is identical on case-sensitive and case-insensitive filesystems. - Document the automatic single-file disk recovery in the edit_note docstring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31ab176c21
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address BM Bossbot review findings on the #581 recovery path: - The sync-file endpoint went straight from boundary/markdown checks to sync_one_markdown_file, bypassing the ignore filtering that scan and watch flows apply before indexing. Hidden files, .bmignore matches, and project .gitignore matches could be indexed and made searchable (also indirectly via edit_note disk recovery). The canonical path is now checked against the same should_ignore_path() rules and rejected with 400. Regression tests cover hidden, .gitignore-ignored, and .bmignore-ignored files. - edit_note disk recovery treated every sync-file ToolError as a harmless non-recovery case, masking auth/server/transport failures as not-found and letting append/prepend continue into auto-create. Only expected 400/404 candidate rejections are suppressed now; anything else propagates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 137408053a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 79fca12d0e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…_note recovery Address BM Bossbot PR feedback on the edit_note disk-recovery path: - Finding A: try the identifier as-is before appending .md, and skip the fallback when it already ends (case-insensitively) in a markdown suffix (.md/.markdown per mimetypes), so notes/foo.markdown no longer becomes notes/foo.markdown.md and recovery finds the on-disk file. - Finding B: distinguish the sync-file ignored-path 400 from generic rejections via a shared IGNORED_PATH_REJECTION_DETAIL marker in ignore_utils (used to build the router detail), and raise a clear ToolError when the file exists on disk but matches ignore rules instead of falling through into append/prepend auto-create. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
|
Addressed the remaining two bot findings in cefc8a1:
Tests: added |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cefc8a147b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40a6d0b8a1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 01b7f8158a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
validate_project_path runs before canonicalization, on the caller-supplied path. On a case-sensitive filesystem a wrong-cased request like 'LINK/secret.md' does not exist, so resolve() cannot follow the real 'link' symlink and the boundary check passes; canonicalization then rewrites the segment to the on-disk 'link', which may point outside the project root. After canonicalization, re-run validate_project_path on the canonical path and require the fully-resolved absolute target to stay inside the resolved project home, rejecting escapes with 400. Symlinks whose targets stay inside the project keep working as before. Regression tests cover exact-cased and wrong-cased escaping requests plus the preserved in-project symlink behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
Two gaps in the edit_note disk-recovery path (#581): - A stem identifier like 'notes/foo' only probed 'notes/foo.md', so an on-disk 'notes/foo.markdown' was missed and append/prepend auto-created a shadow 'notes/foo.md'. Recovery now probes '.md' then '.markdown' when the identifier carries no markdown suffix. - After sync-file canonicalizes a wrong-cased identifier (notes/Disk-Note -> notes/disk-note.md), the retry strictly re-resolved the raw identifier and could miss the entity just indexed. Recovery now returns the entity identity (external_id) straight from the sync-file response, keeping a strict re-resolve only for older servers that omit the field — and letting that re-resolve fail loudly instead of guessing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb2d6c2e1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
_canonical_file_path followed escaping symlinks with os.scandir before the post-canonicalization containment check rejected the request: on a case-sensitive filesystem a wrong-cased request like 'LINK/secret.md' passes the pre-check (the path does not exist, so resolve() never follows the real 'link' symlink), and segment matching then descended through 'link' and scanned the outside directory — an information touch beyond the project boundary even though the request was ultimately rejected. Re-check the boundary at every traversal step: before scanning the current directory, its fully-resolved path must still be contained in the resolved project home; bail with None the moment resolution escapes. The post-canonicalization checks remain as defense in depth, and symlinks whose targets stay inside the project keep canonicalizing as before. Regression tests spy on os.scandir to prove the outside directory is never scanned (meaningful on case-insensitive macOS too, where the pre-check already rejects the request), and exercise the boundary bail directly so it is covered on every filesystem. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 18ad8e483a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| synced = await sync_service.sync_one_markdown_file( | ||
| file_path, new=existing is None, index_search=True | ||
| ) |
There was a problem hiding this comment.
Schedule vector refresh after one-file sync
When semantic search is enabled and a caller uses the new /sync-file endpoint directly to index an on-disk markdown file, this path updates the entity/full-text index but never queues sync_entity_vectors; the other v2 write paths schedule _schedule_vector_sync_if_enabled(...), and full sync batches embeddings after changed files. As a result, a note recovered via this endpoint can remain missing or stale in semantic/hybrid search until some later edit or full sync happens.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e92fd6f037
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Canonicalize to the actual on-disk casing so the DB lookup below hits the | ||
| # row keyed by the real path instead of inserting a wrong-cased duplicate. | ||
| file_path = _canonical_file_path(project_config.home, segments) | ||
| if file_path is None or not (project_config.home / file_path).is_file(): |
There was a problem hiding this comment.
Check symlink containment before probing files
When the request differs only by casing from an in-project symlinked file on a case-sensitive filesystem, _canonical_file_path can rewrite it to the real symlink name and this is_file() call follows the symlink before the resolved-target boundary check below runs. For example, SECRET.md matching secret.md -> /tmp/outside/secret.md causes the endpoint to stat outside the project and return 404 vs 400 depending on the external target, even though single-file sync is supposed to stop at project boundaries; move the resolved containment check before any file-type/existence probe that follows symlinks.
Useful? React with 👍 / 👎.
Summary
edit_note failed with "Entity not found" when the markdown file existed on disk but had not been indexed yet (write a file directly to disk, then immediately edit it).
POST /v2/projects/{project_id}/knowledge/sync-file, which indexes a single markdown file under the project root via the existingSyncService.sync_one_markdown_fileprimitive. Fail-fast validation: 400 for traversal, non-normalized (./,//) or non-markdown paths, 404 when the file is not on disk. Paths are canonicalized to the actual on-disk casing by matching real directory entries, so a wrong-cased request on a case-insensitive filesystem (macOS/Windows) hits the existing DB row instead of inserting a duplicate entity.KnowledgeClient.sync_filefor the new endpoint.edit_note, index the candidate file (identifier + .md, or the identifier itself when it already ends in.md) and retry resolution exactly once. Recovery runs before the append/prepend auto-create branch so appending to an unindexed on-disk note edits the real file instead of colliding with it. The behavior is documented in the tool docstring.bm tool edit-notedelegates to the MCP tool, so it inherits the recovery behavior.Test plan
uv run pytest tests/api/v2/test_knowledge_router.py tests/mcp/test_tool_edit_note.py tests/mcp/clients/test_clients.py -x -q(114 passed; includes regression tests for the wrong-cased duplicate, non-normalized segments, directory paths, and parent-segment-is-a-file cases)uv run pytest test-int/mcp/test_edit_note_integration.py -x -q(end-to-end disk recovery)just typecheckuv run ruff check/uv run ruff formaton changed filesCloses #581
🤖 Generated with Claude Code
Reviewed SHA:
18ad8e483a209308465ab97adb9142a68efa0cd8Verdict:
changes_requestedStatus:
failure- BM Bossbot requested changesSummary:
The PR addresses the indexed-missing-file recovery path and fixes the directory symlink scan regression, but the same boundary ordering issue remains for file symlinks. This is a concrete project-boundary/security blocker for the new sync-file endpoint.
Blocking findings:
(project_config.home / file_path).is_file()before checkingresolved_target.is_relative_to(project_config.home.resolve()). A wrong-cased request to an in-project symlinked file, for exampleSECRET.mdmatching on-disksecret.md -> /tmp/outside/secret.mdon a case-sensitive filesystem, bypasses the initialvalidate_project_pathcheck, canonicalizes to the symlink, and thenis_file()follows/stats the outside target before the later boundary rejection. If the target is absent, the response also becomes 404 instead of a boundary rejection. Move the resolved-target containment check before any file-type/existence check that follows symlinks, and add regression coverage for a file symlink escape, not only a directory symlink escape.Non-blocking findings:
BM Bossbot image choices
#934/home/runner/work/basic-memory/basic-memory/docs/assets/infographics/pr-934.webpgpt-image-21536x1024higheditorial-imageautoTheme / visual direction: