Skip to content

fix(mcp): recover edit_note when file exists on disk but is not indexed#934

Open
phernandez wants to merge 12 commits into
mainfrom
fix/581-edit-note-disk-recovery
Open

fix(mcp): recover edit_note when file exists on disk but is not indexed#934
phernandez wants to merge 12 commits into
mainfrom
fix/581-edit-note-disk-recovery

Conversation

@phernandez

@phernandez phernandez commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

edit_note failed with "Entity not found" when the markdown file existed on disk but had not been indexed yet (write a file directly to disk, then immediately edit it).

  • Add POST /v2/projects/{project_id}/knowledge/sync-file, which indexes a single markdown file under the project root via the existing SyncService.sync_one_markdown_file primitive. Fail-fast validation: 400 for traversal, non-normalized (./, //) or non-markdown paths, 404 when the file is not on disk. Paths are canonicalized to the actual on-disk casing by matching real directory entries, so a wrong-cased request on a case-insensitive filesystem (macOS/Windows) hits the existing DB row instead of inserting a duplicate entity.
  • Add KnowledgeClient.sync_file for the new endpoint.
  • When identifier resolution misses in edit_note, index the candidate file (identifier + .md, or the identifier itself when it already ends in .md) and retry resolution exactly once. Recovery runs before the append/prepend auto-create branch so appending to an unindexed on-disk note edits the real file instead of colliding with it. The behavior is documented in the tool docstring.
  • The CLI bm tool edit-note delegates to the MCP tool, so it inherits the recovery behavior.

Test plan

  • uv run pytest tests/api/v2/test_knowledge_router.py tests/mcp/test_tool_edit_note.py tests/mcp/clients/test_clients.py -x -q (114 passed; includes regression tests for the wrong-cased duplicate, non-normalized segments, directory paths, and parent-segment-is-a-file cases)
  • uv run pytest test-int/mcp/test_edit_note_integration.py -x -q (end-to-end disk recovery)
  • just typecheck
  • uv run ruff check / uv run ruff format on changed files

Closes #581

🤖 Generated with Claude Code

Reviewed SHA: 18ad8e483a209308465ab97adb9142a68efa0cd8
Verdict: changes_requested
Status: failure - BM Bossbot requested changes

Summary:
The PR addresses the indexed-missing-file recovery path and fixes the directory symlink scan regression, but the same boundary ordering issue remains for file symlinks. This is a concrete project-boundary/security blocker for the new sync-file endpoint.

Blocking findings:

  • sync-file still follows wrong-cased file symlink targets before containment validation: src/basic_memory/api/v2/routers/knowledge_router.py:337: after canonicalizing the caller path, the endpoint calls (project_config.home / file_path).is_file() before checking resolved_target.is_relative_to(project_config.home.resolve()). A wrong-cased request to an in-project symlinked file, for example SECRET.md matching on-disk secret.md -> /tmp/outside/secret.md on a case-sensitive filesystem, bypasses the initial validate_project_path check, canonicalizes to the symlink, and then is_file() follows/stats the outside target before the later boundary rejection. If the target is absent, the response also becomes 404 instead of a boundary rejection. Move the resolved-target containment check before any file-type/existence check that follows symlinks, and add regression coverage for a file symlink escape, not only a directory symlink escape.

Non-blocking findings:

  • None

BM Bossbot image for PR #934

BM Bossbot image choices
  • Pull request: #934
  • Generated asset: /home/runner/work/basic-memory/basic-memory/docs/assets/infographics/pr-934.webp
  • Image model: gpt-image-2
  • Size: 1536x1024
  • Quality: high
  • Image mode: editorial-image
  • Theme source: auto

Theme / visual direction:

knockoff space opera: fleet routes, mission consoles, contraband manifests, and practical starship drama with no named fictional universes

phernandez and others added 2 commits June 9, 2026 17:56
edit_note failed with "Entity not found" when the markdown file existed on
disk but had not been indexed yet (write file directly to disk, immediately
edit it). Fixes #581.

Recovery path:
- Add POST /v2/projects/{project_id}/knowledge/sync-file, which indexes a
  single markdown file under the project root via the existing
  sync_one_markdown_file primitive (fail-fast: 400 for traversal or
  non-markdown paths, 404 when the file is not on disk).
- Add KnowledgeClient.sync_file for the new endpoint.
- When identifier resolution misses in edit_note, index the candidate file
  (identifier + .md, or the identifier itself when it already ends in .md)
  and retry resolution exactly once. Recovery runs before the append/prepend
  auto-create branch so appending to an unindexed on-disk note edits the
  real file instead of colliding with it.
- When the identifier does not map to a file on disk, the existing guidance
  is returned, now mentioning the file-on-disk-but-not-indexed case and
  suggesting a sync.

The CLI 'bm tool edit-note' delegates to the MCP tool, so it inherits the
recovery behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
Review fixes for the #581 recovery endpoint:

- The sync-file endpoint indexed the caller-supplied path verbatim. On
  case-insensitive filesystems (macOS/Windows) a wrong-cased path like
  'notes/Disk-Note.md' passes the existence check for 'notes/disk-note.md',
  misses the DB row keyed by the on-disk path, and inserts a duplicate
  entity under the wrong-cased path. The endpoint now rejects non-normalized
  segments ('./', '//') with 400 and canonicalizes the path by matching real
  directory entries (exact name first, then a unique case-insensitive match),
  so the index lookup and sync always use the on-disk casing. Behavior is
  identical on case-sensitive and case-insensitive filesystems.
- Document the automatic single-file disk recovery in the edit_note
  docstring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 31ab176c21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/api/v2/routers/knowledge_router.py
Address BM Bossbot review findings on the #581 recovery path:

- The sync-file endpoint went straight from boundary/markdown checks to
  sync_one_markdown_file, bypassing the ignore filtering that scan and watch
  flows apply before indexing. Hidden files, .bmignore matches, and project
  .gitignore matches could be indexed and made searchable (also indirectly
  via edit_note disk recovery). The canonical path is now checked against
  the same should_ignore_path() rules and rejected with 400. Regression
  tests cover hidden, .gitignore-ignored, and .bmignore-ignored files.
- edit_note disk recovery treated every sync-file ToolError as a harmless
  non-recovery case, masking auth/server/transport failures as not-found
  and letting append/prepend continue into auto-create. Only expected
  400/404 candidate rejections are suppressed now; anything else propagates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 137408053a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/mcp/tools/edit_note.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79fca12d0e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/mcp/tools/edit_note.py Outdated
…_note recovery

Address BM Bossbot PR feedback on the edit_note disk-recovery path:

- Finding A: try the identifier as-is before appending .md, and skip the
  fallback when it already ends (case-insensitively) in a markdown suffix
  (.md/.markdown per mimetypes), so notes/foo.markdown no longer becomes
  notes/foo.markdown.md and recovery finds the on-disk file.

- Finding B: distinguish the sync-file ignored-path 400 from generic
  rejections via a shared IGNORED_PATH_REJECTION_DETAIL marker in
  ignore_utils (used to build the router detail), and raise a clear
  ToolError when the file exists on disk but matches ignore rules instead
  of falling through into append/prepend auto-create.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez

Copy link
Copy Markdown
Member Author

Addressed the remaining two bot findings in cefc8a1:

  • Preserve markdown suffixes (Finding A): _resolve_after_disk_recovery now tries the identifier as-is first and only appends .md when the identifier does not already end (case-insensitively) in a markdown suffix (.md/.markdown, matching what FileService.is_markdown accepts via mimetypes). An unindexed notes/foo.markdown or notes/foo.MD now recovers instead of probing notes/foo.markdown.md.
  • Stop suppressing ignored-path rejections (Finding B): the ignored-path 400 from sync-file is now built from a shared IGNORED_PATH_REJECTION_DETAIL constant in ignore_utils.py; the recovery path matches that marker in the error detail and raises a clear ToolError (file exists on disk but matches .bmignore/.gitignore rules and will not be edited) instead of returning None and falling through into append/prepend auto-create. Other 400/404 rejections keep the existing fall-through behavior.

Tests: added .markdown recovery and gitignored-file refusal tests (both verified to fail on the previous code); tests/mcp/test_tool_edit_note.py + tests/api/v2/test_knowledge_router.py — 101 passed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cefc8a147b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/mcp/tools/edit_note.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40a6d0b8a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/api/v2/routers/knowledge_router.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 01b7f8158a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/mcp/tools/edit_note.py
phernandez and others added 2 commits June 9, 2026 23:35
validate_project_path runs before canonicalization, on the caller-supplied
path. On a case-sensitive filesystem a wrong-cased request like
'LINK/secret.md' does not exist, so resolve() cannot follow the real 'link'
symlink and the boundary check passes; canonicalization then rewrites the
segment to the on-disk 'link', which may point outside the project root.

After canonicalization, re-run validate_project_path on the canonical path
and require the fully-resolved absolute target to stay inside the resolved
project home, rejecting escapes with 400. Symlinks whose targets stay inside
the project keep working as before.

Regression tests cover exact-cased and wrong-cased escaping requests plus
the preserved in-project symlink behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
Two gaps in the edit_note disk-recovery path (#581):

- A stem identifier like 'notes/foo' only probed 'notes/foo.md', so an
  on-disk 'notes/foo.markdown' was missed and append/prepend auto-created a
  shadow 'notes/foo.md'. Recovery now probes '.md' then '.markdown' when the
  identifier carries no markdown suffix.

- After sync-file canonicalizes a wrong-cased identifier (notes/Disk-Note ->
  notes/disk-note.md), the retry strictly re-resolved the raw identifier and
  could miss the entity just indexed. Recovery now returns the entity
  identity (external_id) straight from the sync-file response, keeping a
  strict re-resolve only for older servers that omit the field — and letting
  that re-resolve fail loudly instead of guessing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb2d6c2e1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/api/v2/routers/knowledge_router.py
phernandez and others added 2 commits June 10, 2026 00:03
_canonical_file_path followed escaping symlinks with os.scandir before the
post-canonicalization containment check rejected the request: on a
case-sensitive filesystem a wrong-cased request like 'LINK/secret.md' passes
the pre-check (the path does not exist, so resolve() never follows the real
'link' symlink), and segment matching then descended through 'link' and
scanned the outside directory — an information touch beyond the project
boundary even though the request was ultimately rejected.

Re-check the boundary at every traversal step: before scanning the current
directory, its fully-resolved path must still be contained in the resolved
project home; bail with None the moment resolution escapes. The
post-canonicalization checks remain as defense in depth, and symlinks whose
targets stay inside the project keep canonicalizing as before.

Regression tests spy on os.scandir to prove the outside directory is never
scanned (meaningful on case-insensitive macOS too, where the pre-check
already rejects the request), and exercise the boundary bail directly so it
is covered on every filesystem.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18ad8e483a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +398 to +400
synced = await sync_service.sync_one_markdown_file(
file_path, new=existing is None, index_search=True
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Schedule vector refresh after one-file sync

When semantic search is enabled and a caller uses the new /sync-file endpoint directly to index an on-disk markdown file, this path updates the entity/full-text index but never queues sync_entity_vectors; the other v2 write paths schedule _schedule_vector_sync_if_enabled(...), and full sync batches embeddings after changed files. As a result, a note recovered via this endpoint can remain missing or stale in semantic/hybrid search until some later edit or full sync happens.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e92fd6f037

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# Canonicalize to the actual on-disk casing so the DB lookup below hits the
# row keyed by the real path instead of inserting a wrong-cased duplicate.
file_path = _canonical_file_path(project_config.home, segments)
if file_path is None or not (project_config.home / file_path).is_file():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check symlink containment before probing files

When the request differs only by casing from an in-project symlinked file on a case-sensitive filesystem, _canonical_file_path can rewrite it to the real symlink name and this is_file() call follows the symlink before the resolved-target boundary check below runs. For example, SECRET.md matching secret.md -> /tmp/outside/secret.md causes the endpoint to stat outside the project and return 404 vs 400 depending on the external target, even though single-file sync is supposed to stop at project boundaries; move the resolved containment check before any file-type/existence probe that follows symlinks.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: edit-note recovery when note exists on disk but is not indexed yet

1 participant