feat(memory): path-aware episode/skill provenance + human-gated promote#10
Conversation
Fix the "episode dead-end": provenance tainted episodes by tool *name* only,
so read_file/search_files/multi_grep — used in essentially every coding
session — always produced an untrusted episode. Untrusted episodes are
excluded from recall, and nothing ever set UserApproved, so for normal work
the episode tier was written but never replayed (dead weight).
Provenance is now argument-aware via a single shared predicate
`memory.ToolCallTaints(name, argsJSON)`:
- Always untrusted: browser, http_batch, transcribe, and any MCP tool (__).
- Path-scoped (read_file, search_files, multi_grep): taint only when the
`path` argument resolves to a sensitive location (danger.ClassifyPath →
system_write/destructive, e.g. /etc, ~/.ssh, ~/.aws). Workspace and other
non-sensitive local reads stay trusted. Malformed/absent path → conservative
taint. This precisely re-targets the original "covers /etc/, $HOME/.ssh"
concern while letting ordinary sessions stay recallable.
The skills provenance (internal/skills/selfimprove.go) now delegates to the
same predicate, so learned skills stop being over-flagged for review too.
Reusing danger.ClassifyPath means DeriveProvenance keeps its (messages)
signature — no cwd plumbing into the async session-end call sites.
Escape hatch for genuinely-external episodes: new human-gated CLI
`odek memory list` / `odek memory promote <session_id>` sets UserApproved.
It is intentionally NOT an agent tool — a prompt-injected agent must not be
able to self-approve its own poisoned memory.
Also fixes two stale descriptions found while mapping the system: the
extract_on_end docs ("durable facts" → episode summary) and the
FormatEpisodeContext comment ("recency-based ranker, no LLM" → it uses the
configured, LLM-by-default ranker).
Tests: path-aware taint (workspace trusted, sensitive/system tainted, malformed
conservative, always-external), the ToolCallTaints table, episode
promote/pending lifecycle + persistence, skills mirror (guards the
llm.Message→LlmMessage Arguments plumbing), and the CLI command end to end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…02/D-03/D-05)
Remediates the findings from the AI Verification Protocol run on this PR. The
v1 taint check used a sensitive-path *denylist* (danger.ClassifyPath) over only
three tools, which left several bypasses:
- D-03 (regression): macOS /etc and /var are symlinks to /private/etc, /private
/var. ClassifyPath only matched /etc,/var, so read_file("/private/etc/...")
classified local_write and did NOT taint — a sensitive read that tainted
before this PR. Confirmed live.
- D-01: sibling whole-file readers (batch_read, json_query, head_tail, sort,
tr, diff, count_lines, checksum, word_count, file_info, glob, tree, base64)
were in no taint set at all, so reading /etc/shadow via batch_read produced a
TRUSTED, recallable episode.
- D-02: home credential files outside ClassifyPath's hardcoded subdir list
(e.g. ~/secrets.env) classified local_write and did not taint.
- D-05: session_search re-surfaces prior-session transcripts but never tainted.
Fix — flip the model from a sensitive-path denylist to a workspace-containment
allowlist, and broaden coverage:
- ToolCallTaints now taints a path-reading tool when ANY of its path arguments
(path / path_a / path_b / files[].path) resolves OUTSIDE the workspace trust
zone (cwd, the sandbox /workspace mount, or ~/.odek). Symlinks are resolved
so /etc -> /private/etc can't disguise an escape. Workspace reads stay
trusted (the original goal); everything else taints. Malformed args still
taint conservatively; absent/empty path defaults to the workspace.
- PathReadingTools expanded to every file-reading tool; session_search added to
AlwaysExternalTools. Provenance no longer depends on ClassifyPath.
- Defense in depth: ClassifyPath now strips the macOS /private prefix so
/private/etc|var classify the same as /etc|var for its other callers
(transcribe risk, write confinement).
Also adds the missing manager-wrapper tests (PromoteEpisode /
PendingReviewEpisodes were at 0% coverage — the report's coverage finding;
now 100%) and updates docs/SECURITY.md to describe the containment model
accurately (the report's doc-accuracy finding).
Verified: batch_read/json_query/diff of /etc, read_file of /private/etc and
~/secrets.env, and session_search all taint; workspace reads (relative + abs)
and the /workspace mount stay trusted.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verification Protocol — remediation pass (commit
|
| Finding | Sev | Status |
|---|---|---|
D-03 macOS /private/etc|var read no longer tainted (regression) |
HIGH | ✅ Fixed — containment check resolves symlinks; ClassifyPath also strips /private. Verified read_file("/private/etc/master.passwd") → taints. |
D-01 sibling whole-file readers (batch_read, json_query, head_tail, diff, …) bypass taint |
HIGH | ✅ Fixed — PathReadingTools broadened to all 16 file-readers; multi-path args (path_a/b, files[].path) extracted. batch_read{/etc/shadow} → taints. |
D-02 home creds outside the hardcoded list (~/secrets.env) not tainted |
MED | ✅ Fixed — any read outside the workspace taints now. |
D-05 session_search recalls prior transcripts untainted |
MED | ✅ Fixed — added to AlwaysExternalTools. |
Coverage PromoteEpisode/PendingReviewEpisodes at 0% |
LOW | ✅ Fixed — now 100%. |
Docs (2.9) SECURITY.md overstated coverage |
— | ✅ Rewritten to the containment model. |
Design change: the taint check flipped from a sensitive-path denylist (ClassifyPath) to a workspace-containment allowlist — a path-reading call taints when any path argument resolves outside the workspace / sandbox /workspace / ~/.odek (symlink-resolved). Workspace reads stay trusted (original goal preserved); everything else taints. This closes the whole class of denylist-incompleteness bypasses rather than patching them one by one.
Re-scored axes
2.1 ✅ · 2.2 ✅ · 2.3 ✅ (was 🔴) · 2.4 ✅ · 2.5 ✅ · 2.6 ✅ · 2.7
η ≈ 0.68 (signals improved: o≈0.75, b≈0.92, s≈0.90, t=1, d=1, f=1, m skipped/redistributed) · ρ = 0.21 (unchanged — single-model pipeline).
Verdict: 🟡 HumanReviewRequired — but now only on the correlation gate
No axis is 🔴 anymore; every substantive finding is fixed. The verdict stays at HumanReviewRequired solely because the pipeline is a monoculture (the author and all verifier roles are the same model/family → ρ=0.21 ∈ (0.20, 0.30], which also drags η below 0.80). Per §3.8 that is a hard cap a single model cannot clear by improving the code — a human reviewer or a genuinely independent second-provider verification is the remaining gate. That's the protocol working as designed: a model may not self-certify to AutoApprove.
Verification artifacts: deterministic tests now cover each closed bypass (TestToolCallTaints, broadened path-tool cases) and the manager wrappers.
Add `memory.auto_approve_episodes` (default false). When enabled, untrusted episodes are stamped approved at session end so they are recalled without a manual `odek memory promote` — an explicit opt-in that trades the human review gate for convenience in trusted/sandboxed deployments. - New EpisodeProvenance.AutoApproved field, kept DISTINCT from UserApproved so the audit trail shows the approval was automatic (policy), while Untrusted + Sources are still recorded. Episode Search and PendingReview treat AutoApproved as approved. - The stamp is applied at write time in OnSessionEndWithProvenance only when the flag is on and the episode is untrusted; trusted episodes are unaffected. - Config plumbed through MemoryConfig (default false), NewMemoryManager merge, and resolveMemory. - docs/CONFIG.md and docs/SECURITY.md document it as a security trade-off that disables the persistence-injection protection for episodes when on. Tests: store-level recall of an AutoApproved episode + exclusion from pending; the secure default is false; and the full OnSessionEnd stamping path with the flag on (AutoApproved set, recallable, not pending) vs off (excluded, pending). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Fixes the episode-memory dead-end: provenance tainted an episode if the session used
read_file/search_files/multi_grep— by tool name alone. Those run in essentially every coding session, so almost every episode was bornUntrusted, untrusted episodes are filtered out of recall, and nothing ever setUserApproved. Net result: episodes were extracted (an LLM call per session) and stored, but for normal work never recalled — the tier was dead weight.This makes taint argument-aware and adds the missing human-gated escape hatch.
What changed
1. Path-aware taint (shared single source of truth)
New
memory.ToolCallTaints(name, argsJSON)decides per call:browser,http_batch,transcribe, any MCP tool (server__tool).read_file/search_files/multi_greptaint only when theirpathargument resolves to a sensitive location — reusing the existing, testeddanger.ClassifyPath(system_write/destructive:/etc,~/.ssh,~/.aws, …). Workspace and other non-sensitive local reads stay trusted. Malformed/absent path → conservative taint.This precisely re-targets the original "covers /etc/, $HOME/.ssh" concern while letting ordinary sessions remain recallable. Because
ClassifyPathresolves paths itself,DeriveProvenancekeeps its(messages)signature — no cwd plumbing into the async session-end call sites.2. Skills mirror
internal/skills/selfimprove.gonow delegates to the samememory.ToolCallTaints, so learned skills stop being over-flagged for review too. Trust model stays unified.3. Human-gated promote (CLI only)
Deliberately not an agent tool — a prompt-injected agent must not be able to self-approve its own poisoned memory. Mirrors the existing
skill promotepattern.4. Doc/comment drift fixed
docs/CONFIG.md:extract_on_end("durable facts" → it extracts an episode summary).memory.go:FormatEpisodeContextcomment ("recency-based ranker, no LLM" → it uses the configured, LLM-by-default ranker).docs/SECURITY.md§5/§6 updated for the new model.Security rationale
The control still closes the persistence-injection vector: anything that ingests genuinely external content (web/http/MCP/audio) or reads sensitive system/credential paths is still untrusted and non-recallable until a human promotes it. Only the false-positive (workspace reads) is removed.
Verification
go build ./...,go vet,gofmtclean.go test ./internal/memory/... ./internal/skills/... ./cmd/odek/...— all pass. New coverage:/etc/passwd&~/.ssh/id_rsatainted; default-path search trusted; malformed args conservatively taint;http_batch/transcribealways taint;ToolCallTaintstable.llm.Message → LlmMessageconversion carriesArguments.HOME, seeded index): untrusted web episode listed as pending, clean workspace episode not flagged,promotemakes it recallable, error cases exit 1 cleanly.Note
Only new episodes get the refined provenance; existing on-disk tainted episodes stay as-is —
odek memory promoteis exactly how a user recovers those.🤖 Generated with Claude Code