Skip to content

fix(loop): deliver recovered tool-panic message to the LLM + review cleanups#26

Merged
jkyberneees merged 2 commits into
mainfrom
fix/code-review-followup
Jun 10, 2026
Merged

fix(loop): deliver recovered tool-panic message to the LLM + review cleanups#26
jkyberneees merged 2 commits into
mainfrom
fix/code-review-followup

Conversation

@jkyberneees

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #25: applies the findings from the code review of that branch. One real bug fix plus five small cleanups.

Bug fix

Panicking tool's error message was discarded (internal/loop/loop.go). The tool-execution goroutine recovered the panic and built error: tool "x" panicked: ..., but then hit an early return before results[idx] was assigned. Consequences:

  • the LLM received an empty tool result instead of the error, and
  • the empty string doesn't match the error: prefix check, so the consecutive-tool-error circuit breaker never counted the failure — a repeatedly panicking tool could loop to maxIter with no error signal.

The early return is removed so the recovered message flows through like any other tool error.

The existing TestToolPanic_DoesNotKillAgent was a false positive: its len(body.Messages) == 2 trigger never matched (no system prompt → first request has 1 message), so the mock returned the final answer immediately and the panic path never ran. It now uses a call counter and asserts the panic message actually reaches the LLM — it fails against the old code.

Cleanups

  • internal/loop/loop.go: the anonymous interface{ SetTrustAll(bool) } repeated in three places is now a named trustAllSetter type.
  • internal/telegram/config.go: ODEK_TELEGRAM_ALLOW_ALL is parsed with strconv.ParseBool instead of a custom parseBool that accepted yes/on — consistent truthy grammar with the rest of the config surface; malformed values fail closed. (The extended grammar shipped only in fix(security): audit remediation — Telegram path traversal + approval-grant leak #25 and was never documented; all docs use =true.)
  • internal/telegram/config.go + cmd/odek/telegram.go: new TelegramConfig.HasAllowlist() helper replaces the fail-closed condition duplicated between ValidateConfig and the open-bot startup warning.
  • internal/telegram/handler.go: corrected misleading AllowedChats/AllowedUsers comments (AllowAllUsers is only consulted when both lists are empty), and added a SECURITY note on HandleUpdate that every routed handler must enforce isAllowed — the pattern whose omission caused the callback-query bypass fixed in fix(security): audit remediation — Telegram path traversal + approval-grant leak #25.

Test plan

  • go vet ./... clean
  • go test ./... passes
  • TestToolPanic_DoesNotKillAgent strengthened into a regression test for the panic-output drop

🤖 Generated with Claude Code

…leanups

A panicking tool's recovered error message was discarded: the goroutine
returned before results[idx] was assigned, so the LLM received an empty
tool result and the consecutive-error tracker never counted the failure.
Drop the early return so the panic message flows through like any other
tool error. The existing TestToolPanic_DoesNotKillAgent never exercised
this path (its message-count trigger never matched with no system
prompt); switch it to a call counter and assert the panic message
reaches the LLM.

Review cleanups from the #25 follow-up:
- loop: name the thrice-repeated anonymous interface as trustAllSetter
- telegram: parse ODEK_TELEGRAM_ALLOW_ALL with strconv.ParseBool for a
  consistent truthy grammar (malformed values fail closed)
- telegram: add TelegramConfig.HasAllowlist() and use it in both
  ValidateConfig and the open-bot startup warning
- telegram: correct HandlerConfig allowlist comments (AllowAllUsers is
  only consulted when BOTH lists are empty) and document on HandleUpdate
  that every routed handler must enforce isAllowed

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 10, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
odek fb9a98b Commit Preview URL

Branch Preview URL
Jun 10 2026, 11:28 AM

Verification-protocol remediation for #26: the claim that malformed or
extended truthy values ("yes", "on", "junk") fail closed had no
corresponding test, and HasAllowlist had no direct unit test.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jkyberneees

Copy link
Copy Markdown
Contributor Author

Verification Certificate

(per the AI Verification Protocol v5.2.7, applied within a single-session pipeline — see honesty caveats below)

PR: #26 — fix(loop): deliver recovered tool-panic message to the LLM + review cleanups

Classification: GeneratedCode (correlated-failure risk: code, tests, and review produced by the same model instance)
Generator: Claude Fable 5 (claude-fable-5, Anthropic) — identity known; no formal prompt-lineage manifest (lineage: not_provided)
Pipeline diversity: B=C=D=A (same model instance) → INSUFFICIENT (§0.1 monoculture fallback; maximum correlation penalty applied)
η: ≈ 0.64 (signals: m=skipped, o=0.80, b≈0.90, f=skipped, s=1.0, t=1.0, d=1.0; skipped weights redistributed per §3.5; ρ≈0.25)
Head SHA: fb9a98b

Axes Summary

Axis Status Key Finding
2.1 Semantic Correctness Panic-message fix verified by a strengthened regression test that fails against the old code; the prior test was a false positive (trigger never matched)
2.2 Behavioral Contract ⚠️ One contract clause untested: the open-bot startup warning in cmd/odek/telegram.go (cmd layer has no test exercising that line)
2.3 Security Surface Strict strconv.ParseBool fails closed on malformed ODEK_TELEGRAM_ALLOW_ALL; now pinned by TestConfigFromEnv_allowAllStrictGrammar
2.4 Structural Integrity Named trustAllSetter interface and HasAllowlist() helper remove duplication; no new abstraction layers
2.5 Behavioral Exploration ⚠️ No mutation/fuzz evidence (tooling not in CI); concurrency of results[] writes unchanged from pre-existing design
2.6 Dependency Integrity No dependency changes
2.7 Generator Provenance Generator identity recorded; lineage manifest absent (informational, no ρ penalty per §3.5)
2.8 Adversarial Surface Deterministic pre-scan of diff: no injection markers, unsafe sinks, zero-width/bidi anomalies; diff touches injection-defense code only to improve it
2.9 Documentation Coverage New exported TelegramConfig.HasAllowlist() carries godoc; operator docs verified unaffected (all examples use =true, compatible with the stricter grammar)

Evidence

  • go vet ./... clean; full go test ./... green at fb9a98b
  • Statement coverage on changed functions: runLoop 84.4%, ConfigFromEnv 90.5%, ValidateConfig 93.3%, HasAllowlist 100%, HandleUpdate 100%, isAllowed 100%

Remediations applied during this run

  • Added TestConfigFromEnv_allowAllStrictGrammar — pins that yes/on/junk fail closed (oracle gap on the grammar-change clause)
  • Added TestHasAllowlist — direct unit coverage of the new exported helper

Unverified Gaps

  • Mutation kill rate and fuzz survival not measured (no mutation/fuzz tooling in this repo's CI) — risk: low for a −25/+98 line diff dominated by tests
  • cmd/odek/telegram.go open-bot warning line untested — risk: low (log-only path; condition shares HasAllowlist() with the tested ValidateConfig)

Verdict

  • Human Review REQUIRED — ρ≈0.25 (>0.20) by construction: generator, reviewer, and test author are the same model instance, so the protocol's correlated-failure rule caps the verdict regardless of η

Rationale: all nine axes pass or carry only low-risk gaps, but this pipeline cannot attest independence from its own generator — the human auditor's review of this PR is the required independent check.

🤖 Generated with Claude Code

@jkyberneees jkyberneees merged commit 71c0609 into main Jun 10, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant