You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-up to #347 / PR #348. The read-only intent guard correctly stops the Experiment Agent from derailing into scenario/write tools — but on a weak local model (ollama:llama3.1:8b) read-only queries now loop the read tool and never finish, surfacing as "invalid tool call".
Evidence (session 3b5f965b…, all three queries identical)
"List the most recent model runs and tell me which has the lowest WAPE." / "list store details" / "list top product"
Each: model calls tool_list_runs → it returns the data (10 successful runs incl. WAPE) → model re-calls tool_list_runs 3 more times (tool_call_count 1→2→3→4) → "Exceeded maximum output retries (3)" → UnexpectedModelBehavior → error event.
The answer was available the whole time: lowest WAPE = naive run 2fad611b… (18.93).
Root cause
Pure llama3.1:8b structured-output weakness (cause D), made terminal because the guard correctly removed the propose_scenario "escape hatch" the model previously used to emit something. The model never terminates the PromptedOutput(ExperimentReport) run — it re-calls the read tool instead of writing the summary.
Desired behavior (prompt-only hardening)
Call each read-only tool at most once per question.
The moment a read tool returns, STOP calling tools and write the ExperimentReport.summary from what it returned.
Never re-call a tool that already answered (the observed 4× tool_list_runs loop).
If a read tool returns an empty result, say so in summary ("No model runs found.") rather than retrying.
Acceptance
Guard text extended with the one-pass / stop-and-summarize / no-loop / empty-result rules.
Deterministic tests assert the new instructions are present in the guard and delivered in the system prompt. No live model calls.
Summary
Follow-up to #347 / PR #348. The read-only intent guard correctly stops the Experiment Agent from derailing into scenario/write tools — but on a weak local model (
ollama:llama3.1:8b) read-only queries now loop the read tool and never finish, surfacing as "invalid tool call".Evidence (session
3b5f965b…, all three queries identical)tool_list_runs→ it returns the data (10 successful runs incl. WAPE) → model re-callstool_list_runs3 more times (tool_call_count1→2→3→4) →"Exceeded maximum output retries (3)"→UnexpectedModelBehavior→ error event.naiverun2fad611b…(18.93).Root cause
Pure
llama3.1:8bstructured-output weakness (cause D), made terminal because the guard correctly removed thepropose_scenario"escape hatch" the model previously used to emit something. The model never terminates thePromptedOutput(ExperimentReport)run — it re-calls the read tool instead of writing thesummary.Desired behavior (prompt-only hardening)
ExperimentReport.summaryfrom what it returned.tool_list_runsloop).summary("No model runs found.") rather than retrying.Acceptance