Skip to content

fix(agents): stop experiment read-only tool-call loop on weak models #349

Description

@w7-mgfcode

Summary

Follow-up to #347 / PR #348. The read-only intent guard correctly stops the Experiment Agent from derailing into scenario/write tools — but on a weak local model (ollama:llama3.1:8b) read-only queries now loop the read tool and never finish, surfacing as "invalid tool call".

Evidence (session 3b5f965b…, all three queries identical)

  • "List the most recent model runs and tell me which has the lowest WAPE." / "list store details" / "list top product"
  • Each: model calls tool_list_runs → it returns the data (10 successful runs incl. WAPE) → model re-calls tool_list_runs 3 more times (tool_call_count 1→2→3→4) → "Exceeded maximum output retries (3)"UnexpectedModelBehavior → error event.
  • Zero scenario/write tool calls — the fix(agents): constrain experiment read-only queries #348 guard works; the derail is gone.
  • The answer was available the whole time: lowest WAPE = naive run 2fad611b… (18.93).

Root cause

Pure llama3.1:8b structured-output weakness (cause D), made terminal because the guard correctly removed the propose_scenario "escape hatch" the model previously used to emit something. The model never terminates the PromptedOutput(ExperimentReport) run — it re-calls the read tool instead of writing the summary.

Desired behavior (prompt-only hardening)

  • Call each read-only tool at most once per question.
  • The moment a read tool returns, STOP calling tools and write the ExperimentReport.summary from what it returned.
  • Never re-call a tool that already answered (the observed 4× tool_list_runs loop).
  • If a read tool returns an empty result, say so in summary ("No model runs found.") rather than retrying.

Acceptance

  • Guard text extended with the one-pass / stop-and-summarize / no-loop / empty-result rules.
  • Deterministic tests assert the new instructions are present in the guard and delivered in the system prompt. No live model calls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions