fix(agents): stop experiment read-only tool-call loop on weak models

## Summary

Follow-up to #347 / PR #348. The read-only intent guard correctly stops the Experiment Agent from derailing into scenario/write tools — but on a weak local model (`ollama:llama3.1:8b`) read-only queries now **loop the read tool and never finish**, surfacing as "invalid tool call".

## Evidence (session `3b5f965b…`, all three queries identical)

- "List the most recent model runs and tell me which has the lowest WAPE." / "list store details" / "list top product"
- Each: model calls `tool_list_runs` → it **returns the data** (10 successful runs incl. WAPE) → model **re-calls `tool_list_runs` 3 more times** (`tool_call_count` 1→2→3→4) → `"Exceeded maximum output retries (3)"` → `UnexpectedModelBehavior` → error event.
- **Zero** scenario/write tool calls — the #348 guard works; the derail is gone.
- The answer was available the whole time: lowest WAPE = `naive` run `2fad611b…` (18.93).

## Root cause

Pure `llama3.1:8b` structured-output weakness (cause D), made terminal because the guard correctly removed the `propose_scenario` "escape hatch" the model previously used to emit *something*. The model never terminates the `PromptedOutput(ExperimentReport)` run — it re-calls the read tool instead of writing the `summary`.

## Desired behavior (prompt-only hardening)

- Call each read-only tool **at most once** per question.
- The moment a read tool returns, **STOP calling tools** and write the `ExperimentReport.summary` from what it returned.
- **Never re-call** a tool that already answered (the observed 4× `tool_list_runs` loop).
- If a read tool returns an **empty** result, say so in `summary` ("No model runs found.") rather than retrying.

## Acceptance

- Guard text extended with the one-pass / stop-and-summarize / no-loop / empty-result rules.
- Deterministic tests assert the new instructions are present in the guard and delivered in the system prompt. No live model calls.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents): stop experiment read-only tool-call loop on weak models #349

Summary

Evidence (session `3b5f965b…`, all three queries identical)

Root cause

Desired behavior (prompt-only hardening)

Acceptance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fix(agents): stop experiment read-only tool-call loop on weak models #349

Description

Summary

Evidence (session 3b5f965b…, all three queries identical)

Root cause

Desired behavior (prompt-only hardening)

Acceptance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Evidence (session `3b5f965b…`, all three queries identical)