diff --git a/PRPs/PRP-reliability-E2-surface-fallback-failures.md b/PRPs/PRP-reliability-E2-surface-fallback-failures.md new file mode 100644 index 00000000..55a7855c --- /dev/null +++ b/PRPs/PRP-reliability-E2-surface-fallback-failures.md @@ -0,0 +1,647 @@ +name: "PRP — Reliability E2: surface fallback model failures with classified, actionable details" +description: | + Parallel epic of umbrella #380 (platform reliability hardening), after Foundation E1 (#334). + Issue: #335 · Branch: `fix/agents-surface-fallback-failures` off `dev` · Commit scope: `agents,api` + (primary surface is `app/features/agents/`; the additive RFC 7807 extension plumbing touches + `app/core/{exceptions,problem_details}.py` = `api`, mirroring E1's scope reasoning). + +--- + +## Goal + +When every model in the PydanticAI `FallbackModel` chain fails (or a single configured model +fails with a provider error), the client must receive a **classified, secret-safe summary of +each per-model failure** — `{model_name, status_code, reason, detail}` — instead of today's +generic `Stream error: All models from FallbackModel failed (2 sub-exceptions)`: + +- **WebSocket `/agents/stream`** — one `error` StreamEvent with `error_type="fallback_exhausted"`, + a human-actionable `error` summary string, and a structured `failures` list. +- **REST `POST /agents/sessions/{id}/chat`** — a **502** `application/problem+json` with + `code="AGENT_FALLBACK_EXHAUSTED"` and a `failures` extension member. + +**Deliverable:** one new classifier module (`app/features/agents/failures.py`), one new schema +(`ModelFailureDetail`), two new `except` arms in `AgentService.chat` / `stream_chat`, one new +core exception (`AgentFallbackExhaustedError`) riding a new additive `extensions` pass-through +in the RFC 7807 helpers, plus tests at classifier / service / route levels. + +**Success definition:** the exact failure from the issue (primary `404` model-not-found + +fallback `429` quota-exhausted) renders in the chat UI as a readable two-leg diagnosis with +zero frontend changes, and a route test proves the REST 502 carries both classified legs. +No secret-like material (API keys, bearer tokens) can appear in any surfaced payload. + +## Why + +- **Diagnosability from the UI.** The 2026-06-01 incident (issue #335) required reading + container logs to learn that the primary leg was a 404 (bad model name) and the fallback leg + a 429 (free-tier quota). Both causes were sitting in `agents.websocket_stream_error`; the + client got an opaque one-liner. +- **E1 (#334) stabilized the surface.** The doubled-prefix 404 class is now rejected at config + time (PR #382), so the classification matrix built here tests against a stable failure + surface (umbrella #380 Foundation ordering). +- **Zero-frontend-change win.** `frontend/src/pages/chat.tsx:95-108` renders + `Error: ${event.data.error}` verbatim — making the backend's `error` string itself the + classified human summary upgrades the UI for free; the structured `failures` list is the + additive machine-readable layer for future UI work. + +## What + +### Behavior change + +| Surface | Today | After | +|---------|-------|-------| +| WS `error` event, both models fail | `error="Stream error: All models from FallbackModel failed (2 sub-exceptions)"`, `error_type="ExceptionGroup"`-ish class name (from `websocket.py` generic catch) | `error_type="fallback_exhausted"`, `error="All configured agent models failed — google-gla:gemini-3-flash-preview: model not found / invalid model name (HTTP 404); google-gla:gemini-2.5-flash: quota or rate limit exhausted (HTTP 429)"`, `failures=[{model_name, status_code, reason, detail}, …]`, `recoverable=true` | +| REST chat, both models fail | uncaught `FallbackExceptionGroup` → generic 500 `INTERNAL_ERROR` problem+json | **502** problem+json, `code="AGENT_FALLBACK_EXHAUSTED"`, `type="/errors/agent-fallback-exhausted"`, `detail=`, `failures=[…]` extension | +| Single-model config (no fallback wired), provider error | same generic surfaces | same classified treatment (a bare `ModelAPIError` is classified as a 1-element `failures` list) | +| Model misbehavior (`UnexpectedModelBehavior`) | salvage → friendly message / `error_type="model_behavior_error"` | **unchanged** — the new arm catches only provider-API failures | +| Secrets in provider response bodies | `str(ModelHTTPError)` embeds `body` verbatim (leak risk if echoed) | surfaced `detail` is extracted → scrubbed (`AIza…`, `sk-…`, `Bearer …`, `api_key=…`) → truncated to 300 chars | + +### Reason classification (exact) + +| Evidence | `reason` | +|----------|----------| +| `ModelHTTPError.status_code == 404` | `model_not_found` | +| `status_code == 429` | `quota_exhausted` | +| `status_code in (401, 403)` | `auth_error` | +| `status_code >= 500` | `provider_unavailable` | +| any other `ModelHTTPError` | `provider_error` | +| non-HTTP `ModelAPIError` (connection, etc.) | `provider_error` (status_code `null`) | +| `pydantic_ai.models.fallback.ResponseRejected` member | `response_rejected` | +| anything else inside the group | `unknown` | + +### Success Criteria + +- [ ] `classify_model_failures` maps 404/429/401/403/5xx/other-HTTP/non-HTTP/`ResponseRejected`/unknown and recurses into nested `ExceptionGroup`s +- [ ] Stream path: a `FallbackExceptionGroup(404 + 429)` raised by `agent.run_stream` yields exactly ONE `error` event with `error_type="fallback_exhausted"`, `recoverable=True`, a 2-entry `failures` list, and a summary naming both models — and the raw group string (`"sub-exceptions"`) does NOT appear +- [ ] REST path: the same failure → 502 `application/problem+json` with `code="AGENT_FALLBACK_EXHAUSTED"` and `failures` extension (route test covers both legs — umbrella #380 criterion) +- [ ] A planted secret (`AIzaFakeKey123…` / `sk-fake…` / `Bearer xyz`) in `ModelHTTPError.body` never appears in any serialized event/response payload (regression test asserts on the full JSON dump) +- [ ] Single bare `ModelAPIError` (no FallbackModel) gets the same classified treatment +- [ ] Existing `model_behavior_error` behavior and tests untouched (only extended) +- [ ] All five validation gates green; `docs/_base/API_CONTRACTS.md` updated additively + +## All Needed Context + +### Documentation & References + +```yaml +# ── Where the failures escape today (the two catch points to add) ──────────── +- file: app/features/agents/service.py + lines: 24-26, 295-354, 520-570, 693-771 + why: | + Imports (line 25 already pulls UnexpectedModelBehavior from pydantic_ai.exceptions — + extend it). chat(): the try at 298-308 wraps agent.run; excepts at 309 (TimeoutError) + and 313 (UnexpectedModelBehavior) — the NEW arm slots between them. stream_chat(): + try at 525 wraps run_stream (533, streaming) AND agent.run (560-568, #342 ollama + non-streaming fallback) — one new arm covers both; excepts at 693/697; the + misbehavior error-yield at 759-770 is the EXACT yield pattern to mirror + (data dict with error/error_type/recoverable, datetime.now(UTC) timestamp, + session.last_activity update + db.flush() before yielding, then `return`). + +# ── The generic backstop that produced the bad UX (do NOT remove — keep as backstop) +- file: app/features/agents/websocket.py + lines: 96-123, 132-158 + why: | + The `except Exception` at 109-123 is what stringified the group today + (f"Stream error: {e}", error_type=type(e).__name__) and logged + "agents.websocket_stream_error". After this PRP the service yields the classified + event BEFORE the exception reaches here; the handler stays as the backstop for + everything else. NO changes in this file. + +# ── Schema home for the new detail model + additive ErrorEvent field ───────── +- file: app/features/agents/schemas.py + lines: 145-163, 229-248, 304-316 + why: | + ChatResponse (no error field — REST errors go through problem+json, NOT this model), + StreamEvent (data is dict[str, Any] — the failures list rides inside data), + ErrorEvent (error/error_type/recoverable) — add Optional `failures` here so the + documented event shape matches what the service emits. Define ModelFailureDetail + in this file (schemas.py is the slice's schema home). + +# ── FallbackModel construction (read-only — explains when a group vs bare error escapes) +- file: app/features/agents/agents/base.py + lines: 168-176, 201-249 + why: | + build_agent_model_with_fallback returns a bare primary model when no distinct + key-backed fallback exists (→ bare ModelAPIError escapes, no group) and + FallbackModel(primary, fallback) otherwise (→ FallbackExceptionGroup escapes when + BOTH legs fail). reset_agent_caches (168) is why PATCH /config/ai applies live — + used by the Level-3 plan. + +# ── RFC 7807 plumbing: the precedent and the two additive core edits ───────── +- file: app/core/exceptions.py + lines: 27-61, 227-254, 262-290 + why: | + ForecastLabError base (gains optional `extensions` kwarg; note `details` is + LOG-ONLY — the handler at 279-288 drops it from the response body, which is WHY + the new extensions channel exists). EmbeddingProviderAuthError (227-254) is the + EXACT precedent to mirror for AgentFallbackExhaustedError: module-level code + constant, error_type_uri from ERROR_TYPES, fixed status 502, narrow __init__. + forecastlab_exception_handler (262-290) passes title=exc.title (derived from code: + "AGENT_FALLBACK_EXHAUSTED" → "Agent Fallback Exhausted") — add extensions pass-through. +- file: app/core/problem_details.py + lines: 28-46, 54-114, 135-199 + why: | + EMBEDDING_AUTH_CODE constant pattern (30) + ERROR_TYPES dict (32-46) — add + AGENT_FALLBACK_EXHAUSTED. ProblemDetail has ConfigDict(extra="allow") (RFC 7807 + extension members are sanctioned). problem_response (169-199) serializes via + model_dump(exclude_none=True) — merge extensions into the serialized dict there + (NOT via ProblemDetail(**extensions); see gotcha on the mypy/pydantic-plugin trap). + +# ── Test patterns to mirror (extend, never weaken) ─────────────────────────── +- file: app/features/agents/tests/test_service.py + lines: 426-480 + why: | + test_stream_chat_model_misbehavior_yields_error_event — THE pattern for the new + stream test: AgentService() + monkeypatch settings.agent_default_model to + "anthropic:claude-test" (line 444 — pins the run_stream path, #342), mock_db AsyncMock with + scalar_one_or_none → sample_active_session fixture, _RaisingStream async CM that + raises on __aenter__, patch.object(service, "_get_agent"), collect events, assert + on events[0].data. Note it asserts the LITERAL "model_behavior_error" (line 478) — + error_type strings are load-bearing; pick "fallback_exhausted" once and keep it stable. +- file: app/features/agents/tests/test_routes.py + lines: 1-60 + why: | + Route tests are @pytest.mark.integration (real Postgres via conftest `client` + fixture). Pattern: create a session via POST /agents/sessions with the agent + factory patched, then exercise the endpoint. The new 502 test patches + AgentService.chat (or _get_agent with an agent whose run raises the group) and + asserts status/content-type/code/failures on the problem+json body. +- file: app/features/agents/tests/conftest.py + why: sample_active_session fixture used by the service tests; client fixture for routes. + +# ── Frontend consumer (READ-ONLY — proves no frontend change is needed) ────── +- file: frontend/src/pages/chat.tsx + lines: 95-108 + why: | + case 'error' renders `Error: ${event.data.error}` verbatim into the transcript. + The human summary string IS the UI improvement. AgentStreamEvent.data is + Record (frontend/src/types/api.ts:601-605) so the additive + failures key needs no type change. + +# ── External references (verified against installed pydantic-ai 1.96.0, 2026-06-11) +- url: https://pydantic.dev/docs/ai/models/overview/ + section: "Fallback Model" + why: FallbackModel semantics — falls back on ModelAPIError; raises FallbackExceptionGroup when all legs fail +- url: https://pydantic.dev/docs/ai/api/pydantic-ai/exceptions/ + why: ModelHTTPError / ModelAPIError / FallbackExceptionGroup API reference +- url: https://docs.python.org/3/library/exceptions.html#exception-groups + why: | + ExceptionGroup.exceptions is a TUPLE; sub-groups can nest — the classifier must + recurse. A plain `except FallbackExceptionGroup:` works (it subclasses Exception); + `except*` syntax is NOT needed and would complicate the single-yield contract. +``` + +### Current Codebase tree (relevant subset) + +``` +app/core/ + exceptions.py # ForecastLabError + handler ← MODIFY (additive) + problem_details.py # ERROR_TYPES + problem_response ← MODIFY (additive) + tests/ # (no problem_details test file today) ← ADD test file +app/features/agents/ + agents/base.py # FallbackModel construction (read-only) + service.py # chat() / stream_chat() except arms ← MODIFY + websocket.py # generic backstop (no change) + schemas.py # StreamEvent / ErrorEvent ← MODIFY (additive) + routes.py # chat endpoint (no change — global handler covers 502) + tests/ + test_service.py # stream/chat error tests ← EXTEND + test_routes.py # integration route tests ← EXTEND +docs/_base/API_CONTRACTS.md # WS ErrorEvent + chat endpoint docs ← EXTEND +``` + +### Desired Codebase tree + +``` +app/features/agents/failures.py # NEW — classify_model_failures / summarize_model_failures / _sanitize +app/features/agents/tests/test_failures.py # NEW — classification matrix + secret-scrub + summary tests +app/core/tests/test_problem_details.py # NEW — extensions merge + reserved-key guard + no-extensions unchanged +``` + +No migration (nothing persisted changes). No frontend changes. No new dependencies. + +### Known Gotchas & Library Quirks + +```python +# ── VERIFIED LIBRARY CLAIM #1: the exception family (pydantic-ai 1.96.0) ────────────── +# uv run python -c " +# from pydantic_ai.exceptions import FallbackExceptionGroup, ModelHTTPError, ModelAPIError +# print(FallbackExceptionGroup.__mro__) # → ExceptionGroup → BaseExceptionGroup → Exception +# print(ModelHTTPError.__mro__) # → ModelAPIError → AgentRunError → RuntimeError +# import inspect; print(inspect.signature(ModelHTTPError.__init__))" +# # → (self, status_code: 'int', model_name: 'str', body: 'object | None' = None) +# ModelHTTPError IS a ModelAPIError → FallbackModel's default fallback_on=(ModelAPIError,) +# catches it per-leg; the group only escapes when ALL legs fail. Re-verify on upgrade. + +# ── VERIFIED LIBRARY CLAIM #2: group anatomy ────────────────────────────────────────── +# uv run python -c " +# from pydantic_ai.exceptions import FallbackExceptionGroup, ModelHTTPError +# g = FallbackExceptionGroup('All models from FallbackModel failed', +# [ModelHTTPError(404, 'm1'), ModelHTTPError(429, 'm2')]) +# print(type(g.exceptions), g.message)" +# # → All models from FallbackModel failed +# .exceptions is an immutable TUPLE (not list). The constructor REJECTS an empty list. +# The message literal 'All models from FallbackModel failed' is what users saw — assert +# it does NOT leak into the new surfaced error string. + +# ── VERIFIED LIBRARY CLAIM #3: str(ModelHTTPError) embeds body VERBATIM ─────────────── +# uv run python -c " +# from pydantic_ai.exceptions import ModelHTTPError +# print(str(ModelHTTPError(404, 'gemini-x', body={'error': {'message': 'nope'}})))" +# # → status_code: 404, model_name: gemini-x, body: {'error': {'message': 'nope'}} +# NEVER put str(exc) or exc.body raw into a client payload. Extract the provider message +# (Google/OpenAI shape body['error']['message'], else str(body)), scrub, truncate (300). +# Issue #335 hard constraint: no API keys / Bearer tokens / AIza… values, ever. + +# ── VERIFIED LIBRARY CLAIM #4: ResponseRejected can be a group member ───────────────── +# uv run python -c " +# from pydantic_ai.models.fallback import ResponseRejected; print(str(ResponseRejected(2)))" +# # → 2 model response(s) rejected by fallback_on handler +# It carries NO model_name → classify with model_name="(response rejected)" or similar +# deterministic placeholder, reason="response_rejected". + +# ── GOTCHA: classification arm placement & non-overlap ──────────────────────────────── +# UnexpectedModelBehavior is NOT a ModelAPIError (separate AgentRunError branches), so +# `except (FallbackExceptionGroup, ModelAPIError) as e:` cannot shadow the existing +# misbehavior arm. Place the new arm AFTER TimeoutError, BEFORE UnexpectedModelBehavior +# in BOTH chat() and stream_chat(). Do NOT attempt _salvage_* in the new arm — nothing +# ran, there is nothing to salvage. + +# ── GOTCHA: inner `except Exception` at service.py:545 ─────────────────────────────── +# stream_text() iteration errors are swallowed by an inner handler (structured-output +# agents can't stream deltas); a mid-stream provider failure re-raises from +# result.get_output() and still hits the OUTER except arms. Put the new arm on the +# OUTER try only — do not touch the inner handler. + +# ── GOTCHA: forecastlab_exception_handler DROPS exc.details from the response ───────── +# app/core/exceptions.py:279-288 logs details but problem_response never receives them. +# That is BY DESIGN (details may carry internals). Do NOT stuff failures into details — +# add the parallel `extensions` channel (default None ⇒ zero behavior change for every +# existing raiser) and pass it through explicitly. + +# ── GOTCHA: merge extensions on the SERIALIZED dict, not via ProblemDetail(**ext) ────── +# ProblemDetail has extra="allow", but unpacking arbitrary **dict[str, Any] into a +# pydantic-plugin-checked constructor risks mypy/pyright --strict errors. problem_response +# already does problem.model_dump(exclude_none=True) — update THAT dict, guarded by a +# reserved-key frozenset {type,title,status,detail,instance,errors,code,request_id}. + +# ── GOTCHA: error_type strings are load-bearing test/UI contracts ───────────────────── +# test_service.py:477 asserts the literal "model_behavior_error". The new literal is +# "fallback_exhausted" — used in service.py, asserted in tests, documented in +# API_CONTRACTS.md. Pick once; never rename casually. + +# ── GOTCHA: StreamEvent.data must stay JSON-serializable ───────────────────────────── +# websocket.py sends event.model_dump(mode="json"). Put PLAIN DICTS in data: +# failures=[f.model_dump(mode="json") for f in details] — not BaseModel instances. + +# ── GOTCHA: .env bleed + settings singleton (only if a test touches Settings) ───────── +# Service tests monkeypatch service.settings fields (see test_service.py:443) — that +# pattern self-restores. If any new test constructs Settings(...), pass _env_file=None +# (RUNBOOKS incident class). + +# ── GOTCHA: Level-3 mutates the operator's persisted config — snapshot/restore ──────── +# PATCH /config/ai persists to app_config AND applies live (reset_agent_caches, +# config/service.py:214-216). The local operator override is agent_default_model= +# ollama:gemma4-agent — GET /config/ai first, restore the exact values after the curl +# matrix (E1 session precedent). + +# ── GOTCHA: repo has mixed CRLF/LF line endings ─────────────────────────────────────── +# Check `git diff --stat` after editing: if a file shows ~all lines changed, your editor +# rewrote line endings — re-edit preserving the file's existing endings. +``` + +## Implementation Blueprint + +### Data models and structure + +```python +# app/features/agents/schemas.py — new model + additive ErrorEvent field + +FailureReason = Literal[ + "model_not_found", "quota_exhausted", "auth_error", + "provider_unavailable", "provider_error", "response_rejected", "unknown", +] + +class ModelFailureDetail(BaseModel): + """One classified per-model failure from a FallbackModel chain (issue #335).""" + model_name: str + status_code: int | None = None + reason: FailureReason + detail: str = "" # sanitized + truncated provider message — NEVER raw body + +class ErrorEvent(BaseModel): + error: str + error_type: str + recoverable: bool = True + failures: list[ModelFailureDetail] | None = None # additive (issue #335) +``` + +```python +# app/features/agents/failures.py — NEW module (pure functions, fully unit-testable) + +_SECRET_PATTERNS = ( + re.compile(r"AIza[0-9A-Za-z_\-]{10,}"), # Google API keys + re.compile(r"sk-[A-Za-z0-9_\-]{10,}"), # OpenAI/Anthropic-style keys + re.compile(r"(?i)bearer\s+[A-Za-z0-9._\-]+"), # Authorization bearer tokens + re.compile(r"(?i)(api[_-]?key|token|authorization)[=:]\s*\S+"), +) +_MAX_DETAIL_LEN = 300 + +def _sanitize(text: str) -> str: + # sub each pattern with "[redacted]", then truncate to _MAX_DETAIL_LEN + +def _provider_message(body: object | None) -> str: + # dict with Google/OpenAI shape → body["error"]["message"]; str → as-is; + # anything else → str(body) or "" — ALWAYS through _sanitize at the call site + +def classify_model_failures(exc: BaseException) -> list[ModelFailureDetail]: + # ExceptionGroup (incl. FallbackExceptionGroup): recurse over exc.exceptions (tuple) + # ModelHTTPError: status map per the reason table; detail=_sanitize(_provider_message(body)) + # ResponseRejected: reason="response_rejected", model_name placeholder + # other ModelAPIError: reason="provider_error", status None, detail=_sanitize(str(exc)) + # fallback: reason="unknown", detail=_sanitize(str(exc)) + +def summarize_model_failures(failures: list[ModelFailureDetail]) -> str: + # deterministic (tests match substrings): + # 1 failure → "The configured agent model failed — {leg}" + # n failures → "All configured agent models failed — {leg}; {leg}; …" + # leg = "{model_name}: {human label} (HTTP {status_code})" (omit HTTP part when None) + # labels: model_not_found→"model not found / invalid model name", + # quota_exhausted→"quota or rate limit exhausted", + # auth_error→"authentication/permission error", + # provider_unavailable→"provider unavailable", + # provider_error→"provider error", response_rejected→"response rejected", + # unknown→"unexpected failure" +``` + +```python +# app/core/problem_details.py — additive +AGENT_FALLBACK_EXHAUSTED_CODE = "AGENT_FALLBACK_EXHAUSTED" # next to EMBEDDING_AUTH_CODE +ERROR_TYPES[AGENT_FALLBACK_EXHAUSTED_CODE] = f"{ERROR_TYPE_BASE}/agent-fallback-exhausted" + +_RESERVED_PROBLEM_KEYS = frozenset( + {"type", "title", "status", "detail", "instance", "errors", "code", "request_id"} +) + +def problem_response(..., extensions: dict[str, Any] | None = None) -> ProblemDetailResponse: + content = problem.model_dump(exclude_none=True) + if extensions: + content.update({k: v for k, v in extensions.items() if k not in _RESERVED_PROBLEM_KEYS}) + return ProblemDetailResponse(status_code=status, content=content) +``` + +```python +# app/core/exceptions.py — additive +class ForecastLabError(Exception): + def __init__(self, message, code=..., status_code=..., details=None, + extensions: dict[str, Any] | None = None) -> None: + ... + self.extensions = extensions or {} # RESPONSE-VISIBLE (details stays log-only) + +# handler: problem_response(..., extensions=exc.extensions or None) + +class AgentFallbackExhaustedError(ForecastLabError): + """502 — every model in the agent's fallback chain failed (issue #335). + + Mirrors EmbeddingProviderAuthError: machine-readable code so clients can + classify; carries the per-model failures as an RFC 7807 extension member. + """ + error_type_uri = ERROR_TYPES[AGENT_FALLBACK_EXHAUSTED_CODE] + def __init__(self, message: str, failures: list[dict[str, Any]]) -> None: + super().__init__(message=message, code=AGENT_FALLBACK_EXHAUSTED_CODE, + status_code=502, extensions={"failures": failures}) +``` + +### Tasks (in order) + +```yaml +Task 1: +MODIFY app/features/agents/schemas.py: + - ADD FailureReason Literal alias + ModelFailureDetail near ErrorEvent (line ~304) + - ADD `failures: list[ModelFailureDetail] | None = None` to ErrorEvent + - PRESERVE every existing field and Literal value on StreamEvent/ErrorEvent + +Task 2: +CREATE app/features/agents/failures.py: + - Pure module: _SECRET_PATTERNS, _sanitize, _provider_message, + classify_model_failures, summarize_model_failures per blueprint + - Imports: pydantic_ai.exceptions (ModelAPIError, ModelHTTPError), + pydantic_ai.models.fallback (ResponseRejected), app.features.agents.schemas + - Recursion guard: ExceptionGroup members may nest — recurse; classify leaves only + +Task 3: +MODIFY app/core/problem_details.py: + - ADD AGENT_FALLBACK_EXHAUSTED_CODE constant next to EMBEDDING_AUTH_CODE (line 30) + - ADD ERROR_TYPES entry "/errors/agent-fallback-exhausted" + - ADD optional `extensions` param to problem_response; merge on the serialized dict + guarded by _RESERVED_PROBLEM_KEYS (see gotcha — do NOT ProblemDetail(**extensions)) + - PRESERVE the no-extensions output byte-for-byte (default None) + +Task 4: +MODIFY app/core/exceptions.py: + - ADD optional `extensions` kwarg on ForecastLabError.__init__ (stored attribute) + - ADD AgentFallbackExhaustedError mirroring EmbeddingProviderAuthError (lines 227-254) + - MODIFY forecastlab_exception_handler: pass extensions=exc.extensions or None + - PRESERVE: details stays log-only; every existing subclass signature unchanged + +Task 5: +MODIFY app/features/agents/service.py: + - EXTEND import line 25: from pydantic_ai.exceptions import ( + FallbackExceptionGroup, ModelAPIError, UnexpectedModelBehavior) + - ADD import: classify_model_failures, summarize_model_failures from .failures; + AgentFallbackExhaustedError from app.core.exceptions + - chat(): NEW arm between TimeoutError (309) and UnexpectedModelBehavior (313): + except (FallbackExceptionGroup, ModelAPIError) as e: + failures = classify_model_failures(e) + logger.warning("agents.chat_fallback_exhausted", session_id=session_id, + failure_count=len(failures), + reasons=[f.reason for f in failures]) # safe fields only + raise AgentFallbackExhaustedError( + summarize_model_failures(failures), + failures=[f.model_dump(mode="json") for f in failures]) from e + - stream_chat(): NEW arm between TimeoutError (693) and UnexpectedModelBehavior (697), + mirroring the misbehavior tail at 759-770: + except (FallbackExceptionGroup, ModelAPIError) as e: + failures = classify_model_failures(e) + logger.warning("agents.stream_chat_fallback_exhausted", ...) # same safe fields + now = datetime.now(UTC); session.last_activity = now; await db.flush() + yield StreamEvent(event_type="error", data={ + "error": summarize_model_failures(failures), + "error_type": "fallback_exhausted", + "recoverable": True, + "failures": [f.model_dump(mode="json") for f in failures], + }, timestamp=now) + return + - PRESERVE: no _salvage_* calls in the new arms; misbehavior arms byte-identical + +Task 6: +CREATE app/features/agents/tests/test_failures.py: + - Classification matrix: parametrize ModelHTTPError statuses + (404→model_not_found, 429→quota_exhausted, 401/403→auth_error, + 500/503→provider_unavailable, 418→provider_error) + - Group of (404 + 429) → 2 details preserving model_name order + - Nested group (group inside group) → flattened leaves + - Bare ModelAPIError (construct a minimal subclass or ModelHTTPError-free instance) + → provider_error, status None + - ResponseRejected member → response_rejected + - Unknown exception → unknown + - Secret scrub: body={"error": {"message": "key AIzaFakeKey1234567890abcdef leaked"}} + → "[redacted]" in detail, "AIza" not in detail; same for "sk-fake…" and "Bearer x.y.z" + - Truncation: 1000-char provider message → len(detail) <= 300 + - summarize_model_failures: exact-substring asserts for 1-leg and 2-leg shapes + +Task 7: +EXTEND app/features/agents/tests/test_service.py: + - TestAgentServiceStreamChat.test_stream_chat_fallback_exhausted_yields_classified_error: + MIRROR test_stream_chat_model_misbehavior_yields_error_event (426-480) exactly, + but _RaisingStream.__aenter__ raises FallbackExceptionGroup( + "All models from FallbackModel failed", + [ModelHTTPError(404, "google-gla:gemini-3-flash-preview", + body={"error": {"message": "models/... is not found"}}), + ModelHTTPError(429, "gemini-2.5-flash", + body={"error": {"message": "RESOURCE_EXHAUSTED ... AIzaFakeKey123456789"}})]) + ASSERT: len(events)==1; event_type=="error"; data["error_type"]=="fallback_exhausted"; + data["recoverable"] is True; len(data["failures"])==2; + failures[0]["reason"]=="model_not_found"; failures[1]["reason"]=="quota_exhausted"; + "sub-exceptions" not in data["error"]; + "AIza" not in json.dumps(events[0].model_dump(mode="json")) + - TestAgentServiceStreamChat.test_stream_chat_bare_model_api_error_classified: + same harness, __aenter__ raises ModelHTTPError(401, "anthropic:claude-test") → + 1 error event, failures==1, reason=="auth_error" + - TestAgentServiceChat.test_chat_fallback_exhausted_raises_classified_error: + MIRROR the chat misbehavior test harness; agent.run = AsyncMock(side_effect=); + pytest.raises(AgentFallbackExhaustedError) → exc.status_code==502, + exc.code=="AGENT_FALLBACK_EXHAUSTED", len(exc.extensions["failures"])==2 + +Task 8: +EXTEND app/features/agents/tests/test_routes.py (integration): + - test_chat_fallback_exhausted_returns_502_problem_json: + create session (patched agent factory, existing pattern), then patch the service + agent so run raises the 404+429 group; POST /agents/sessions/{id}/chat → + ASSERT status 502; headers content-type startswith "application/problem+json"; + body["code"]=="AGENT_FALLBACK_EXHAUSTED"; + body["type"].endswith("/errors/agent-fallback-exhausted"); + len(body["failures"])==2 with both reasons; "request_id" present + +Task 9: +CREATE app/core/tests/test_problem_details.py: + - test_problem_response_without_extensions_unchanged: no extensions → body has no + "failures" key; code/type/status as before + - test_problem_response_merges_extensions: extensions={"failures":[{"a":1}]} → in body + - test_problem_response_extensions_cannot_override_reserved: + extensions={"status": 200, "code": "HACK"} → body keeps the real status/code + +Task 10 (docs, same PR): +EXTEND docs/_base/API_CONTRACTS.md: + - WS `/agents/stream` error bullet: document `error_type="fallback_exhausted"` and the + additive Optional `failures: [{model_name, status_code, reason, detail}]` data key + - agents chat row: note the 502 AGENT_FALLBACK_EXHAUSTED problem+json (additive) +``` + +### Integration Points + +```yaml +DATABASE: none — nothing persisted changes; no migration +ROUTES: none — REST surface comes via the global ForecastLabError handler (502) +WEBSOCKET: service-level yield only; websocket.py generic handler untouched (backstop) +CONFIG: none — no new settings; no change to agent_require_approval (no new mutation surface) +FRONTEND: none — chat.tsx renders the summary string as-is; failures key is additive +DOCS: docs/_base/API_CONTRACTS.md (Task 10) +``` + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check app/features/agents/ app/core/ && uv run ruff format --check . +uv run mypy app/ && uv run pyright app/ # both --strict; zero new errors +``` + +### Level 2: Unit tests (no DB) + +```bash +uv run pytest -v \ + app/features/agents/tests/test_failures.py \ + app/features/agents/tests/test_service.py \ + app/core/tests/test_problem_details.py +# Full unit gate — proves misbehavior/salvage paths and every other consumer untouched: +uv run pytest -v -m "not integration" +``` + +### Level 3: Integration (live API; snapshot config FIRST — see gotcha) + +```bash +docker compose up -d +uv run pytest -v -m integration app/features/agents/tests/test_routes.py + +# Live REST leg (fresh uvicorn; snapshot + restore the operator's persisted config!): +curl -s http://localhost:8123/config/ai # SNAPSHOT current model ids +curl -si -X PATCH http://localhost:8123/config/ai -H 'Content-Type: application/json' \ + -d '{"agent_default_model":"openai:gpt-nonexistent-e2","agent_fallback_model":"openai:gpt-also-nonexistent"}' +SID=$(curl -s -X POST http://localhost:8123/agents/sessions \ + -H 'Content-Type: application/json' -d '{"agent_type":"experiment"}' | python3 -c 'import sys,json;print(json.load(sys.stdin)["session_id"])') +curl -si -X POST http://localhost:8123/agents/sessions/$SID/chat \ + -H 'Content-Type: application/json' -d '{"message":"hello"}' | head -30 +# expect: HTTP/1.1 502, application/problem+json, code AGENT_FALLBACK_EXHAUSTED, +# failures[] with reason "model_not_found" on both legs +curl -si -X PATCH http://localhost:8123/config/ai -H 'Content-Type: application/json' \ + -d '{"agent_default_model":"","agent_fallback_model":""}' # RESTORE +``` + +### Level 4 (optional dogfood): chat UI over WebSocket + +With the broken model pair patched in, open `/chat` (localhost:5173), send a message → +the transcript should show the classified two-leg summary (`model not found … (HTTP 404); …`) +instead of `Stream error: All models from FallbackModel failed`. Restore config after. + +## Final Validation Checklist + +- [ ] `uv run ruff check . && uv run ruff format --check .` clean +- [ ] `uv run mypy app/ && uv run pyright app/` clean (strict) +- [ ] `uv run pytest -v -m "not integration"` green — including the untouched + `model_behavior_error` and salvage tests +- [ ] New tests cover: status matrix, nested group, bare ModelAPIError, ResponseRejected, + secret scrub (AIza/sk-/Bearer), truncation, summary shapes, stream 404+429 event, + stream bare-401 event, chat raise, route 502 (both legs), extensions merge + guard +- [ ] `uv run pytest -v -m integration app/features/agents/tests/test_routes.py` green +- [ ] Level-3 curl matrix matches; operator config snapshot RESTORED and re-verified +- [ ] No secret-like string in any serialized payload (asserted on full JSON dumps) +- [ ] `git diff --stat` shows surgical diffs (no whole-file line-ending churn) +- [ ] Commits: `fix(agents,api): surface fallback model failures with classified details (#335)` + (+ `docs(docs): …` for API_CONTRACTS if split); no AI trailers +- [ ] PR into `dev` from `fix/agents-surface-fallback-failures`; CI green + +--- + +## Out of Scope (this PRP) + +- **Frontend failure-detail rendering** (chips/expandable list from the `failures` key) — + the summary string already lands in the transcript; promote to its own `feat(ui)` issue + if dogfood demands richer rendering. +- **Retry/circuit-breaker middleware or metrics** — explicitly rejected in umbrella #380 + (violates the no-external-observability / single-host principle). +- **Classifying `UsageLimitExceeded` / `ConcurrencyLimitExceeded`** — pydantic-ai usage-cap + errors, not provider failures; today's behavior (generic backstop) stands. +- **Surfacing agent-BUILD failures** (missing API key → `ValueError` in + `build_agent_model_with_fallback`) — a config-time failure class, already log-visible; + separate concern from run-time provider failure. +- **E6 release-gate dogfood** — umbrella #380's own closing epic. + +## Anti-Patterns to Avoid + +- ❌ Don't put `str(exception)` or `exc.body` raw into any client payload — sanitize-then-truncate only. +- ❌ Don't stuff failures into `ForecastLabError.details` — the handler drops it by design; use `extensions`. +- ❌ Don't use `except*` — a plain `except FallbackExceptionGroup` keeps the single-yield contract simple. +- ❌ Don't touch `websocket.py` — the generic handler is the deliberate backstop. +- ❌ Don't salvage (`_salvage_*`) in the new arms — no model ran; there is nothing to salvage. +- ❌ Don't rename `model_behavior_error` or weaken its tests — extend alongside. +- ❌ Don't widen `agent_require_approval` or any mutation surface — this is read-path-only hardening. +- ❌ Don't forget to RESTORE the operator's persisted `ollama:gemma4-agent` override after Level 3. + +--- + +**One-pass confidence score: 8/10** — every catch point, schema, and precedent is +runtime-verified with exact line anchors, and the classifier is a pure module with a mirrored +test harness. Deductions: the stream-test async-CM mocking is fiddly (mitigated by mirroring +test_service.py:426-480 verbatim), and the `extensions` merge must dodge the +pydantic-plugin/strict-mypy trap (mitigated by the serialized-dict merge decision). diff --git a/app/core/exceptions.py b/app/core/exceptions.py index 1e6279ea..d67d9a0b 100644 --- a/app/core/exceptions.py +++ b/app/core/exceptions.py @@ -10,6 +10,7 @@ from app.core.logging import get_logger from app.core.problem_details import ( + AGENT_FALLBACK_EXHAUSTED_CODE, EMBEDDING_AUTH_CODE, ERROR_TYPES, ProblemDetailResponse, @@ -40,6 +41,7 @@ def __init__( code: str = "INTERNAL_ERROR", status_code: int = 500, details: dict[str, Any] | None = None, + extensions: dict[str, Any] | None = None, ) -> None: """Initialize application error. @@ -47,13 +49,19 @@ def __init__( message: Human-readable error message. code: Machine-readable error code. status_code: HTTP status code. - details: Additional error context. + details: Additional error context. LOG-ONLY — the exception + handler never copies it into the response body (it may carry + internals). + extensions: RFC 7807 extension members the handler DOES merge + into the problem+json response body (#335). Only put + client-safe, already-sanitized data here. """ super().__init__(message) self.message = message self.code = code self.status_code = status_code self.details = details or {} + self.extensions = extensions or {} @property def title(self) -> str: @@ -254,6 +262,41 @@ def __init__( ) +class AgentFallbackExhaustedError(ForecastLabError): + """502 — every model in the agent's fallback chain failed (issue #335). + + Raised when the PydanticAI ``FallbackModel`` chain (or a single configured + model) fails with provider-API errors on every leg. Mirrors + :class:`EmbeddingProviderAuthError`: keeps the public status at 502 (an + upstream failure from the caller's perspective) and emits a + *machine-readable* ``AGENT_FALLBACK_EXHAUSTED`` problem ``type``/``code`` + so clients can classify it. The per-model classified failures ride the + response-visible ``extensions`` channel as a ``failures`` member — + ``details`` stays log-only by design. + """ + + error_type_uri: str = ERROR_TYPES[AGENT_FALLBACK_EXHAUSTED_CODE] + + def __init__( + self, + message: str, + failures: list[dict[str, Any]], + ) -> None: + """Initialize with the human summary and classified per-model legs. + + Args: + message: Human-actionable summary (already secret-safe). + failures: Serialized ``ModelFailureDetail`` dicts — sanitized + upstream by the classifier; surfaced verbatim to the client. + """ + super().__init__( + message=message, + code=AGENT_FALLBACK_EXHAUSTED_CODE, + status_code=502, + extensions={"failures": failures}, + ) + + # ============================================================================= # Exception Handlers (RFC 7807) # ============================================================================= @@ -287,6 +330,7 @@ async def forecastlab_exception_handler( title=exc.title, detail=exc.message, error_code=exc.code, + extensions=exc.extensions or None, ) diff --git a/app/core/problem_details.py b/app/core/problem_details.py index f8bba455..1078a1b9 100644 --- a/app/core/problem_details.py +++ b/app/core/problem_details.py @@ -29,6 +29,11 @@ # demo pipeline's classifier) so the marker never drifts between the two. EMBEDDING_AUTH_CODE = "EMBEDDING_AUTH" +# Machine-readable code for an exhausted agent model fallback chain (#335). +# Single source of truth shared by the producer (AgentFallbackExhaustedError) +# and any consumer classifying the 502 — mirrors EMBEDDING_AUTH_CODE. +AGENT_FALLBACK_EXHAUSTED_CODE = "AGENT_FALLBACK_EXHAUSTED" + ERROR_TYPES = { "NOT_FOUND": f"{ERROR_TYPE_BASE}/not-found", "VALIDATION_ERROR": f"{ERROR_TYPE_BASE}/validation", @@ -43,8 +48,16 @@ "SERVICE_UNAVAILABLE": f"{ERROR_TYPE_BASE}/service-unavailable", "GATEWAY_TIMEOUT": f"{ERROR_TYPE_BASE}/gateway-timeout", EMBEDDING_AUTH_CODE: f"{ERROR_TYPE_BASE}/embedding-auth", + AGENT_FALLBACK_EXHAUSTED_CODE: f"{ERROR_TYPE_BASE}/agent-fallback-exhausted", } +# RFC 7807 extension members may never shadow the spec/base fields the +# ProblemDetail schema already owns — reserved keys are dropped from any +# `extensions` merge in problem_response (#335). +_RESERVED_PROBLEM_KEYS = frozenset( + {"type", "title", "status", "detail", "instance", "errors", "code", "request_id"} +) + # ============================================================================= # Problem Detail Schema @@ -172,6 +185,7 @@ def problem_response( detail: str | None = None, error_code: str = "INTERNAL_ERROR", errors: list[dict[str, Any]] | None = None, + extensions: dict[str, Any] | None = None, ) -> ProblemDetailResponse: """Create a ProblemDetailResponse with proper content type. @@ -181,6 +195,9 @@ def problem_response( detail: Detailed explanation (optional). error_code: Internal error code for type URI lookup. errors: Field-level validation errors (optional). + extensions: Additional RFC 7807 extension members merged into the + response body (optional, #335). Reserved base-field keys are + silently dropped — extensions can never shadow type/status/etc. Returns: JSONResponse with problem+json content type. """ @@ -192,9 +209,17 @@ def problem_response( errors=errors, ) + # Merge on the serialized dict (not ProblemDetail(**extensions)) so + # arbitrary extension payloads never fight the pydantic constructor. + content = problem.model_dump(exclude_none=True) + if extensions: + content.update( + {key: value for key, value in extensions.items() if key not in _RESERVED_PROBLEM_KEYS} + ) + return ProblemDetailResponse( status_code=status, - content=problem.model_dump(exclude_none=True), + content=content, ) diff --git a/app/core/tests/test_problem_details.py b/app/core/tests/test_problem_details.py new file mode 100644 index 00000000..9db1673d --- /dev/null +++ b/app/core/tests/test_problem_details.py @@ -0,0 +1,110 @@ +"""Unit tests for RFC 7807 problem_response extension members (issue #335). + +The `extensions` channel lets a ForecastLabError surface client-safe data +(e.g. classified per-model failures) in the problem+json body without going +through the log-only `details` attribute. +""" + +import json +from typing import Any + +import pytest +from fastapi import Request + +from app.core.exceptions import ( + AgentFallbackExhaustedError, + forecastlab_exception_handler, +) +from app.core.problem_details import problem_response + + +def _body(response: Any) -> dict[str, Any]: + """Decode a ProblemDetailResponse body.""" + decoded: dict[str, Any] = json.loads(response.body) + return decoded + + +def test_problem_response_without_extensions_unchanged() -> None: + """Default (no extensions) output keeps the existing shape exactly.""" + response = problem_response( + status=404, + title="Not Found", + detail="Resource not found", + error_code="NOT_FOUND", + ) + + body = _body(response) + assert response.status_code == 404 + assert body["status"] == 404 + assert body["code"] == "NOT_FOUND" + assert body["type"] == "/errors/not-found" + assert "failures" not in body + + +def test_problem_response_merges_extensions() -> None: + """Extension members are merged into the serialized body.""" + response = problem_response( + status=502, + title="Agent Fallback Exhausted", + detail="All configured agent models failed", + error_code="AGENT_FALLBACK_EXHAUSTED", + extensions={"failures": [{"model_name": "m1", "reason": "model_not_found"}]}, + ) + + body = _body(response) + assert body["code"] == "AGENT_FALLBACK_EXHAUSTED" + assert body["type"] == "/errors/agent-fallback-exhausted" + assert body["failures"] == [{"model_name": "m1", "reason": "model_not_found"}] + + +def test_problem_response_extensions_cannot_override_reserved() -> None: + """Reserved base-field keys in extensions are silently dropped.""" + response = problem_response( + status=502, + title="Agent Fallback Exhausted", + detail="real detail", + error_code="AGENT_FALLBACK_EXHAUSTED", + extensions={ + "status": 200, + "code": "HACK", + "detail": "spoofed", + "type": "about:blank", + "title": "spoofed", + "safe_key": "kept", + }, + ) + + body = _body(response) + assert response.status_code == 502 + assert body["status"] == 502 + assert body["code"] == "AGENT_FALLBACK_EXHAUSTED" + assert body["detail"] == "real detail" + assert body["type"] == "/errors/agent-fallback-exhausted" + assert body["title"] == "Agent Fallback Exhausted" + assert body["safe_key"] == "kept" + + +@pytest.mark.asyncio +async def test_exception_handler_propagates_extensions() -> None: + """The full exception → handler → problem+json path carries extensions. + + Guards the wiring: ForecastLabError.extensions must reach the response + body via forecastlab_exception_handler's pass-through (issue #335). + """ + failures = [ + {"model_name": "m1", "status_code": 404, "reason": "model_not_found", "detail": ""}, + {"model_name": "m2", "status_code": 429, "reason": "quota_exhausted", "detail": ""}, + ] + exc = AgentFallbackExhaustedError("All configured agent models failed", failures=failures) + request = Request(scope={"type": "http", "method": "POST", "path": "/", "headers": []}) + + response = await forecastlab_exception_handler(request, exc) + + body = _body(response) + assert response.status_code == 502 + assert body["status"] == 502 + assert body["code"] == "AGENT_FALLBACK_EXHAUSTED" + assert body["type"] == "/errors/agent-fallback-exhausted" + assert body["title"] == "Agent Fallback Exhausted" + assert body["detail"] == "All configured agent models failed" + assert body["failures"] == failures diff --git a/app/features/agents/failures.py b/app/features/agents/failures.py new file mode 100644 index 00000000..57b56803 --- /dev/null +++ b/app/features/agents/failures.py @@ -0,0 +1,156 @@ +"""Classify provider-API model failures into secret-safe, actionable details. + +When every model in the PydanticAI ``FallbackModel`` chain fails (or a single +configured model fails with a provider error), the raw exception surface is an +opaque one-liner (``All models from FallbackModel failed (2 sub-exceptions)``) +and ``str(ModelHTTPError)`` embeds the provider response body verbatim — a +secret-leak risk. This module turns that exception tree into a list of +:class:`ModelFailureDetail` entries plus a deterministic human summary that the +chat UI renders as-is (issue #335). + +Pure functions only — fully unit-testable without a DB or network. +""" + +from __future__ import annotations + +import re + +from pydantic_ai.exceptions import ModelAPIError, ModelHTTPError +from pydantic_ai.models.fallback import ResponseRejected + +from app.features.agents.schemas import FailureReason, ModelFailureDetail + +# Secret-shaped substrings scrubbed from any surfaced provider message. +# Issue #335 hard constraint: no API keys / Bearer tokens, ever. +_SECRET_PATTERNS: tuple[re.Pattern[str], ...] = ( + re.compile(r"AIza[0-9A-Za-z_\-]{10,}"), # Google API keys + re.compile(r"sk-[A-Za-z0-9_\-]{10,}"), # OpenAI/Anthropic-style keys + re.compile(r"(?i)bearer\s+[A-Za-z0-9._\-]+"), # Authorization bearer tokens + re.compile(r"(?i)(api[_-]?key|token|authorization)[=:]\s*\S+"), +) + +# Cap on the surfaced per-model detail string. +_MAX_DETAIL_LEN = 300 + +# Placeholder model names for failures that carry none. +_RESPONSE_REJECTED_MODEL = "(response rejected)" +_UNKNOWN_MODEL = "(unknown model)" + +# Human labels for the summary string (rendered verbatim by the chat UI). +_REASON_LABELS: dict[FailureReason, str] = { + "model_not_found": "model not found / invalid model name", + "quota_exhausted": "quota or rate limit exhausted", + "auth_error": "authentication/permission error", + "provider_unavailable": "provider unavailable", + "provider_error": "provider error", + "response_rejected": "response rejected", + "unknown": "unexpected failure", +} + + +def _sanitize(text: str) -> str: + """Scrub secret-shaped substrings, then truncate to the detail cap.""" + for pattern in _SECRET_PATTERNS: + text = pattern.sub("[redacted]", text) + return text[:_MAX_DETAIL_LEN] + + +def _provider_message(body: object | None) -> str: + """Extract the provider's human message from an HTTP error body. + + Handles the Google/OpenAI ``{"error": {"message": ...}}`` shape; a plain + string passes through; anything else is stringified. Callers MUST pass the + result through :func:`_sanitize` before surfacing it. + """ + if body is None: + return "" + if isinstance(body, dict): + error = body.get("error") + if isinstance(error, dict): + message = error.get("message") + if isinstance(message, str): + return message + if isinstance(body, str): + return body + return str(body) + + +def _classify_http_status(status_code: int) -> FailureReason: + """Map an HTTP status to the issue #335 reason taxonomy.""" + if status_code == 404: + return "model_not_found" + if status_code == 429: + return "quota_exhausted" + if status_code in (401, 403): + return "auth_error" + if status_code >= 500: + return "provider_unavailable" + return "provider_error" + + +def classify_model_failures(exc: BaseException) -> list[ModelFailureDetail]: + """Flatten an exception (group) into classified per-model failures. + + ``FallbackExceptionGroup.exceptions`` is a tuple and sub-groups can nest — + recurse into groups and classify only the leaves, preserving leg order. + """ + if isinstance(exc, BaseExceptionGroup): + details: list[ModelFailureDetail] = [] + for member in exc.exceptions: + details.extend(classify_model_failures(member)) + return details + if isinstance(exc, ModelHTTPError): + return [ + ModelFailureDetail( + model_name=exc.model_name, + status_code=exc.status_code, + reason=_classify_http_status(exc.status_code), + detail=_sanitize(_provider_message(exc.body)), + ) + ] + if isinstance(exc, ResponseRejected): + return [ + ModelFailureDetail( + model_name=_RESPONSE_REJECTED_MODEL, + status_code=None, + reason="response_rejected", + detail=_sanitize(str(exc)), + ) + ] + if isinstance(exc, ModelAPIError): + return [ + ModelFailureDetail( + model_name=exc.model_name, + status_code=None, + reason="provider_error", + detail=_sanitize(str(exc)), + ) + ] + return [ + ModelFailureDetail( + model_name=_UNKNOWN_MODEL, + status_code=None, + reason="unknown", + detail=_sanitize(str(exc)), + ) + ] + + +def summarize_model_failures(failures: list[ModelFailureDetail]) -> str: + """Build the deterministic human summary the chat UI renders verbatim. + + One failure → ``The configured agent model failed — {leg}``; several → + ``All configured agent models failed — {leg}; {leg}; …`` where each leg is + ``{model_name}: {label} (HTTP {status_code})`` (HTTP part omitted when the + failure was not HTTP-shaped). + """ + legs: list[str] = [] + for failure in failures: + leg = f"{failure.model_name}: {_REASON_LABELS[failure.reason]}" + if failure.status_code is not None: + leg = f"{leg} (HTTP {failure.status_code})" + legs.append(leg) + joined = "; ".join(legs) + if len(failures) == 1: + return f"The configured agent model failed — {joined}" + return f"All configured agent models failed — {joined}" diff --git a/app/features/agents/schemas.py b/app/features/agents/schemas.py index 69b74261..f6f02724 100644 --- a/app/features/agents/schemas.py +++ b/app/features/agents/schemas.py @@ -301,6 +301,33 @@ class CompleteEvent(BaseModel): tool_calls_count: int +FailureReason = Literal[ + "model_not_found", + "quota_exhausted", + "auth_error", + "provider_unavailable", + "provider_error", + "response_rejected", + "unknown", +] + + +class ModelFailureDetail(BaseModel): + """One classified per-model failure from a FallbackModel chain (issue #335). + + Args: + model_name: Provider-prefixed model identifier that failed. + status_code: HTTP status from the provider, when the failure was HTTP. + reason: Machine-readable failure classification. + detail: Sanitized + truncated provider message — NEVER the raw body. + """ + + model_name: str + status_code: int | None = None + reason: FailureReason + detail: str = "" + + class ErrorEvent(BaseModel): """Error event. @@ -308,11 +335,14 @@ class ErrorEvent(BaseModel): error: Error message. error_type: Type of error. recoverable: Whether the session can continue. + failures: Classified per-model failures when ``error_type`` is + ``fallback_exhausted`` (issue #335); ``None`` otherwise. """ error: str error_type: str recoverable: bool = True + failures: list[ModelFailureDetail] | None = None # ============================================================================= diff --git a/app/features/agents/service.py b/app/features/agents/service.py index 6372fd9c..ba865e9f 100644 --- a/app/features/agents/service.py +++ b/app/features/agents/service.py @@ -22,13 +22,15 @@ import structlog from pydantic_ai import Agent, capture_run_messages -from pydantic_ai.exceptions import UnexpectedModelBehavior +from pydantic_ai.exceptions import FallbackExceptionGroup, ModelAPIError, UnexpectedModelBehavior from pydantic_ai.messages import ModelMessage, ModelMessagesTypeAdapter, ToolReturnPart from sqlalchemy import select from sqlalchemy.ext.asyncio import AsyncSession from app.core.config import get_settings +from app.core.exceptions import AgentFallbackExhaustedError from app.features.agents.deps import AgentDeps +from app.features.agents.failures import classify_model_failures, summarize_model_failures from app.features.agents.models import AgentSession, AgentType, SessionStatus from app.features.agents.schemas import ( ApprovalResponse, @@ -310,6 +312,23 @@ async def chat( raise TimeoutError( f"Agent response timed out after {self.settings.agent_timeout_seconds} seconds" ) from e + except (FallbackExceptionGroup, ModelAPIError) as e: + # Every model in the fallback chain failed (or the single + # configured model failed) with a provider-API error before any + # output was produced — nothing to salvage. Classify each leg into + # a secret-safe detail and surface the summary as a 502 + # problem+json via the global handler (issue #335). + failures = classify_model_failures(e) + logger.warning( + "agents.chat_fallback_exhausted", + session_id=session_id, + failure_count=len(failures), + reasons=[f.reason for f in failures], + ) + raise AgentFallbackExhaustedError( + summarize_model_failures(failures), + failures=[f.model_dump(mode="json") for f in failures], + ) from e except UnexpectedModelBehavior as e: # The model misbehaved (e.g. a tool call exceeded its retry budget). # This is recoverable from the user's perspective — surface a clean @@ -694,6 +713,33 @@ async def stream_chat( raise TimeoutError( f"Agent response timed out after {self.settings.agent_timeout_seconds} seconds" ) from e + except (FallbackExceptionGroup, ModelAPIError) as e: + # Every model in the fallback chain failed (or the single + # configured model failed) with a provider-API error before any + # output was produced — nothing to salvage. Yield ONE classified, + # secret-safe `error` event instead of letting the raw exception + # reach the generic WebSocket backstop (issue #335). + failures = classify_model_failures(e) + logger.warning( + "agents.stream_chat_fallback_exhausted", + session_id=session_id, + failure_count=len(failures), + reasons=[f.reason for f in failures], + ) + fallback_now = datetime.now(UTC) + session.last_activity = fallback_now + await db.flush() + yield StreamEvent( + event_type="error", + data={ + "error": summarize_model_failures(failures), + "error_type": "fallback_exhausted", + "recoverable": True, + "failures": [f.model_dump(mode="json") for f in failures], + }, + timestamp=fallback_now, + ) + return except UnexpectedModelBehavior as e: # The model misbehaved (e.g. a tool call exceeded its retry budget). # Emit a clean, recoverable `error` event rather than letting the raw diff --git a/app/features/agents/tests/test_failures.py b/app/features/agents/tests/test_failures.py new file mode 100644 index 00000000..cf5a218c --- /dev/null +++ b/app/features/agents/tests/test_failures.py @@ -0,0 +1,232 @@ +"""Unit tests for the model-failure classifier (issue #335). + +Covers the status-code classification matrix, exception-group recursion, +secret scrubbing, detail truncation, and the deterministic human summary. +""" + +import pytest +from pydantic_ai.exceptions import FallbackExceptionGroup, ModelAPIError, ModelHTTPError +from pydantic_ai.models.fallback import ResponseRejected + +from app.features.agents.failures import ( + classify_model_failures, + summarize_model_failures, +) +from app.features.agents.schemas import ModelFailureDetail + + +class TestClassifyModelFailures: + """Classification matrix for classify_model_failures.""" + + @pytest.mark.parametrize( + ("status_code", "expected_reason"), + [ + (404, "model_not_found"), + (429, "quota_exhausted"), + (401, "auth_error"), + (403, "auth_error"), + (500, "provider_unavailable"), + (503, "provider_unavailable"), + (418, "provider_error"), + ], + ) + def test_http_status_matrix(self, status_code: int, expected_reason: str) -> None: + """Each HTTP status maps to its issue #335 reason.""" + failures = classify_model_failures(ModelHTTPError(status_code, "test:model")) + + assert len(failures) == 1 + assert failures[0].model_name == "test:model" + assert failures[0].status_code == status_code + assert failures[0].reason == expected_reason + + def test_fallback_group_preserves_leg_order(self) -> None: + """A 404 + 429 group yields two details in model order.""" + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError(404, "google-gla:gemini-3-flash-preview"), + ModelHTTPError(429, "google-gla:gemini-2.5-flash"), + ], + ) + + failures = classify_model_failures(group) + + assert len(failures) == 2 + assert failures[0].model_name == "google-gla:gemini-3-flash-preview" + assert failures[0].reason == "model_not_found" + assert failures[1].model_name == "google-gla:gemini-2.5-flash" + assert failures[1].reason == "quota_exhausted" + + def test_nested_group_flattens_leaves(self) -> None: + """Sub-groups inside the group are recursed into, not classified as legs.""" + inner = FallbackExceptionGroup( + "inner", + [ModelHTTPError(429, "inner:model")], + ) + outer = FallbackExceptionGroup( + "outer", + [ModelHTTPError(404, "outer:model"), inner], + ) + + failures = classify_model_failures(outer) + + assert [f.model_name for f in failures] == ["outer:model", "inner:model"] + assert [f.reason for f in failures] == ["model_not_found", "quota_exhausted"] + + def test_mixed_group_classifies_unknown_members(self) -> None: + """A group mixing known and unexpected members flattens in order, + classifying the unexpected member as unknown.""" + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError(404, "google-gla:gemini-3-flash-preview"), + RuntimeError("boom"), + ], + ) + + failures = classify_model_failures(group) + + assert len(failures) == 2 + assert failures[0].model_name == "google-gla:gemini-3-flash-preview" + assert failures[0].reason == "model_not_found" + assert failures[1].reason == "unknown" + assert failures[1].status_code is None + assert "boom" in failures[1].detail + + def test_bare_model_api_error_is_provider_error(self) -> None: + """A non-HTTP ModelAPIError (connection failure) → provider_error, no status.""" + failures = classify_model_failures( + ModelAPIError("ollama:gemma4-agent", "connection refused") + ) + + assert len(failures) == 1 + assert failures[0].model_name == "ollama:gemma4-agent" + assert failures[0].status_code is None + assert failures[0].reason == "provider_error" + + def test_response_rejected_member(self) -> None: + """A ResponseRejected group member classifies as response_rejected.""" + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ResponseRejected(2)], + ) + + failures = classify_model_failures(group) + + assert len(failures) == 1 + assert failures[0].reason == "response_rejected" + assert failures[0].status_code is None + + def test_unknown_exception_is_unknown(self) -> None: + """Anything else inside the group classifies as unknown.""" + failures = classify_model_failures(RuntimeError("boom")) + + assert len(failures) == 1 + assert failures[0].reason == "unknown" + assert failures[0].status_code is None + assert "boom" in failures[0].detail + + @pytest.mark.parametrize( + "secret", + [ + "AIzaFakeKey1234567890abcdef", + "sk-fakekey1234567890abcdef", + "Bearer xyz.abc-123", + "api_key=supersecretvalue", + ], + ) + def test_secret_scrubbed_from_detail(self, secret: str) -> None: + """Secret-shaped substrings in the provider body never reach the detail.""" + exc = ModelHTTPError( + 429, + "test:model", + body={"error": {"message": f"quota exceeded for {secret} retry later"}}, + ) + + failures = classify_model_failures(exc) + + assert "[redacted]" in failures[0].detail + assert "AIzaFake" not in failures[0].detail + assert "sk-fake" not in failures[0].detail + assert "xyz.abc-123" not in failures[0].detail + assert "supersecretvalue" not in failures[0].detail + + def test_detail_truncated_to_cap(self) -> None: + """A 1000-char provider message is truncated to the 300-char cap.""" + exc = ModelHTTPError( + 500, + "test:model", + body={"error": {"message": "x" * 1000}}, + ) + + failures = classify_model_failures(exc) + + assert len(failures[0].detail) <= 300 + + def test_provider_message_string_body(self) -> None: + """A plain-string body passes through (sanitized).""" + failures = classify_model_failures(ModelHTTPError(404, "test:model", body="not found")) + + assert failures[0].detail == "not found" + + def test_provider_message_none_body(self) -> None: + """A missing body yields an empty detail.""" + failures = classify_model_failures(ModelHTTPError(404, "test:model")) + + assert failures[0].detail == "" + + +class TestSummarizeModelFailures: + """Deterministic summary shapes (rendered verbatim by the chat UI).""" + + def test_single_leg_shape(self) -> None: + failures = [ + ModelFailureDetail( + model_name="anthropic:claude-test", + status_code=401, + reason="auth_error", + ) + ] + + summary = summarize_model_failures(failures) + + assert summary == ( + "The configured agent model failed — " + "anthropic:claude-test: authentication/permission error (HTTP 401)" + ) + + def test_two_leg_shape(self) -> None: + failures = [ + ModelFailureDetail( + model_name="google-gla:gemini-3-flash-preview", + status_code=404, + reason="model_not_found", + ), + ModelFailureDetail( + model_name="google-gla:gemini-2.5-flash", + status_code=429, + reason="quota_exhausted", + ), + ] + + summary = summarize_model_failures(failures) + + assert summary == ( + "All configured agent models failed — " + "google-gla:gemini-3-flash-preview: model not found / invalid model name (HTTP 404); " + "google-gla:gemini-2.5-flash: quota or rate limit exhausted (HTTP 429)" + ) + + def test_non_http_leg_omits_status(self) -> None: + failures = [ + ModelFailureDetail( + model_name="ollama:gemma4-agent", + status_code=None, + reason="provider_error", + ) + ] + + summary = summarize_model_failures(failures) + + assert "(HTTP" not in summary + assert "ollama:gemma4-agent: provider error" in summary diff --git a/app/features/agents/tests/test_routes.py b/app/features/agents/tests/test_routes.py index d53bb914..12ff4711 100644 --- a/app/features/agents/tests/test_routes.py +++ b/app/features/agents/tests/test_routes.py @@ -7,6 +7,7 @@ import pytest from httpx import AsyncClient +from pydantic_ai.exceptions import FallbackExceptionGroup, ModelHTTPError from app.features.agents.schemas import ExperimentReport @@ -162,6 +163,56 @@ async def test_chat_session_not_found(self, client: AsyncClient) -> None: assert response.status_code == 404 + @pytest.mark.asyncio + async def test_chat_fallback_exhausted_returns_502_problem_json( + self, client: AsyncClient + ) -> None: + """Both fallback legs failing → 502 problem+json with classified + per-model failures (#335, umbrella #380 route-level criterion).""" + with patch("app.features.agents.agents.experiment.get_experiment_agent") as mock_get: + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError( + 404, + "google-gla:gemini-3-flash-preview", + body={"error": {"message": "models/gemini-3-flash-preview is not found"}}, + ), + ModelHTTPError( + 429, + "google-gla:gemini-2.5-flash", + body={"error": {"message": "RESOURCE_EXHAUSTED key AIzaFakeKey123456789"}}, + ), + ], + ) + mock_agent = MagicMock() + mock_agent.run = AsyncMock(side_effect=group) + mock_get.return_value = mock_agent + + create_response = await client.post( + "/agents/sessions", + json={"agent_type": "experiment"}, + ) + session_id = create_response.json()["session_id"] + + response = await client.post( + f"/agents/sessions/{session_id}/chat", + json={"message": "hello"}, + ) + + assert response.status_code == 502 + assert response.headers["content-type"].startswith("application/problem+json") + body = response.json() + assert body["code"] == "AGENT_FALLBACK_EXHAUSTED" + assert body["type"].endswith("/errors/agent-fallback-exhausted") + assert len(body["failures"]) == 2 + assert body["failures"][0]["reason"] == "model_not_found" + assert body["failures"][1]["reason"] == "quota_exhausted" + assert "request_id" in body + # The opaque group string and secrets must never reach the client. + assert "sub-exceptions" not in body["detail"] + assert "AIza" not in response.text + @pytest.mark.integration class TestApprovalRoutes: diff --git a/app/features/agents/tests/test_schemas.py b/app/features/agents/tests/test_schemas.py index 7294a294..9c50ea31 100644 --- a/app/features/agents/tests/test_schemas.py +++ b/app/features/agents/tests/test_schemas.py @@ -12,8 +12,10 @@ ChatMessage, ChatRequest, ChatResponse, + ErrorEvent, ExperimentPlan, ExperimentReport, + ModelFailureDetail, PendingAction, RAGAnswer, SessionCreateRequest, @@ -304,6 +306,38 @@ def test_error_event(self) -> None: assert event.event_type == "error" +class TestErrorEvent: + """Tests for the ErrorEvent schema (failures added by issue #335).""" + + def test_non_fallback_error_has_no_failures(self) -> None: + """Non-fallback error types must keep failures None in serialized output.""" + event = ErrorEvent( + error="The assistant produced an invalid tool call.", + error_type="model_behavior_error", + ) + + serialized = event.model_dump(mode="json") + assert serialized.get("failures") is None + + def test_fallback_exhausted_carries_failures(self) -> None: + """fallback_exhausted events carry the classified per-model failures.""" + event = ErrorEvent( + error="All configured agent models failed", + error_type="fallback_exhausted", + failures=[ + ModelFailureDetail( + model_name="google-gla:gemini-3-flash-preview", + status_code=404, + reason="model_not_found", + ) + ], + ) + + serialized = event.model_dump(mode="json") + assert serialized["failures"][0]["reason"] == "model_not_found" + assert serialized["failures"][0]["status_code"] == 404 + + class TestExperimentPlan: """Tests for ExperimentPlan schema.""" diff --git a/app/features/agents/tests/test_service.py b/app/features/agents/tests/test_service.py index 759e0284..612783cb 100644 --- a/app/features/agents/tests/test_service.py +++ b/app/features/agents/tests/test_service.py @@ -8,7 +8,11 @@ import pytest from pydantic_ai import Agent -from pydantic_ai.exceptions import UnexpectedModelBehavior +from pydantic_ai.exceptions import ( + FallbackExceptionGroup, + ModelHTTPError, + UnexpectedModelBehavior, +) from pydantic_ai.messages import ( ModelMessage, ModelRequest, @@ -18,6 +22,7 @@ UserPromptPart, ) +from app.core.exceptions import AgentFallbackExhaustedError from app.features.agents.deps import AgentDeps from app.features.agents.models import AgentSession, AgentType, SessionStatus from app.features.agents.schemas import ExperimentReport @@ -422,6 +427,58 @@ async def test_chat_runs_tools_sequentially( mock_mode.assert_called_once_with("sequential") + @pytest.mark.asyncio + async def test_chat_fallback_exhausted_raises_classified_error( + self, + sample_active_session: AgentSession, + ) -> None: + """A FallbackExceptionGroup from agent.run must raise the classified + 502 AgentFallbackExhaustedError, not bubble the raw group (#335).""" + service = AgentService() + mock_db = AsyncMock() + + mock_result = MagicMock() + mock_result.scalar_one_or_none.return_value = sample_active_session + mock_db.execute.return_value = mock_result + + group = FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError( + 404, + "google-gla:gemini-3-flash-preview", + body={"error": {"message": "models/gemini-3-flash-preview is not found"}}, + ), + ModelHTTPError( + 429, + "google-gla:gemini-2.5-flash", + body={"error": {"message": "RESOURCE_EXHAUSTED key AIzaFakeKey123456789"}}, + ), + ], + ) + mock_agent = MagicMock() + mock_agent.run = AsyncMock(side_effect=group) + + with patch.object(service, "_get_agent", return_value=mock_agent): + with pytest.raises(AgentFallbackExhaustedError) as exc_info: + await service.chat( + db=mock_db, + session_id=sample_active_session.session_id, + message="Hello", + ) + + exc = exc_info.value + assert exc.status_code == 502 + assert exc.code == "AGENT_FALLBACK_EXHAUSTED" + failures = exc.extensions["failures"] + assert len(failures) == 2 + assert failures[0]["reason"] == "model_not_found" + assert failures[1]["reason"] == "quota_exhausted" + assert "sub-exceptions" not in exc.message + # Issue #335 hard constraint: no secret-like material anywhere. + serialized = json.dumps({"message": exc.message, "extensions": exc.extensions}) + assert "AIza" not in serialized + class TestAgentServiceStreamChat: """Tests for streaming chat functionality.""" @@ -477,6 +534,127 @@ async def __aexit__(self, *exc: object) -> bool: assert events[0].data["recoverable"] is True assert events[0].data["error_type"] == "model_behavior_error" assert "exceeded max retries" not in events[0].data["error"] + # failures is exclusive to fallback_exhausted events (#335). + assert "failures" not in events[0].data + + @pytest.mark.asyncio + async def test_stream_chat_fallback_exhausted_yields_classified_error( + self, + sample_active_session: AgentSession, + monkeypatch: pytest.MonkeyPatch, + ) -> None: + """All fallback legs failing must yield ONE classified `error` event + with per-model failures — never the raw group string (#335).""" + service = AgentService() + # Pin a streaming-capable (cloud) provider so this exercises the + # run_stream path regardless of the local .env (#342). + monkeypatch.setattr(service.settings, "agent_default_model", "anthropic:claude-test") + mock_db = AsyncMock() + + mock_result = MagicMock() + mock_result.scalar_one_or_none.return_value = sample_active_session + mock_db.execute.return_value = mock_result + + class _RaisingStream: + """Async context manager that fails on entry like an exhausted chain.""" + + async def __aenter__(self) -> Any: + raise FallbackExceptionGroup( + "All models from FallbackModel failed", + [ + ModelHTTPError( + 404, + "google-gla:gemini-3-flash-preview", + body={ + "error": {"message": "models/gemini-3-flash-preview is not found"} + }, + ), + ModelHTTPError( + 429, + "google-gla:gemini-2.5-flash", + body={ + "error": {"message": "RESOURCE_EXHAUSTED key AIzaFakeKey123456789"} + }, + ), + ], + ) + + async def __aexit__(self, *exc: object) -> bool: + return False + + mock_agent = MagicMock() + mock_agent.run_stream = MagicMock(return_value=_RaisingStream()) + + with patch.object(service, "_get_agent", return_value=mock_agent): + events = [ + event + async for event in service.stream_chat( + db=mock_db, + session_id=sample_active_session.session_id, + message="Hello", + ) + ] + + assert len(events) == 1 + assert events[0].event_type == "error" + assert events[0].data["error_type"] == "fallback_exhausted" + assert events[0].data["recoverable"] is True + failures = events[0].data["failures"] + assert len(failures) == 2 + assert failures[0]["reason"] == "model_not_found" + assert failures[1]["reason"] == "quota_exhausted" + assert "google-gla:gemini-3-flash-preview" in events[0].data["error"] + assert "google-gla:gemini-2.5-flash" in events[0].data["error"] + # The opaque group string must never reach the client. + assert "sub-exceptions" not in events[0].data["error"] + # Issue #335 hard constraint: no secret-like material anywhere. + assert "AIza" not in json.dumps(events[0].model_dump(mode="json")) + + @pytest.mark.asyncio + async def test_stream_chat_bare_model_api_error_classified( + self, + sample_active_session: AgentSession, + monkeypatch: pytest.MonkeyPatch, + ) -> None: + """A bare ModelAPIError (single-model config, no fallback wired) gets + the same classified treatment as a 1-element failures list (#335).""" + service = AgentService() + monkeypatch.setattr(service.settings, "agent_default_model", "anthropic:claude-test") + mock_db = AsyncMock() + + mock_result = MagicMock() + mock_result.scalar_one_or_none.return_value = sample_active_session + mock_db.execute.return_value = mock_result + + class _RaisingStream: + """Async context manager that fails on entry like a provider 401.""" + + async def __aenter__(self) -> Any: + raise ModelHTTPError(401, "anthropic:claude-test") + + async def __aexit__(self, *exc: object) -> bool: + return False + + mock_agent = MagicMock() + mock_agent.run_stream = MagicMock(return_value=_RaisingStream()) + + with patch.object(service, "_get_agent", return_value=mock_agent): + events = [ + event + async for event in service.stream_chat( + db=mock_db, + session_id=sample_active_session.session_id, + message="Hello", + ) + ] + + assert len(events) == 1 + assert events[0].event_type == "error" + assert events[0].data["error_type"] == "fallback_exhausted" + failures = events[0].data["failures"] + assert len(failures) == 1 + assert failures[0]["reason"] == "auth_error" + assert failures[0]["model_name"] == "anthropic:claude-test" @pytest.mark.asyncio async def test_chat_surfaces_pending_action_on_model_misbehavior( diff --git a/docs/_base/API_CONTRACTS.md b/docs/_base/API_CONTRACTS.md index 232b7da3..27d75ea1 100644 --- a/docs/_base/API_CONTRACTS.md +++ b/docs/_base/API_CONTRACTS.md @@ -52,7 +52,7 @@ All endpoints serve JSON; error responses use `application/problem+json` (RFC 78 | rag | DELETE | `/rag/sources/{source_id}` | Delete source + cascaded chunks | | agents | POST | `/agents/sessions` | Create session (`agent_type`: `experiment` or `rag_assistant`) | | agents | GET | `/agents/sessions/{session_id}` | Status + message history (Postgres JSONB) | -| agents | POST | `/agents/sessions/{session_id}/chat` | Send user message; returns full response | +| agents | POST | `/agents/sessions/{session_id}/chat` | Send user message; returns full response. **#335** — when every model in the agent's fallback chain fails with a provider error, returns **502** `application/problem+json` with `code="AGENT_FALLBACK_EXHAUSTED"`, `type=/errors/agent-fallback-exhausted`, and an additive `failures: [{model_name, status_code, reason, detail}]` extension member (secret-scrubbed, 300-char-capped details) | | agents | POST | `/agents/sessions/{session_id}/approve` | Approve/reject a pending tool call (HITL gate) | | agents | DELETE | `/agents/sessions/{session_id}` | Close session | | agents | WS | `/agents/stream` | Token-by-token streaming + tool-call events | @@ -77,7 +77,7 @@ Verified against `app/features/agents/websocket.py` and `app/features/agents/sch - `tool_call_end` — `data: {"tool_name": str, "tool_call_id": str, "result": Any, "duration_ms": float}` (`ToolCallEndEvent`) - `approval_required` — emitted when a tool in `agent_require_approval` is pending; the chat REST `/agents/sessions/{id}/approve` endpoint releases it - `complete` — `data: {"message": str, "tokens_used": int, "tool_calls_count": int}` (`CompleteEvent`) - - `error` — `data: {"error": str, "error_type": str, "recoverable": bool}` (`ErrorEvent`). On `recoverable: false` (e.g., `session_not_found`, `session_expired`), the client should close. + - `error` — `data: {"error": str, "error_type": str, "recoverable": bool}` (`ErrorEvent`). On `recoverable: false` (e.g., `session_not_found`, `session_expired`), the client should close. **#335** — when every model in the agent's fallback chain fails with a provider error, the event carries `error_type="fallback_exhausted"`, `recoverable=true`, a human-actionable per-leg summary in `error`, and an additive Optional `failures: [{model_name, status_code: int|null, reason, detail}]` key (`reason` ∈ `model_not_found` / `quota_exhausted` / `auth_error` / `provider_unavailable` / `provider_error` / `response_rejected` / `unknown`; `detail` is secret-scrubbed and 300-char-capped). ## WebSocket Events (`/demo/stream`)