diff --git a/PRPs/PRP-reliability-E2-surface-fallback-failures.md b/PRPs/PRP-reliability-E2-surface-fallback-failures.md
new file mode 100644
index 00000000..55a7855c
--- /dev/null
+++ b/PRPs/PRP-reliability-E2-surface-fallback-failures.md
@@ -0,0 +1,647 @@
+name: "PRP — Reliability E2: surface fallback model failures with classified, actionable details"
+description: |
+  Parallel epic of umbrella #380 (platform reliability hardening), after Foundation E1 (#334).
+  Issue: #335 · Branch: `fix/agents-surface-fallback-failures` off `dev` · Commit scope: `agents,api`
+  (primary surface is `app/features/agents/`; the additive RFC 7807 extension plumbing touches
+  `app/core/{exceptions,problem_details}.py` = `api`, mirroring E1's scope reasoning).
+
+---
+
+## Goal
+
+When every model in the PydanticAI `FallbackModel` chain fails (or a single configured model
+fails with a provider error), the client must receive a **classified, secret-safe summary of
+each per-model failure** — `{model_name, status_code, reason, detail}` — instead of today's
+generic `Stream error: All models from FallbackModel failed (2 sub-exceptions)`:
+
+- **WebSocket `/agents/stream`** — one `error` StreamEvent with `error_type="fallback_exhausted"`,
+  a human-actionable `error` summary string, and a structured `failures` list.
+- **REST `POST /agents/sessions/{id}/chat`** — a **502** `application/problem+json` with
+  `code="AGENT_FALLBACK_EXHAUSTED"` and a `failures` extension member.
+
+**Deliverable:** one new classifier module (`app/features/agents/failures.py`), one new schema
+(`ModelFailureDetail`), two new `except` arms in `AgentService.chat` / `stream_chat`, one new
+core exception (`AgentFallbackExhaustedError`) riding a new additive `extensions` pass-through
+in the RFC 7807 helpers, plus tests at classifier / service / route levels.
+
+**Success definition:** the exact failure from the issue (primary `404` model-not-found +
+fallback `429` quota-exhausted) renders in the chat UI as a readable two-leg diagnosis with
+zero frontend changes, and a route test proves the REST 502 carries both classified legs.
+No secret-like material (API keys, bearer tokens) can appear in any surfaced payload.
+
+## Why
+
+- **Diagnosability from the UI.** The 2026-06-01 incident (issue #335) required reading
+  container logs to learn that the primary leg was a 404 (bad model name) and the fallback leg
+  a 429 (free-tier quota). Both causes were sitting in `agents.websocket_stream_error`; the
+  client got an opaque one-liner.
+- **E1 (#334) stabilized the surface.** The doubled-prefix 404 class is now rejected at config
+  time (PR #382), so the classification matrix built here tests against a stable failure
+  surface (umbrella #380 Foundation ordering).
+- **Zero-frontend-change win.** `frontend/src/pages/chat.tsx:95-108` renders
+  `Error: ${event.data.error}` verbatim — making the backend's `error` string itself the
+  classified human summary upgrades the UI for free; the structured `failures` list is the
+  additive machine-readable layer for future UI work.
+
+## What
+
+### Behavior change
+
+| Surface | Today | After |
+|---------|-------|-------|
+| WS `error` event, both models fail | `error="Stream error: All models from FallbackModel failed (2 sub-exceptions)"`, `error_type="ExceptionGroup"`-ish class name (from `websocket.py` generic catch) | `error_type="fallback_exhausted"`, `error="All configured agent models failed — google-gla:gemini-3-flash-preview: model not found / invalid model name (HTTP 404); google-gla:gemini-2.5-flash: quota or rate limit exhausted (HTTP 429)"`, `failures=[{model_name, status_code, reason, detail}, …]`, `recoverable=true` |
+| REST chat, both models fail | uncaught `FallbackExceptionGroup` → generic 500 `INTERNAL_ERROR` problem+json | **502** problem+json, `code="AGENT_FALLBACK_EXHAUSTED"`, `type="/errors/agent-fallback-exhausted"`, `detail=<same human summary>`, `failures=[…]` extension |
+| Single-model config (no fallback wired), provider error | same generic surfaces | same classified treatment (a bare `ModelAPIError` is classified as a 1-element `failures` list) |
+| Model misbehavior (`UnexpectedModelBehavior`) | salvage → friendly message / `error_type="model_behavior_error"` | **unchanged** — the new arm catches only provider-API failures |
+| Secrets in provider response bodies | `str(ModelHTTPError)` embeds `body` verbatim (leak risk if echoed) | surfaced `detail` is extracted → scrubbed (`AIza…`, `sk-…`, `Bearer …`, `api_key=…`) → truncated to 300 chars |
+
+### Reason classification (exact)
+
+| Evidence | `reason` |
+|----------|----------|
+| `ModelHTTPError.status_code == 404` | `model_not_found` |
+| `status_code == 429` | `quota_exhausted` |
+| `status_code in (401, 403)` | `auth_error` |
+| `status_code >= 500` | `provider_unavailable` |
+| any other `ModelHTTPError` | `provider_error` |
+| non-HTTP `ModelAPIError` (connection, etc.) | `provider_error` (status_code `null`) |
+| `pydantic_ai.models.fallback.ResponseRejected` member | `response_rejected` |
+| anything else inside the group | `unknown` |
+
+### Success Criteria
+
+- [ ] `classify_model_failures` maps 404/429/401/403/5xx/other-HTTP/non-HTTP/`ResponseRejected`/unknown and recurses into nested `ExceptionGroup`s
+- [ ] Stream path: a `FallbackExceptionGroup(404 + 429)` raised by `agent.run_stream` yields exactly ONE `error` event with `error_type="fallback_exhausted"`, `recoverable=True`, a 2-entry `failures` list, and a summary naming both models — and the raw group string (`"sub-exceptions"`) does NOT appear
+- [ ] REST path: the same failure → 502 `application/problem+json` with `code="AGENT_FALLBACK_EXHAUSTED"` and `failures` extension (route test covers both legs — umbrella #380 criterion)
+- [ ] A planted secret (`AIzaFakeKey123…` / `sk-fake…` / `Bearer xyz`) in `ModelHTTPError.body` never appears in any serialized event/response payload (regression test asserts on the full JSON dump)
+- [ ] Single bare `ModelAPIError` (no FallbackModel) gets the same classified treatment
+- [ ] Existing `model_behavior_error` behavior and tests untouched (only extended)
+- [ ] All five validation gates green; `docs/_base/API_CONTRACTS.md` updated additively
+
+## All Needed Context
+
+### Documentation & References
+
+```yaml
+# ── Where the failures escape today (the two catch points to add) ────────────
+- file: app/features/agents/service.py
+  lines: 24-26, 295-354, 520-570, 693-771
+  why: |
+    Imports (line 25 already pulls UnexpectedModelBehavior from pydantic_ai.exceptions —
+    extend it). chat(): the try at 298-308 wraps agent.run; excepts at 309 (TimeoutError)
+    and 313 (UnexpectedModelBehavior) — the NEW arm slots between them. stream_chat():
+    try at 525 wraps run_stream (533, streaming) AND agent.run (560-568, #342 ollama
+    non-streaming fallback) — one new arm covers both; excepts at 693/697; the
+    misbehavior error-yield at 759-770 is the EXACT yield pattern to mirror
+    (data dict with error/error_type/recoverable, datetime.now(UTC) timestamp,
+    session.last_activity update + db.flush() before yielding, then `return`).
+
+# ── The generic backstop that produced the bad UX (do NOT remove — keep as backstop)
+- file: app/features/agents/websocket.py
+  lines: 96-123, 132-158
+  why: |
+    The `except Exception` at 109-123 is what stringified the group today
+    (f"Stream error: {e}", error_type=type(e).__name__) and logged
+    "agents.websocket_stream_error". After this PRP the service yields the classified
+    event BEFORE the exception reaches here; the handler stays as the backstop for
+    everything else. NO changes in this file.
+
+# ── Schema home for the new detail model + additive ErrorEvent field ─────────
+- file: app/features/agents/schemas.py
+  lines: 145-163, 229-248, 304-316
+  why: |
+    ChatResponse (no error field — REST errors go through problem+json, NOT this model),
+    StreamEvent (data is dict[str, Any] — the failures list rides inside data),
+    ErrorEvent (error/error_type/recoverable) — add Optional `failures` here so the
+    documented event shape matches what the service emits. Define ModelFailureDetail
+    in this file (schemas.py is the slice's schema home).
+
+# ── FallbackModel construction (read-only — explains when a group vs bare error escapes)
+- file: app/features/agents/agents/base.py
+  lines: 168-176, 201-249
+  why: |
+    build_agent_model_with_fallback returns a bare primary model when no distinct
+    key-backed fallback exists (→ bare ModelAPIError escapes, no group) and
+    FallbackModel(primary, fallback) otherwise (→ FallbackExceptionGroup escapes when
+    BOTH legs fail). reset_agent_caches (168) is why PATCH /config/ai applies live —
+    used by the Level-3 plan.
+
+# ── RFC 7807 plumbing: the precedent and the two additive core edits ─────────
+- file: app/core/exceptions.py
+  lines: 27-61, 227-254, 262-290
+  why: |
+    ForecastLabError base (gains optional `extensions` kwarg; note `details` is
+    LOG-ONLY — the handler at 279-288 drops it from the response body, which is WHY
+    the new extensions channel exists). EmbeddingProviderAuthError (227-254) is the
+    EXACT precedent to mirror for AgentFallbackExhaustedError: module-level code
+    constant, error_type_uri from ERROR_TYPES, fixed status 502, narrow __init__.
+    forecastlab_exception_handler (262-290) passes title=exc.title (derived from code:
+    "AGENT_FALLBACK_EXHAUSTED" → "Agent Fallback Exhausted") — add extensions pass-through.
+- file: app/core/problem_details.py
+  lines: 28-46, 54-114, 135-199
+  why: |
+    EMBEDDING_AUTH_CODE constant pattern (30) + ERROR_TYPES dict (32-46) — add
+    AGENT_FALLBACK_EXHAUSTED. ProblemDetail has ConfigDict(extra="allow") (RFC 7807
+    extension members are sanctioned). problem_response (169-199) serializes via
+    model_dump(exclude_none=True) — merge extensions into the serialized dict there
+    (NOT via ProblemDetail(**extensions); see gotcha on the mypy/pydantic-plugin trap).
+
+# ── Test patterns to mirror (extend, never weaken) ───────────────────────────
+- file: app/features/agents/tests/test_service.py
+  lines: 426-480
+  why: |
+    test_stream_chat_model_misbehavior_yields_error_event — THE pattern for the new
+    stream test: AgentService() + monkeypatch settings.agent_default_model to
+    "anthropic:claude-test" (line 444 — pins the run_stream path, #342), mock_db AsyncMock with
+    scalar_one_or_none → sample_active_session fixture, _RaisingStream async CM that
+    raises on __aenter__, patch.object(service, "_get_agent"), collect events, assert
+    on events[0].data. Note it asserts the LITERAL "model_behavior_error" (line 478) —
+    error_type strings are load-bearing; pick "fallback_exhausted" once and keep it stable.
+- file: app/features/agents/tests/test_routes.py
+  lines: 1-60
+  why: |
+    Route tests are @pytest.mark.integration (real Postgres via conftest `client`
+    fixture). Pattern: create a session via POST /agents/sessions with the agent
+    factory patched, then exercise the endpoint. The new 502 test patches
+    AgentService.chat (or _get_agent with an agent whose run raises the group) and
+    asserts status/content-type/code/failures on the problem+json body.
+- file: app/features/agents/tests/conftest.py
+  why: sample_active_session fixture used by the service tests; client fixture for routes.
+
+# ── Frontend consumer (READ-ONLY — proves no frontend change is needed) ──────
+- file: frontend/src/pages/chat.tsx
+  lines: 95-108
+  why: |
+    case 'error' renders `Error: ${event.data.error}` verbatim into the transcript.
+    The human summary string IS the UI improvement. AgentStreamEvent.data is
+    Record<string, unknown> (frontend/src/types/api.ts:601-605) so the additive
+    failures key needs no type change.
+
+# ── External references (verified against installed pydantic-ai 1.96.0, 2026-06-11)
+- url: https://pydantic.dev/docs/ai/models/overview/
+  section: "Fallback Model"
+  why: FallbackModel semantics — falls back on ModelAPIError; raises FallbackExceptionGroup when all legs fail
+- url: https://pydantic.dev/docs/ai/api/pydantic-ai/exceptions/
+  why: ModelHTTPError / ModelAPIError / FallbackExceptionGroup API reference
+- url: https://docs.python.org/3/library/exceptions.html#exception-groups
+  why: |
+    ExceptionGroup.exceptions is a TUPLE; sub-groups can nest — the classifier must
+    recurse. A plain `except FallbackExceptionGroup:` works (it subclasses Exception);
+    `except*` syntax is NOT needed and would complicate the single-yield contract.
+```
+
+### Current Codebase tree (relevant subset)
+
+```
+app/core/
+  exceptions.py                     # ForecastLabError + handler                ← MODIFY (additive)
+  problem_details.py                # ERROR_TYPES + problem_response            ← MODIFY (additive)
+  tests/                            # (no problem_details test file today)      ← ADD test file
+app/features/agents/
+  agents/base.py                    # FallbackModel construction                (read-only)
+  service.py                        # chat() / stream_chat() except arms        ← MODIFY
+  websocket.py                      # generic backstop                          (no change)
+  schemas.py                        # StreamEvent / ErrorEvent                  ← MODIFY (additive)
+  routes.py                         # chat endpoint                             (no change — global handler covers 502)
+  tests/
+    test_service.py                 # stream/chat error tests                   ← EXTEND
+    test_routes.py                  # integration route tests                   ← EXTEND
+docs/_base/API_CONTRACTS.md         # WS ErrorEvent + chat endpoint docs        ← EXTEND
+```
+
+### Desired Codebase tree
+
+```
+app/features/agents/failures.py            # NEW — classify_model_failures / summarize_model_failures / _sanitize
+app/features/agents/tests/test_failures.py # NEW — classification matrix + secret-scrub + summary tests
+app/core/tests/test_problem_details.py     # NEW — extensions merge + reserved-key guard + no-extensions unchanged
+```
+
+No migration (nothing persisted changes). No frontend changes. No new dependencies.
+
+### Known Gotchas & Library Quirks
+
+```python
+# ── VERIFIED LIBRARY CLAIM #1: the exception family (pydantic-ai 1.96.0) ──────────────
+#   uv run python -c "
+#   from pydantic_ai.exceptions import FallbackExceptionGroup, ModelHTTPError, ModelAPIError
+#   print(FallbackExceptionGroup.__mro__)   # → ExceptionGroup → BaseExceptionGroup → Exception
+#   print(ModelHTTPError.__mro__)           # → ModelAPIError → AgentRunError → RuntimeError
+#   import inspect; print(inspect.signature(ModelHTTPError.__init__))"
+#   # → (self, status_code: 'int', model_name: 'str', body: 'object | None' = None)
+# ModelHTTPError IS a ModelAPIError → FallbackModel's default fallback_on=(ModelAPIError,)
+# catches it per-leg; the group only escapes when ALL legs fail. Re-verify on upgrade.
+
+# ── VERIFIED LIBRARY CLAIM #2: group anatomy ──────────────────────────────────────────
+#   uv run python -c "
+#   from pydantic_ai.exceptions import FallbackExceptionGroup, ModelHTTPError
+#   g = FallbackExceptionGroup('All models from FallbackModel failed',
+#                              [ModelHTTPError(404, 'm1'), ModelHTTPError(429, 'm2')])
+#   print(type(g.exceptions), g.message)"
+#   # → <class 'tuple'> All models from FallbackModel failed
+# .exceptions is an immutable TUPLE (not list). The constructor REJECTS an empty list.
+# The message literal 'All models from FallbackModel failed' is what users saw — assert
+# it does NOT leak into the new surfaced error string.
+
+# ── VERIFIED LIBRARY CLAIM #3: str(ModelHTTPError) embeds body VERBATIM ───────────────
+#   uv run python -c "
+#   from pydantic_ai.exceptions import ModelHTTPError
+#   print(str(ModelHTTPError(404, 'gemini-x', body={'error': {'message': 'nope'}})))"
+#   # → status_code: 404, model_name: gemini-x, body: {'error': {'message': 'nope'}}
+# NEVER put str(exc) or exc.body raw into a client payload. Extract the provider message
+# (Google/OpenAI shape body['error']['message'], else str(body)), scrub, truncate (300).
+# Issue #335 hard constraint: no API keys / Bearer tokens / AIza… values, ever.
+
+# ── VERIFIED LIBRARY CLAIM #4: ResponseRejected can be a group member ─────────────────
+#   uv run python -c "
+#   from pydantic_ai.models.fallback import ResponseRejected; print(str(ResponseRejected(2)))"
+#   # → 2 model response(s) rejected by fallback_on handler
+# It carries NO model_name → classify with model_name="(response rejected)" or similar
+# deterministic placeholder, reason="response_rejected".
+
+# ── GOTCHA: classification arm placement & non-overlap ────────────────────────────────
+# UnexpectedModelBehavior is NOT a ModelAPIError (separate AgentRunError branches), so
+# `except (FallbackExceptionGroup, ModelAPIError) as e:` cannot shadow the existing
+# misbehavior arm. Place the new arm AFTER TimeoutError, BEFORE UnexpectedModelBehavior
+# in BOTH chat() and stream_chat(). Do NOT attempt _salvage_* in the new arm — nothing
+# ran, there is nothing to salvage.
+
+# ── GOTCHA: inner `except Exception` at service.py:545 ───────────────────────────────
+# stream_text() iteration errors are swallowed by an inner handler (structured-output
+# agents can't stream deltas); a mid-stream provider failure re-raises from
+# result.get_output() and still hits the OUTER except arms. Put the new arm on the
+# OUTER try only — do not touch the inner handler.
+
+# ── GOTCHA: forecastlab_exception_handler DROPS exc.details from the response ─────────
+# app/core/exceptions.py:279-288 logs details but problem_response never receives them.
+# That is BY DESIGN (details may carry internals). Do NOT stuff failures into details —
+# add the parallel `extensions` channel (default None ⇒ zero behavior change for every
+# existing raiser) and pass it through explicitly.
+
+# ── GOTCHA: merge extensions on the SERIALIZED dict, not via ProblemDetail(**ext) ──────
+# ProblemDetail has extra="allow", but unpacking arbitrary **dict[str, Any] into a
+# pydantic-plugin-checked constructor risks mypy/pyright --strict errors. problem_response
+# already does problem.model_dump(exclude_none=True) — update THAT dict, guarded by a
+# reserved-key frozenset {type,title,status,detail,instance,errors,code,request_id}.
+
+# ── GOTCHA: error_type strings are load-bearing test/UI contracts ─────────────────────
+# test_service.py:477 asserts the literal "model_behavior_error". The new literal is
+# "fallback_exhausted" — used in service.py, asserted in tests, documented in
+# API_CONTRACTS.md. Pick once; never rename casually.
+
+# ── GOTCHA: StreamEvent.data must stay JSON-serializable ─────────────────────────────
+# websocket.py sends event.model_dump(mode="json"). Put PLAIN DICTS in data:
+# failures=[f.model_dump(mode="json") for f in details] — not BaseModel instances.
+
+# ── GOTCHA: .env bleed + settings singleton (only if a test touches Settings) ─────────
+# Service tests monkeypatch service.settings fields (see test_service.py:443) — that
+# pattern self-restores. If any new test constructs Settings(...), pass _env_file=None
+# (RUNBOOKS incident class).
+
+# ── GOTCHA: Level-3 mutates the operator's persisted config — snapshot/restore ────────
+# PATCH /config/ai persists to app_config AND applies live (reset_agent_caches,
+# config/service.py:214-216). The local operator override is agent_default_model=
+# ollama:gemma4-agent — GET /config/ai first, restore the exact values after the curl
+# matrix (E1 session precedent).
+
+# ── GOTCHA: repo has mixed CRLF/LF line endings ───────────────────────────────────────
+# Check `git diff --stat` after editing: if a file shows ~all lines changed, your editor
+# rewrote line endings — re-edit preserving the file's existing endings.
+```
+
+## Implementation Blueprint
+
+### Data models and structure
+
+```python
+# app/features/agents/schemas.py — new model + additive ErrorEvent field
+
+FailureReason = Literal[
+    "model_not_found", "quota_exhausted", "auth_error",
+    "provider_unavailable", "provider_error", "response_rejected", "unknown",
+]
+
+class ModelFailureDetail(BaseModel):
+    """One classified per-model failure from a FallbackModel chain (issue #335)."""
+    model_name: str
+    status_code: int | None = None
+    reason: FailureReason
+    detail: str = ""          # sanitized + truncated provider message — NEVER raw body
+
+class ErrorEvent(BaseModel):
+    error: str
+    error_type: str
+    recoverable: bool = True
+    failures: list[ModelFailureDetail] | None = None   # additive (issue #335)
+```
+
+```python
+# app/features/agents/failures.py — NEW module (pure functions, fully unit-testable)
+
+_SECRET_PATTERNS = (
+    re.compile(r"AIza[0-9A-Za-z_\-]{10,}"),                      # Google API keys
+    re.compile(r"sk-[A-Za-z0-9_\-]{10,}"),                        # OpenAI/Anthropic-style keys
+    re.compile(r"(?i)bearer\s+[A-Za-z0-9._\-]+"),                 # Authorization bearer tokens
+    re.compile(r"(?i)(api[_-]?key|token|authorization)[=:]\s*\S+"),
+)
+_MAX_DETAIL_LEN = 300
+
+def _sanitize(text: str) -> str:
+    # sub each pattern with "[redacted]", then truncate to _MAX_DETAIL_LEN
+
+def _provider_message(body: object | None) -> str:
+    # dict with Google/OpenAI shape → body["error"]["message"]; str → as-is;
+    # anything else → str(body) or "" — ALWAYS through _sanitize at the call site
+
+def classify_model_failures(exc: BaseException) -> list[ModelFailureDetail]:
+    # ExceptionGroup (incl. FallbackExceptionGroup): recurse over exc.exceptions (tuple)
+    # ModelHTTPError: status map per the reason table; detail=_sanitize(_provider_message(body))
+    # ResponseRejected: reason="response_rejected", model_name placeholder
+    # other ModelAPIError: reason="provider_error", status None, detail=_sanitize(str(exc))
+    # fallback: reason="unknown", detail=_sanitize(str(exc))
+
+def summarize_model_failures(failures: list[ModelFailureDetail]) -> str:
+    # deterministic (tests match substrings):
+    # 1 failure  → "The configured agent model failed — {leg}"
+    # n failures → "All configured agent models failed — {leg}; {leg}; …"
+    # leg = "{model_name}: {human label} (HTTP {status_code})" (omit HTTP part when None)
+    # labels: model_not_found→"model not found / invalid model name",
+    #         quota_exhausted→"quota or rate limit exhausted",
+    #         auth_error→"authentication/permission error",
+    #         provider_unavailable→"provider unavailable",
+    #         provider_error→"provider error", response_rejected→"response rejected",
+    #         unknown→"unexpected failure"
+```
+
+```python
+# app/core/problem_details.py — additive
+AGENT_FALLBACK_EXHAUSTED_CODE = "AGENT_FALLBACK_EXHAUSTED"          # next to EMBEDDING_AUTH_CODE
+ERROR_TYPES[AGENT_FALLBACK_EXHAUSTED_CODE] = f"{ERROR_TYPE_BASE}/agent-fallback-exhausted"
+
+_RESERVED_PROBLEM_KEYS = frozenset(
+    {"type", "title", "status", "detail", "instance", "errors", "code", "request_id"}
+)
+
+def problem_response(..., extensions: dict[str, Any] | None = None) -> ProblemDetailResponse:
+    content = problem.model_dump(exclude_none=True)
+    if extensions:
+        content.update({k: v for k, v in extensions.items() if k not in _RESERVED_PROBLEM_KEYS})
+    return ProblemDetailResponse(status_code=status, content=content)
+```
+
+```python
+# app/core/exceptions.py — additive
+class ForecastLabError(Exception):
+    def __init__(self, message, code=..., status_code=..., details=None,
+                 extensions: dict[str, Any] | None = None) -> None:
+        ...
+        self.extensions = extensions or {}   # RESPONSE-VISIBLE (details stays log-only)
+
+# handler: problem_response(..., extensions=exc.extensions or None)
+
+class AgentFallbackExhaustedError(ForecastLabError):
+    """502 — every model in the agent's fallback chain failed (issue #335).
+
+    Mirrors EmbeddingProviderAuthError: machine-readable code so clients can
+    classify; carries the per-model failures as an RFC 7807 extension member.
+    """
+    error_type_uri = ERROR_TYPES[AGENT_FALLBACK_EXHAUSTED_CODE]
+    def __init__(self, message: str, failures: list[dict[str, Any]]) -> None:
+        super().__init__(message=message, code=AGENT_FALLBACK_EXHAUSTED_CODE,
+                         status_code=502, extensions={"failures": failures})
+```
+
+### Tasks (in order)
+
+```yaml
+Task 1:
+MODIFY app/features/agents/schemas.py:
+  - ADD FailureReason Literal alias + ModelFailureDetail near ErrorEvent (line ~304)
+  - ADD `failures: list[ModelFailureDetail] | None = None` to ErrorEvent
+  - PRESERVE every existing field and Literal value on StreamEvent/ErrorEvent
+
+Task 2:
+CREATE app/features/agents/failures.py:
+  - Pure module: _SECRET_PATTERNS, _sanitize, _provider_message,
+    classify_model_failures, summarize_model_failures per blueprint
+  - Imports: pydantic_ai.exceptions (ModelAPIError, ModelHTTPError),
+    pydantic_ai.models.fallback (ResponseRejected), app.features.agents.schemas
+  - Recursion guard: ExceptionGroup members may nest — recurse; classify leaves only
+
+Task 3:
+MODIFY app/core/problem_details.py:
+  - ADD AGENT_FALLBACK_EXHAUSTED_CODE constant next to EMBEDDING_AUTH_CODE (line 30)
+  - ADD ERROR_TYPES entry "/errors/agent-fallback-exhausted"
+  - ADD optional `extensions` param to problem_response; merge on the serialized dict
+    guarded by _RESERVED_PROBLEM_KEYS (see gotcha — do NOT ProblemDetail(**extensions))
+  - PRESERVE the no-extensions output byte-for-byte (default None)
+
+Task 4:
+MODIFY app/core/exceptions.py:
+  - ADD optional `extensions` kwarg on ForecastLabError.__init__ (stored attribute)
+  - ADD AgentFallbackExhaustedError mirroring EmbeddingProviderAuthError (lines 227-254)
+  - MODIFY forecastlab_exception_handler: pass extensions=exc.extensions or None
+  - PRESERVE: details stays log-only; every existing subclass signature unchanged
+
+Task 5:
+MODIFY app/features/agents/service.py:
+  - EXTEND import line 25: from pydantic_ai.exceptions import (
+      FallbackExceptionGroup, ModelAPIError, UnexpectedModelBehavior)
+  - ADD import: classify_model_failures, summarize_model_failures from .failures;
+    AgentFallbackExhaustedError from app.core.exceptions
+  - chat(): NEW arm between TimeoutError (309) and UnexpectedModelBehavior (313):
+      except (FallbackExceptionGroup, ModelAPIError) as e:
+          failures = classify_model_failures(e)
+          logger.warning("agents.chat_fallback_exhausted", session_id=session_id,
+                         failure_count=len(failures),
+                         reasons=[f.reason for f in failures])   # safe fields only
+          raise AgentFallbackExhaustedError(
+              summarize_model_failures(failures),
+              failures=[f.model_dump(mode="json") for f in failures]) from e
+  - stream_chat(): NEW arm between TimeoutError (693) and UnexpectedModelBehavior (697),
+    mirroring the misbehavior tail at 759-770:
+      except (FallbackExceptionGroup, ModelAPIError) as e:
+          failures = classify_model_failures(e)
+          logger.warning("agents.stream_chat_fallback_exhausted", ...)  # same safe fields
+          now = datetime.now(UTC); session.last_activity = now; await db.flush()
+          yield StreamEvent(event_type="error", data={
+              "error": summarize_model_failures(failures),
+              "error_type": "fallback_exhausted",
+              "recoverable": True,
+              "failures": [f.model_dump(mode="json") for f in failures],
+          }, timestamp=now)
+          return
+  - PRESERVE: no _salvage_* calls in the new arms; misbehavior arms byte-identical
+
+Task 6:
+CREATE app/features/agents/tests/test_failures.py:
+  - Classification matrix: parametrize ModelHTTPError statuses
+    (404→model_not_found, 429→quota_exhausted, 401/403→auth_error,
+     500/503→provider_unavailable, 418→provider_error)
+  - Group of (404 + 429) → 2 details preserving model_name order
+  - Nested group (group inside group) → flattened leaves
+  - Bare ModelAPIError (construct a minimal subclass or ModelHTTPError-free instance)
+    → provider_error, status None
+  - ResponseRejected member → response_rejected
+  - Unknown exception → unknown
+  - Secret scrub: body={"error": {"message": "key AIzaFakeKey1234567890abcdef leaked"}}
+    → "[redacted]" in detail, "AIza" not in detail; same for "sk-fake…" and "Bearer x.y.z"
+  - Truncation: 1000-char provider message → len(detail) <= 300
+  - summarize_model_failures: exact-substring asserts for 1-leg and 2-leg shapes
+
+Task 7:
+EXTEND app/features/agents/tests/test_service.py:
+  - TestAgentServiceStreamChat.test_stream_chat_fallback_exhausted_yields_classified_error:
+      MIRROR test_stream_chat_model_misbehavior_yields_error_event (426-480) exactly,
+      but _RaisingStream.__aenter__ raises FallbackExceptionGroup(
+        "All models from FallbackModel failed",
+        [ModelHTTPError(404, "google-gla:gemini-3-flash-preview",
+                        body={"error": {"message": "models/... is not found"}}),
+         ModelHTTPError(429, "gemini-2.5-flash",
+                        body={"error": {"message": "RESOURCE_EXHAUSTED ... AIzaFakeKey123456789"}})])
+      ASSERT: len(events)==1; event_type=="error"; data["error_type"]=="fallback_exhausted";
+      data["recoverable"] is True; len(data["failures"])==2;
+      failures[0]["reason"]=="model_not_found"; failures[1]["reason"]=="quota_exhausted";
+      "sub-exceptions" not in data["error"];
+      "AIza" not in json.dumps(events[0].model_dump(mode="json"))
+  - TestAgentServiceStreamChat.test_stream_chat_bare_model_api_error_classified:
+      same harness, __aenter__ raises ModelHTTPError(401, "anthropic:claude-test") →
+      1 error event, failures==1, reason=="auth_error"
+  - TestAgentServiceChat.test_chat_fallback_exhausted_raises_classified_error:
+      MIRROR the chat misbehavior test harness; agent.run = AsyncMock(side_effect=<group>);
+      pytest.raises(AgentFallbackExhaustedError) → exc.status_code==502,
+      exc.code=="AGENT_FALLBACK_EXHAUSTED", len(exc.extensions["failures"])==2
+
+Task 8:
+EXTEND app/features/agents/tests/test_routes.py (integration):
+  - test_chat_fallback_exhausted_returns_502_problem_json:
+      create session (patched agent factory, existing pattern), then patch the service
+      agent so run raises the 404+429 group; POST /agents/sessions/{id}/chat →
+      ASSERT status 502; headers content-type startswith "application/problem+json";
+      body["code"]=="AGENT_FALLBACK_EXHAUSTED";
+      body["type"].endswith("/errors/agent-fallback-exhausted");
+      len(body["failures"])==2 with both reasons; "request_id" present
+
+Task 9:
+CREATE app/core/tests/test_problem_details.py:
+  - test_problem_response_without_extensions_unchanged: no extensions → body has no
+    "failures" key; code/type/status as before
+  - test_problem_response_merges_extensions: extensions={"failures":[{"a":1}]} → in body
+  - test_problem_response_extensions_cannot_override_reserved:
+    extensions={"status": 200, "code": "HACK"} → body keeps the real status/code
+
+Task 10 (docs, same PR):
+EXTEND docs/_base/API_CONTRACTS.md:
+  - WS `/agents/stream` error bullet: document `error_type="fallback_exhausted"` and the
+    additive Optional `failures: [{model_name, status_code, reason, detail}]` data key
+  - agents chat row: note the 502 AGENT_FALLBACK_EXHAUSTED problem+json (additive)
+```
+
+### Integration Points
+
+```yaml
+DATABASE:  none — nothing persisted changes; no migration
+ROUTES:    none — REST surface comes via the global ForecastLabError handler (502)
+WEBSOCKET: service-level yield only; websocket.py generic handler untouched (backstop)
+CONFIG:    none — no new settings; no change to agent_require_approval (no new mutation surface)
+FRONTEND:  none — chat.tsx renders the summary string as-is; failures key is additive
+DOCS:      docs/_base/API_CONTRACTS.md (Task 10)
+```
+
+## Validation Loop
+
+### Level 1: Syntax & Style
+
+```bash
+uv run ruff check app/features/agents/ app/core/ && uv run ruff format --check .
+uv run mypy app/ && uv run pyright app/        # both --strict; zero new errors
+```
+
+### Level 2: Unit tests (no DB)
+
+```bash
+uv run pytest -v \
+  app/features/agents/tests/test_failures.py \
+  app/features/agents/tests/test_service.py \
+  app/core/tests/test_problem_details.py
+# Full unit gate — proves misbehavior/salvage paths and every other consumer untouched:
+uv run pytest -v -m "not integration"
+```
+
+### Level 3: Integration (live API; snapshot config FIRST — see gotcha)
+
+```bash
+docker compose up -d
+uv run pytest -v -m integration app/features/agents/tests/test_routes.py
+
+# Live REST leg (fresh uvicorn; snapshot + restore the operator's persisted config!):
+curl -s http://localhost:8123/config/ai          # SNAPSHOT current model ids
+curl -si -X PATCH http://localhost:8123/config/ai -H 'Content-Type: application/json' \
+  -d '{"agent_default_model":"openai:gpt-nonexistent-e2","agent_fallback_model":"openai:gpt-also-nonexistent"}'
+SID=$(curl -s -X POST http://localhost:8123/agents/sessions \
+  -H 'Content-Type: application/json' -d '{"agent_type":"experiment"}' | python3 -c 'import sys,json;print(json.load(sys.stdin)["session_id"])')
+curl -si -X POST http://localhost:8123/agents/sessions/$SID/chat \
+  -H 'Content-Type: application/json' -d '{"message":"hello"}' | head -30
+#   expect: HTTP/1.1 502, application/problem+json, code AGENT_FALLBACK_EXHAUSTED,
+#           failures[] with reason "model_not_found" on both legs
+curl -si -X PATCH http://localhost:8123/config/ai -H 'Content-Type: application/json' \
+  -d '{"agent_default_model":"<snapshot>","agent_fallback_model":"<snapshot>"}'   # RESTORE
+```
+
+### Level 4 (optional dogfood): chat UI over WebSocket
+
+With the broken model pair patched in, open `/chat` (localhost:5173), send a message →
+the transcript should show the classified two-leg summary (`model not found … (HTTP 404); …`)
+instead of `Stream error: All models from FallbackModel failed`. Restore config after.
+
+## Final Validation Checklist
+
+- [ ] `uv run ruff check . && uv run ruff format --check .` clean
+- [ ] `uv run mypy app/ && uv run pyright app/` clean (strict)
+- [ ] `uv run pytest -v -m "not integration"` green — including the untouched
+      `model_behavior_error` and salvage tests
+- [ ] New tests cover: status matrix, nested group, bare ModelAPIError, ResponseRejected,
+      secret scrub (AIza/sk-/Bearer), truncation, summary shapes, stream 404+429 event,
+      stream bare-401 event, chat raise, route 502 (both legs), extensions merge + guard
+- [ ] `uv run pytest -v -m integration app/features/agents/tests/test_routes.py` green
+- [ ] Level-3 curl matrix matches; operator config snapshot RESTORED and re-verified
+- [ ] No secret-like string in any serialized payload (asserted on full JSON dumps)
+- [ ] `git diff --stat` shows surgical diffs (no whole-file line-ending churn)
+- [ ] Commits: `fix(agents,api): surface fallback model failures with classified details (#335)`
+      (+ `docs(docs): …` for API_CONTRACTS if split); no AI trailers
+- [ ] PR into `dev` from `fix/agents-surface-fallback-failures`; CI green
+
+---
+
+## Out of Scope (this PRP)
+
+- **Frontend failure-detail rendering** (chips/expandable list from the `failures` key) —
+  the summary string already lands in the transcript; promote to its own `feat(ui)` issue
+  if dogfood demands richer rendering.
+- **Retry/circuit-breaker middleware or metrics** — explicitly rejected in umbrella #380
+  (violates the no-external-observability / single-host principle).
+- **Classifying `UsageLimitExceeded` / `ConcurrencyLimitExceeded`** — pydantic-ai usage-cap
+  errors, not provider failures; today's behavior (generic backstop) stands.
+- **Surfacing agent-BUILD failures** (missing API key → `ValueError` in
+  `build_agent_model_with_fallback`) — a config-time failure class, already log-visible;
+  separate concern from run-time provider failure.
+- **E6 release-gate dogfood** — umbrella #380's own closing epic.
+
+## Anti-Patterns to Avoid
+
+- ❌ Don't put `str(exception)` or `exc.body` raw into any client payload — sanitize-then-truncate only.
+- ❌ Don't stuff failures into `ForecastLabError.details` — the handler drops it by design; use `extensions`.
+- ❌ Don't use `except*` — a plain `except FallbackExceptionGroup` keeps the single-yield contract simple.
+- ❌ Don't touch `websocket.py` — the generic handler is the deliberate backstop.
+- ❌ Don't salvage (`_salvage_*`) in the new arms — no model ran; there is nothing to salvage.
+- ❌ Don't rename `model_behavior_error` or weaken its tests — extend alongside.
+- ❌ Don't widen `agent_require_approval` or any mutation surface — this is read-path-only hardening.
+- ❌ Don't forget to RESTORE the operator's persisted `ollama:gemma4-agent` override after Level 3.
+
+---
+
+**One-pass confidence score: 8/10** — every catch point, schema, and precedent is
+runtime-verified with exact line anchors, and the classifier is a pure module with a mirrored
+test harness. Deductions: the stream-test async-CM mocking is fiddly (mitigated by mirroring
+test_service.py:426-480 verbatim), and the `extensions` merge must dodge the
+pydantic-plugin/strict-mypy trap (mitigated by the serialized-dict merge decision).
diff --git a/app/core/exceptions.py b/app/core/exceptions.py
index 1e6279ea..d67d9a0b 100644
--- a/app/core/exceptions.py
+++ b/app/core/exceptions.py
@@ -10,6 +10,7 @@
 
 from app.core.logging import get_logger
 from app.core.problem_details import (
+    AGENT_FALLBACK_EXHAUSTED_CODE,
     EMBEDDING_AUTH_CODE,
     ERROR_TYPES,
     ProblemDetailResponse,
@@ -40,6 +41,7 @@ def __init__(
         code: str = "INTERNAL_ERROR",
         status_code: int = 500,
         details: dict[str, Any] | None = None,
+        extensions: dict[str, Any] | None = None,
     ) -> None:
         """Initialize application error.
 
@@ -47,13 +49,19 @@ def __init__(
             message: Human-readable error message.
             code: Machine-readable error code.
             status_code: HTTP status code.
-            details: Additional error context.
+            details: Additional error context. LOG-ONLY — the exception
+                handler never copies it into the response body (it may carry
+                internals).
+            extensions: RFC 7807 extension members the handler DOES merge
+                into the problem+json response body (#335). Only put
+                client-safe, already-sanitized data here.
         """
         super().__init__(message)
         self.message = message
         self.code = code
         self.status_code = status_code
         self.details = details or {}
+        self.extensions = extensions or {}
 
     @property
     def title(self) -> str:
@@ -254,6 +262,41 @@ def __init__(
         )
 
 
+class AgentFallbackExhaustedError(ForecastLabError):
+    """502 — every model in the agent's fallback chain failed (issue #335).
+
+    Raised when the PydanticAI ``FallbackModel`` chain (or a single configured
+    model) fails with provider-API errors on every leg. Mirrors
+    :class:`EmbeddingProviderAuthError`: keeps the public status at 502 (an
+    upstream failure from the caller's perspective) and emits a
+    *machine-readable* ``AGENT_FALLBACK_EXHAUSTED`` problem ``type``/``code``
+    so clients can classify it. The per-model classified failures ride the
+    response-visible ``extensions`` channel as a ``failures`` member —
+    ``details`` stays log-only by design.
+    """
+
+    error_type_uri: str = ERROR_TYPES[AGENT_FALLBACK_EXHAUSTED_CODE]
+
+    def __init__(
+        self,
+        message: str,
+        failures: list[dict[str, Any]],
+    ) -> None:
+        """Initialize with the human summary and classified per-model legs.
+
+        Args:
+            message: Human-actionable summary (already secret-safe).
+            failures: Serialized ``ModelFailureDetail`` dicts — sanitized
+                upstream by the classifier; surfaced verbatim to the client.
+        """
+        super().__init__(
+            message=message,
+            code=AGENT_FALLBACK_EXHAUSTED_CODE,
+            status_code=502,
+            extensions={"failures": failures},
+        )
+
+
 # =============================================================================
 # Exception Handlers (RFC 7807)
 # =============================================================================
@@ -287,6 +330,7 @@ async def forecastlab_exception_handler(
         title=exc.title,
         detail=exc.message,
         error_code=exc.code,
+        extensions=exc.extensions or None,
     )
 
 
diff --git a/app/core/problem_details.py b/app/core/problem_details.py
index f8bba455..1078a1b9 100644
--- a/app/core/problem_details.py
+++ b/app/core/problem_details.py
@@ -29,6 +29,11 @@
 # demo pipeline's classifier) so the marker never drifts between the two.
 EMBEDDING_AUTH_CODE = "EMBEDDING_AUTH"
 
+# Machine-readable code for an exhausted agent model fallback chain (#335).
+# Single source of truth shared by the producer (AgentFallbackExhaustedError)
+# and any consumer classifying the 502 — mirrors EMBEDDING_AUTH_CODE.
+AGENT_FALLBACK_EXHAUSTED_CODE = "AGENT_FALLBACK_EXHAUSTED"
+
 ERROR_TYPES = {
     "NOT_FOUND": f"{ERROR_TYPE_BASE}/not-found",
     "VALIDATION_ERROR": f"{ERROR_TYPE_BASE}/validation",
@@ -43,8 +48,16 @@
     "SERVICE_UNAVAILABLE": f"{ERROR_TYPE_BASE}/service-unavailable",
     "GATEWAY_TIMEOUT": f"{ERROR_TYPE_BASE}/gateway-timeout",
     EMBEDDING_AUTH_CODE: f"{ERROR_TYPE_BASE}/embedding-auth",
+    AGENT_FALLBACK_EXHAUSTED_CODE: f"{ERROR_TYPE_BASE}/agent-fallback-exhausted",
 }
 
+# RFC 7807 extension members may never shadow the spec/base fields the
+# ProblemDetail schema already owns — reserved keys are dropped from any
+# `extensions` merge in problem_response (#335).
+_RESERVED_PROBLEM_KEYS = frozenset(
+    {"type", "title", "status", "detail", "instance", "errors", "code", "request_id"}
+)
+
 
 # =============================================================================
 # Problem Detail Schema
@@ -172,6 +185,7 @@ def problem_response(
     detail: str | None = None,
     error_code: str = "INTERNAL_ERROR",
     errors: list[dict[str, Any]] | None = None,
+    extensions: dict[str, Any] | None = None,
 ) -> ProblemDetailResponse:
     """Create a ProblemDetailResponse with proper content type.
 
@@ -181,6 +195,9 @@ def problem_response(
         detail: Detailed explanation (optional).
         error_code: Internal error code for type URI lookup.
         errors: Field-level validation errors (optional).
+        extensions: Additional RFC 7807 extension members merged into the
+            response body (optional, #335). Reserved base-field keys are
+            silently dropped — extensions can never shadow type/status/etc.
     Returns:
         JSONResponse with problem+json content type.
     """
@@ -192,9 +209,17 @@ def problem_response(
         errors=errors,
     )
 
+    # Merge on the serialized dict (not ProblemDetail(**extensions)) so
+    # arbitrary extension payloads never fight the pydantic constructor.
+    content = problem.model_dump(exclude_none=True)
+    if extensions:
+        content.update(
+            {key: value for key, value in extensions.items() if key not in _RESERVED_PROBLEM_KEYS}
+        )
+
     return ProblemDetailResponse(
         status_code=status,
-        content=problem.model_dump(exclude_none=True),
+        content=content,
     )
 
 
diff --git a/app/core/tests/test_problem_details.py b/app/core/tests/test_problem_details.py
new file mode 100644
index 00000000..9db1673d
--- /dev/null
+++ b/app/core/tests/test_problem_details.py
@@ -0,0 +1,110 @@
+"""Unit tests for RFC 7807 problem_response extension members (issue #335).
+
+The `extensions` channel lets a ForecastLabError surface client-safe data
+(e.g. classified per-model failures) in the problem+json body without going
+through the log-only `details` attribute.
+"""
+
+import json
+from typing import Any
+
+import pytest
+from fastapi import Request
+
+from app.core.exceptions import (
+    AgentFallbackExhaustedError,
+    forecastlab_exception_handler,
+)
+from app.core.problem_details import problem_response
+
+
+def _body(response: Any) -> dict[str, Any]:
+    """Decode a ProblemDetailResponse body."""
+    decoded: dict[str, Any] = json.loads(response.body)
+    return decoded
+
+
+def test_problem_response_without_extensions_unchanged() -> None:
+    """Default (no extensions) output keeps the existing shape exactly."""
+    response = problem_response(
+        status=404,
+        title="Not Found",
+        detail="Resource not found",
+        error_code="NOT_FOUND",
+    )
+
+    body = _body(response)
+    assert response.status_code == 404
+    assert body["status"] == 404
+    assert body["code"] == "NOT_FOUND"
+    assert body["type"] == "/errors/not-found"
+    assert "failures" not in body
+
+
+def test_problem_response_merges_extensions() -> None:
+    """Extension members are merged into the serialized body."""
+    response = problem_response(
+        status=502,
+        title="Agent Fallback Exhausted",
+        detail="All configured agent models failed",
+        error_code="AGENT_FALLBACK_EXHAUSTED",
+        extensions={"failures": [{"model_name": "m1", "reason": "model_not_found"}]},
+    )
+
+    body = _body(response)
+    assert body["code"] == "AGENT_FALLBACK_EXHAUSTED"
+    assert body["type"] == "/errors/agent-fallback-exhausted"
+    assert body["failures"] == [{"model_name": "m1", "reason": "model_not_found"}]
+
+
+def test_problem_response_extensions_cannot_override_reserved() -> None:
+    """Reserved base-field keys in extensions are silently dropped."""
+    response = problem_response(
+        status=502,
+        title="Agent Fallback Exhausted",
+        detail="real detail",
+        error_code="AGENT_FALLBACK_EXHAUSTED",
+        extensions={
+            "status": 200,
+            "code": "HACK",
+            "detail": "spoofed",
+            "type": "about:blank",
+            "title": "spoofed",
+            "safe_key": "kept",
+        },
+    )
+
+    body = _body(response)
+    assert response.status_code == 502
+    assert body["status"] == 502
+    assert body["code"] == "AGENT_FALLBACK_EXHAUSTED"
+    assert body["detail"] == "real detail"
+    assert body["type"] == "/errors/agent-fallback-exhausted"
+    assert body["title"] == "Agent Fallback Exhausted"
+    assert body["safe_key"] == "kept"
+
+
+@pytest.mark.asyncio
+async def test_exception_handler_propagates_extensions() -> None:
+    """The full exception → handler → problem+json path carries extensions.
+
+    Guards the wiring: ForecastLabError.extensions must reach the response
+    body via forecastlab_exception_handler's pass-through (issue #335).
+    """
+    failures = [
+        {"model_name": "m1", "status_code": 404, "reason": "model_not_found", "detail": ""},
+        {"model_name": "m2", "status_code": 429, "reason": "quota_exhausted", "detail": ""},
+    ]
+    exc = AgentFallbackExhaustedError("All configured agent models failed", failures=failures)
+    request = Request(scope={"type": "http", "method": "POST", "path": "/", "headers": []})
+
+    response = await forecastlab_exception_handler(request, exc)
+
+    body = _body(response)
+    assert response.status_code == 502
+    assert body["status"] == 502
+    assert body["code"] == "AGENT_FALLBACK_EXHAUSTED"
+    assert body["type"] == "/errors/agent-fallback-exhausted"
+    assert body["title"] == "Agent Fallback Exhausted"
+    assert body["detail"] == "All configured agent models failed"
+    assert body["failures"] == failures
diff --git a/app/features/agents/failures.py b/app/features/agents/failures.py
new file mode 100644
index 00000000..57b56803
--- /dev/null
+++ b/app/features/agents/failures.py
@@ -0,0 +1,156 @@
+"""Classify provider-API model failures into secret-safe, actionable details.
+
+When every model in the PydanticAI ``FallbackModel`` chain fails (or a single
+configured model fails with a provider error), the raw exception surface is an
+opaque one-liner (``All models from FallbackModel failed (2 sub-exceptions)``)
+and ``str(ModelHTTPError)`` embeds the provider response body verbatim — a
+secret-leak risk. This module turns that exception tree into a list of
+:class:`ModelFailureDetail` entries plus a deterministic human summary that the
+chat UI renders as-is (issue #335).
+
+Pure functions only — fully unit-testable without a DB or network.
+"""
+
+from __future__ import annotations
+
+import re
+
+from pydantic_ai.exceptions import ModelAPIError, ModelHTTPError
+from pydantic_ai.models.fallback import ResponseRejected
+
+from app.features.agents.schemas import FailureReason, ModelFailureDetail
+
+# Secret-shaped substrings scrubbed from any surfaced provider message.
+# Issue #335 hard constraint: no API keys / Bearer tokens, ever.
+_SECRET_PATTERNS: tuple[re.Pattern[str], ...] = (
+    re.compile(r"AIza[0-9A-Za-z_\-]{10,}"),  # Google API keys
+    re.compile(r"sk-[A-Za-z0-9_\-]{10,}"),  # OpenAI/Anthropic-style keys
+    re.compile(r"(?i)bearer\s+[A-Za-z0-9._\-]+"),  # Authorization bearer tokens
+    re.compile(r"(?i)(api[_-]?key|token|authorization)[=:]\s*\S+"),
+)
+
+# Cap on the surfaced per-model detail string.
+_MAX_DETAIL_LEN = 300
+
+# Placeholder model names for failures that carry none.
+_RESPONSE_REJECTED_MODEL = "(response rejected)"
+_UNKNOWN_MODEL = "(unknown model)"
+
+# Human labels for the summary string (rendered verbatim by the chat UI).
+_REASON_LABELS: dict[FailureReason, str] = {
+    "model_not_found": "model not found / invalid model name",
+    "quota_exhausted": "quota or rate limit exhausted",
+    "auth_error": "authentication/permission error",
+    "provider_unavailable": "provider unavailable",
+    "provider_error": "provider error",
+    "response_rejected": "response rejected",
+    "unknown": "unexpected failure",
+}
+
+
+def _sanitize(text: str) -> str:
+    """Scrub secret-shaped substrings, then truncate to the detail cap."""
+    for pattern in _SECRET_PATTERNS:
+        text = pattern.sub("[redacted]", text)
+    return text[:_MAX_DETAIL_LEN]
+
+
+def _provider_message(body: object | None) -> str:
+    """Extract the provider's human message from an HTTP error body.
+
+    Handles the Google/OpenAI ``{"error": {"message": ...}}`` shape; a plain
+    string passes through; anything else is stringified. Callers MUST pass the
+    result through :func:`_sanitize` before surfacing it.
+    """
+    if body is None:
+        return ""
+    if isinstance(body, dict):
+        error = body.get("error")
+        if isinstance(error, dict):
+            message = error.get("message")
+            if isinstance(message, str):
+                return message
+    if isinstance(body, str):
+        return body
+    return str(body)
+
+
+def _classify_http_status(status_code: int) -> FailureReason:
+    """Map an HTTP status to the issue #335 reason taxonomy."""
+    if status_code == 404:
+        return "model_not_found"
+    if status_code == 429:
+        return "quota_exhausted"
+    if status_code in (401, 403):
+        return "auth_error"
+    if status_code >= 500:
+        return "provider_unavailable"
+    return "provider_error"
+
+
+def classify_model_failures(exc: BaseException) -> list[ModelFailureDetail]:
+    """Flatten an exception (group) into classified per-model failures.
+
+    ``FallbackExceptionGroup.exceptions`` is a tuple and sub-groups can nest —
+    recurse into groups and classify only the leaves, preserving leg order.
+    """
+    if isinstance(exc, BaseExceptionGroup):
+        details: list[ModelFailureDetail] = []
+        for member in exc.exceptions:
+            details.extend(classify_model_failures(member))
+        return details
+    if isinstance(exc, ModelHTTPError):
+        return [
+            ModelFailureDetail(
+                model_name=exc.model_name,
+                status_code=exc.status_code,
+                reason=_classify_http_status(exc.status_code),
+                detail=_sanitize(_provider_message(exc.body)),
+            )
+        ]
+    if isinstance(exc, ResponseRejected):
+        return [
+            ModelFailureDetail(
+                model_name=_RESPONSE_REJECTED_MODEL,
+                status_code=None,
+                reason="response_rejected",
+                detail=_sanitize(str(exc)),
+            )
+        ]
+    if isinstance(exc, ModelAPIError):
+        return [
+            ModelFailureDetail(
+                model_name=exc.model_name,
+                status_code=None,
+                reason="provider_error",
+                detail=_sanitize(str(exc)),
+            )
+        ]
+    return [
+        ModelFailureDetail(
+            model_name=_UNKNOWN_MODEL,
+            status_code=None,
+            reason="unknown",
+            detail=_sanitize(str(exc)),
+        )
+    ]
+
+
+def summarize_model_failures(failures: list[ModelFailureDetail]) -> str:
+    """Build the deterministic human summary the chat UI renders verbatim.
+
+    One failure → ``The configured agent model failed — {leg}``; several →
+    ``All configured agent models failed — {leg}; {leg}; …`` where each leg is
+    ``{model_name}: {label} (HTTP {status_code})`` (HTTP part omitted when the
+    failure was not HTTP-shaped).
+    """
+    legs: list[str] = []
+    for failure in failures:
+        leg = f"{failure.model_name}: {_REASON_LABELS[failure.reason]}"
+        if failure.status_code is not None:
+            leg = f"{leg} (HTTP {failure.status_code})"
+        legs.append(leg)
+    joined = "; ".join(legs)
+    if len(failures) == 1:
+        return f"The configured agent model failed — {joined}"
+    return f"All configured agent models failed — {joined}"
diff --git a/app/features/agents/schemas.py b/app/features/agents/schemas.py
index 69b74261..f6f02724 100644
--- a/app/features/agents/schemas.py
+++ b/app/features/agents/schemas.py
@@ -301,6 +301,33 @@ class CompleteEvent(BaseModel):
     tool_calls_count: int
 
 
+FailureReason = Literal[
+    "model_not_found",
+    "quota_exhausted",
+    "auth_error",
+    "provider_unavailable",
+    "provider_error",
+    "response_rejected",
+    "unknown",
+]
+
+
+class ModelFailureDetail(BaseModel):
+    """One classified per-model failure from a FallbackModel chain (issue #335).
+
+    Args:
+        model_name: Provider-prefixed model identifier that failed.
+        status_code: HTTP status from the provider, when the failure was HTTP.
+        reason: Machine-readable failure classification.
+        detail: Sanitized + truncated provider message — NEVER the raw body.
+    """
+
+    model_name: str
+    status_code: int | None = None
+    reason: FailureReason
+    detail: str = ""
+
+
 class ErrorEvent(BaseModel):
     """Error event.
 
@@ -308,11 +335,14 @@ class ErrorEvent(BaseModel):
         error: Error message.
         error_type: Type of error.
         recoverable: Whether the session can continue.
+        failures: Classified per-model failures when ``error_type`` is
+            ``fallback_exhausted`` (issue #335); ``None`` otherwise.
     """
 
     error: str
     error_type: str
     recoverable: bool = True
+    failures: list[ModelFailureDetail] | None = None
 
 
 # =============================================================================
diff --git a/app/features/agents/service.py b/app/features/agents/service.py
index 6372fd9c..ba865e9f 100644
--- a/app/features/agents/service.py
+++ b/app/features/agents/service.py
@@ -22,13 +22,15 @@
 
 import structlog
 from pydantic_ai import Agent, capture_run_messages
-from pydantic_ai.exceptions import UnexpectedModelBehavior
+from pydantic_ai.exceptions import FallbackExceptionGroup, ModelAPIError, UnexpectedModelBehavior
 from pydantic_ai.messages import ModelMessage, ModelMessagesTypeAdapter, ToolReturnPart
 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession
 
 from app.core.config import get_settings
+from app.core.exceptions import AgentFallbackExhaustedError
 from app.features.agents.deps import AgentDeps
+from app.features.agents.failures import classify_model_failures, summarize_model_failures
 from app.features.agents.models import AgentSession, AgentType, SessionStatus
 from app.features.agents.schemas import (
     ApprovalResponse,
@@ -310,6 +312,23 @@ async def chat(
             raise TimeoutError(
                 f"Agent response timed out after {self.settings.agent_timeout_seconds} seconds"
             ) from e
+        except (FallbackExceptionGroup, ModelAPIError) as e:
+            # Every model in the fallback chain failed (or the single
+            # configured model failed) with a provider-API error before any
+            # output was produced — nothing to salvage. Classify each leg into
+            # a secret-safe detail and surface the summary as a 502
+            # problem+json via the global handler (issue #335).
+            failures = classify_model_failures(e)
+            logger.warning(
+                "agents.chat_fallback_exhausted",
+                session_id=session_id,
+                failure_count=len(failures),
+                reasons=[f.reason for f in failures],
+            )
+            raise AgentFallbackExhaustedError(
+                summarize_model_failures(failures),
+                failures=[f.model_dump(mode="json") for f in failures],
+            ) from e
         except UnexpectedModelBehavior as e:
             # The model misbehaved (e.g. a tool call exceeded its retry budget).
             # This is recoverable from the user's perspective — surface a clean
@@ -694,6 +713,33 @@ async def stream_chat(
             raise TimeoutError(
                 f"Agent response timed out after {self.settings.agent_timeout_seconds} seconds"
             ) from e
+        except (FallbackExceptionGroup, ModelAPIError) as e:
+            # Every model in the fallback chain failed (or the single
+            # configured model failed) with a provider-API error before any
+            # output was produced — nothing to salvage. Yield ONE classified,
+            # secret-safe `error` event instead of letting the raw exception
+            # reach the generic WebSocket backstop (issue #335).
+            failures = classify_model_failures(e)
+            logger.warning(
+                "agents.stream_chat_fallback_exhausted",
+                session_id=session_id,
+                failure_count=len(failures),
+                reasons=[f.reason for f in failures],
+            )
+            fallback_now = datetime.now(UTC)
+            session.last_activity = fallback_now
+            await db.flush()
+            yield StreamEvent(
+                event_type="error",
+                data={
+                    "error": summarize_model_failures(failures),
+                    "error_type": "fallback_exhausted",
+                    "recoverable": True,
+                    "failures": [f.model_dump(mode="json") for f in failures],
+                },
+                timestamp=fallback_now,
+            )
+            return
         except UnexpectedModelBehavior as e:
             # The model misbehaved (e.g. a tool call exceeded its retry budget).
             # Emit a clean, recoverable `error` event rather than letting the raw
diff --git a/app/features/agents/tests/test_failures.py b/app/features/agents/tests/test_failures.py
new file mode 100644
index 00000000..cf5a218c
--- /dev/null
+++ b/app/features/agents/tests/test_failures.py
@@ -0,0 +1,232 @@
+"""Unit tests for the model-failure classifier (issue #335).
+
+Covers the status-code classification matrix, exception-group recursion,
+secret scrubbing, detail truncation, and the deterministic human summary.
+"""
+
+import pytest
+from pydantic_ai.exceptions import FallbackExceptionGroup, ModelAPIError, ModelHTTPError
+from pydantic_ai.models.fallback import ResponseRejected
+
+from app.features.agents.failures import (
+    classify_model_failures,
+    summarize_model_failures,
+)
+from app.features.agents.schemas import ModelFailureDetail
+
+
+class TestClassifyModelFailures:
+    """Classification matrix for classify_model_failures."""
+
+    @pytest.mark.parametrize(
+        ("status_code", "expected_reason"),
+        [
+            (404, "model_not_found"),
+            (429, "quota_exhausted"),
+            (401, "auth_error"),
+            (403, "auth_error"),
+            (500, "provider_unavailable"),
+            (503, "provider_unavailable"),
+            (418, "provider_error"),
+        ],
+    )
+    def test_http_status_matrix(self, status_code: int, expected_reason: str) -> None:
+        """Each HTTP status maps to its issue #335 reason."""
+        failures = classify_model_failures(ModelHTTPError(status_code, "test:model"))
+
+        assert len(failures) == 1
+        assert failures[0].model_name == "test:model"
+        assert failures[0].status_code == status_code
+        assert failures[0].reason == expected_reason
+
+    def test_fallback_group_preserves_leg_order(self) -> None:
+        """A 404 + 429 group yields two details in model order."""
+        group = FallbackExceptionGroup(
+            "All models from FallbackModel failed",
+            [
+                ModelHTTPError(404, "google-gla:gemini-3-flash-preview"),
+                ModelHTTPError(429, "google-gla:gemini-2.5-flash"),
+            ],
+        )
+
+        failures = classify_model_failures(group)
+
+        assert len(failures) == 2
+        assert failures[0].model_name == "google-gla:gemini-3-flash-preview"
+        assert failures[0].reason == "model_not_found"
+        assert failures[1].model_name == "google-gla:gemini-2.5-flash"
+        assert failures[1].reason == "quota_exhausted"
+
+    def test_nested_group_flattens_leaves(self) -> None:
+        """Sub-groups inside the group are recursed into, not classified as legs."""
+        inner = FallbackExceptionGroup(
+            "inner",
+            [ModelHTTPError(429, "inner:model")],
+        )
+        outer = FallbackExceptionGroup(
+            "outer",
+            [ModelHTTPError(404, "outer:model"), inner],
+        )
+
+        failures = classify_model_failures(outer)
+
+        assert [f.model_name for f in failures] == ["outer:model", "inner:model"]
+        assert [f.reason for f in failures] == ["model_not_found", "quota_exhausted"]
+
+    def test_mixed_group_classifies_unknown_members(self) -> None:
+        """A group mixing known and unexpected members flattens in order,
+        classifying the unexpected member as unknown."""
+        group = FallbackExceptionGroup(
+            "All models from FallbackModel failed",
+            [
+                ModelHTTPError(404, "google-gla:gemini-3-flash-preview"),
+                RuntimeError("boom"),
+            ],
+        )
+
+        failures = classify_model_failures(group)
+
+        assert len(failures) == 2
+        assert failures[0].model_name == "google-gla:gemini-3-flash-preview"
+        assert failures[0].reason == "model_not_found"
+        assert failures[1].reason == "unknown"
+        assert failures[1].status_code is None
+        assert "boom" in failures[1].detail
+
+    def test_bare_model_api_error_is_provider_error(self) -> None:
+        """A non-HTTP ModelAPIError (connection failure) → provider_error, no status."""
+        failures = classify_model_failures(
+            ModelAPIError("ollama:gemma4-agent", "connection refused")
+        )
+
+        assert len(failures) == 1
+        assert failures[0].model_name == "ollama:gemma4-agent"
+        assert failures[0].status_code is None
+        assert failures[0].reason == "provider_error"
+
+    def test_response_rejected_member(self) -> None:
+        """A ResponseRejected group member classifies as response_rejected."""
+        group = FallbackExceptionGroup(
+            "All models from FallbackModel failed",
+            [ResponseRejected(2)],
+        )
+
+        failures = classify_model_failures(group)
+
+        assert len(failures) == 1
+        assert failures[0].reason == "response_rejected"
+        assert failures[0].status_code is None
+
+    def test_unknown_exception_is_unknown(self) -> None:
+        """Anything else inside the group classifies as unknown."""
+        failures = classify_model_failures(RuntimeError("boom"))
+
+        assert len(failures) == 1
+        assert failures[0].reason == "unknown"
+        assert failures[0].status_code is None
+        assert "boom" in failures[0].detail
+
+    @pytest.mark.parametrize(
+        "secret",
+        [
+            "AIzaFakeKey1234567890abcdef",
+            "sk-fakekey1234567890abcdef",
+            "Bearer xyz.abc-123",
+            "api_key=supersecretvalue",
+        ],
+    )
+    def test_secret_scrubbed_from_detail(self, secret: str) -> None:
+        """Secret-shaped substrings in the provider body never reach the detail."""
+        exc = ModelHTTPError(
+            429,
+            "test:model",
+            body={"error": {"message": f"quota exceeded for {secret} retry later"}},
+        )
+
+        failures = classify_model_failures(exc)
+
+        assert "[redacted]" in failures[0].detail
+        assert "AIzaFake" not in failures[0].detail
+        assert "sk-fake" not in failures[0].detail
+        assert "xyz.abc-123" not in failures[0].detail
+        assert "supersecretvalue" not in failures[0].detail
+
+    def test_detail_truncated_to_cap(self) -> None:
+        """A 1000-char provider message is truncated to the 300-char cap."""
+        exc = ModelHTTPError(
+            500,
+            "test:model",
+            body={"error": {"message": "x" * 1000}},
+        )
+
+        failures = classify_model_failures(exc)
+
+        assert len(failures[0].detail) <= 300
+
+    def test_provider_message_string_body(self) -> None:
+        """A plain-string body passes through (sanitized)."""
+        failures = classify_model_failures(ModelHTTPError(404, "test:model", body="not found"))
+
+        assert failures[0].detail == "not found"
+
+    def test_provider_message_none_body(self) -> None:
+        """A missing body yields an empty detail."""
+        failures = classify_model_failures(ModelHTTPError(404, "test:model"))
+
+        assert failures[0].detail == ""
+
+
+class TestSummarizeModelFailures:
+    """Deterministic summary shapes (rendered verbatim by the chat UI)."""
+
+    def test_single_leg_shape(self) -> None:
+        failures = [
+            ModelFailureDetail(
+                model_name="anthropic:claude-test",
+                status_code=401,
+                reason="auth_error",
+            )
+        ]
+
+        summary = summarize_model_failures(failures)
+
+        assert summary == (
+            "The configured agent model failed — "
+            "anthropic:claude-test: authentication/permission error (HTTP 401)"
+        )
+
+    def test_two_leg_shape(self) -> None:
+        failures = [
+            ModelFailureDetail(
+                model_name="google-gla:gemini-3-flash-preview",
+                status_code=404,
+                reason="model_not_found",
+            ),
+            ModelFailureDetail(
+                model_name="google-gla:gemini-2.5-flash",
+                status_code=429,
+                reason="quota_exhausted",
+            ),
+        ]
+
+        summary = summarize_model_failures(failures)
+
+        assert summary == (
+            "All configured agent models failed — "
+            "google-gla:gemini-3-flash-preview: model not found / invalid model name (HTTP 404); "
+            "google-gla:gemini-2.5-flash: quota or rate limit exhausted (HTTP 429)"
+        )
+
+    def test_non_http_leg_omits_status(self) -> None:
+        failures = [
+            ModelFailureDetail(
+                model_name="ollama:gemma4-agent",
+                status_code=None,
+                reason="provider_error",
+            )
+        ]
+
+        summary = summarize_model_failures(failures)
+
+        assert "(HTTP" not in summary
+        assert "ollama:gemma4-agent: provider error" in summary
diff --git a/app/features/agents/tests/test_routes.py b/app/features/agents/tests/test_routes.py
index d53bb914..12ff4711 100644
--- a/app/features/agents/tests/test_routes.py
+++ b/app/features/agents/tests/test_routes.py
@@ -7,6 +7,7 @@
 
 import pytest
 from httpx import AsyncClient
+from pydantic_ai.exceptions import FallbackExceptionGroup, ModelHTTPError
 
 from app.features.agents.schemas import ExperimentReport
 
@@ -162,6 +163,56 @@ async def test_chat_session_not_found(self, client: AsyncClient) -> None:
 
         assert response.status_code == 404
 
+    @pytest.mark.asyncio
+    async def test_chat_fallback_exhausted_returns_502_problem_json(
+        self, client: AsyncClient
+    ) -> None:
+        """Both fallback legs failing → 502 problem+json with classified
+        per-model failures (#335, umbrella #380 route-level criterion)."""
+        with patch("app.features.agents.agents.experiment.get_experiment_agent") as mock_get:
+            group = FallbackExceptionGroup(
+                "All models from FallbackModel failed",
+                [
+                    ModelHTTPError(
+                        404,
+                        "google-gla:gemini-3-flash-preview",
+                        body={"error": {"message": "models/gemini-3-flash-preview is not found"}},
+                    ),
+                    ModelHTTPError(
+                        429,
+                        "google-gla:gemini-2.5-flash",
+                        body={"error": {"message": "RESOURCE_EXHAUSTED key AIzaFakeKey123456789"}},
+                    ),
+                ],
+            )
+            mock_agent = MagicMock()
+            mock_agent.run = AsyncMock(side_effect=group)
+            mock_get.return_value = mock_agent
+
+            create_response = await client.post(
+                "/agents/sessions",
+                json={"agent_type": "experiment"},
+            )
+            session_id = create_response.json()["session_id"]
+
+            response = await client.post(
+                f"/agents/sessions/{session_id}/chat",
+                json={"message": "hello"},
+            )
+
+        assert response.status_code == 502
+        assert response.headers["content-type"].startswith("application/problem+json")
+        body = response.json()
+        assert body["code"] == "AGENT_FALLBACK_EXHAUSTED"
+        assert body["type"].endswith("/errors/agent-fallback-exhausted")
+        assert len(body["failures"]) == 2
+        assert body["failures"][0]["reason"] == "model_not_found"
+        assert body["failures"][1]["reason"] == "quota_exhausted"
+        assert "request_id" in body
+        # The opaque group string and secrets must never reach the client.
+        assert "sub-exceptions" not in body["detail"]
+        assert "AIza" not in response.text
+
 
 @pytest.mark.integration
 class TestApprovalRoutes:
diff --git a/app/features/agents/tests/test_schemas.py b/app/features/agents/tests/test_schemas.py
index 7294a294..9c50ea31 100644
--- a/app/features/agents/tests/test_schemas.py
+++ b/app/features/agents/tests/test_schemas.py
@@ -12,8 +12,10 @@
     ChatMessage,
     ChatRequest,
     ChatResponse,
+    ErrorEvent,
     ExperimentPlan,
     ExperimentReport,
+    ModelFailureDetail,
     PendingAction,
     RAGAnswer,
     SessionCreateRequest,
@@ -304,6 +306,38 @@ def test_error_event(self) -> None:
         assert event.event_type == "error"
 
 
+class TestErrorEvent:
+    """Tests for the ErrorEvent schema (failures added by issue #335)."""
+
+    def test_non_fallback_error_has_no_failures(self) -> None:
+        """Non-fallback error types must keep failures None in serialized output."""
+        event = ErrorEvent(
+            error="The assistant produced an invalid tool call.",
+            error_type="model_behavior_error",
+        )
+
+        serialized = event.model_dump(mode="json")
+        assert serialized.get("failures") is None
+
+    def test_fallback_exhausted_carries_failures(self) -> None:
+        """fallback_exhausted events carry the classified per-model failures."""
+        event = ErrorEvent(
+            error="All configured agent models failed",
+            error_type="fallback_exhausted",
+            failures=[
+                ModelFailureDetail(
+                    model_name="google-gla:gemini-3-flash-preview",
+                    status_code=404,
+                    reason="model_not_found",
+                )
+            ],
+        )
+
+        serialized = event.model_dump(mode="json")
+        assert serialized["failures"][0]["reason"] == "model_not_found"
+        assert serialized["failures"][0]["status_code"] == 404
+
+
 class TestExperimentPlan:
     """Tests for ExperimentPlan schema."""
 
diff --git a/app/features/agents/tests/test_service.py b/app/features/agents/tests/test_service.py
index 759e0284..612783cb 100644
--- a/app/features/agents/tests/test_service.py
+++ b/app/features/agents/tests/test_service.py
@@ -8,7 +8,11 @@
 
 import pytest
 from pydantic_ai import Agent
-from pydantic_ai.exceptions import UnexpectedModelBehavior
+from pydantic_ai.exceptions import (
+    FallbackExceptionGroup,
+    ModelHTTPError,
+    UnexpectedModelBehavior,
+)
 from pydantic_ai.messages import (
     ModelMessage,
     ModelRequest,
@@ -18,6 +22,7 @@
     UserPromptPart,
 )
 
+from app.core.exceptions import AgentFallbackExhaustedError
 from app.features.agents.deps import AgentDeps
 from app.features.agents.models import AgentSession, AgentType, SessionStatus
 from app.features.agents.schemas import ExperimentReport
@@ -422,6 +427,58 @@ async def test_chat_runs_tools_sequentially(
 
         mock_mode.assert_called_once_with("sequential")
 
+    @pytest.mark.asyncio
+    async def test_chat_fallback_exhausted_raises_classified_error(
+        self,
+        sample_active_session: AgentSession,
+    ) -> None:
+        """A FallbackExceptionGroup from agent.run must raise the classified
+        502 AgentFallbackExhaustedError, not bubble the raw group (#335)."""
+        service = AgentService()
+        mock_db = AsyncMock()
+
+        mock_result = MagicMock()
+        mock_result.scalar_one_or_none.return_value = sample_active_session
+        mock_db.execute.return_value = mock_result
+
+        group = FallbackExceptionGroup(
+            "All models from FallbackModel failed",
+            [
+                ModelHTTPError(
+                    404,
+                    "google-gla:gemini-3-flash-preview",
+                    body={"error": {"message": "models/gemini-3-flash-preview is not found"}},
+                ),
+                ModelHTTPError(
+                    429,
+                    "google-gla:gemini-2.5-flash",
+                    body={"error": {"message": "RESOURCE_EXHAUSTED key AIzaFakeKey123456789"}},
+                ),
+            ],
+        )
+        mock_agent = MagicMock()
+        mock_agent.run = AsyncMock(side_effect=group)
+
+        with patch.object(service, "_get_agent", return_value=mock_agent):
+            with pytest.raises(AgentFallbackExhaustedError) as exc_info:
+                await service.chat(
+                    db=mock_db,
+                    session_id=sample_active_session.session_id,
+                    message="Hello",
+                )
+
+        exc = exc_info.value
+        assert exc.status_code == 502
+        assert exc.code == "AGENT_FALLBACK_EXHAUSTED"
+        failures = exc.extensions["failures"]
+        assert len(failures) == 2
+        assert failures[0]["reason"] == "model_not_found"
+        assert failures[1]["reason"] == "quota_exhausted"
+        assert "sub-exceptions" not in exc.message
+        # Issue #335 hard constraint: no secret-like material anywhere.
+        serialized = json.dumps({"message": exc.message, "extensions": exc.extensions})
+        assert "AIza" not in serialized
+
 
 class TestAgentServiceStreamChat:
     """Tests for streaming chat functionality."""
@@ -477,6 +534,127 @@ async def __aexit__(self, *exc: object) -> bool:
         assert events[0].data["recoverable"] is True
         assert events[0].data["error_type"] == "model_behavior_error"
         assert "exceeded max retries" not in events[0].data["error"]
+        # failures is exclusive to fallback_exhausted events (#335).
+        assert "failures" not in events[0].data
+
+    @pytest.mark.asyncio
+    async def test_stream_chat_fallback_exhausted_yields_classified_error(
+        self,
+        sample_active_session: AgentSession,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        """All fallback legs failing must yield ONE classified `error` event
+        with per-model failures — never the raw group string (#335)."""
+        service = AgentService()
+        # Pin a streaming-capable (cloud) provider so this exercises the
+        # run_stream path regardless of the local .env (#342).
+        monkeypatch.setattr(service.settings, "agent_default_model", "anthropic:claude-test")
+        mock_db = AsyncMock()
+
+        mock_result = MagicMock()
+        mock_result.scalar_one_or_none.return_value = sample_active_session
+        mock_db.execute.return_value = mock_result
+
+        class _RaisingStream:
+            """Async context manager that fails on entry like an exhausted chain."""
+
+            async def __aenter__(self) -> Any:
+                raise FallbackExceptionGroup(
+                    "All models from FallbackModel failed",
+                    [
+                        ModelHTTPError(
+                            404,
+                            "google-gla:gemini-3-flash-preview",
+                            body={
+                                "error": {"message": "models/gemini-3-flash-preview is not found"}
+                            },
+                        ),
+                        ModelHTTPError(
+                            429,
+                            "google-gla:gemini-2.5-flash",
+                            body={
+                                "error": {"message": "RESOURCE_EXHAUSTED key AIzaFakeKey123456789"}
+                            },
+                        ),
+                    ],
+                )
+
+            async def __aexit__(self, *exc: object) -> bool:
+                return False
+
+        mock_agent = MagicMock()
+        mock_agent.run_stream = MagicMock(return_value=_RaisingStream())
+
+        with patch.object(service, "_get_agent", return_value=mock_agent):
+            events = [
+                event
+                async for event in service.stream_chat(
+                    db=mock_db,
+                    session_id=sample_active_session.session_id,
+                    message="Hello",
+                )
+            ]
+
+        assert len(events) == 1
+        assert events[0].event_type == "error"
+        assert events[0].data["error_type"] == "fallback_exhausted"
+        assert events[0].data["recoverable"] is True
+        failures = events[0].data["failures"]
+        assert len(failures) == 2
+        assert failures[0]["reason"] == "model_not_found"
+        assert failures[1]["reason"] == "quota_exhausted"
+        assert "google-gla:gemini-3-flash-preview" in events[0].data["error"]
+        assert "google-gla:gemini-2.5-flash" in events[0].data["error"]
+        # The opaque group string must never reach the client.
+        assert "sub-exceptions" not in events[0].data["error"]
+        # Issue #335 hard constraint: no secret-like material anywhere.
+        assert "AIza" not in json.dumps(events[0].model_dump(mode="json"))
+
+    @pytest.mark.asyncio
+    async def test_stream_chat_bare_model_api_error_classified(
+        self,
+        sample_active_session: AgentSession,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        """A bare ModelAPIError (single-model config, no fallback wired) gets
+        the same classified treatment as a 1-element failures list (#335)."""
+        service = AgentService()
+        monkeypatch.setattr(service.settings, "agent_default_model", "anthropic:claude-test")
+        mock_db = AsyncMock()
+
+        mock_result = MagicMock()
+        mock_result.scalar_one_or_none.return_value = sample_active_session
+        mock_db.execute.return_value = mock_result
+
+        class _RaisingStream:
+            """Async context manager that fails on entry like a provider 401."""
+
+            async def __aenter__(self) -> Any:
+                raise ModelHTTPError(401, "anthropic:claude-test")
+
+            async def __aexit__(self, *exc: object) -> bool:
+                return False
+
+        mock_agent = MagicMock()
+        mock_agent.run_stream = MagicMock(return_value=_RaisingStream())
+
+        with patch.object(service, "_get_agent", return_value=mock_agent):
+            events = [
+                event
+                async for event in service.stream_chat(
+                    db=mock_db,
+                    session_id=sample_active_session.session_id,
+                    message="Hello",
+                )
+            ]
+
+        assert len(events) == 1
+        assert events[0].event_type == "error"
+        assert events[0].data["error_type"] == "fallback_exhausted"
+        failures = events[0].data["failures"]
+        assert len(failures) == 1
+        assert failures[0]["reason"] == "auth_error"
+        assert failures[0]["model_name"] == "anthropic:claude-test"
 
     @pytest.mark.asyncio
     async def test_chat_surfaces_pending_action_on_model_misbehavior(
diff --git a/docs/_base/API_CONTRACTS.md b/docs/_base/API_CONTRACTS.md
index 232b7da3..27d75ea1 100644
--- a/docs/_base/API_CONTRACTS.md
+++ b/docs/_base/API_CONTRACTS.md
@@ -52,7 +52,7 @@ All endpoints serve JSON; error responses use `application/problem+json` (RFC 78
 | rag | DELETE | `/rag/sources/{source_id}` | Delete source + cascaded chunks |
 | agents | POST | `/agents/sessions` | Create session (`agent_type`: `experiment` or `rag_assistant`) |
 | agents | GET | `/agents/sessions/{session_id}` | Status + message history (Postgres JSONB) |
-| agents | POST | `/agents/sessions/{session_id}/chat` | Send user message; returns full response |
+| agents | POST | `/agents/sessions/{session_id}/chat` | Send user message; returns full response. **#335** — when every model in the agent's fallback chain fails with a provider error, returns **502** `application/problem+json` with `code="AGENT_FALLBACK_EXHAUSTED"`, `type=/errors/agent-fallback-exhausted`, and an additive `failures: [{model_name, status_code, reason, detail}]` extension member (secret-scrubbed, 300-char-capped details) |
 | agents | POST | `/agents/sessions/{session_id}/approve` | Approve/reject a pending tool call (HITL gate) |
 | agents | DELETE | `/agents/sessions/{session_id}` | Close session |
 | agents | WS | `/agents/stream` | Token-by-token streaming + tool-call events |
@@ -77,7 +77,7 @@ Verified against `app/features/agents/websocket.py` and `app/features/agents/sch
   - `tool_call_end` — `data: {"tool_name": str, "tool_call_id": str, "result": Any, "duration_ms": float}` (`ToolCallEndEvent`)
   - `approval_required` — emitted when a tool in `agent_require_approval` is pending; the chat REST `/agents/sessions/{id}/approve` endpoint releases it
   - `complete` — `data: {"message": str, "tokens_used": int, "tool_calls_count": int}` (`CompleteEvent`)
-  - `error` — `data: {"error": str, "error_type": str, "recoverable": bool}` (`ErrorEvent`). On `recoverable: false` (e.g., `session_not_found`, `session_expired`), the client should close.
+  - `error` — `data: {"error": str, "error_type": str, "recoverable": bool}` (`ErrorEvent`). On `recoverable: false` (e.g., `session_not_found`, `session_expired`), the client should close. **#335** — when every model in the agent's fallback chain fails with a provider error, the event carries `error_type="fallback_exhausted"`, `recoverable=true`, a human-actionable per-leg summary in `error`, and an additive Optional `failures: [{model_name, status_code: int|null, reason, detail}]` key (`reason` ∈ `model_not_found` / `quota_exhausted` / `auth_error` / `provider_unavailable` / `provider_error` / `response_rejected` / `unknown`; `detail` is secret-scrubbed and 300-char-capped).
 
 ## WebSocket Events (`/demo/stream`)