fix(agents,api): surface fallback model failures with classified details (#335)#383
Conversation
Reviewer's GuideImplements classified, secret-scrubbed reporting of exhausted agent model fallbacks across REST and WebSocket surfaces by introducing a model-failure classifier, extending agent error schemas and core RFC 7807 plumbing, and wiring AgentService to surface structured per-model failures via a new AgentFallbackExhaustedError and enriched error events. Sequence diagram for REST chat fallback exhaustion handlingsequenceDiagram
participant Client
participant AgentsREST
participant AgentService
participant PydanticAIAgent
participant FailuresClassifier
participant ExceptionHandler
participant ProblemDetails
Client->>AgentsREST: POST /agents/sessions/{id}/chat
AgentsREST->>AgentService: chat(session_id, message)
AgentService->>PydanticAIAgent: run(...)
PydanticAIAgent-->>AgentService: FallbackExceptionGroup / ModelAPIError
AgentService->>FailuresClassifier: classify_model_failures(exception)
FailuresClassifier-->>AgentService: list[ModelFailureDetail]
AgentService->>AgentService: summarize_model_failures(failures)
AgentService->>ExceptionHandler: raise AgentFallbackExhaustedError(message, failures)
ExceptionHandler->>ProblemDetails: problem_response(..., error_code=AGENT_FALLBACK_EXHAUSTED, extensions={failures})
ProblemDetails-->>Client: 502 application/problem+json
Sequence diagram for WebSocket stream fallback exhaustion handlingsequenceDiagram
participant Client
participant AgentsWS
participant AgentService
participant PydanticAIAgent
participant FailuresClassifier
Client->>AgentsWS: CONNECT /agents/stream
Client->>AgentsWS: send chat message
AgentsWS->>AgentService: stream_chat(session_id, message)
AgentService->>PydanticAIAgent: run_stream(...)
PydanticAIAgent-->>AgentService: FallbackExceptionGroup / ModelAPIError
AgentService->>FailuresClassifier: classify_model_failures(exception)
FailuresClassifier-->>AgentService: list[ModelFailureDetail]
AgentService->>AgentService: summarize_model_failures(failures)
AgentService-->>AgentsWS: yield StreamEvent(event_type=error, data={error_type=fallback_exhausted, failures=[...]})
AgentsWS-->>Client: error event with fallback_exhausted and failures
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 3 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="app/features/agents/tests/test_service.py" line_range="612" />
<code_context>
+ assert "AIza" not in json.dumps(events[0].model_dump(mode="json"))
+
+ @pytest.mark.asyncio
+ async def test_stream_chat_bare_model_api_error_classified(
+ self,
+ sample_active_session: AgentSession,
</code_context>
<issue_to_address>
**suggestion (testing):** Add an assertion that non-fallback error events do not populate the failures field
Since `ErrorEvent.failures` is meant to be set only when `error_type == "fallback_exhausted"`, we should also assert the inverse: for other error types, `failures` stays `None`. Please add a test (here or in a schema-focused test) that builds an `ErrorEvent` with `error_type="model_behavior_error"` (or similar) and verifies that `failures` is `None` in the serialized output, to protect against regressions in other error paths.
Suggested implementation:
```python
assert "google-gla:gemini-3-flash-preview" in events[0].data["error"]
assert "google-gla:gemini-2.5-flash" in events[0].data["error"]
# The opaque group string must never reach the client.
assert "sub-exceptions" not in events[0].data["error"]
# Issue #335 hard constraint: no secret-like material anywhere.
assert "AIza" not in json.dumps(events[0].model_dump(mode="json"))
def test_error_event_non_fallback_has_no_failures(self) -> None:
"""Non-fallback ErrorEvents must not populate the failures field."""
event = ErrorEvent(
type="error",
error_type="model_behavior_error",
error="unexpected model behavior",
failures=None,
)
serialized = event.model_dump(mode="json")
# For non-fallback errors, failures must remain unset/None.
assert serialized.get("failures") is None
@pytest.mark.asyncio
```
If `ErrorEvent` is not already imported in this test module, add:
`from app.features.agents.schemas import ErrorEvent` (or the correct import path for `ErrorEvent`) to the imports at the top of `app/features/agents/tests/test_service.py`.
</issue_to_address>
### Comment 2
<location path="app/features/agents/tests/test_failures.py" line_range="18" />
<code_context>
+from app.features.agents.schemas import ModelFailureDetail
+
+
+class TestClassifyModelFailures:
+ """Classification matrix for classify_model_failures."""
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a mixed-group test that includes unexpected exception types as members
Current tests cover HTTP errors, nested `FallbackExceptionGroup`s, bare `ModelAPIError`, `ResponseRejected`, and a top-level unknown exception. To better exercise the recursion in `classify_model_failures`, please add a case where a `FallbackExceptionGroup` contains both known and unknown exceptions (e.g. `[ModelHTTPError(...), RuntimeError("boom")]`), and assert that it is flattened, order is preserved, and the unexpected member produces a `ModelFailureDetail` with `reason == "unknown"`.
</issue_to_address>
### Comment 3
<location path="app/core/tests/test_problem_details.py" line_range="37" />
<code_context>
+ assert "failures" not in body
+
+
+def test_problem_response_merges_extensions() -> None:
+ """Extension members are merged into the serialized body."""
+ response = problem_response(
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test that exercises problem_response via a ForecastLabError with extensions
Since `ForecastLabError` now carries an `extensions` dict and the handler passes it through to `problem_response`, consider adding a test that raises a minimal `ForecastLabError` (or `AgentFallbackExhaustedError`) with non-empty `extensions`, runs it through `forecastlab_exception_handler`, and asserts that the response body contains those extension fields and the expected `type`/`code`. This would exercise the full path from exception to problem+json and help catch future wiring regressions.
Suggested implementation:
```python
body = _body(response)
assert response.status_code == 404
assert body["status"] == 404
assert body["code"] == "NOT_FOUND"
assert body["type"] == "/errors/not-found"
assert "failures" not in body
status=404,
title="Not Found",
detail="Resource not found",
error_code="NOT_FOUND",
)
body = _body(response)
assert response.status_code == 404
assert body["status"] == 404
assert body["code"] == "NOT_FOUND"
async def test_forecastlab_exception_handler_includes_extensions() -> None:
"""ForecastLabError extensions are propagated through the exception handler."""
# Arrange
request = Request(scope={"type": "http"})
exc = ForecastLabError(
status_code=400,
title="Bad Request",
detail="Something went wrong",
error_code="BAD_REQUEST",
extensions={"foo": "bar", "trace_id": "abc123"},
)
# Act
response = await forecastlab_exception_handler(request, exc)
# Assert
body = _body(response)
assert response.status_code == 400
assert body["status"] == 400
assert body["code"] == "BAD_REQUEST"
assert body["title"] == "Bad Request"
assert body["detail"] == "Something went wrong"
# Extensions should be merged into the serialized problem+json body
assert body["foo"] == "bar"
assert body["trace_id"] == "abc123"
```
To fully wire this up you will likely need to:
1. Add the necessary imports at the top of `app/core/tests/test_problem_details.py`, for example:
- `from starlette.requests import Request`
- `from app.core.exceptions import ForecastLabError`
- `from app.core.exception_handlers import forecastlab_exception_handler`
Adjust module paths to match your actual project layout.
2. Ensure the async test is executed correctly:
- If you use `pytest`, either:
- Mark the test (or module) with `@pytest.mark.anyio` / `@pytest.mark.asyncio`, or
- Use your existing async test configuration.
- If `forecastlab_exception_handler` is synchronous in your codebase, remove `async` from the test definition and the `await` before calling the handler.
3. If `ForecastLabError` uses different parameter names (e.g. `status` instead of `status_code`, `error_code` vs `code`, or a different way to pass `extensions`), adjust the constructor call in the test accordingly so that:
- The `status_code` of the HTTP response is 400.
- The JSON body includes `"code": "BAD_REQUEST"` and the other fields.
4. If your handler derives the `"type"` field (e.g. `/errors/bad-request`) and you want to assert it as well, add:
- `assert body["type"] == "/errors/bad-request"`
once you confirm the exact value in your implementation.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| assert "AIza" not in json.dumps(events[0].model_dump(mode="json")) | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_stream_chat_bare_model_api_error_classified( |
There was a problem hiding this comment.
suggestion (testing): Add an assertion that non-fallback error events do not populate the failures field
Since ErrorEvent.failures is meant to be set only when error_type == "fallback_exhausted", we should also assert the inverse: for other error types, failures stays None. Please add a test (here or in a schema-focused test) that builds an ErrorEvent with error_type="model_behavior_error" (or similar) and verifies that failures is None in the serialized output, to protect against regressions in other error paths.
Suggested implementation:
assert "google-gla:gemini-3-flash-preview" in events[0].data["error"]
assert "google-gla:gemini-2.5-flash" in events[0].data["error"]
# The opaque group string must never reach the client.
assert "sub-exceptions" not in events[0].data["error"]
# Issue #335 hard constraint: no secret-like material anywhere.
assert "AIza" not in json.dumps(events[0].model_dump(mode="json"))
def test_error_event_non_fallback_has_no_failures(self) -> None:
"""Non-fallback ErrorEvents must not populate the failures field."""
event = ErrorEvent(
type="error",
error_type="model_behavior_error",
error="unexpected model behavior",
failures=None,
)
serialized = event.model_dump(mode="json")
# For non-fallback errors, failures must remain unset/None.
assert serialized.get("failures") is None
@pytest.mark.asyncioIf ErrorEvent is not already imported in this test module, add:
from app.features.agents.schemas import ErrorEvent (or the correct import path for ErrorEvent) to the imports at the top of app/features/agents/tests/test_service.py.
| from app.features.agents.schemas import ModelFailureDetail | ||
|
|
||
|
|
||
| class TestClassifyModelFailures: |
There was a problem hiding this comment.
suggestion (testing): Consider adding a mixed-group test that includes unexpected exception types as members
Current tests cover HTTP errors, nested FallbackExceptionGroups, bare ModelAPIError, ResponseRejected, and a top-level unknown exception. To better exercise the recursion in classify_model_failures, please add a case where a FallbackExceptionGroup contains both known and unknown exceptions (e.g. [ModelHTTPError(...), RuntimeError("boom")]), and assert that it is flattened, order is preserved, and the unexpected member produces a ModelFailureDetail with reason == "unknown".
| assert "failures" not in body | ||
|
|
||
|
|
||
| def test_problem_response_merges_extensions() -> None: |
There was a problem hiding this comment.
suggestion (testing): Add a test that exercises problem_response via a ForecastLabError with extensions
Since ForecastLabError now carries an extensions dict and the handler passes it through to problem_response, consider adding a test that raises a minimal ForecastLabError (or AgentFallbackExhaustedError) with non-empty extensions, runs it through forecastlab_exception_handler, and asserts that the response body contains those extension fields and the expected type/code. This would exercise the full path from exception to problem+json and help catch future wiring regressions.
Suggested implementation:
body = _body(response)
assert response.status_code == 404
assert body["status"] == 404
assert body["code"] == "NOT_FOUND"
assert body["type"] == "/errors/not-found"
assert "failures" not in body
status=404,
title="Not Found",
detail="Resource not found",
error_code="NOT_FOUND",
)
body = _body(response)
assert response.status_code == 404
assert body["status"] == 404
assert body["code"] == "NOT_FOUND"
async def test_forecastlab_exception_handler_includes_extensions() -> None:
"""ForecastLabError extensions are propagated through the exception handler."""
# Arrange
request = Request(scope={"type": "http"})
exc = ForecastLabError(
status_code=400,
title="Bad Request",
detail="Something went wrong",
error_code="BAD_REQUEST",
extensions={"foo": "bar", "trace_id": "abc123"},
)
# Act
response = await forecastlab_exception_handler(request, exc)
# Assert
body = _body(response)
assert response.status_code == 400
assert body["status"] == 400
assert body["code"] == "BAD_REQUEST"
assert body["title"] == "Bad Request"
assert body["detail"] == "Something went wrong"
# Extensions should be merged into the serialized problem+json body
assert body["foo"] == "bar"
assert body["trace_id"] == "abc123"To fully wire this up you will likely need to:
- Add the necessary imports at the top of
app/core/tests/test_problem_details.py, for example:from starlette.requests import Requestfrom app.core.exceptions import ForecastLabErrorfrom app.core.exception_handlers import forecastlab_exception_handler
Adjust module paths to match your actual project layout.
- Ensure the async test is executed correctly:
- If you use
pytest, either:- Mark the test (or module) with
@pytest.mark.anyio/@pytest.mark.asyncio, or - Use your existing async test configuration.
- Mark the test (or module) with
- If
forecastlab_exception_handleris synchronous in your codebase, removeasyncfrom the test definition and theawaitbefore calling the handler.
- If you use
- If
ForecastLabErroruses different parameter names (e.g.statusinstead ofstatus_code,error_codevscode, or a different way to passextensions), adjust the constructor call in the test accordingly so that:- The
status_codeof the HTTP response is 400. - The JSON body includes
"code": "BAD_REQUEST"and the other fields.
- The
- If your handler derives the
"type"field (e.g./errors/bad-request) and you want to assert it as well, add:assert body["type"] == "/errors/bad-request"
once you confirm the exact value in your implementation.
Summary
Closes #335 · Reliability E2 of umbrella #380 (after Foundation E1 #334 / PR #382).
When every model in the PydanticAI
FallbackModelchain fails (or a single configured model fails with a provider error), the client now receives a classified, secret-safe summary of each per-model failure instead of the opaqueStream error: All models from FallbackModel failed (2 sub-exceptions):/agents/stream— oneerrorStreamEvent witherror_type="fallback_exhausted", a human-actionable per-leg summary inerror(rendered verbatim by the chat UI — zero frontend changes), and an additivefailures: [{model_name, status_code, reason, detail}]list.POST /agents/sessions/{id}/chat— 502application/problem+jsonwithcode="AGENT_FALLBACK_EXHAUSTED",type=/errors/agent-fallback-exhausted, and thefailuresextension member.Implementation
app/features/agents/failures.py(new) — pure classifier: status matrix (404→model_not_found, 429→quota_exhausted, 401/403→auth_error, 5xx→provider_unavailable, other HTTP→provider_error,ResponseRejected→response_rejected, else→unknown), nested-ExceptionGrouprecursion, secret scrubbing (AIza…/sk-…/Bearer …/api_key=…→[redacted]), 300-char detail cap, deterministic human summary.app/features/agents/schemas.py— newModelFailureDetail+FailureReason; additiveErrorEvent.failures.app/features/agents/service.py— newexcept (FallbackExceptionGroup, ModelAPIError)arms inchat()andstream_chat(), betweenTimeoutErrorandUnexpectedModelBehavior; misbehavior/salvage paths untouched; no salvage in the new arms (nothing ran).app/core/problem_details.py—AGENT_FALLBACK_EXHAUSTEDcode + type URI; additiveextensionsparam onproblem_responsemerged on the serialized dict, guarded by a reserved-key frozenset.app/core/exceptions.py— additiveextensionskwarg onForecastLabError(response-visible channel;detailsstays log-only), newAgentFallbackExhaustedError(502) mirroringEmbeddingProviderAuthError, handler pass-through.docs/_base/API_CONTRACTS.md— chat row + WS error bullet updated additively.websocket.pyuntouched — generic handler remains the backstop.Validation
ruff check+ruff format --checkcleanmypy app/+pyright app/(both strict) cleanpytest -m "not integration"— 1916 passed (incl. 20 new tests: classifier matrix, nested groups, secret scrub, truncation, summary shapes, stream 404+429 event, bare-401 event, chat raise, extensions merge + reserved-key guard)pytest -m integration app/features/agents/tests/test_routes.py— 12 passed (incl. new 502 problem+json route test asserting both classified legs)POST /chatreturned 502 problem+json with two classifiedauth_errorlegs and request_id; operator config snapshotted and restored after🤖 Generated with Claude Code
Summary by Sourcery
Classify and surface detailed, secret-safe information about exhausted agent model fallbacks across REST and WebSocket surfaces, using structured failure metadata and RFC 7807 extensions.
New Features:
Enhancements:
Documentation:
Tests:
Chores: