Skip to content

feat: structured AgentError + classified error UX with Retry#693

Merged
blove merged 13 commits into
mainfrom
claude/error-ux-agent-error
Jun 18, 2026
Merged

feat: structured AgentError + classified error UX with Retry#693
blove merged 13 commits into
mainfrom
claude/error-ux-agent-error

Conversation

@blove

@blove blove commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces the cryptic "HTTP 500:" the audit flagged with a structured, classified error and a retry affordance, across @threadplane/chat + both adapters.

A backend/stream failure was previously caught raw and shown verbatim (String(error)), with no classification, no retry UI, and an inconsistency where a user stop showed as an error in LangGraph.

Changes

  • AgentError extends Error (@threadplane/chat): kind (connection/auth/server/interrupted/aborted), retryable, status?, cause. Stays an Error subclass, so existing .message/instanceof reads keep working. + AGENT_ERROR_MESSAGES default copy.
  • toAgentError(raw) — idempotent classifier. Structured .status/.cause.status is authoritative; network markers → connection; only HTTP-shaped message tokens (HTTP 500, status: 503) yield a status (a bare gpt-500 no longer misfires); abort short-circuits; fallback is server + retryable.
  • ContractAgent.error: Signal<AgentError | undefined> (was unknown); new neutral retry() (re-run last input; no-op while loading; clears error).
  • LangGraph adapter — normalizes every error$ write via toAgentError; user-stop → idle (no error); a non-user mid-stream death → interrupted, a non-user pre-stream failure → connection; retry() → clear + resubmitLast.
  • AG-UI adapter — normalizes failures; abort→idle preserved; retry() re-runs the last input via an extracted runCurrentMessages() helper without duplicating the user message, re-emitting the client-tools catalog.
  • ChatErrorComponent — renders the legible AgentError.message and a Retry button only when retryable, wired to agent.retry().

Validation

  • chat/langgraph/ag-ui test + lint + build + type-tests all green (chat: 858 tests).
  • e2e (examples/chat, fixture replay): an aborted stream now shows "Can't reach the server. Check your connection and try again." (asserts no HTTP \d{3} leak), a Retry button appears, and clicking it recovers — 2/2.
  • Adversarial reviews on the langgraph abort/interrupted semantics + the classifier; the one Important finding (3-digit-number status misfire) was fixed and locked with counterexample tests.

Design spec + TDD plan under docs/superpowers/. No backwards-compatibility constraint (pre-1.0); AgentError extends Error keeps the ripple small.

🤖 Generated with Claude Code

blove and others added 11 commits June 18, 2026 13:19
Structured AgentError (extends Error; kind/retryable/status/cause) + a shared
toAgentError() classifier (5-class: connection/auth/server/interrupted/aborted),
normalized in both adapters; neutral Agent.error re-typed to Signal<AgentError>;
new neutral retry(); ChatErrorComponent renders legible per-kind copy + a
conditional Retry. Fixes the cryptic 'HTTP 500:' surfacing + langgraph abort
inconsistency the audit flagged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…auth/server/interrupted/aborted)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… neutral retry()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l retry()

- Thread `userAbortRequested` flag in the bridge; stop() sets it before
  aborting so the runStream() catch can distinguish user-stop (→ Idle, no
  error) from a mid-stream abort (→ interrupted AgentError) or fresh
  connect abort (→ toAgentError classification).
- Thread `streamingStarted` flag in runStream() (set on first event);
  AbortError after streaming has begun → kind:'interrupted'/retryable:true;
  AbortError with no events yet falls through to toAgentError.
- Normalize all non-abort catch errors through toAgentError from
  @threadplane/chat so agent.error() is always AgentError | undefined.
- Add retry() to agent.fn.ts: no-op while loading, clears error$, then
  calls resubmitLast() — implements the neutral Agent contract method.
- Tighten errorSig to Signal<AgentError | undefined> via a documented cast
  (BehaviorSubject stays unknown to satisfy StreamSubjects invariance).
- MockLangGraphAgent.error re-typed to WritableSignal<AgentError|undefined>.
- Update bridge + agent.fn specs: stop()→Idle assertion; four new error-UX
  tests (server error kind, user-stop→idle, retry() clears+resubmits,
  retry() no-op while loading); fix pre-existing empty-gen lint error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ormalize all error$ sites via toAgentError

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…un last input, no duplicate message)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onditional Retry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…try recovery

Split the error-handling e2e into two focused tests: one that asserts the
alert shows human-legible copy (matching /can't reach|connection|server|
interrupted|try again/i and NOT /HTTP \d{3}/) with a visible Retry button,
and a second that clicks Retry after unrouting and confirms a final
assistant bubble appears.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ction-before-text; dedup isAbortError/messages
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
threadplane Ready Ready Preview, Comment Jun 18, 2026 9:57pm

Request Review

@blove blove enabled auto-merge (squash) June 18, 2026 21:45
@blove blove merged commit 6027a2e into main Jun 18, 2026
53 of 54 checks passed
blove added a commit that referenced this pull request Jun 18, 2026
#693's friendly AGENT_ERROR_MESSAGES copy no longer contains "fail"/"error";
update the assertion to mirror the chat example so ag-ui — e2e stays green on
this branch too.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
blove added a commit that referenced this pull request Jun 18, 2026
…2e (#695)

#693 replaced the raw stream-failure text with friendly AGENT_ERROR_MESSAGES
copy ("Can't reach the server. Check your connection and try again."), which
no longer contains "fail"/"error" — turning main red on examples/ag-ui — e2e.
The chat example's spec was updated in #693; the ag-ui twin was missed. Mirror
that assertion so the ag-ui error-handling spec asserts the actual copy.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant