Drive resolver elicitation over the 2026-07-28 input_required flow#2986
Conversation
There was a problem hiding this comment.
3 issues found across 5 files
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
Reviewer's guideStacked on #2969 (review that first). Its base is the Where to look
Author-facing code is unchanged — resolvers still return Design decisions worth a second look
Wire-key correctness (the subtle part): resolvers are deduped in-call by Reviewed by cubic + Claude + a bughunter pass + a Codex pass (which caught two real multi-round state bugs, now fixed). CI green, 100% coverage, no casts. |
…g resolvers Review follow-ups on #2986: - _state_key carries no id(...): it is module:qualname (a callable object uses its type's), so request_state round-trips and resumes on any worker (stateless HTTP). Two resolvers sharing that base (method instances, factory closures) are already disambiguated deterministically at registration (#N), so dropping the id is safe. - An eliciting resolver whose annotation lacks an Elicit[T] arm has elicit_schema None; its answer restored from request_state is now re-validated against the live Elicit.schema (via _elicit consulting res.state) instead of injecting a raw dict. - Move the datetime import to module top (AGENTS.md). Add regression tests: an unannotated eliciting resolver in a multi-round flow, and worker-stable wire keys for method instances and callable objects.
Resolvers that return Elicit[T] now negotiate the transport by protocol version: at >= 2026-07-28 the framework returns an InputRequiredResult carrying the batched questions and resumes when the client retries with input_responses/request_state; at <= 2025-11-25 it keeps the synchronous ctx.elicit() request. Author-facing code (Resolve/Elicit) is unchanged. resolve_arguments becomes a resumable DAG walk: it reads ctx.input_responses / ctx.request_state, memoizes resolver outcomes by a process-stable module:qualname key, batches independent pending elicitations into one round, serializes dependent ones across rounds, and carries resolved outcomes in request_state so each resolver resolves once per logical call. Outcomes restored from request_state are re-validated into their model via the Elicit[T] return arm. request_state is client-trusted for now (HMAC sealing is a follow-up). Add a render_elicitation_schema helper to elicitation.py, MRTR-loop and codec tests, and document the transport in the migration guide.
Replace every cast() with a checked or properly-typed alternative: model the request_state payload with pydantic (_State/_StateEntry) so the untrusted JSON is validated instead of cast; type _Resolution.pending as InputRequests so an ElicitRequest fits without a cast; add a _is_elicit TypeGuard and an _accepted helper that carry the right types; and narrow req.params via isinstance in the test helper. No behavior change.
- In-call cache keyed by _resolver_key (instance-distinct) again; _state_key adds id(__self__) for the wire key, so two instances of one bound method no longer collide and silently share an outcome. - resolve_arguments reads ctx.protocol_version (new Context property, None outside a request) instead of dereferencing request_context, so direct MCPServer.call_tool() works for tools whose resolvers never elicit. - request_state persists only elicited outcomes (always validated models); a resolver that resolves without eliciting is pure and re-runs each round. Fixes the json.dumps crash on non-serializable returns (datetime/set/...) and the dict-degradation of restored values. - _elicit_return_schema handles a bare Elicit[T] return (not only unions). - _INPUT_REQUIRED_VERSION pinned to '2026-07-28' instead of LATEST_MODERN_VERSION. - accept with no content raises ToolError instead of silently reporting cancel. - Independent nested resolver deps batch into one round (catch _Pending per dep). - Test cleanup: drop dead helpers, hoist CreateMessageResult import. Add regression tests for each; document the narrowed elicited-only persistence.
- Carry restored answers forward: an elicited outcome restored from request_state is now re-added to res.elicited, so in a 4+-round dependency chain an early answer is not dropped from request_state and re-asked on a later round. - Collision-free wire keys: assign each resolver a deterministic wire key at registration (module:qualname, disambiguated with #N when bases collide), so two distinct closures from one factory get separate questions/outcomes instead of sharing one. _state_key is now only the base-key source at registration. Add regression tests: a deep chain asserting an early answer is asked once, and factory closures asserting distinct wire keys and correct injected values.
…g resolvers Review follow-ups on #2986: - _state_key carries no id(...): it is module:qualname (a callable object uses its type's), so request_state round-trips and resumes on any worker (stateless HTTP). Two resolvers sharing that base (method instances, factory closures) are already disambiguated deterministically at registration (#N), so dropping the id is safe. - An eliciting resolver whose annotation lacks an Elicit[T] arm has elicit_schema None; its answer restored from request_state is now re-validated against the live Elicit.schema (via _elicit consulting res.state) instead of injecting a raw dict. - Move the datetime import to module top (AGENTS.md). Add regression tests: an unannotated eliciting resolver in a multi-round flow, and worker-stable wire keys for method instances and callable objects.
…t driver - request_state entries are validated unconditionally when the resolver's model is known; an entry that fails validation is dropped and the question re-asked, matching _decode_state's malformed-means-no-progress stance. Accepted models are encoded by alias so the stored shape round-trips through the same validation the client's fresh answer passed. - Recorded state now wins over a re-sent answer on both restore paths (previously only resolvers without an Elicit[T] return arm honored a changed answer). - A fresh answer whose content does not match the requested schema raises an explicit ToolError instead of leaking pydantic internals. - Embedded elicitation is gated on the client's declared form-elicitation capability: a client that did not declare it gets -32021 with the requiredCapabilities payload (the spec says servers MUST NOT send inputRequests the client has not declared support for). - Legacy parity: elicit_with_validation reports accept-with-no-content explicitly; the unexpected-action arm is gone (the action literal makes the remainder provably cancel). - uses_input_required -> _uses_input_required (module-internal). - Tests drive the loop through the real client surfaces - manual wire shapes via session.call_tool(..., allow_input_required=True), outcomes end-to-end through Client.call_tool's auto-driver - plus regressions for forged state, declined-outcome persistence, unknown keys, aliased models, schema-mismatched answers and the rounds cap.
The manifest pinned the story to era="legacy" until resolver elicitation could ride the 2026-07-28 input_required round-trip; that is now the case, so the story runs the full transport x era matrix with identical author code. README caveats describe the per-era transport and what persists across rounds; the spec links point at the real anchor (#input-required-tool-results), fixing the same stale fragment in the mrtr story; legacy_elicitation's cross-reference no longer claims resolver DI rides push elicitation everywhere.
The Dependencies tutorial explains that the framework picks the question's transport from the negotiated protocol version, scopes the run-once claim to what is guaranteed (each question asked once; a non-eliciting resolver may re-run when a call resumes), and documents the determinism constraint on questions. Multi-round-trip requests no longer claims @mcp.tool() has no high-level path to InputRequiredResult - resolver dependencies are that path. The eliciting tutorial snippets are tested on both transports.
a0eb013 to
85756a7
Compare
… returns request_state entries now store exactly the content the client sent (which already passed validation) instead of re-deriving it from the validated model with model_dump - persistence and restore previously used two codecs that diverge whenever validation and serialization aliases differ, dropping a legitimate stored answer on the third round and re-asking forever. Encode and restore are now the identity on the client's bytes, so no serialization knob can break the round-trip; the earlier by-alias dump is gone with the rest of the model-dumping path. A return annotation with more than one distinct Elicit arm now raises InvalidSignature at registration: the restore path validates against the single statically-known schema, so `-> Elicit[A] | Elicit[B]` either never converges (the stored B answer fails A validation each round) or silently injects a wrong-typed model when the shapes happen to overlap. Like cycles and unclassifiable parameters, the ambiguous shape is rejected up front. Legacy parity: elicit_with_validation wraps a schema-mismatched accepted answer in the same stable error the 2026 path uses, instead of letting the raw pydantic ValidationError text reach the client; noted in the migration guide since callers catching ValidationError see ValueError now.
Each question is asked exactly once per call, but on the multi-round-trip form an eliciting resolver's body runs again to consume its answer, and a resolver that answered without asking may run again whenever the call resumes. The tutorial info box, its recap bullet, and the refund_desk caveats now say exactly that, so authors don't hang side effects on a runs-at-most-once reading that only holds for the synchronous form.
| * `Annotated[T, Resolve(fn)]` on a tool parameter: the SDK runs `fn` and injects its return value. | ||
| * A resolved parameter is invisible to the model and cannot be supplied by a client. Values the model must not invent - prices, identities, permissions - belong here. | ||
| * A resolver's parameters are resolved the same way: the `Context`, another `Resolve(...)`, or a tool argument by name. The graph runs each resolver at most once per call. | ||
| * A resolver's parameters are resolved the same way: the `Context`, another `Resolve(...)`, or a tool argument by name. The graph runs each resolver at most once per round, however many consumers it has; each question is asked exactly once, an eliciting resolver runs again to consume its answer, and a resolver that never asked may run again when a call resumes. |
There was a problem hiding this comment.
🟡 Earlier sections of this page still state the old per-call guarantee — line 61 ("the SDK runs the resolver at most once per call, no matter how many declare it") and line 73 ("it runs once per call. One inventory lookup, two consumers") — which now contradicts the per-round contract this PR introduces in the new !!! info box and the rewritten Recap bullet on the same page. Reword those earlier statements (and the "Don't take once-per-call on faith" / "Once per call means exactly that" tips) to the per-round / asked-once phrasing, or add a forward reference to the info box, so an author doesn't write a non-idempotent resolver relying on once-per-call.
Extended reasoning...
What the issue is. docs/tutorial/dependencies.md is now internally inconsistent about the resolver run-once guarantee. This PR added the !!! info box (lines ~119-132: "a resolver that answered without asking, like check_stock, may run again whenever the call resumes after a question") and rewrote the Recap bullet (line ~138: "at most once per round ... a resolver that never asked may run again when a call resumes"). But the earlier sections of the same page were left with the old contract: line 61 says "every tool that needs stock declares the same parameter, and the SDK runs the resolver at most once per call, no matter how many declare it", line 73 says "it runs once per call. One inventory lookup, two consumers", and the nearby tips reinforce it ("Don't take once-per-call on faith ... one line per call"; "Once per call means exactly that").
Why the old wording is no longer universally true. Under the >= 2026-07-28 input_required flow, only elicited outcomes are persisted in request_state; a resolver that resolves without eliciting is pure and re-runs on every retry round whenever the tool also has an eliciting resolver. The PR's own tests assert exactly this: test_auto_driver_answers_independent_questions_in_a_single_round counts rounds == 2 via a pure resolver that re-runs each round, and test_input_required_resolver_asks_and_consumes_then_never_reruns shows an eliciting resolver running twice (ask + consume). The page's own tutorial003 example (the eliciting confirm_backorder) is now exercised on mode="auto" by tests/docs_src/test_dependencies.py, so the multi-round behaviour applies to the very examples this page teaches.
Step-by-step proof. Take the page's tutorial003 shape extended with the tutorial002 dependency: a tool with stock: Annotated[Stock, Resolve(check_stock)] and backorder: Annotated[Backorder, Resolve(confirm_backorder)] where the title is out of stock, on a 2026-07-28 connection. Round 1: check_stock runs (lookup #1), confirm_backorder returns Elicit(...) with no answer yet, so the server returns an InputRequiredResult and only the (empty) elicited-outcome map goes into request_state — check_stock's value is not persisted. Round 2: the client retries with the answer; check_stock runs again (lookup #2), confirm_backorder consumes its answer, the body runs. One logical tools/call, two executions of the non-eliciting resolver — directly contradicting "at most once per call" / "one inventory lookup".
Why this looks like an oversight rather than intent. The author reworded the analogous sentences elsewhere in this PR — examples/stories/refund_desk/README.md ("ask each question at most once per call"), refund_desk/client.py comments, the resolve.py module docstring, and this page's own Recap bullet — and the follow-up commit 1a15cb6 ("Scope the resolver run-once guarantee to questions, not resolver bodies") shows this exact wording class is considered worth correcting. Lines 61/73/77 and the "Once per call means exactly that" tip were simply missed. The earlier-flagged stale wording in docs/migration.md is a different file and was already addressed; this is a distinct remaining location.
Impact. Documentation-only, no runtime effect, and the statements remain literally true for the tutorial001/002 examples in their immediate context (no eliciting resolver, so the call completes in one round). But line 61 is phrased as a general design claim, and an author reading it could hang side effects or expensive non-idempotent work on a once-per-call assumption that the same page later retracts — exactly the confusion the new info box exists to prevent.
How to fix. Reword line 61 and line 73 (and the two reinforcing tips) to the per-round / asked-once phrasing the rest of the page now uses — e.g. "the SDK runs the resolver at most once per round, however many declare it" and "each question is asked once; a resolver that never asks may re-run when a call resumes" — or add a short forward reference to the !!! info box where the full contract is stated.
Summary
#2969 added the
Resolve/Elicitresolver dependency-injection API for@mcp.tool()functions, eliciting via a synchronousctx.elicit()request mid-call - the 2025 model, which doesn't exist at2026-07-28(sampling/roots/elicitation are only embedded inInputRequiredResultthere).This PR makes the resolver
Elicit[T]path negotiate the transport by protocol version, so the same author-facing code works on both:>= 2026-07-28- the framework returns anInputRequiredResultcarrying the batched questions and resumes when the client retriescall_tool(..., input_responses=..., request_state=...). Independent resolvers are asked together in one round; a resolver that depends on another's answer is asked in a later round.<= 2025-11-25- it keeps the synchronouselicitation/createrequest mid-call (no regression for current clients).Author-facing code is unchanged - resolvers still return
T/Elicit[T]; the framework drives the multi-round loop. And sinceClient.call_toolauto-drivesInputRequiredResultretries through the client's normal callbacks (SEP-2322, #2998), a client with anelicitation_callbackhandles resolver questions transparently on either protocol version - several tests here prove that loop end-to-end.How it works
resolve_argumentsbecomes a resumable DAG walk: it readsctx.input_responses/ctx.request_state, runs each resolver, and collects anyElicitit can't yet answer into apendingmap. If anything is pending,Tool.runreturns theInputRequiredResultinstead of running the body.request_stateacross rounds, keyed by a worker-stablemodule:qualnamewire key, so each question is asked exactly once per logical call and two consumers of one resolver share one question. A resolver that resolves without eliciting is pure and may re-run on each round; an eliciting resolver runs again to consume its answer.Elicit[T]model;request_stateis client-supplied, so an entry that fails validation is treated as no-progress and the question is re-asked (the same stance the state decoder takes for a malformed envelope). A fresh answer whose content doesn't match the requested schema is an explicit tool error - on both transports (the legacy path previously let the raw pydanticValidationErrortext reach the client; that change is noted in the migration guide).Elicitarm (-> Elicit[A] | Elicit[B]) raisesInvalidSignatureat registration, like cycles and unclassifiable parameters: the restore path validates against one statically-known schema, so the ambiguous shape either never converges or silently injects a wrong-typed model when the shapes overlap.-32021missing-required-client-capability error with therequiredCapabilitiespayload, instead of questions it can't render.render_elicitation_schema()helper inelicitation.py(shared byelicit_with_validationand theinput_requestsrendering). The legacy path also now reports accept-with-no-content explicitly instead of an "unexpected action" error.Docs & examples
default_factoryid don't bind across rounds). Multi-round-trip requests no longer claims@mcp.tool()has no high-level path toInputRequiredResult- resolver DI is that path now.examples/stories/refund_deskflips fromera = "legacy"toera = "dual"- the manifest comment said to do exactly that once this capability landed - so the story runs the full matrix on both protocol generations with identical author code.Out of scope (follow-ups)
request_stateis client-trusted here: forged entries can't smuggle unvalidated or wrongly-typed values into tool parameters (restore validation, above), but a client can still fabricate a valid-shaped outcome (e.g. a confirmation it never showed the user) - the same thing a lying client can do by answering the live question, on any transport. What sealing adds is integrity against third-party tampering and replay/cross-request binding; HMAC-sealingrequest_statewith a per-server key is a separate PR (the typescript-sdk staged it the same way).Tests
tests/server/mcpserver/test_resolve.py: the MRTR loop is tested both manually (per-round wire shapes viaclient.session.call_tool(..., allow_input_required=True): first-round question shape, independent questions batched into one round, factory closures with distinct wire keys, non-elicitation response) and end-to-end through the client auto-driver (accept/decline/cancel outcomes, dependent chains asked across rounds with each question answered exactly once, deep chains, restored typed models, accept-with-no-content). Hardening tests: forgedrequest_state(non-dict and schema-failing data) re-asks instead of erroring or injecting, declines persist across rounds, unknown wire keys are ignored, aliased and divergent-alias models round-trip, schema-mismatched fresh answers produce the explicit error on both transports, an 11-deep chain trips the client'sinput_required_max_rounds, multi-armElicitannotations are rejected at registration, and the-32021capability gate. The synchronous legacy path keeps its own legs, and the docs/story suites run both transports. 100% branch coverage.AI Disclaimer