fix(server): settle a cancelled serveStdio probe so a pipelined initialize fallback cannot wedge the connection#2317
Conversation
…ize cannot wedge serveStdio A client may pipeline an enveloped notifications/cancelled naming its server/discover probe and a fallback initialize behind the probe without waiting for the DiscoverResult. The cancellation aborts the in-flight discover handler, so no response is ever produced for the probe id; the probe-discard path then waited forever for that answer before closing the probe instance, and since the inbound pump processes messages in order, the fallback initialize and every later message were never processed - a silent, permanent connection wedge with no error reported. The per-connection channel now settles a delivered request when a notifications/cancelled naming its id is delivered (by protocol contract a cancelled request may go unanswered), so the discard wait drains immediately in that case while non-cancelled probes still get their answer delivered to the wire before the probe instance is closed. As a backstop, the wait-for-answers used by the discard is bounded by a short timeout; on timeout the discard proceeds and the condition is reported through onerror, so no future edge can hold the pump indefinitely. New test covers the pipelined probe -> cancellation -> initialize sequence falling back to a working legacy session.
|
@modelcontextprotocol/client
@modelcontextprotocol/codemod
@modelcontextprotocol/server
@modelcontextprotocol/server-legacy
@modelcontextprotocol/express
@modelcontextprotocol/fastify
@modelcontextprotocol/hono
@modelcontextprotocol/node
commit: |
| } else if (isJSONRPCNotification(message) && message.method === 'notifications/cancelled') { | ||
| // By protocol contract a cancelled request may legitimately go | ||
| // unanswered (the instance aborts the in-flight handler and writes | ||
| // nothing for it), so a delivered cancellation settles the request | ||
| // it names: nothing should keep waiting for an answer that may | ||
| // never come. Non-cancelled requests still settle only when their | ||
| // answer is handed to the wire. | ||
| const cancelledId = (message.params as CancelledNotificationParams | undefined)?.requestId; | ||
| if (cancelledId !== undefined) { | ||
| this._settle(cancelledId); | ||
| } | ||
| } |
There was a problem hiding this comment.
🟣 Pre-existing issue (not introduced by this PR): Protocol._oncancel in packages/core/src/shared/protocol.ts:512-515 guards with if (!notification.params.requestId) return, so a notifications/cancelled naming request id 0 — the very first id an SDK client uses — is silently ignored and the in-flight handler is never aborted. The new channel-level settle here correctly checks cancelledId !== undefined; a follow-up should change the Protocol guard to requestId === undefined so the two layers agree on whether id 0 is cancellable.
Extended reasoning...
What the bug is. Protocol._oncancel (packages/core/src/shared/protocol.ts:512-519) starts with if (!notification.params.requestId) { return; }. RequestId is string | number, and 0 (as well as '') is falsy, so a cancellation that names request id 0 is treated as if the field were absent: the method returns early, the matching AbortController is never looked up, and the in-flight request handler keeps running to completion.
Why id 0 is realistic — in fact the most likely id to be cancelled. Protocol initializes its request counter at 0 (private _requestMessageId = 0, protocol.ts:418) and assigns ids with post-increment (const messageId = this._requestMessageId++, protocol.ts:1132). So an SDK-built client's very first request on a connection — e.g. the opening server/discover probe that this PR's headline scenario is about — carries id 0, and the client's own cancellation path (protocol.ts:1165) sends notifications/cancelled with that same numeric id.
Concrete walkthrough. 1) An SDK 2026 client opens a stdio connection and sends server/discover with id 0 (its first request). 2) It decides to abandon the probe and pipelines notifications/cancelled with requestId: 0. 3) On the server, the entry delivers both messages to the probe instance; the new channel code in serveStdio.ts:187-198 correctly settles pending id 0 because it checks cancelledId !== undefined. 4) But Protocol._oncancel evaluates !0 === true and returns — the discover handler's abort signal never fires, the handler runs to completion, and its response is still written. The cancellation is silently ignored for that one id.
Why this PR doesn't prevent or recreate it. The PR only touches the channel layer in serveStdio.ts; protocol.ts is unchanged. Importantly, this does NOT recreate the wedge the PR fixes: because the handler for id 0 is never aborted, the discover answer still reaches the wire, send() settles the pending id, and the discard wait resolves — the connection's pump is never blocked. The visible impact is limited to wasted handler work and a response the cancelling client must ignore per spec. However, after this PR the channel layer (!== undefined) and the Protocol layer (falsy check) disagree on whether id 0 is a valid cancellation target, which is exactly the off-by-falsy class the new code in this diff was careful to avoid.
How to fix. In a follow-up to core (out of scope for this PR), change the guard to if (notification.params.requestId === undefined) { return; } — or drop it entirely, since the schema requires the field. String ids of '' would be handled correctly by the same change.
All four verifiers independently confirmed the falsy guard, the id-0 starting counter, and that the impact is non-blocking; there were no refutations.
…alize fallback cannot wedge the connection (#2317)
Fixes a connection wedge in serveStdio when a client cancels its server/discover probe before the answer is written and then falls back to initialize on the same connection.
Motivation and Context
A pipelined notifications/cancelled naming the probe request id aborts the in-flight discover handler, so the probe never receives a response; the probe-discard path then waits for that answer indefinitely and the connection's message pump never processes the fallback initialize or anything after it — a silent, permanent wedge only a disconnect clears.
How Has This Been Tested?
New regression test for the pipelined cancel-then-initialize sequence (fails before the fix, passes after); the existing probe-window tests (answer-then-cancel, fallback, repeated probe) stay green; server package suite, typecheck, and lint pass.
Breaking Changes
None.
Types of changes
Additional context
A delivered cancellation now settles the pending probe id (a cancelled request may legitimately go unanswered), and the discard-time wait for unanswered probe requests is bounded with an error report as a backstop so no future edge can stall the connection's pump indefinitely.