Skip to content

streamable_http: early response.aclose() poisons keepalive connection, causes ~260ms latency on every subsequent tool call #2707

@whocareyw

Description

@whocareyw

Summary

In mcp.client.streamable_http.StreamableHTTPTransport._handle_sse_response, the client calls await response.aclose() immediately after receiving the first JSON-RPC response event. This early close leaves the underlying HTTP/1.1 keepalive connection in a state where the next request reusing the same connection blocks for ~260 ms before the server's response status arrives.

The result is that every session.call_tool(...) (and send_ping, list_tools, ...) over streamable_http pays a fixed ~260 ms penalty when calls are serial on a single connection.

Removing the early aclose() and draining the SSE stream to EOF eliminates the penalty entirely (37× speedup: 265 ms → 7 ms per call in my setup).

Environment

  • mcp == 1.27.1
  • Python 3.12.8, Windows 11
  • Server: mcp.server.streamable_http (also 1.27.1), localhost, SSE response mode
  • Transport: streamable HTTP, single long-lived client session, sequential requests

Symptom (numbers)

Same tools/call, same server, same httpx.AsyncClient, all on localhost:

Path Avg latency
Raw httpx.AsyncClient.stream("POST", ...) + aiter_bytes() to EOF ~5 ms
ClientSession.call_tool(...) (current code) ~265 ms
ClientSession.call_tool(...) (with aclose() removed) ~7 ms

Status code arrival timing (measured with raw httpx on the same client/headers):

  • Status: 1.5 ms
  • First chunk: 4.7 ms
  • EOF: 5 ms

So the server replies in single-digit ms. The 260 ms appears only after MCP's early aclose() on the previous request.

Reproducer

Assuming any reachable streamable_http MCP server with one cheap read-only tool. Replace URL and TOOL_NAME:

import asyncio, time, httpx
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

URL = "http://localhost:PORT/mcp"
TOOL_NAME = "your_cheap_tool"
TOOL_ARGS = {}

async def main():
    async with streamablehttp_client(URL) as (r, w, _):
        async with ClientSession(r, w) as s:
            await s.initialize()
            # warm up
            for _ in range(2):
                await s.call_tool(TOOL_NAME, TOOL_ARGS)
            # measure
            times = []
            for _ in range(10):
                t0 = time.perf_counter()
                await s.call_tool(TOOL_NAME, TOOL_ARGS)
                times.append((time.perf_counter() - t0) * 1000)
            print(f"avg = {sum(times)/len(times):.2f} ms")

asyncio.run(main())

On my setup this prints avg = 267.40 ms. After the patch below it prints avg = 7.28 ms.

Root cause

In src/mcp/client/streamable_http.py, _handle_sse_response:

async def _handle_sse_response(self, response, ctx, is_initialization=False):
    ...
    async for sse in event_source.aiter_sse():
        ...
        is_complete = await self._handle_sse_event(...)
        if is_complete:
            await response.aclose()   # <-- offending line
            return

After the response event is received, the SSE stream is force-closed before reaching EOF. The connection is then returned to the keepalive pool in a "not fully drained" state. The next POST attempting to reuse this connection blocks for ~260 ms before status arrives (likely a server-side SSE idle/reconnect window — sse_starlette.EventSourceResponse keeps the writer task alive after sending its only event).

Confirming evidence (instrumented timings across many runs):

  • Time inside _handle_post_request from entry to first _handle_sse_event call: 266 ms (always)
  • Bare client.stream(POST) issued on the same httpx.AsyncClient and same event loop, outside the MCP call path: 5 ms
  • Bare client.stream(POST) + aiter_bytes() to EOF issued inside MCP's post_writer subtask, immediately followed by the original _handle_post_request: bare = 266 ms, orig = 5 ms (the next call on the same connection is fast because the previous one drained to EOF)

So the penalty is paid on the request following every early-aclose, not on the request that did the aclose.

Proposed fix

Drain the SSE stream to EOF instead of aborting early:

@@ async def _handle_sse_response(self, response, ctx, is_initialization=False):
-        try:
-            event_source = EventSource(response)
-            async for sse in event_source.aiter_sse():
-                ...
-                is_complete = await self._handle_sse_event(...)
-                if is_complete:
-                    await response.aclose()
-                    return
-        except Exception as e:
-            logger.debug(f"SSE stream ended: {e}")
+        try:
+            event_source = EventSource(response)
+            async for sse in event_source.aiter_sse():
+                ...
+                await self._handle_sse_event(...)
+        except Exception as e:
+            logger.debug(f"SSE stream ended: {e}")

(The last_event_id / reconnect bookkeeping below is unaffected: we still observe every event and the loop now exits naturally on EOF.)

Caveat

This relies on the server closing the SSE stream after sending the response (which sse_starlette.EventSourceResponse does once sse_writer exits via break on JSONRPCResponse — see mcp/server/streamable_http.py). If a server is configured to keep the stream open for multi-message responses, draining will wait for those too — which is the desired behavior. A request_read_timeout_seconds-aware variant could be added if needed.

Happy to send a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions