[v1.x] Backport transport test deflakes from main (tests-only) by maxisbey · Pull Request #2837 · modelcontextprotocol/python-sdk

maxisbey · 2026-06-11T14:55:55Z

Backports the transport test deflaking work from main (#2764, #2765, #2767, #2788) to v1.x. Tests-only: the diff touches nothing under src/.

Stacked on #2717 (maxisbey/v1x-interaction-backport), which supplies the in-process streaming bridge these tests are built on.

Motivation and Context

The v1.x transport tests still run over real sockets with uvicorn servers in threads or subprocesses, and CI pays for it: ephemeral port races, server startup polling, and a real-clock session-timeout bet that loses under scheduler load. main fixed all of this by moving the tests in process; this PR ports those rewrites to v1.x so the v1.x matrix stops flaking on them.

Two of the rewritten files have direct v1.x CI evidence (test_streamable_http_multiple_reconnections and test_json_response failures on Windows legs); the rest share the same socket-based harness and mechanism. The session-timeout deflake matters for #2717 itself: the suite that PR backports still carries the pre-#2788 real-clock test, which fails roughly 1 in 3 full-suite runs under coverage load, once per transport leg. Folding the first three commits here into #2717 instead would work equally well.

Adaptations made for v1.x, each noted in the commit that makes it:

The 4-line src/ delta from Run StreamableHTTP transport tests in process instead of over sockets #2767 (a pragma removal and a stale docstring note) is dropped; v1.x has no strict pragma checking, so the retained pragma is harmless.
9 wildcard-Accept-header test params are omitted because v1.x predates the RFC 7231 wildcard parsing from fix: accept wildcard media types in Accept header per RFC 7231 #2152. The v1.x-only deprecation tests (streamablehttp_client alias, deprecated transport params) are re-added in in-process form, so no existing v1.x test is lost.
In-process serving exposes two v1.x-specific teardown behaviors that the subprocess harness hid: sse-starlette < 3.0 binds a module-global exit event to the first event loop that serves a response (guarded with a reset fixture, exercised by the lowest-direct legs), and the server transport leaks anyio memory streams whose GC-time warnings would otherwise escape the per-test filters (handled with scoped filters plus a per-test gc.collect() so isolated runs stay clean).
The streaming bridge now ignores response messages after the response has completed, matching what a client of a real ASGI server observes, with contract tests pinning the behavior.
One ported test verifies inside its connection loop because Python 3.11 loses line tracing after an async-with teardown throws (Bad interaction between asyncio task cancellation and trace functions (only 3.11) python/cpython#106749), which otherwise misreports coverage on 3.11 legs.

How Has This Been Tested?

Full v1.x gate (pytest plus 100% line and branch coverage) green throughout the series.
The rewritten files were run repeatedly in a scratch CI environment replicating the v1.x matrix (Windows and Ubuntu, Python 3.10 to 3.14, locked and lowest-direct resolutions): roughly 900 repetitions of the affected files with zero failures, including the lowest-direct legs that install sse-starlette 1.6.1.
Contrast evidence for the timeout deflake: on the pre-deflake base, 5 of 8 full-suite runs under coverage load failed on test_session_level_timeout_applies_to_every_request; with the virtual-clock version the failure has never reproduced (28 full-suite runs plus 150 isolated repetitions).
The full-suite failures that did appear during matrix stress were all in files this PR does not touch (the known Windows stdio child-cleanup and experimental-tasks flakes), which are out of scope here because fixing them requires src/ changes.

Breaking Changes

None. Tests-only; no public interface is touched.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

Draft while #2717 is in review. If #2717 is amended, the only coupling points are tests/interaction/transports/__init__.py (the re-export added here) and the small _bridge.py and conftest.py changes; everything else is independent of its internals. Once #2717 merges this can be retargeted to v1.x.

_{AI Disclaimer}

The session-level timeout test now runs on trio's virtual clock. At teardown the streamable-HTTP client abandons its httpx/httpx-sse response generators; trio's asyncgen finalizer warns about each one (asyncio finalizes abandoned generators silently at loop shutdown), and filterwarnings=error turns that into a test failure. Scope-ignore the two third-party generator signatures, and make the bridge's response body delegate to the memory stream's own iterator so the harness itself leaves no abandoned generator.

…rt point

Tests-only backport to v1.x; adapted from main commit b3025f9.

…ty tests The three tests that complete the initialize handshake leak anyio memory streams at transport teardown when run in process. The scoped filters in tests/interaction/conftest.py cover full-suite runs, but they only load when that package is collected, so a targeted run of this file alone turns the ResourceWarning into a failure under filterwarnings=error. Mirror the module-level filterwarnings marks already used in tests/shared/test_sse.py.

#2765) Tests-only backport to v1.x; adapted from main commit ed39e73.

sse-starlette <3.0 stores its exit Event on the AppStatus class the first time an EventSourceResponse runs; the Event is bound to that test's event loop and breaks every later in-process SSE response on the same worker. test_http_unicode.py serves all its requests as EventSourceResponses (json_response=False) but had no reset fixture, so on a sse-starlette<3.0 install (CI's lowest-direct legs) it could poison the worker for any later SSE-based test. - Copy the autouse AppStatus reset fixture into test_http_unicode.py. - Reset on both sides of the yield, here and in test_sse.py, so each module also survives a stale Event left behind by an earlier test. - Correct the filterwarnings comments in both files: the item-scoped markers cannot cover the GC flush at session cleanup, so isolated runs without xdist (-n 0) still exit nonzero after all tests pass.

…#2767) Tests-only backport to v1.x; adapted from main commit 19fe9fa.

…lters The module-level filterwarnings marks only apply while a test in this file is the active warning context, but v1's leaked memory streams sit in reference cycles and were garbage-collected later: at pytest's session-unconfigure unraisable sweep (every test in this file passed but pytest exited 1 when run without xdist, e.g. -n 0 for --pdb debugging), or during an unrelated test on the same worker, where the global filterwarnings = ["error"] turned the deallocator warning into a failure of that innocent test. An autouse fixture now runs gc.collect() in each test's teardown, so the deallocator warnings fire where the scoped ignores apply. Verified: every previously failing single-test and full-file -n 0 invocation now exits 0, and cross-file runs no longer misattribute the warnings.

…bal exit event tests/shared/test_streamable_http.py and tests/server/test_streamable_http_security.py serve responses in process; on sse-starlette <3.0 (what the lowest-direct CI legs install) the first EventSourceResponse binds AppStatus.should_exit_event to that test's event loop and every later SSE response in the module fails. Port the both-sides reset fixture from tests/client/test_http_unicode.py into both files. Also give the security module the same per-test gc.collect() teardown as test_streamable_http.py so leaked-stream warnings stay inside the module's scoped filters, and drop the comment claiming a -n 0 run exits nonzero after all tests pass: with the flush in place it exits 0.

tests/shared/test_sse.py and tests/client/test_http_unicode.py carry the same scoped MemoryObject*Stream filterwarnings marks as tests/shared/test_streamable_http.py and had the same gap: the leaked streams sit in reference cycles, so their deallocator warnings fired at pytest's session-unconfigure unraisable sweep, exiting 1 after all tests passed when run without xdist (-n 0). Both files documented this in their filter-mark comments. Apply the same autouse fixture that fixed test_streamable_http.py — a gc.collect() in each test's teardown, where the item-scoped ignores still apply — and update the comments to describe the fixed behavior. Verified: -n 0 runs of each file now exit 0 (previously 26 passed/exit 1 and 2 passed/exit 1), and five consecutive default-option runs of both files together pass.

Once an application sends the final http.response.body chunk, the bridge now drops any further http.response.start/body messages, matching what a real ASGI server's client observes. Starlette's request_response sends a trailing response when an endpoint's sub-application has already completed a rejection response, so the SSE security rejection tests no longer depend on scheduling for the client to read the first status. Pinned by two new bridge contract tests (registered as harness self-tests). Also widen the trio-leg unraisable-warning filter to the whole httpx/httpx-sse generator chain: abandoning EventSource.aiter_sse abandons the nested aiter_lines -> aiter_text -> aiter_bytes -> aiter_raw generators, and which link the finalizer reports depends on GC timing and Python version.

Statements after the connection loop sit in the Python 3.11 trace-loss shadow (python/cpython#106749) and were reported uncovered on 3.11 matrix cells; verifying inside the traced region removes the gap without a coverage exclusion.

The arc fires after the final async-with teardown, inside the same Python 3.11 trace-loss shadow as the previous commit; only the arc is excluded, the loop body stays measured.

The checks must stay after the streamable_http_client context exits so a teardown-time mutation cannot escape them, which on Python 3.11 places them in the trace-loss shadow of python/cpython#106749: they execute and assert on every matrix cell but go unmeasured on 3.11. Same convention as the suite's existing lax exclusions.

The TestChildProcessCleanup tests asserted that spawned writers had started after fixed sleeps (0.5s startup, 0.3s observation window). On loaded CI runners the child interpreter can take longer than that to boot, so the "child should be writing" assertion failed with "assert 0 > 0" before the cleanup logic under test ever ran. Replace the fixed sleeps with bounded polling: - _wait_for_first_write polls until a marker file has grown, proving the writer reached its write loop, with a 15s timeout. - _wait_for_writes_to_stop polls until two samples taken 0.3s apart (3x the writers' 0.1s write interval) observe the same size; if the file never stops growing the timeout fails the test, so a genuine cleanup failure is still reported. Also terminate the spawned process tree in each test's finally block, so a failed assertion can no longer leak a running process tree, and collect garbage before leaving each test so subprocess transports are finalized while the test's ResourceWarning filters are still active. Removing the unconditional sleeps also makes the tests faster.

… tests Two follow-up fixes to the child process cleanup tests: Require three consecutive stable samples before declaring writes stopped. The previous check exited on the first pair of samples 0.3s apart with no growth, retrying within the 15s budget, which made it easier for a CPU-starved (but alive) writer to be mistaken for a terminated one. The counter resets on any observed growth, and a file that never stops growing still fails the test via the timeout. Only re-terminate the process tree in the finally block if the test failed before reaching its own _terminate_process_tree call. The unconditional second call ran termination against an already-closed job object handle on Windows and logged a spurious fallback warning on POSIX in every passing run. The skipped branch only executes on failing runs, so it is excluded from coverage.

maxisbey added 15 commits June 11, 2026 14:52

Deflake the session-level timeout test with trio's virtual clock (#2788)

b880950

tests: re-export StreamingASGITransport as the sanctioned bridge impo…

8d73896

…rt point

Run transport security tests in process instead of over sockets (#2764)

6a6e7b7

Tests-only backport to v1.x; adapted from main commit b3025f9.

Run SSE and Unicode transport tests in process instead of over sockets (

92c4fe0

#2765) Tests-only backport to v1.x; adapted from main commit ed39e73.

Run StreamableHTTP transport tests in process instead of over sockets (…

43cf9ec

…#2767) Tests-only backport to v1.x; adapted from main commit 19fe9fa.

Exclude the connection loop's exit arc from branch coverage

3e53636

The arc fires after the final async-with teardown, inside the same Python 3.11 trace-loss shadow as the previous commit; only the arc is excluded, the loop body stays measured.

maxisbey marked this pull request as ready for review June 11, 2026 15:02

maxisbey added 2 commits June 11, 2026 15:56

felixweinberger approved these changes Jun 11, 2026

View reviewed changes

maxisbey merged commit a3234b6 into maxisbey/v1x-interaction-backport Jun 11, 2026
19 checks passed

maxisbey deleted the maxisbey/v1x-tests-deflake branch June 11, 2026 16:15

maxisbey mentioned this pull request Jun 11, 2026

[v1.x] Deflake the child process cleanup tests #2839

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v1.x] Backport transport test deflakes from main (tests-only) - #2837

[v1.x] Backport transport test deflakes from main (tests-only)#2837
maxisbey merged 17 commits into
maxisbey/v1x-interaction-backportfrom
maxisbey/v1x-tests-deflake

maxisbey commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

maxisbey commented Jun 11, 2026

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants