[v1.x] Backport transport test deflakes from main (tests-only)#2837
Merged
maxisbey merged 17 commits intoJun 11, 2026
Conversation
The session-level timeout test now runs on trio's virtual clock. At teardown the streamable-HTTP client abandons its httpx/httpx-sse response generators; trio's asyncgen finalizer warns about each one (asyncio finalizes abandoned generators silently at loop shutdown), and filterwarnings=error turns that into a test failure. Scope-ignore the two third-party generator signatures, and make the bridge's response body delegate to the memory stream's own iterator so the harness itself leaves no abandoned generator.
Tests-only backport to v1.x; adapted from main commit b3025f9.
…ty tests The three tests that complete the initialize handshake leak anyio memory streams at transport teardown when run in process. The scoped filters in tests/interaction/conftest.py cover full-suite runs, but they only load when that package is collected, so a targeted run of this file alone turns the ResourceWarning into a failure under filterwarnings=error. Mirror the module-level filterwarnings marks already used in tests/shared/test_sse.py.
sse-starlette <3.0 stores its exit Event on the AppStatus class the first time an EventSourceResponse runs; the Event is bound to that test's event loop and breaks every later in-process SSE response on the same worker. test_http_unicode.py serves all its requests as EventSourceResponses (json_response=False) but had no reset fixture, so on a sse-starlette<3.0 install (CI's lowest-direct legs) it could poison the worker for any later SSE-based test. - Copy the autouse AppStatus reset fixture into test_http_unicode.py. - Reset on both sides of the yield, here and in test_sse.py, so each module also survives a stale Event left behind by an earlier test. - Correct the filterwarnings comments in both files: the item-scoped markers cannot cover the GC flush at session cleanup, so isolated runs without xdist (-n 0) still exit nonzero after all tests pass.
…lters The module-level filterwarnings marks only apply while a test in this file is the active warning context, but v1's leaked memory streams sit in reference cycles and were garbage-collected later: at pytest's session-unconfigure unraisable sweep (every test in this file passed but pytest exited 1 when run without xdist, e.g. -n 0 for --pdb debugging), or during an unrelated test on the same worker, where the global filterwarnings = ["error"] turned the deallocator warning into a failure of that innocent test. An autouse fixture now runs gc.collect() in each test's teardown, so the deallocator warnings fire where the scoped ignores apply. Verified: every previously failing single-test and full-file -n 0 invocation now exits 0, and cross-file runs no longer misattribute the warnings.
…bal exit event tests/shared/test_streamable_http.py and tests/server/test_streamable_http_security.py serve responses in process; on sse-starlette <3.0 (what the lowest-direct CI legs install) the first EventSourceResponse binds AppStatus.should_exit_event to that test's event loop and every later SSE response in the module fails. Port the both-sides reset fixture from tests/client/test_http_unicode.py into both files. Also give the security module the same per-test gc.collect() teardown as test_streamable_http.py so leaked-stream warnings stay inside the module's scoped filters, and drop the comment claiming a -n 0 run exits nonzero after all tests pass: with the flush in place it exits 0.
tests/shared/test_sse.py and tests/client/test_http_unicode.py carry the same scoped MemoryObject*Stream filterwarnings marks as tests/shared/test_streamable_http.py and had the same gap: the leaked streams sit in reference cycles, so their deallocator warnings fired at pytest's session-unconfigure unraisable sweep, exiting 1 after all tests passed when run without xdist (-n 0). Both files documented this in their filter-mark comments. Apply the same autouse fixture that fixed test_streamable_http.py — a gc.collect() in each test's teardown, where the item-scoped ignores still apply — and update the comments to describe the fixed behavior. Verified: -n 0 runs of each file now exit 0 (previously 26 passed/exit 1 and 2 passed/exit 1), and five consecutive default-option runs of both files together pass.
Once an application sends the final http.response.body chunk, the bridge now drops any further http.response.start/body messages, matching what a real ASGI server's client observes. Starlette's request_response sends a trailing response when an endpoint's sub-application has already completed a rejection response, so the SSE security rejection tests no longer depend on scheduling for the client to read the first status. Pinned by two new bridge contract tests (registered as harness self-tests). Also widen the trio-leg unraisable-warning filter to the whole httpx/httpx-sse generator chain: abandoning EventSource.aiter_sse abandons the nested aiter_lines -> aiter_text -> aiter_bytes -> aiter_raw generators, and which link the finalizer reports depends on GC timing and Python version.
Statements after the connection loop sit in the Python 3.11 trace-loss shadow (python/cpython#106749) and were reported uncovered on 3.11 matrix cells; verifying inside the traced region removes the gap without a coverage exclusion.
The arc fires after the final async-with teardown, inside the same Python 3.11 trace-loss shadow as the previous commit; only the arc is excluded, the loop body stays measured.
The checks must stay after the streamable_http_client context exits so a teardown-time mutation cannot escape them, which on Python 3.11 places them in the trace-loss shadow of python/cpython#106749: they execute and assert on every matrix cell but go unmeasured on 3.11. Same convention as the suite's existing lax exclusions.
The TestChildProcessCleanup tests asserted that spawned writers had started after fixed sleeps (0.5s startup, 0.3s observation window). On loaded CI runners the child interpreter can take longer than that to boot, so the "child should be writing" assertion failed with "assert 0 > 0" before the cleanup logic under test ever ran. Replace the fixed sleeps with bounded polling: - _wait_for_first_write polls until a marker file has grown, proving the writer reached its write loop, with a 15s timeout. - _wait_for_writes_to_stop polls until two samples taken 0.3s apart (3x the writers' 0.1s write interval) observe the same size; if the file never stops growing the timeout fails the test, so a genuine cleanup failure is still reported. Also terminate the spawned process tree in each test's finally block, so a failed assertion can no longer leak a running process tree, and collect garbage before leaving each test so subprocess transports are finalized while the test's ResourceWarning filters are still active. Removing the unconditional sleeps also makes the tests faster.
… tests Two follow-up fixes to the child process cleanup tests: Require three consecutive stable samples before declaring writes stopped. The previous check exited on the first pair of samples 0.3s apart with no growth, retrying within the 15s budget, which made it easier for a CPU-starved (but alive) writer to be mistaken for a terminated one. The counter resets on any observed growth, and a file that never stops growing still fails the test via the timeout. Only re-terminate the process tree in the finally block if the test failed before reaching its own _terminate_process_tree call. The unconditional second call ran termination against an already-closed job object handle on Windows and logged a spurious fallback warning on POSIX in every passing run. The skipped branch only executes on failing runs, so it is excluded from coverage.
felixweinberger
approved these changes
Jun 11, 2026
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backports the transport test deflaking work from main (#2764, #2765, #2767, #2788) to v1.x. Tests-only: the diff touches nothing under
src/.Stacked on #2717 (
maxisbey/v1x-interaction-backport), which supplies the in-process streaming bridge these tests are built on.Motivation and Context
The v1.x transport tests still run over real sockets with uvicorn servers in threads or subprocesses, and CI pays for it: ephemeral port races, server startup polling, and a real-clock session-timeout bet that loses under scheduler load. main fixed all of this by moving the tests in process; this PR ports those rewrites to v1.x so the v1.x matrix stops flaking on them.
Two of the rewritten files have direct v1.x CI evidence (
test_streamable_http_multiple_reconnectionsandtest_json_responsefailures on Windows legs); the rest share the same socket-based harness and mechanism. The session-timeout deflake matters for #2717 itself: the suite that PR backports still carries the pre-#2788 real-clock test, which fails roughly 1 in 3 full-suite runs under coverage load, once per transport leg. Folding the first three commits here into #2717 instead would work equally well.Adaptations made for v1.x, each noted in the commit that makes it:
src/delta from Run StreamableHTTP transport tests in process instead of over sockets #2767 (a pragma removal and a stale docstring note) is dropped; v1.x has no strict pragma checking, so the retained pragma is harmless.streamablehttp_clientalias, deprecated transport params) are re-added in in-process form, so no existing v1.x test is lost.gc.collect()so isolated runs stay clean).How Has This Been Tested?
test_session_level_timeout_applies_to_every_request; with the virtual-clock version the failure has never reproduced (28 full-suite runs plus 150 isolated repetitions).src/changes.Breaking Changes
None. Tests-only; no public interface is touched.
Types of changes
Checklist
Additional context
Draft while #2717 is in review. If #2717 is amended, the only coupling points are
tests/interaction/transports/__init__.py(the re-export added here) and the small_bridge.pyandconftest.pychanges; everything else is independent of its internals. Once #2717 merges this can be retargeted tov1.x.AI Disclaimer