Skip to content

fix: thread conn= into failover timeout paths; invalidate pooled connections in tracker#1255

Merged
karenc-bq merged 1 commit into
aws:mainfrom
AhmadMasry:fix/failover-abort-and-tracker-invalidate
Jul 2, 2026
Merged

fix: thread conn= into failover timeout paths; invalidate pooled connections in tracker#1255
karenc-bq merged 1 commit into
aws:mainfrom
AhmadMasry:fix/failover-abort-and-tracker-invalidate

Conversation

@AhmadMasry

@AhmadMasry AhmadMasry commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Description

Follow-up to #1252 (sync foundation for asyncio support, phase 1 of #1251). This is sync-side hardening of the failover paths, intended to land before the async phase (phase 2) is introduced.

Two changes:

  1. Thread conn= into the failover plugins' timeout paths. feat: sync foundation for asyncio support (phase 1 of #1251) #1252 introduced the abort-and-drain mechanism for timed-out offloaded operations (on timeout: shut down the operation's socket via driver_dialect.abort_connection and wait for its worker thread, so a later close/reuse of the connection cannot race the still-running operation) and applied it to the host-monitoring and limitless paths. This PR passes conn= to driver_dialect.execute() in the failover plugins' rollback/close call sites (failover_plugin.py, failover_v2_plugin.py), completing the mechanism for the failover paths.

  2. Prefer PoolProxiedConnection.invalidate() over close() in the connection tracker. When wrapper-internal connection pools (SqlAlchemyPooledConnectionProvider) are enabled, the connections tracked by OpenedConnectionTracker are SQLAlchemy pool proxies. Calling close() on a proxy checks it back into the pool, which first runs rollback-on-return against the failed writer — an unbounded blocking call on the tracker thread when the host is unreachable — and re-pools the connection if that rollback happens to succeed (e.g. the old writer came back as a reader). invalidate() (public SQLAlchemy API) skips the rollback and discards the connection, so the pool opens a fresh connection on the next checkout. Plain driver connections keep the existing close() behavior.

Unit tests cover both the invalidate-vs-close dispatch and the end-to-end pool behavior against a real QueuePool (connection discarded without rollback-on-return, fresh connection on next checkout).

Proposed CHANGELOG entry (under ### :bug: Fixed):

Failover plugins now pass the target connection into the execute-timeout path so timed-out rollback/close operations abort their socket and drain their worker; the Aurora connection tracker now invalidates (rather than re-pools) connections from wrapper-internal connection pools after writer failover. (PR #1255)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…ections in tracker

Follow-up to the sync foundation PR aws#1252 (phase 1 of aws#1251): sync-side
hardening of the failover paths, intended to land before the async phase
(phase 2) is introduced.

Pass conn= to driver_dialect.execute() in the failover plugins' rollback/close
paths so a timed-out operation can abort its socket and drain its worker.
PR aws#1252 introduced this abort-and-drain mechanism and applied it to the
host-monitoring and limitless paths; this completes the failover paths.

In the connection tracker, prefer PoolProxiedConnection.invalidate() over
close() for pool-proxied connections (wrapper-internal connection pools):
close() checks the connection back into the pool, which runs an unbounded
rollback-on-return against the failed writer and re-pools the connection if
that rollback succeeds. invalidate() skips the rollback and discards the
connection so the pool opens a fresh one on next checkout.
@karenc-bq karenc-bq merged commit d8e210a into aws:main Jul 2, 2026
7 checks passed
@AhmadMasry AhmadMasry deleted the fix/failover-abort-and-tracker-invalidate branch July 3, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants