Skip to content

feat: sync foundation for asyncio support (phase 1 of #1251)#1252

Merged
karenc-bq merged 18 commits into
aws:mainfrom
AhmadMasry:feat/sync-phase1
Jul 1, 2026
Merged

feat: sync foundation for asyncio support (phase 1 of #1251)#1252
karenc-bq merged 18 commits into
aws:mainfrom
AhmadMasry:feat/sync-phase1

Conversation

@AhmadMasry

Copy link
Copy Markdown
Contributor

Phase 1 of the #1251 async-support contribution, split per maintainer request into sync-first then async. Sync-side changes + Python 3.14 only; the aws_advanced_python_wrapper.aio subpackage, async SQLAlchemy dialects, and async tests land in phase 2.

  • Sync-side behavioral changes flagged in Async (asyncio) support for the wrapper — a community contribution we'd love your guidance on #1251: cross-thread socket-shutdown on execute-timeout (env-4 SIGSEGV interrupt-and-wait) in the driver dialects
    • decorators.timeout; transient-connect exception classification; failover connection retry logic (RetryUtil), incl. the *_OR_WRITER writer fallback.
  • Sync SQLAlchemy dialects: postgresql+aws_wrapper_psycopg and mysql+aws_wrapper_mysqlconnector (sqlalchemy_dialects/), replacing the old sqlalchemy/mysql_orm_dialect.py; DBAPI-shim restructure (_dbapi.install).
  • Python 3.14: version classifier, requires-python, CI matrix, integration harness PYTHON_3_14 + test-python-3.14-{pg,mysql} tasks; sync dependency bumps (SQLAlchemy/psycopg/mysql-connector/boto3/requests).
  • Tests: unit + sync SA integration tests, plus a new sync RetryUtil writer-fallback regression test.

/verify green: mypy (237 files), flake8, isort, 1114 unit tests on Python 3.14.

Description

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Phase 1 of the aws#1251 async-support contribution, split per maintainer request
into sync-first then async. This branch carries the sync-side changes and
Python 3.14 support only; the aws_advanced_python_wrapper.aio subpackage, the
async SQLAlchemy dialects, and the async test suites land in phase 2.

- Sync-side behavioral changes flagged in aws#1251: cross-thread socket-shutdown
  on execute-timeout (the env-4 SIGSEGV interrupt-and-wait) in the driver
  dialects + decorators.timeout; transient-connect exception classification;
  and the failover connection retry logic (RetryUtil), incl. the *_OR_WRITER
  reader-then-writer fallback.
- Sync SQLAlchemy dialects: postgresql+aws_wrapper_psycopg and
  mysql+aws_wrapper_mysqlconnector (sqlalchemy_dialects/), replacing the old
  sqlalchemy/mysql_orm_dialect.py; DBAPI-shim restructure (_dbapi.install).
- Python 3.14: version classifier, CI matrix, integration harness PYTHON_3_14
  + test-python-3.14-{pg,mysql} tasks; sync dependency bumps.
- Tests: unit + sync SQLAlchemy integration tests, plus a sync RetryUtil
  writer-fallback regression test.

pyproject uses the [tool.poetry] metadata format (Poetry 1.8.2-compatible, as
on main). The sync surface carries no async/aiomysql/greenlet references.
greenlet 3.1.1 ships no cp314 wheel, so the Python 3.14 unit-CI job
(3.14 added to main.yml in 6817b96) built it from source and failed
against CPython 3.14's internal frame API (_PyInterpreterFrame,
c_recursion_remaining, Py_C_RECURSION_LIMIT). greenlet 3.5.2 provides
cp314 wheels. Pulled transitively via SQLAlchemy on x86_64; lock-only
change, lock-version unchanged (2.1).
Comment thread aws_advanced_python_wrapper/utils/decorators.py Outdated
…abort_connection

Addresses PR aws#1252 review feedback: the timeout decorators reached into driver
internals via a generic _shutdown_connection_socket helper, duplicating the
socket-shutdown that PgDriverDialect.abort_connection already does and leaking
the driver abstraction into utils/decorators.py.

- mysql_driver_dialect.py: implement abort_connection (was an
  UnsupportedOperationError stub) -- shut down conn._socket.sock (SHUT_RDWR) for
  the pure-Python connector, best-effort no-op for the C extension, WITHOUT
  freeing the connection (the owning thread closes it).
- utils/decorators.py: drop _shutdown_connection_socket; call
  driver_dialect.abort_connection(conn) directly in
  preserve_transaction_status_with_timeout and timeout. timeout() gains a
  driver_dialect parameter.
- driver_dialect.py: the sole timeout() call site now passes self.
- tests: add MySQLDriverDialect.abort_connection unit tests.
AhmadMasry added a commit to AhmadMasry/aws-advanced-python-wrapper that referenced this pull request Jun 23, 2026
…abort_connection

Addresses PR aws#1252 review feedback: the timeout decorators reached into driver
internals via a generic _shutdown_connection_socket helper, duplicating the
socket-shutdown that PgDriverDialect.abort_connection already does and leaking
the driver abstraction into utils/decorators.py.

- mysql_driver_dialect.py: implement abort_connection (was an
  UnsupportedOperationError stub) -- shut down conn._socket.sock (SHUT_RDWR) for
  the pure-Python connector, best-effort no-op for the C extension, WITHOUT
  freeing the connection (the owning thread closes it).
- utils/decorators.py: drop _shutdown_connection_socket; call
  driver_dialect.abort_connection(conn) directly in
  preserve_transaction_status_with_timeout and timeout. timeout() gains a
  driver_dialect parameter.
- driver_dialect.py: the sole timeout() call site now passes self.
- tests: add MySQLDriverDialect.abort_connection unit tests.
@karenc-bq

Copy link
Copy Markdown
Contributor

Not part of the modified files but we also need to implement __getattr__ for AwsWrapperConnection and AwsWrapperCursor in wrapper.py:

    def __getattr__(self, name: str) -> Any:
        return getattr(self._plugin_service.current_connection, name)

and

    def __getattr__(self, name: str) -> Any:
        return getattr(self._target_cursor, name)

SQLAlchemy sometimes calls driver--specific extension methods that are not exposed by the Wrapper's PEP-249 API, such as conn.add_notice_handler(_log_notices)

Addresses PR aws#1252 review: SQLAlchemy (and application code) call driver-specific
extension methods that aren't on the wrapper's PEP-249 surface, e.g. psycopg's
conn.add_notice_handler / cursor.statusmessage. AwsWrapperConnection and
AwsWrapperCursor now delegate unknown attributes to the underlying driver
connection (the live one, via current_connection) and cursor respectively, so
those extensions work transparently.

__getattr__ runs only on a normal-lookup miss, so it never shadows the wrapper's
own API. Underscore names are not delegated: that keeps Python internals
(pickle/copy dunders) on the wrapper and prevents infinite recursion if an
internal attribute (e.g. _plugin_service) is read before it is set. Adds unit
tests for delegation and the underscore guard.
…rror CI)

opentelemetry-api 1.42.x's _importlib_metadata shim handles Python 3.10/3.11's
SelectableGroups by calling eps.values(), which raises importlib.metadata's
"SelectableGroups dict interface is deprecated" DeprecationWarning. Under CI's
`pytest ./tests/unit -Werror` that warning becomes an error during conftest
import, so the entire unit suite fails to collect on Python 3.10/3.11.

opentelemetry 1.43.0 fixes this upstream (uses dict.values(eps) to avoid the
deprecated SelectableGroups interface). Bump the opentelemetry-api/sdk/exporter
constraints to ^1.43.0 and refresh the lock (only the opentelemetry packages
change). Verified locally: `pytest ./tests/unit -Werror` passes on both
Python 3.10 and 3.14.
@AhmadMasry

Copy link
Copy Markdown
Contributor Author

Not part of the modified files but we also need to implement __getattr__ for AwsWrapperConnection and AwsWrapperCursor in wrapper.py:

    def __getattr__(self, name: str) -> Any:
        return getattr(self._plugin_service.current_connection, name)

and

    def __getattr__(self, name: str) -> Any:
        return getattr(self._target_cursor, name)

SQLAlchemy sometimes calls driver--specific extension methods that are not exposed by the Wrapper's PEP-249 API, such as conn.add_notice_handler(_log_notices)

Done — added getattr to both AwsWrapperConnection and AwsWrapperCursor so driver-specific extension methods (e.g. psycopg's add_notice_handler) delegate to the underlying connection/cursor; underscore names are excluded to keep the wrapper's internals off the delegation path and avoid recursion. Added unit tests.
I also pushed a fix for the unrelated 3.10/3.11 unit-test failure — opentelemetry 1.42.x raises a SelectableGroups DeprecationWarning that -Werror turns into a conftest import error; bumped to 1.43.0.

@karenc-bq

Copy link
Copy Markdown
Contributor

Thank you for addressing the PR comments. We have ran the integration tests against the changes for verification as well.
To ensure we correctly exclude version 3.14 when running the tests against other Python versions, could you please include these changes as well: e22fdbc

Comment thread aws_advanced_python_wrapper/sql_alchemy_connection_provider.py Outdated
Mirror upstream e22fdbc: add the exclude-python-3-14 system property and the
corresponding PYTHON_3_14 skip guard in the test environment provider, matching
the existing exclude-python-3-12/13 pattern. Lets the integration harness
correctly exclude Python 3.14 when running the matrix against other Python
versions.
@AhmadMasry

Copy link
Copy Markdown
Contributor Author

Thank you for addressing the PR comments. We have ran the integration tests against the changes for verification as well. To ensure we correctly exclude version 3.14 when running the tests against other Python versions, could you please include these changes as well: e22fdbc

Pushed, and thank you for your feedback

…rker can't be drained

Topology/writer-id queries offloaded by preserve_transaction_status_with_timeout
(and timeout) are interrupted via abort_connection on timeout, then drained before
the timeout propagates -- so the caller never closes the connection out from under a
worker still executing on it (a cross-thread PQfinish/close = use-after-free SIGSEGV).

The drain was bounded and best-effort: when abort_connection cannot interrupt the
worker -- it shuts the socket for psycopg and the mysql pure-Python connector, but is
a NO-OP for the mysql C-extension -- the drain timed out, was swallowed by a blanket
except, and the caller closed the connection while the worker was still mid-query.
Reproduced as `Fatal Python error: Segmentation fault` on PG multi-instance failover
(py3.10) under a reader-connectivity storm: ClusterTopologyMonitor teardown closed a
reader connection while an offloaded _query_for_topology worker was in cursor.execute.

Fix: if the post-abort drain still times out, mark the connection abandoned (a
WeakSet registry); connection-close paths (ClusterTopologyMonitor._close_connection)
skip abandoned connections. The still-running worker holds the last reference, so the
connection is finalized safely on the worker's own thread once it finishes. A
connection leaked until GC is strictly better than a process crash (close-during-query)
or a hang (waiting forever for an uninterruptible worker).
Per maintainer review on PR aws#1252: revert _get_connection_func to the original
target_connect_func passthrough. Transient connect errors are better handled at
the plugin level (initial-connection / failover plugin) -- the provider-level
retry overlaps with failover's own error classification and, for a permanently
unavailable host, delays failover to a healthy instance.

Removes the now-unused transient_connect helper module, the
CONNECTION_RETRY_MAX_ATTEMPTS / CONNECTION_RETRY_MAX_BACKOFF_S properties, and the
transient_connect unit tests (all PR-new, not on main). _get_connection_func is
now byte-identical to main.
AhmadMasry added a commit to AhmadMasry/aws-advanced-python-wrapper that referenced this pull request Jun 26, 2026
Per maintainer review on PR aws#1252: revert the connection-provider transient
retry to the original passthrough. Transient connect errors are better handled
at the plugin level (initial-connection / failover plugin) -- the provider-level
retry overlaps with failover's own error classification and, for a permanently
unavailable host, delays failover to a healthy instance.

Reverts _get_connection_func in both the sync SqlAlchemyPooledConnectionProvider
and the async AsyncPooledConnectionProvider, and removes the
CONNECTION_RETRY_MAX_ATTEMPTS / CONNECTION_RETRY_MAX_BACKOFF_S properties. The
transient_connect classifier module is retained -- it is still used by the Django
backend and the integration-test harness (is_transient_connect_error), which are
out of scope for this revert.
…etattr__

Mirror the async-wrapper fix (e75c9cf) on the sync wrapper for parity: the
__getattr__ guard rejected ALL underscore names, which would block single-
underscore driver methods (e.g. psycopg's _close, which SQLAlchemy's adapter
calls) if a sync caller ever reached for one. Narrow both the connection and
cursor guards to block only dunders and the recursion-critical internal field
each method dereferences (_plugin_service / _target_cursor); delegate everything
else, including single-underscore driver attributes. Update the delegation unit
tests to the corrected contract.
AhmadMasry added a commit to AhmadMasry/aws-advanced-python-wrapper that referenced this pull request Jun 27, 2026
…ain)

The transient-connect retry in get_new_connection was added by us, not the
upstream AWS implementation -- main's Django backend is a plain
AwsWrapperConnection.connect() with no retry. Per the maintainer's guidance that
transient connect errors belong at the plugin level rather than duplicated where
they overlap with failover (PR aws#1252), and our convention that sync code follows
the AWS reference, revert the backend to main's exact version. The
transient_connect classifier is retained for now -- still used only by the
integration-test harness (is_transient_connect_error) for setup resilience.
…gin chain

psycopg exposes conn.execute(), conn.set_read_only(), conn.set_autocommit() as
native methods. Without explicit wrapper versions, __getattr__ forwards them raw
to the driver -- bypassing the plugin pipeline (failover / read-write-splitting)
and the session-state service. The practical bug: a read_only/autocommit change
made via the method form is lost on the post-failover reconnect, and
conn.execute() SQL is invisible to plugins. Add plugin-routed versions (execute
opens a wrapper cursor; set_read_only/set_autocommit go through the existing
plugin-aware property setters).

Also add the remaining psycopg-shaped passthroughs the sync wrapper was missing:
invalidate (pool-evict, used by the aurora connection tracker), closed (SA's
psycopg dialect probes it directly), and prepare_threshold / prepared_max
(psycopg connection settings whose setters __getattr__ cannot forward).

These were present on feat/async-parity's sync wrapper but absent here, so the
gap was latent on the phase-1 branch.
mypy (no_implicit_optional) rejected ``abort_releases: threading.Event = None``;
the parameter is genuinely optional, so type it Optional[threading.Event].
DefaultPlugin.execute did not pass the connection an operation runs on into
driver_dialect.execute, so on a timeout the worker thread kept running the query
on a connection a later close/reuse could race -- the env-4 execute-path SIGSEGV.
Resolve target to its connection via get_connection_from_obj and pass it as
conn= (driver_dialect.execute already accepts it, from the abort_connection
work). The supporting wiring was present on this branch but unused.

Also add the PgDriverDialect.AbortConnectionShutdownFailed message key that
pg_driver_dialect.abort_connection logs: it was referenced but missing from the
bundle, so that error path would raise NotInResourceBundleError.
The property was defined on this branch but nothing consumes it -- the
read-write-splitting recheck logic that reads it lives only on the async-parity
feature branch. Drop the orphaned definition; it ships with the RWS recheck
feature, not the phase-1 foundation.
After the transient-retry revert dropped `Any` from the typing import, the
remaining names fit on a single line but were left wrapped across two, so the
full-repo `isort --check-only .` that CI runs (PR aws#1252) failed -- which skipped
the unit-test job on every Python version. Collapse to one line. /verify green
(mypy, flake8, isort, pytest -Werror).
Complete the env-4 execute-path SIGSEGV coverage already on this branch
(default_plugin's conn=): pass the operation's connection into
driver_dialect.execute on the host-monitoring (v1/v2) liveness probe and the
limitless query helper, so a timeout can interrupt-and-wait the right connection
instead of leaving it racing a later close/reuse. driver_dialect.execute already
accepts conn= (from the abort_connection work); these three call sites just
weren't passing it.
Open an [Unreleased] section with Python 3.14 support (Added) and the env-4
execute-timeout cross-thread use-after-free fix (Fixed).

@karenc-bq karenc-bq left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the improvements your team is making to the wrapper. Just a few more comments, otherwise this is ready to be merged.

Comment thread docs/examples/MySQLSQLAlchemyReadWriteSplitting.py Outdated
Comment thread docs/examples/PGSQLAlchemyFailover.py Outdated
Comment thread docs/examples/PGSQLAlchemyFailover.py
Comment thread docs/examples/MySQLSQLAlchemyFailover.py Outdated
Comment thread docs/using-the-python-wrapper/SqlAlchemySupport.md Outdated
- Creator= examples now use the wrapper's custom dialects:
  postgresql+aws_wrapper_psycopg (required for PG -- the stock psycopg dialect's
  TypeInfo.fetch() raises "TypeError: expected Connection or AsyncConnection, got
  AwsWrapperConnection" on the wrapper proxy) and mysql+aws_wrapper_mysqlconnector
  (consistency / future-proofing).
- Add cluster_id to the PG failover example.
- Remove the read/write-splitting SQLAlchemy examples + references: RWS is not
  supported with SQLAlchemy (no routing of execution_options readonly to the
  wrapper's read_only attribute).
- Restore the Plugin Compatibility table, marking read_write_splitting and srw
  unsupported, with a pointer to SQLAlchemy session binding.
@karenc-bq karenc-bq merged commit 465901f into aws:main Jul 1, 2026
7 checks passed
AhmadMasry added a commit to AhmadMasry/aws-advanced-python-wrapper that referenced this pull request Jul 2, 2026
…ections in tracker

Follow-up to the sync foundation PR aws#1252 (phase 1 of aws#1251): sync-side
hardening of the failover paths, intended to land before the async phase
(phase 2) is introduced.

Pass conn= to driver_dialect.execute() in the failover plugins' rollback/close
paths so a timed-out operation can abort its socket and drain its worker.
PR aws#1252 introduced this abort-and-drain mechanism and applied it to the
host-monitoring and limitless paths; this completes the failover paths.

In the connection tracker, prefer PoolProxiedConnection.invalidate() over
close() for pool-proxied connections (wrapper-internal connection pools):
close() checks the connection back into the pool, which runs an unbounded
rollback-on-return against the failed writer and re-pools the connection if
that rollback succeeds. invalidate() skips the rollback and discards the
connection so the pool opens a fresh one on next checkout.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
AhmadMasry added a commit to AhmadMasry/aws-advanced-python-wrapper that referenced this pull request Jul 2, 2026
…ections in tracker

Follow-up to the sync foundation PR aws#1252 (phase 1 of aws#1251): sync-side
hardening of the failover paths, intended to land before the async phase
(phase 2) is introduced.

Pass conn= to driver_dialect.execute() in the failover plugins' rollback/close
paths so a timed-out operation can abort its socket and drain its worker.
PR aws#1252 introduced this abort-and-drain mechanism and applied it to the
host-monitoring and limitless paths; this completes the failover paths.

In the connection tracker, prefer PoolProxiedConnection.invalidate() over
close() for pool-proxied connections (wrapper-internal connection pools):
close() checks the connection back into the pool, which runs an unbounded
rollback-on-return against the failed writer and re-pools the connection if
that rollback succeeds. invalidate() skips the rollback and discards the
connection so the pool opens a fresh one on next checkout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants