Skip to content

fix: six small issues found during the sync/async parity review#1256

Open
AhmadMasry wants to merge 1 commit into
aws:mainfrom
AhmadMasry:fix/sync-findings-from-async-parity-review
Open

fix: six small issues found during the sync/async parity review#1256
AhmadMasry wants to merge 1 commit into
aws:mainfrom
AhmadMasry:fix/sync-findings-from-async-parity-review

Conversation

@AhmadMasry

@AhmadMasry AhmadMasry commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Description

While preparing the async (asyncio) phase of #1251, we ran a systematic sync/async parity review — comparing each module of the upcoming async port against its sync counterpart line by line, with the sync implementation as the source of truth. Along the way the review surfaced a handful of small pre-existing issues in the sync code. This PR fixes the six of them we could verify, each with a focused regression test. They are all small and independent; happy to split or drop any of them if you'd prefer.

  1. Blue/Green: two message keys referenced by logger calls are missing from the bundle. SuspendConnectRouting.WaitConnectUntilCorrespondingHostFound is logged with a format argument (blue_green_plugin.py:504); Logger.debug's formatted path has no missing-key guard, so with DEBUG logging enabled a switchover in the suspend-until-corresponding-host path raises NotInResourceBundleError. BlueGreenStatusProvider.AllGreenHostsChangedName (no-arg, :1791) was silently skipped. Both keys are added.
  2. Limitless: _is_login_exception discarded its result (limitless_plugin.py:524-525 had no return), so the if self._is_login_exception(e): raise guards in the connect/retry paths never fired and login failures were retried like transient errors. It now returns the classification verdict. Note: this is a small behavior change — auth failures now surface immediately instead of burning the router retry budget, which appears to be the callers' intent.
  3. Limitless: the synchronous get-routers path queried a stale connection. After context.set_connection(...) replaces a dead/None monitoring connection, the router query still used the pre-reconnect local variable (:505-510). It now queries the fresh connection.
  4. Session state: the transfer-handler reset was a silent no-op. reset_transfer_session_state_on_switch_func assigned a nonexistent attribute (states/session_state_service.py:99-101reset_transfer_session_state_on_switch_callable instead of transfer_session_state_on_switch_callable), so a registered custom transfer handler could never be cleared.
  5. Plugin service: raw message key raised. plugin_service.py:475 raised the literal string "PluginServiceImpl.CouldNotDetermineCurrentHost" instead of the resolved message (the key exists in the bundle).
  6. Aurora initial connection strategy: the retry deadline never used its timeout property. Both _get_verified_writer_connection and _get_verified_reader_connection computed the deadline from OPEN_CONNECTION_RETRY_INTERVAL_MS (default 1 s), and OPEN_CONNECTION_RETRY_TIMEOUT_MS (default 30 s) was never read anywhere — effectively capping the whole retry budget at one interval. The deadline now uses the timeout property.

The five limitless retry tests were updated to stub is_login_exception as False (they exercise generic retryable errors — previously the discarded verdict masked the distinction).

Verification: mypy (237 files), flake8, isort, and pytest ./tests/unit -Werror (1,123 passed) all clean.

Proposed CHANGELOG entry (under ### :bug: Fixed):

Six small fixes found during the sync/async parity review: missing Blue/Green log message keys (crash under DEBUG logging during switchover), Limitless login-exception classification discarded and stale monitoring-connection read, session-state transfer-handler reset not clearing the handler, a raw message key raised by the plugin service, and the Aurora initial connection strategy never reading open_connection_retry_timeout_ms. (PR #1256)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

- Blue/Green: add the two message keys referenced by logger calls
  (SuspendConnectRouting.WaitConnectUntilCorrespondingHostFound is logged
  WITH a format arg, so its absence raised NotInResourceBundleError
  whenever DEBUG logging was enabled during a switchover;
  BlueGreenStatusProvider.AllGreenHostsChangedName was silently skipped).
- Limitless: _is_login_exception now returns the classification verdict
  (previously discarded, so login failures were never short-circuited out
  of the router retry loop as the callers intend); the synchronous
  get-routers path queries the freshly reconnected monitoring connection
  instead of the stale pre-reconnect local.
- Session state: reset_transfer_session_state_on_switch_func assigned a
  nonexistent attribute, silently keeping the old transfer handler
  registered -- it now clears the real one.
- Plugin service: the CouldNotDetermineCurrentHost error resolves its
  message key instead of raising the raw key string.
- Aurora initial connection strategy: the retry deadline now comes from
  open_connection_retry_timeout_ms (previously it reused
  open_connection_retry_interval_ms and never read the timeout property).

Each fix has a focused regression test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant