Skip to content

JAVA-6194 Add MongoSocksProxyException for CMAP backpressure labeling#1968

Open
nhachicha wants to merge 45 commits into
mongodb:backpressurefrom
nhachicha:nh/backpressure/socks5_exception
Open

JAVA-6194 Add MongoSocksProxyException for CMAP backpressure labeling#1968
nhachicha wants to merge 45 commits into
mongodb:backpressurefrom
nhachicha:nh/backpressure/socks5_exception

Conversation

@nhachicha
Copy link
Copy Markdown
Collaborator

nhachicha and others added 10 commits March 5, 2026 13:34
…ABEL` (mongodb#1926)

This commit only adds the labels, and does not fully implement the tickets specified below.

The reason there are four JAVA tickets specified is that the is a single specification commit that resolved the four corresponding DRIVERS tickets. All of these JAVA tickets have to be done together.

The relevant spec changes:
- https://github.com/mongodb/specifications/blame/ba14b6bdc1dc695aa9cc20ccf9378592da1b2329/source/client-backpressure/client-backpressure.md#L52-L80 - it's a subset of [DRIVERS-3239, DRIVERS-3411, DRIVERS-3370, DRIVERS-3412: Client backpressure (mongodb#1907)](mongodb/specifications@1125200)

JAVA-5956, JAVA-6117, JAVA-6113, JAVA-6119
- Deprioritize sharded clusters on any error, all other topologies only on SystemOverloadedError.
- Pass ClusterType to updateCandidate so onAttemptFailure can distinguish topology types.
- Add retryable reads prose tests 3.1 and 3.2.
- Change ServerSelectionSelectionTest to use BaseCluster server selection chain.

JAVA-6105
JAVA-6021
JAVA-6074
---------
Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
Co-authored-by: Ross Lawley <ross.lawley@gmail.com>
- Add enableOverloadRetargeting boolean option to MongoClientSettings and ConnectionString to allow
  the driver to route requests to a different replica set member on retries when the previously
  used server is overloaded
- Add prose test 3.3 to verify that overload errors are retried on the same server when retargeting
  is disabled

JAVA-6167
---------

Co-authored-by: Ross Lawley <ross.lawley@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a dedicated MongoSocksProxyException to represent SOCKS5 proxy connection/handshake failures (with phase + optional RFC 1928 reply code), and updates the SOCKS implementation/tests to throw and assert this new exception type. This supports more precise classification/handling of SOCKS-related connection failures (per JAVA-6194).

Changes:

  • Added MongoSocksProxyException (with HandshakePhase and optional proxy reply code).
  • Updated SocksSocket and SocketStream to throw/propagate MongoSocksProxyException for SOCKS negotiation/auth/connect failures and proxy TCP connect failures.
  • Updated/added tests to validate the new exception type and its phase/reply-code tagging.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
driver-sync/src/test/functional/com/mongodb/client/Socks5ProseTest.java Updates prose test assertions to expect MongoSocksProxyException during SOCKS auth failures.
driver-core/src/test/unit/com/mongodb/internal/connection/SocksSocketTest.java Adds unit tests validating handshake phase tagging and CONNECT reply-code tagging.
driver-core/src/main/com/mongodb/MongoSocksProxyException.java Adds new public exception type carrying SOCKS handshake phase and optional RFC 1928 reply code.
driver-core/src/main/com/mongodb/internal/connection/SocksSocket.java Replaces some SOCKS protocol failures with MongoSocksProxyException and adds helper to build a ServerAddress.
driver-core/src/main/com/mongodb/internal/connection/SocketStream.java Wraps proxy-enabled IO failures as MongoSocksProxyException and rethrows SOCKS exceptions directly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread driver-core/src/test/unit/com/mongodb/internal/connection/SocksSocketTest.java Outdated
Comment thread driver-core/src/test/unit/com/mongodb/internal/connection/SocksSocketTest.java Outdated
@stIncMale stIncMale force-pushed the backpressure branch 2 times, most recently from 2d1b2e1 to 11a7b73 Compare May 8, 2026 20:58
nhachicha added 10 commits May 16, 2026 18:38
- Delegate single-arg constructor to four-arg variant
- Clarify Javadoc that proxyReplyCode may be null
- Remove duplicate blank lines between constructors
MongoSocksProxyException extends RuntimeException and bypassed the
existing catch (SocketException) block, leaking the underlying proxy
TCP socket on every SOCKS5 protocol failure (negotiation/auth/relay).
Defensive: avoids any reverse-DNS risk in error paths if the unresolved
invariant on remoteAddress is ever weakened.
- Close inner socket on failure inside initializeSocketOverSocksProxy
  and initializeSslSocketOverSocksProxy so that failures before this.socket
  is assigned do not leak the underlying file descriptor.
- Hoist translateInterruptedException out of the two orElseThrow branches.
Server thread now blocks on the client closing the connection instead
of guessing how long the SOCKS exchange will take. Removes a CI flake
under load.
Port 1 (tcpmux) is unassigned on most systems but not guaranteed.
Binding then releasing an ephemeral port gives a reliably-closed port.
The previous fix used InputStream.transferTo (Java 9+) and
OutputStream.nullOutputStream (Java 11+), which CI on Java 8 surfaced
as NoSuchMethodError. The test source set is compiled by the Groovy
compiler (because src/test/unit contains both Java and Groovy), which
does not honor options.release.set(8) — Java 11 API calls slipped past
compile-time validation but failed at link time on Java 8.

Replace with a plain read-into-discard-buffer loop. Same semantics
(blocks until client closes), Java 8 source compatible.
ApiAliasAndCompanionSpec discovers every public subtype of MongoException
in the Java driver and asserts each one has a corresponding Scala alias.
MongoSocksProxyException is new in 5.8 and was missing from the wrapper.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Comment thread driver-core/src/main/com/mongodb/MongoSocksProxyException.java
Comment thread driver-core/src/main/com/mongodb/internal/connection/SocketStream.java Outdated
Comment thread driver-core/src/main/com/mongodb/MongoSocksProxyException.java Outdated
Comment thread driver-core/src/main/com/mongodb/MongoSocksProxyException.java Outdated
Comment thread driver-core/src/main/com/mongodb/internal/connection/SocksSocket.java Outdated
…eler

JAVA-6194: SOCKS5 failures during post-TCP phases (negotiation, auth,
CONNECT-relay) are configuration/protocol errors and must not carry
SystemOverloadedError or RetryableError labels. Failures during the
PROXY_TCP_CONNECT phase are plain TCP-level reach failures (proxy host
unreachable / overloaded) and continue to receive the labels like any
other socket-open failure. Removes the corresponding TODO.
The defensive null check on getHandshakePhase() and its rationale comment
predated commit dee79f0, which added notNull("handshakePhase", ...) to
all four MongoSocksProxyException constructors. Combined with the field
being private final, the phase reference is guaranteed non-null for any
instance created through the public API, making the labeler-side check
unreachable in normal use.

Collapse the body of isExcludedSocksPostTcpPhase to the bare phase
comparison; the constructor contract is the authoritative non-null
guarantee. Deserialization-driven null is not defended against here for
the same reason it is not defended against elsewhere in the driver
hierarchy: if it were a real concern, readObject on the exception itself
is the correct enforcement point, not a downstream labeler.

SocksSocketTest (11) and BackpressureErrorLabelerTest (49) both pass.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

nhachicha added 3 commits May 20, 2026 01:27
Backpressure labels (SystemOverloadedError, RetryableError) signal that
the target mongod is overloaded and the driver should back off. The
prior policy excluded post-TCP SOCKS5 phases (NEGOTIATION /
AUTHENTICATION / CONNECT_RELAY) and labeled PROXY_TCP_CONNECT on the
"structurally identical to a socket-open failure" rationale (98483bb,
b3da503). That rule conflated failure mechanism with semantic
attribution: backpressure is about WHICH party is overloaded, not WHAT
kind of failure surfaces.

Re-frame: a SOCKS5 failure is mongod-attributable only when the proxy
acted as a reach-proxy for mongod and reported a transport-level outcome
that mirrors a direct-connection socket-open failure. That is exactly
CONNECT_RELAY with RFC 1928 reply codes 3 (network unreachable),
4 (host unreachable), or 5 (connection refused) — the SOCKS5 analogs of
NoRouteToHostException and ConnectException.

Policy changes:
- PROXY_TCP_CONNECT flips from labeled to NOT labeled — the proxy itself
  is unreachable; mongod was never reached.
- CONNECT_RELAY with codes 3, 4, or 5 flips from NOT labeled to LABELED
  — these mirror direct-connection mongod-overload signals.
- NEGOTIATION, AUTHENTICATION, CONNECT_RELAY with codes 1, 2, 6, 7, 8,
  and CONNECT_RELAY with null replyCode (I/O failure / unrecognised
  reply) remain NOT labeled — proxy-side or ambiguous failures with no
  mongod-side attribution.

Rename isExcludedSocksPostTcpPhase -> isNonMongodAttributableSocksFailure
to reflect the actual semantic and add reply-code-aware logic. Update
MongoSocksProxyException class Javadoc and the labeler method Javadoc
to describe the attribution rule.

Test recalibration:
- socksProxyPostTcpPhaseShouldNotBeLabeled split and renamed:
  socksProxyNonMongodAttributableShouldNotBeLabeled (9 cases) and
  socksProxyMongodAttributableShouldBeLabeled (3 cases for codes 3/4/5).
- Old socksProxyTcpConnectPhaseShouldBeLabeled removed; PROXY_TCP_CONNECT
  is now covered as the first case of the non-attributable parameterised
  test.

This inverts the explicit Round 2 decision documented in 98483bb
and the matching class-level Javadoc updated in b3da503.
When SocketStream.open() wraps an initial-connect IOException as a
MongoSocksProxyException with PROXY_TCP_CONNECT, the message text was
"Exception connecting to SOCKS5 proxy" without identifying which proxy.
Operators reading a log line could not tell whether the ServerAddress
attached to the exception was the proxy or the target mongod.

Append the proxy host:port to the message string so the proxy endpoint
is identifiable from the message alone. Keep the ServerAddress as the
target mongod — SDAM, the connection pool, and retry logic key off the
cluster-member identity, and substituting the proxy endpoint there would
break those consumers (per the discussion on PR mongodb#1968 thread
r3267511517: message-string enrichment accepted, ServerAddress
substitution rejected).
The throw at SocksSocket.checkServerReply used reply.message verbatim
(e.g. "Host is unreachable") which is indistinguishable in raw log
output from a non-proxy ICMP-driven ConnectException carrying the
same text. Prefix the message with "SOCKS5 CONNECT reply:" and append
the RFC 1928 reply code so operators can attribute the failure to the
SOCKS5 protocol layer from the message alone.

Phrasing chosen to remain distinguishable from the sibling IO-failure
path message ("SOCKS5 CONNECT relay failed: ..." at SocksSocket.java
line 130): "reply" signals a parsed proxy response; "relay failed"
signals an IO failure during the CONNECT exchange. Two textual styles
let operators tell the two CONNECT_RELAY sub-cases apart.

reply.replyNumber remains available via getProxyReplyCode() for
programmatic consumers; embedding it in the message is purely a
diagnostic convenience for log readers.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment thread driver-core/src/main/com/mongodb/internal/connection/SocksSocket.java Outdated
nhachicha added 2 commits May 20, 2026 03:30
After 78d6b01 (per-phase IOException wrapping) and 27417eb
(removal of redundant re-throws), the two outer catches in
SocksSocket.connect handle disjoint, well-defined cases — but the
comments hadn't caught up.

- catch (MongoSocksProxyException) — now the destination for every
  SOCKS5 protocol failure, including the RFC 1928 X'FF' "no acceptable
  method" self-close which performNegotiation throws as
  MongoSocksProxyException directly (not as an IOException). Note this
  in the comment.

- catch (IOException) — now only ever fires for IOExceptions from the
  initial proxy TCP connect (timeout.checkedRun above); inner-phase
  IOExceptions are caught and rewrapped by the per-phase wrappers and
  land in the MongoSocksProxyException catch above. The stale comment
  block here referenced the RFC 1928 X'FF' path, which no longer
  reaches this catch. Replace with an accurate description of what
  actually lands here, and the leak-prevention rationale (JDK
  auto-close on connect timeout is implementation-defined).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment thread driver-core/src/test/unit/com/mongodb/internal/connection/SocksSocketTest.java Outdated
nhachicha added 3 commits May 20, 2026 05:19
The helper wrapped the SocksSocket try-with-resources in
catch (Exception ignored) {} followed by a trailing return null, which
silently absorbed any non-IOException, non-MongoSocksProxyException
failure (RuntimeException from SocksSocket internals, close()
failures, etc.) and returned null. assertProxy(null) then failed with
"Expected MongoSocksProxyException but got: null" instead of the
actual stack trace — masking real regressions.

Remove the catch and the trailing return null. Unexpected exceptions
now propagate with the original stack trace; the inner catch still
captures the expected MongoSocksProxyException / IOException as the
helper's return value. Move t.join(5000) into its own try with a
narrow InterruptedException catch that preserves the interrupt flag
without masking a primary exception that may already be unwinding.
Reviewers reading the four constructor overloads can mistake the
ordering for inconsistent because handshakePhase appears at a different
absolute position in each (third in the 3-arg ctor, fourth when cause
or proxyReplyCode is present, etc.). The ordering is actually rule-
driven: parent-class params first (message, address, optional cause),
then SOCKS-specific args (handshakePhase, optional proxyReplyCode).
Adding the rule to the class-level Javadoc closes the documentation
gap without restructuring the API — keeping parent-class symmetry on
the super(...) calls and avoiding a one-off divergence from the rest
of the MongoSocket*Exception hierarchy.
@nhachicha nhachicha requested a review from Copilot May 20, 2026 04:55
@nhachicha nhachicha marked this pull request as ready for review May 20, 2026 04:57
@nhachicha nhachicha requested a review from a team as a code owner May 20, 2026 04:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

try {
authenticate(authenticationMethod, timeout);
} catch (IOException e) {
throw new MongoSocksProxyException("SOCKS5 authentication failed: " + e.getMessage(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need SOCK5 prefix in the messages given that the exception class is a "SOCKS5" exception?

Additionally, we have message naming inconsistency currently- there are call sites with a different prefix for authentication:

 throw new MongoSocksProxyException(
                        "Authentication failed. Proxy server returned status: " + authStatus,
                        targetServerAddress());

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in
a838766

return false;
}
MongoSocksProxyException socksException = (MongoSocksProxyException) t;
if (socksException.getHandshakePhase() != MongoSocksProxyException.HandshakePhase.CONNECT_RELAY) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think HandshakePhase may be adding API surface mainly to support internal backpressure-labeling. For this PR, the user-visible value of exposing a handshake phase seems unclear.

The internal logic in labeler only needs to distinguish whether the failure came from a CONNECT relay response with a SOCKS reply code that should be inspected. The same gate can be achieved with getProxyReplyCode() == null (not a CONNECT relay error) vs non-null (has a reply code to inspect).

Should we simplify that and remove HandshakePhase?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking it further, do we need getProxyReplyCode() either? The exception message already contains the ServerReply message text, for example "Network is unreachable", "Host is unreachable", or "Connection has been refused". Since BackpressureErrorLabeler is in the same package as SocksSocket.ServerReply, it could compare against ServerReply.getMessage() directly, without adding public API surface beyond the exception class itself.

 private static boolean isSocksFailure(final MongoSocketException t) {
      if (!(t instanceof MongoSocksProxyException)) {
          return false;
      }
      String message = t.getMessage();
      return !SocksSocket.ServerReply.NET_UNREACHABLE.getMessage().equals(message)
              && !SocksSocket.ServerReply.HOST_UNREACHABLE.getMessage().equals(message)
              && !SocksSocket.ServerReply.CONN_REFUSED.getMessage().equals(message);
  }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, I removed the enum in 9efad2b However I believe we need to keep the reply code to avoid brittle string based checks

Comment on lines +186 to +191
try {
socksProxy.connect(toSocketAddress(address.getHost(), address.getPort()),
operationContext.getTimeoutContext().getConnectTimeoutMs());
} catch (IOException e) {
throw wrapAsProxyTcpConnect(e);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make SocksSocket.connect() throw only MongoSocksProxyException by contract?

    @Override
    public void connect(final SocketAddress endpoint, final int connectTimeoutMs) // no throws IOException

If it catches IOException internally and always wraps it as MongoSocksProxyException, callers would not need to branch on exception types.

Currently, SocketStream has two places that catch IOException from SocksSocket.connect() and wrap it as MongoSocksProxyException:

  • initializeSslSocketOverSocksProxy - lines 140–142
  • initializeSocketOverSocksProxy - lines 188–191

With that contract, both catch blocks could go away, along with the wrapAsProxyTcpConnect helper.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can:
1- timeout.checkedRun(...) inside can throw IOException this is not caught and wrapped in a MongoSocksProxyException
2- the connect method is an override and cannot change the exception type from the parent MongoSocksProxyException is unchecked, IOException is checked

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed IRL, we wrap it under MongoSocksProxyException for simplification

Comment thread driver-core/src/main/com/mongodb/internal/connection/SocksSocket.java Outdated
Comment thread driver-core/src/main/com/mongodb/internal/connection/SocketStream.java Outdated
Comment thread driver-core/src/test/unit/com/mongodb/internal/connection/SocksSocketTest.java Outdated
@nhachicha nhachicha requested a review from vbabanin May 21, 2026 14:58
Comment on lines +145 to +151
} catch (IOException | RuntimeException e) {
try {
toClose.close();
} catch (IOException closeException) {
e.addSuppressed(closeException);
}
throw e;
Copy link
Copy Markdown
Member

@vbabanin vbabanin May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants