JAVA-6194 Add MongoSocksProxyException for CMAP backpressure labeling#1968
JAVA-6194 Add MongoSocksProxyException for CMAP backpressure labeling#1968nhachicha wants to merge 45 commits into
Conversation
…ABEL` (mongodb#1926) This commit only adds the labels, and does not fully implement the tickets specified below. The reason there are four JAVA tickets specified is that the is a single specification commit that resolved the four corresponding DRIVERS tickets. All of these JAVA tickets have to be done together. The relevant spec changes: - https://github.com/mongodb/specifications/blame/ba14b6bdc1dc695aa9cc20ccf9378592da1b2329/source/client-backpressure/client-backpressure.md#L52-L80 - it's a subset of [DRIVERS-3239, DRIVERS-3411, DRIVERS-3370, DRIVERS-3412: Client backpressure (mongodb#1907)](mongodb/specifications@1125200) JAVA-5956, JAVA-6117, JAVA-6113, JAVA-6119
…b#1929) The relevant spec changes: - https://github.com/mongodb/specifications/blame/ba14b6bdc1dc695aa9cc20ccf9378592da1b2329/source/retryable-writes/tests/README.md#L265-L418 - See also https://jira.mongodb.org/browse/DRIVERS-3432 for the phrasing fixes for "Test 3 Case 3" JAVA-6055
JAVA-6141
- Deprioritize sharded clusters on any error, all other topologies only on SystemOverloadedError. - Pass ClusterType to updateCandidate so onAttemptFailure can distinguish topology types. - Add retryable reads prose tests 3.1 and 3.2. - Change ServerSelectionSelectionTest to use BaseCluster server selection chain. JAVA-6105 JAVA-6021 JAVA-6074 --------- Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com> Co-authored-by: Ross Lawley <ross.lawley@gmail.com>
The relevant spec changes: - https://github.com/mongodb/specifications/blob/8a8a7c56429c80b51ec62268dcafc5e5e3c477ef/source/client-backpressure/tests/README.md JAVA-5956, JAVA-6117, JAVA-6113, JAVA-6119, JAVA-6141
- Add enableOverloadRetargeting boolean option to MongoClientSettings and ConnectionString to allow the driver to route requests to a different replica set member on retries when the previously used server is overloaded - Add prose test 3.3 to verify that overload errors are retried on the same server when retargeting is disabled JAVA-6167 --------- Co-authored-by: Ross Lawley <ross.lawley@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a dedicated MongoSocksProxyException to represent SOCKS5 proxy connection/handshake failures (with phase + optional RFC 1928 reply code), and updates the SOCKS implementation/tests to throw and assert this new exception type. This supports more precise classification/handling of SOCKS-related connection failures (per JAVA-6194).
Changes:
- Added
MongoSocksProxyException(withHandshakePhaseand optional proxy reply code). - Updated
SocksSocketandSocketStreamto throw/propagateMongoSocksProxyExceptionfor SOCKS negotiation/auth/connect failures and proxy TCP connect failures. - Updated/added tests to validate the new exception type and its phase/reply-code tagging.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| driver-sync/src/test/functional/com/mongodb/client/Socks5ProseTest.java | Updates prose test assertions to expect MongoSocksProxyException during SOCKS auth failures. |
| driver-core/src/test/unit/com/mongodb/internal/connection/SocksSocketTest.java | Adds unit tests validating handshake phase tagging and CONNECT reply-code tagging. |
| driver-core/src/main/com/mongodb/MongoSocksProxyException.java | Adds new public exception type carrying SOCKS handshake phase and optional RFC 1928 reply code. |
| driver-core/src/main/com/mongodb/internal/connection/SocksSocket.java | Replaces some SOCKS protocol failures with MongoSocksProxyException and adds helper to build a ServerAddress. |
| driver-core/src/main/com/mongodb/internal/connection/SocketStream.java | Wraps proxy-enabled IO failures as MongoSocksProxyException and rethrows SOCKS exceptions directly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2d1b2e1 to
11a7b73
Compare
…re/socks5_exception
- Delegate single-arg constructor to four-arg variant - Clarify Javadoc that proxyReplyCode may be null - Remove duplicate blank lines between constructors
MongoSocksProxyException extends RuntimeException and bypassed the existing catch (SocketException) block, leaking the underlying proxy TCP socket on every SOCKS5 protocol failure (negotiation/auth/relay).
Defensive: avoids any reverse-DNS risk in error paths if the unresolved invariant on remoteAddress is ever weakened.
- Close inner socket on failure inside initializeSocketOverSocksProxy and initializeSslSocketOverSocksProxy so that failures before this.socket is assigned do not leak the underlying file descriptor. - Hoist translateInterruptedException out of the two orElseThrow branches.
Server thread now blocks on the client closing the connection instead of guessing how long the SOCKS exchange will take. Removes a CI flake under load.
Port 1 (tcpmux) is unassigned on most systems but not guaranteed. Binding then releasing an ephemeral port gives a reliably-closed port.
The previous fix used InputStream.transferTo (Java 9+) and OutputStream.nullOutputStream (Java 11+), which CI on Java 8 surfaced as NoSuchMethodError. The test source set is compiled by the Groovy compiler (because src/test/unit contains both Java and Groovy), which does not honor options.release.set(8) — Java 11 API calls slipped past compile-time validation but failed at link time on Java 8. Replace with a plain read-into-discard-buffer loop. Same semantics (blocks until client closes), Java 8 source compatible.
ApiAliasAndCompanionSpec discovers every public subtype of MongoException in the Java driver and asserts each one has a corresponding Scala alias. MongoSocksProxyException is new in 5.8 and was missing from the wrapper.
…eler JAVA-6194: SOCKS5 failures during post-TCP phases (negotiation, auth, CONNECT-relay) are configuration/protocol errors and must not carry SystemOverloadedError or RetryableError labels. Failures during the PROXY_TCP_CONNECT phase are plain TCP-level reach failures (proxy host unreachable / overloaded) and continue to receive the labels like any other socket-open failure. Removes the corresponding TODO.
The defensive null check on getHandshakePhase() and its rationale comment predated commit dee79f0, which added notNull("handshakePhase", ...) to all four MongoSocksProxyException constructors. Combined with the field being private final, the phase reference is guaranteed non-null for any instance created through the public API, making the labeler-side check unreachable in normal use. Collapse the body of isExcludedSocksPostTcpPhase to the bare phase comparison; the constructor contract is the authoritative non-null guarantee. Deserialization-driven null is not defended against here for the same reason it is not defended against elsewhere in the driver hierarchy: if it were a real concern, readObject on the exception itself is the correct enforcement point, not a downstream labeler. SocksSocketTest (11) and BackpressureErrorLabelerTest (49) both pass.
Backpressure labels (SystemOverloadedError, RetryableError) signal that the target mongod is overloaded and the driver should back off. The prior policy excluded post-TCP SOCKS5 phases (NEGOTIATION / AUTHENTICATION / CONNECT_RELAY) and labeled PROXY_TCP_CONNECT on the "structurally identical to a socket-open failure" rationale (98483bb, b3da503). That rule conflated failure mechanism with semantic attribution: backpressure is about WHICH party is overloaded, not WHAT kind of failure surfaces. Re-frame: a SOCKS5 failure is mongod-attributable only when the proxy acted as a reach-proxy for mongod and reported a transport-level outcome that mirrors a direct-connection socket-open failure. That is exactly CONNECT_RELAY with RFC 1928 reply codes 3 (network unreachable), 4 (host unreachable), or 5 (connection refused) — the SOCKS5 analogs of NoRouteToHostException and ConnectException. Policy changes: - PROXY_TCP_CONNECT flips from labeled to NOT labeled — the proxy itself is unreachable; mongod was never reached. - CONNECT_RELAY with codes 3, 4, or 5 flips from NOT labeled to LABELED — these mirror direct-connection mongod-overload signals. - NEGOTIATION, AUTHENTICATION, CONNECT_RELAY with codes 1, 2, 6, 7, 8, and CONNECT_RELAY with null replyCode (I/O failure / unrecognised reply) remain NOT labeled — proxy-side or ambiguous failures with no mongod-side attribution. Rename isExcludedSocksPostTcpPhase -> isNonMongodAttributableSocksFailure to reflect the actual semantic and add reply-code-aware logic. Update MongoSocksProxyException class Javadoc and the labeler method Javadoc to describe the attribution rule. Test recalibration: - socksProxyPostTcpPhaseShouldNotBeLabeled split and renamed: socksProxyNonMongodAttributableShouldNotBeLabeled (9 cases) and socksProxyMongodAttributableShouldBeLabeled (3 cases for codes 3/4/5). - Old socksProxyTcpConnectPhaseShouldBeLabeled removed; PROXY_TCP_CONNECT is now covered as the first case of the non-attributable parameterised test. This inverts the explicit Round 2 decision documented in 98483bb and the matching class-level Javadoc updated in b3da503.
When SocketStream.open() wraps an initial-connect IOException as a MongoSocksProxyException with PROXY_TCP_CONNECT, the message text was "Exception connecting to SOCKS5 proxy" without identifying which proxy. Operators reading a log line could not tell whether the ServerAddress attached to the exception was the proxy or the target mongod. Append the proxy host:port to the message string so the proxy endpoint is identifiable from the message alone. Keep the ServerAddress as the target mongod — SDAM, the connection pool, and retry logic key off the cluster-member identity, and substituting the proxy endpoint there would break those consumers (per the discussion on PR mongodb#1968 thread r3267511517: message-string enrichment accepted, ServerAddress substitution rejected).
The throw at SocksSocket.checkServerReply used reply.message verbatim
(e.g. "Host is unreachable") which is indistinguishable in raw log
output from a non-proxy ICMP-driven ConnectException carrying the
same text. Prefix the message with "SOCKS5 CONNECT reply:" and append
the RFC 1928 reply code so operators can attribute the failure to the
SOCKS5 protocol layer from the message alone.
Phrasing chosen to remain distinguishable from the sibling IO-failure
path message ("SOCKS5 CONNECT relay failed: ..." at SocksSocket.java
line 130): "reply" signals a parsed proxy response; "relay failed"
signals an IO failure during the CONNECT exchange. Two textual styles
let operators tell the two CONNECT_RELAY sub-cases apart.
reply.replyNumber remains available via getProxyReplyCode() for
programmatic consumers; embedding it in the message is purely a
diagnostic convenience for log readers.
After 78d6b01 (per-phase IOException wrapping) and 27417eb (removal of redundant re-throws), the two outer catches in SocksSocket.connect handle disjoint, well-defined cases — but the comments hadn't caught up. - catch (MongoSocksProxyException) — now the destination for every SOCKS5 protocol failure, including the RFC 1928 X'FF' "no acceptable method" self-close which performNegotiation throws as MongoSocksProxyException directly (not as an IOException). Note this in the comment. - catch (IOException) — now only ever fires for IOExceptions from the initial proxy TCP connect (timeout.checkedRun above); inner-phase IOExceptions are caught and rewrapped by the per-phase wrappers and land in the MongoSocksProxyException catch above. The stale comment block here referenced the RFC 1928 X'FF' path, which no longer reaches this catch. Replace with an accurate description of what actually lands here, and the leak-prevention rationale (JDK auto-close on connect timeout is implementation-defined).
The helper wrapped the SocksSocket try-with-resources in
catch (Exception ignored) {} followed by a trailing return null, which
silently absorbed any non-IOException, non-MongoSocksProxyException
failure (RuntimeException from SocksSocket internals, close()
failures, etc.) and returned null. assertProxy(null) then failed with
"Expected MongoSocksProxyException but got: null" instead of the
actual stack trace — masking real regressions.
Remove the catch and the trailing return null. Unexpected exceptions
now propagate with the original stack trace; the inner catch still
captures the expected MongoSocksProxyException / IOException as the
helper's return value. Move t.join(5000) into its own try with a
narrow InterruptedException catch that preserves the interrupt flag
without masking a primary exception that may already be unwinding.
Reviewers reading the four constructor overloads can mistake the ordering for inconsistent because handshakePhase appears at a different absolute position in each (third in the 3-arg ctor, fourth when cause or proxyReplyCode is present, etc.). The ordering is actually rule- driven: parent-class params first (message, address, optional cause), then SOCKS-specific args (handshakePhase, optional proxyReplyCode). Adding the rule to the class-level Javadoc closes the documentation gap without restructuring the API — keeping parent-class symmetry on the super(...) calls and avoiding a one-off divergence from the rest of the MongoSocket*Exception hierarchy.
| try { | ||
| authenticate(authenticationMethod, timeout); | ||
| } catch (IOException e) { | ||
| throw new MongoSocksProxyException("SOCKS5 authentication failed: " + e.getMessage(), |
There was a problem hiding this comment.
Do we need SOCK5 prefix in the messages given that the exception class is a "SOCKS5" exception?
Additionally, we have message naming inconsistency currently- there are call sites with a different prefix for authentication:
throw new MongoSocksProxyException(
"Authentication failed. Proxy server returned status: " + authStatus,
targetServerAddress());
| return false; | ||
| } | ||
| MongoSocksProxyException socksException = (MongoSocksProxyException) t; | ||
| if (socksException.getHandshakePhase() != MongoSocksProxyException.HandshakePhase.CONNECT_RELAY) { |
There was a problem hiding this comment.
I think HandshakePhase may be adding API surface mainly to support internal backpressure-labeling. For this PR, the user-visible value of exposing a handshake phase seems unclear.
The internal logic in labeler only needs to distinguish whether the failure came from a CONNECT relay response with a SOCKS reply code that should be inspected. The same gate can be achieved with getProxyReplyCode() == null (not a CONNECT relay error) vs non-null (has a reply code to inspect).
Should we simplify that and remove HandshakePhase?
There was a problem hiding this comment.
Taking it further, do we need getProxyReplyCode() either? The exception message already contains the ServerReply message text, for example "Network is unreachable", "Host is unreachable", or "Connection has been refused". Since BackpressureErrorLabeler is in the same package as SocksSocket.ServerReply, it could compare against ServerReply.getMessage() directly, without adding public API surface beyond the exception class itself.
private static boolean isSocksFailure(final MongoSocketException t) {
if (!(t instanceof MongoSocksProxyException)) {
return false;
}
String message = t.getMessage();
return !SocksSocket.ServerReply.NET_UNREACHABLE.getMessage().equals(message)
&& !SocksSocket.ServerReply.HOST_UNREACHABLE.getMessage().equals(message)
&& !SocksSocket.ServerReply.CONN_REFUSED.getMessage().equals(message);
}
There was a problem hiding this comment.
This is a good point, I removed the enum in 9efad2b However I believe we need to keep the reply code to avoid brittle string based checks
| try { | ||
| socksProxy.connect(toSocketAddress(address.getHost(), address.getPort()), | ||
| operationContext.getTimeoutContext().getConnectTimeoutMs()); | ||
| } catch (IOException e) { | ||
| throw wrapAsProxyTcpConnect(e); | ||
| } |
There was a problem hiding this comment.
Could we make SocksSocket.connect() throw only MongoSocksProxyException by contract?
@Override
public void connect(final SocketAddress endpoint, final int connectTimeoutMs) // no throws IOException
If it catches IOException internally and always wraps it as MongoSocksProxyException, callers would not need to branch on exception types.
Currently, SocketStream has two places that catch IOException from SocksSocket.connect() and wrap it as MongoSocksProxyException:
initializeSslSocketOverSocksProxy- lines 140–142initializeSocketOverSocksProxy- lines 188–191
With that contract, both catch blocks could go away, along with the wrapAsProxyTcpConnect helper.
There was a problem hiding this comment.
I don't think we can:
1- timeout.checkedRun(...) inside can throw IOException this is not caught and wrapped in a MongoSocksProxyException
2- the connect method is an override and cannot change the exception type from the parent MongoSocksProxyException is unchecked, IOException is checked
There was a problem hiding this comment.
Discussed IRL, we wrap it under MongoSocksProxyException for simplification
…in SocksSocket.connect
| } catch (IOException | RuntimeException e) { | ||
| try { | ||
| toClose.close(); | ||
| } catch (IOException closeException) { | ||
| e.addSuppressed(closeException); | ||
| } | ||
| throw e; |
JAVA-6194