Make OpenTelemetry tracing the single default middleware#2995
Conversation
The SDK shipped two server-side OpenTelemetry span mechanisms: a dispatch-tier `otel_middleware` wired on by default (span `MCP handle <method>`) and a context-tier `OpenTelemetryMiddleware` the docs told users to append (span `<method>`, plus GenAI attributes). Appending the documented one produced a second, nested span, and the "you must append it" instruction was wrong since a span was already emitted by default. Collapse to one tier: remove `otel_middleware` (and its `__all__` export), default `dispatch_middleware` to empty, and seed `Server.middleware` with `OpenTelemetryMiddleware()` so tracing is on by default through the richer context-tier span. Drop it from the list to opt out. Docs get a dedicated OpenTelemetry page; the middleware page now points to it.
The context-tier OpenTelemetryMiddleware closes its span when the middleware chain returns, but result dumping and spec-result serialization ran after that in _on_request. A handler returning an unsupported type or a malformed dict for a spec method surfaced as a JSON-RPC error to the client while the span finished UNSET with no error attributes, so default error-rate dashboards missed the failure. Move dumping and serialization into the chain (a new _serialize helper called from _inner) so the span observes these failures. Serializing inside the chain also lets a middleware short-circuit own its result shape directly, replacing the post-chain KeyError catch. Also point the SEP-414 doc link at its spec-repo PR rather than the repository root.
There was a problem hiding this comment.
My earlier findings have been addressed (result serialization now runs inside the traced chain with a regression test, and the SEP-414 link now points at the spec PR), and I found no new issues — but since this changes the default tracing behavior and refactors the request path in ServerRunner, it deserves a human look at the design.
Extended reasoning...
Overview
The PR collapses the two server-side OpenTelemetry mechanisms into one: the dispatch-tier otel_middleware is removed (dispatch_middleware defaults to ()), and OpenTelemetryMiddleware() is seeded into Server.middleware by default. To keep post-handler failures observable on the span, result dumping and spec-result serialization were moved from _on_request into the middleware chain via a new ServerRunner._serialize helper. Docs gain a dedicated OpenTelemetry page with a tested example, and the test suites were rewritten around the default-on middleware.
Prior feedback
My earlier review flagged that post-chain failures (_dump_result / serialize_server_result) would close the span as a success while the client saw an error, and that the SEP-414 link pointed at the spec repo root. The current revision addresses both: serialization runs inside _inner (covered by test_records_error_status_on_malformed_spec_result), and the link now targets modelcontextprotocol/pull/414.
Security risks
No auth, crypto, or permission code is touched. The main consideration is that every server now emits spans by default; with no exporter installed these are no-ops, and trace-context extraction from _meta already existed, so I see no new exposure.
Level of scrutiny
This is a behavior-changing refactor of the core request path (ServerRunner._on_request) plus a default-behavior decision (tracing seeded into every Server.middleware list, mutable per-instance opt-out). That is a design call — span naming, default-on telemetry, and the placement of serialization inside the middleware chain — that a maintainer should sign off on rather than a bot.
Other factors
Test coverage is thorough: the OTel tests now exercise the default configuration, the docs example is tested, and the runner tests retain coverage of the error paths. The PR description reports full test/lint/docs verification.
The SDK shipped two server-side OpenTelemetry span mechanisms:
otel_middleware, wired on by default (dispatch_middleware=(otel_middleware,)), spanMCP handle <method>;OpenTelemetryMiddlewarethe docs told you toappend, span<method>, plus the GenAI semantic-convention attributes.So a server already emitted a span out of the box, and following the docs (
server.middleware.append(OpenTelemetryMiddleware())) added a second, nested span. The "you must append it for tracing" instruction was simply wrong.This collapses the two into one tier:
otel_middleware(and its__all__export);dispatch_middlewarenow defaults to()but stays a usable extension point;Server.middlewarewithOpenTelemetryMiddleware()inServer.__init__, so bothServerandMCPServertrace on by default through the richer context-tier span (mcp.protocol.version+ GenAI attrs). Drop it from the list to opt out.Docs: a dedicated
docs/advanced/opentelemetry.mdpage replaces the misleading section on the middleware page (which now just points to it), plus a tested example.No
migration.mdentry: v1 ships no OpenTelemetry, so for a v1->v2 migrant this is a new feature, not a break;MCP handle/otel_middlewareonly ever lived on v2main.Verification
./scripts/test: all pass, 100% coverage,strict-no-coverclean.mkdocs build --strictall pass.AI Disclaimer
This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.