Skip to content

Make OpenTelemetry tracing the single default middleware#2995

Merged
Kludex merged 2 commits into
mainfrom
otel-default-middleware
Jun 26, 2026
Merged

Make OpenTelemetry tracing the single default middleware#2995
Kludex merged 2 commits into
mainfrom
otel-default-middleware

Conversation

@Kludex

@Kludex Kludex commented Jun 26, 2026

Copy link
Copy Markdown
Member

The SDK shipped two server-side OpenTelemetry span mechanisms:

  • a dispatch-tier otel_middleware, wired on by default (dispatch_middleware=(otel_middleware,)), span MCP handle <method>;
  • a context-tier OpenTelemetryMiddleware the docs told you to append, span <method>, plus the GenAI semantic-convention attributes.

So a server already emitted a span out of the box, and following the docs (server.middleware.append(OpenTelemetryMiddleware())) added a second, nested span. The "you must append it for tracing" instruction was simply wrong.

This collapses the two into one tier:

  • remove otel_middleware (and its __all__ export); dispatch_middleware now defaults to () but stays a usable extension point;
  • seed Server.middleware with OpenTelemetryMiddleware() in Server.__init__, so both Server and MCPServer trace on by default through the richer context-tier span (mcp.protocol.version + GenAI attrs). Drop it from the list to opt out.
  • it remains a no-op until an OpenTelemetry exporter is installed.

Docs: a dedicated docs/advanced/opentelemetry.md page replaces the misleading section on the middleware page (which now just points to it), plus a tested example.

No migration.md entry: v1 ships no OpenTelemetry, so for a v1->v2 migrant this is a new feature, not a break; MCP handle / otel_middleware only ever lived on v2 main.

Verification

  • ./scripts/test: all pass, 100% coverage, strict-no-cover clean.
  • ruff, pyright, markdownlint, mkdocs build --strict all pass.

AI Disclaimer

This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.

The SDK shipped two server-side OpenTelemetry span mechanisms: a dispatch-tier
`otel_middleware` wired on by default (span `MCP handle <method>`) and a
context-tier `OpenTelemetryMiddleware` the docs told users to append (span
`<method>`, plus GenAI attributes). Appending the documented one produced a
second, nested span, and the "you must append it" instruction was wrong since a
span was already emitted by default.

Collapse to one tier: remove `otel_middleware` (and its `__all__` export), default
`dispatch_middleware` to empty, and seed `Server.middleware` with
`OpenTelemetryMiddleware()` so tracing is on by default through the richer
context-tier span. Drop it from the list to opt out.

Docs get a dedicated OpenTelemetry page; the middleware page now points to it.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 12 files

Re-trigger cubic

Comment thread src/mcp/server/runner.py
Comment thread docs/advanced/opentelemetry.md
The context-tier OpenTelemetryMiddleware closes its span when the
middleware chain returns, but result dumping and spec-result
serialization ran after that in _on_request. A handler returning an
unsupported type or a malformed dict for a spec method surfaced as a
JSON-RPC error to the client while the span finished UNSET with no
error attributes, so default error-rate dashboards missed the failure.

Move dumping and serialization into the chain (a new _serialize helper
called from _inner) so the span observes these failures. Serializing
inside the chain also lets a middleware short-circuit own its result
shape directly, replacing the post-chain KeyError catch.

Also point the SEP-414 doc link at its spec-repo PR rather than the
repository root.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My earlier findings have been addressed (result serialization now runs inside the traced chain with a regression test, and the SEP-414 link now points at the spec PR), and I found no new issues — but since this changes the default tracing behavior and refactors the request path in ServerRunner, it deserves a human look at the design.

Extended reasoning...

Overview

The PR collapses the two server-side OpenTelemetry mechanisms into one: the dispatch-tier otel_middleware is removed (dispatch_middleware defaults to ()), and OpenTelemetryMiddleware() is seeded into Server.middleware by default. To keep post-handler failures observable on the span, result dumping and spec-result serialization were moved from _on_request into the middleware chain via a new ServerRunner._serialize helper. Docs gain a dedicated OpenTelemetry page with a tested example, and the test suites were rewritten around the default-on middleware.

Prior feedback

My earlier review flagged that post-chain failures (_dump_result / serialize_server_result) would close the span as a success while the client saw an error, and that the SEP-414 link pointed at the spec repo root. The current revision addresses both: serialization runs inside _inner (covered by test_records_error_status_on_malformed_spec_result), and the link now targets modelcontextprotocol/pull/414.

Security risks

No auth, crypto, or permission code is touched. The main consideration is that every server now emits spans by default; with no exporter installed these are no-ops, and trace-context extraction from _meta already existed, so I see no new exposure.

Level of scrutiny

This is a behavior-changing refactor of the core request path (ServerRunner._on_request) plus a default-behavior decision (tracing seeded into every Server.middleware list, mutable per-instance opt-out). That is a design call — span naming, default-on telemetry, and the placement of serialization inside the middleware chain — that a maintainer should sign off on rather than a bot.

Other factors

Test coverage is thorough: the OTel tests now exercise the default configuration, the docs example is tested, and the runner tests retain coverage of the error paths. The PR description reports full test/lint/docs verification.

@Kludex Kludex merged commit b31d95a into main Jun 26, 2026
39 checks passed
@Kludex Kludex deleted the otel-default-middleware branch June 26, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants