Skip to content

fix(anthropic): surface cache_read/write input tokens in metadata chunk#2302

Open
jehoon-shin-mathflat wants to merge 1 commit into
strands-agents:mainfrom
jehoon-shin-mathflat:feat/anthropic-cache-token-usage
Open

fix(anthropic): surface cache_read/write input tokens in metadata chunk#2302
jehoon-shin-mathflat wants to merge 1 commit into
strands-agents:mainfrom
jehoon-shin-mathflat:feat/anthropic-cache-token-usage

Conversation

@jehoon-shin-mathflat

Copy link
Copy Markdown

Description of changes

Anthropic returns input_tokens as the non-cached portion only when prompt caching is in use. The metadata chunk formatter in models/anthropic.py reads input_tokens / output_tokens from the API response but drops the cache_read_input_tokens and cache_creation_input_tokens fields. As a result, anything that derives cost from Agent.metrics.accumulated_usage under-reports real usage — sometimes by an order of magnitude on image+text workloads where the cached prefix carries most of the prompt.

Repro

A vision classifier I run sends [{"text": ...}, {"cachePoint": {}}, {"image": ...}] so that the text+image prefix is cached. After warmup, the Anthropic API returns roughly:

{
  "input_tokens": 3,
  "cache_read_input_tokens": 1500,
  "cache_creation_input_tokens": 0,
  "output_tokens": 120
}

Strands currently surfaces only inputTokens=3, so accumulated cost looks ~10× lower than what Anthropic actually bills. The Usage TypedDict in types/event_loop.py already declares the cache fields as optional members — they just weren't being populated by the Anthropic adapter.

Change

models/anthropic.py — in the metadata case of format_chunk:

  • Pull cache_read_input_tokens and cache_creation_input_tokens from the usage dict (defaulting to 0 when absent).
  • Emit them as cacheReadInputTokens / cacheWriteInputTokens on the usage chunk, matching the existing field names already defined in types.event_loop.Usage.
  • Recompute totalTokens as uncached_input + cache_read + cache_write + output_tokens so it reflects the full billed input.
  • Omit the cache fields when both are zero so the chunk shape for non-cached responses is unchanged (no consumer needs to update).

Tests

tests/strands/models/test_anthropic.py:

  • test_format_chunk_metadata_with_cache_tokens — both cache fields present.
  • test_format_chunk_metadata_omits_zero_cache_tokens — fields absent/zero, legacy shape preserved.

All 59 tests in tests/strands/models/test_anthropic.py pass.

Related Issues

n/a — surfaced while debugging cost accounting in a downstream POC.

Documentation PR

n/a — no public API surface change, only adds optional fields that Usage already declares.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Testing

pytest tests/strands/models/test_anthropic.py
# 59 passed

Checklist

  • I have read the CONTRIBUTING document
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have added/updated necessary documentation (if applicable)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Anthropic returns `input_tokens` as the NON-CACHED portion only when prompt
caching is in use. `cache_read_input_tokens` and `cache_creation_input_tokens`
were dropped during metadata chunk formatting, so downstream consumers
(`Agent.metrics.accumulated_usage` and anything that derives cost from it)
saw only the uncached delta and under-reported real usage / cost — sometimes
by an order of magnitude on image+text workloads where the image dominates
the cached prefix.

This change:
- Maps `cache_read_input_tokens` → `cacheReadInputTokens` and
  `cache_creation_input_tokens` → `cacheWriteInputTokens` on the metadata
  chunk, both already defined as optional members of `types.event_loop.Usage`.
- Recomputes `totalTokens` as `uncached_input + cache_read + cache_write +
  output_tokens` so it reflects the actual billed input.
- Omits the cache fields when both are zero/absent, preserving the existing
  chunk shape for non-cached responses (no consumer change required).

Added tests covering the cached and non-cached metadata shapes.
@yonib05 yonib05 added area-model Related to models or model providers area-otel Open-telemetry related bug Something isn't working python Pull requests that update python code labels May 27, 2026
@yonib05 yonib05 removed the area-otel Open-telemetry related label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-model Related to models or model providers bug Something isn't working python Pull requests that update python code size/s

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants