Anthropic 1h cache TTL: expand model list, add message-breakpoint sub-toggle (#316679)

* Expand extended cache TTL model list + add message-breakpoint sub-toggle

Two related changes to Anthropic Messages API prompt caching:

1. Expand modelSupportsExtendedCacheTtl beyond just the 1M context
   variants. Per Anthropic docs, the 1h cache TTL is available on all
   active models; this widens our opt-in list to Claude Opus 4.5/4.6/4.7
   and Sonnet 4.5/4.6 (all variants, not just -1m).

2. Add a new experiment-based setting
   chat.anthropic.promptCaching.extendedTtlMessages as a strict
   sub-toggle of the existing extendedTtl setting. When both are on, the
   rolling message-level breakpoints (last cacheable user / tool-result
   blocks set by addMessagesApiCacheControl) also use the 1h TTL
   instead of the default 5m. Nested rather than orthogonal because
   Anthropic requires longer-TTL breakpoints to appear before shorter
   ones in the tools->system->messages prefix order.

Tests: 69 passed. Added a suite for isExtendedCacheTtlMessagesEnabled
(parent on/off x sub on/off matrix + inherited model/location/subagent
gates) and two tests verifying addMessagesApiCacheControl propagates
the new cacheTtl argument.

* Slim down extended cache TTL tests

- Trim isExtendedCacheTtlEnabled model-list test to just verify the
  delegation (full boundaries are covered by modelSupportsExtendedCacheTtl).
- Remove redundant 'inherits gates from parent' test in
  isExtendedCacheTtlMessagesEnabled suite — the parent×sub matrix plus
  the parent's own gate tests already cover this.
- Merge the two addMessagesApiCacheControl ttl tests into one
  parameterized assertion.

* Update stale comment about message breakpoint TTL

The comment claimed message breakpoints 'always use the default 5m TTL',
but that's no longer true when the new extendedTtlMessages sub-toggle is on.

* Refactor extended cache TTL: pass parentEnabled, drop misleading coercions

- isExtendedCacheTtlMessagesEnabled now takes parentEnabled: boolean
  instead of re-running the parent gate. Call site passes the resolved
  useExtendedCacheTtl directly, eliminating a duplicate experiment-service
  lookup per request. Makes the 'sub-toggle of' relationship literal in
  the signature.
- Drop the !! coercion on getExperimentBasedConfig<boolean> returns —
  the generic guarantees T, so the coercion was misleading defensive
  noise.
- Narrow cacheTtl param from '5m' | '1h' to just '1h' on both
  addToolsAndSystemCacheControl and addMessagesApiCacheControl. Per
  Anthropic docs, { type: 'ephemeral' } already defaults to 5m, so '5m'
  is never actually emitted on the wire and call sites never passed it.
- Stronger composition test for isExtendedCacheTtlEnabled — replaces
  four single-axis tests with one table-driven matrix that exercises
  all four gates simultaneously, catching short-circuit refactors.
- Table-driven 2x2 matrix for isExtendedCacheTtlMessagesEnabled.
- Trim user-facing extendedTtlMessages setting description; team-only
  rationale (rolling breakpoints, 2x write premium) lives in the JSDoc.
- Update stale comment claiming message breakpoints always use 5m.

* Remove outdated comments about extended cache TTL models
Bhavya U committed 1mo ago
0bd2387d0e948846941d82641fc16bb5f08a6690
Parent: 7e3f0c1
Committed by GitHub <noreply@github.com> on 5/15/2026, 9:57:03 PM