SIGN IN SIGN UP

fix(gradient): detect free-write-cache providers and compress eagerly (#487)

## Summary

Fixes unbounded context growth for models with free cache writes (e.g.
MiniMax passive caching) or no caching. The gradient system's
`shouldCompress()` economic gate incorrectly rejected compression
because it used fallback Anthropic-like pricing ($3.75/M write cost) for
unknown models — but the actual write cost is $0.

**Confirmed via Sentry traces** from Onur's MiniMax-M2.7 sessions
(`server_name:onur-ThinkPad-E470`, 7,421 events): every span shows
`cache_creation_input_tokens: null`, `cache_read_input_tokens: 0`, and
input tokens growing unbounded to 178k (past the 168k maxInput limit).

## Problem

MiniMax's Anthropic-compatible endpoint uses passive caching with **zero
write cost** (`cache_creation_input_tokens` is always 0/null). The
gradient system:

1. Gets fallback pricing ($3.75/M write, $0.30/M read) — a 12.5:1 ratio
2. `shouldCompress()` computes bust cost ($0.36) vs continue cost
($0.05) → rejects compression
3. Context grows at layer 0 to `maxInput × 0.95 = 159,600` with zero
compression
4. Tool-heavy implementation turns (5-15k tokens each) push past the API
limit

## Changes

- **Free-write detection**: Tracks `zeroCacheWriteTurns` per session in
`recordCacheUsage()`. After 3 consecutive turns with zero cache writes,
`isFreeWriteSession()` returns true
- **`shouldCompress()` bypass**: Accepts `{ freeWrite: true }` option —
returns true unconditionally (compression is free when there's no write
cost). The 5-consecutive-bust guard still applies to prevent
compress→overflow loops
- **Earlier compression trigger**: For free-write sessions,
`layer0Ceiling` drops to 65% of maxInput (~109k for 200k context),
leaving ~59k headroom for tool-heavy turns instead of the normal ~8k
- **Self-correcting**: If the upstream ever reports non-zero
`cache_creation_input_tokens`, the counter resets and normal economic
analysis resumes
- **OpenCode provider sync**: Adds `minimax`, `minimax-cn`, `zai`,
`kimi-coding` to `GATEWAY_PROVIDERS` (matching Pi plugin)

## Test plan

- 14 new tests across 4 test suites covering: zero-cache-write tracking,
counter reset, bust counter reset at threshold, `isFreeWriteSession()`
behavior, `shouldCompress()` with freeWrite option, backward
compatibility, and an integration test verifying `transform()`
compresses at ~65% for free-write sessions vs ~95% for normal
- All 1957 tests pass, typecheck clean across all 4 packages
B
Burak Yigit Kaya committed
82b903f1142e2f653bde3f38a83b0f6f9707dcf8
Parent: 2c5533f
Committed by GitHub <noreply@github.com> on 5/28/2026, 10:18:51 PM