SIGN IN SIGN UP

Correct performance claims with real measured numbers

Previous docs included estimated DDTree throughput (73-95 tok/s, "2.6x
over AR") computed by multiplying DFlash long-context speeds by DDTree's
speedup ratio. These were never directly measured and significantly
overstated actual performance.

Real end-to-end measurements (code generation, 8K max tokens):
- Autoregressive: 27.9 tok/s
- DFlash: 38.6 tok/s (1.38x)
- DFlash + DDTree: 42.3 tok/s (1.52x over AR, ~10% over DFlash)

Also document content-type sensitivity: DDTree helps 10-15% on code
and structured content (85% draft acceptance), but provides ~0% benefit
on creative prose (5-10% draft acceptance).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
T
Thanh Pham committed
4b12590abc9909fb03bfdf7dd736e76cef7ebdb0
Parent: 56d9103