Correct performance claims with real measured numbers
Previous docs included estimated DDTree throughput (73-95 tok/s, "2.6x over AR") computed by multiplying DFlash long-context speeds by DDTree's speedup ratio. These were never directly measured and significantly overstated actual performance. Real end-to-end measurements (code generation, 8K max tokens): - Autoregressive: 27.9 tok/s - DFlash: 38.6 tok/s (1.38x) - DFlash + DDTree: 42.3 tok/s (1.52x over AR, ~10% over DFlash) Also document content-type sensitivity: DDTree helps 10-15% on code and structured content (85% draft acceptance), but provides ~0% benefit on creative prose (5-10% draft acceptance). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
T
Thanh Pham committed
4b12590abc9909fb03bfdf7dd736e76cef7ebdb0
Parent: 56d9103