🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
enable cpu paged cache (#42869)
* enable cpu paged cache Signed-off-by: jiqing-feng <[email protected]> * enable cpu example Signed-off-by: jiqing-feng <[email protected]> * fix device map Signed-off-by: jiqing-feng <[email protected]> * update tests Signed-off-by: jiqing-feng <[email protected]> * revert xpu deterministic Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * update test_paged_attention for CPU Signed-off-by: jiqing-feng <[email protected]> * update cpu groud truth for CI Signed-off-by: jiqing-feng <[email protected]> * use accelerator Signed-off-by: jiqing-feng <[email protected]> * fix typo Signed-off-by: jiqing-feng <[email protected]> * fix tests Signed-off-by: jiqing-feng <[email protected]> * fix example Signed-off-by: jiqing-feng <[email protected]> * update tests Signed-off-by: jiqing-feng <[email protected]> * update tests Signed-off-by: jiqing-feng <[email protected]> * fix tests Signed-off-by: jiqing-feng <[email protected]> * fix num_return_sequences Signed-off-by: jiqing-feng <[email protected]> * fix num_return_sequence Signed-off-by: jiqing-feng <[email protected]> * fix max_seqlen_q Signed-off-by: jiqing-feng <[email protected]> * cpu does not support FA2 without paged Signed-off-by: jiqing-feng <[email protected]> * add cpu expected outputs Signed-off-by: jiqing-feng <[email protected]> * revert useless change Signed-off-by: jiqing-feng <[email protected]> * revert wrong changge Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * update comments Signed-off-by: jiqing-feng <[email protected]> * add flex attn for CPU Signed-off-by: jiqing-feng <[email protected]> * fix tests Signed-off-by: jiqing-feng <[email protected]> * fix comment Signed-off-by: jiqing-feng <[email protected]> * fix ground truth check Signed-off-by: jiqing-feng <[email protected]> * fix graph check Signed-off-by: jiqing-feng <[email protected]> * Simplify _graphs initialization for CUDA graphs Refactor the initialization of _graphs to simplify the condition for using CUDA graphs. * Update src/transformers/generation/continuous_batching/requests.py Co-authored-by: Rémi Ouazan <[email protected]> * Update src/transformers/generation/continuous_batching/continuous_api.py Co-authored-by: Rémi Ouazan <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: Rémi Ouazan <[email protected]>
J
jiqing-feng committed
071e178be163917777dac272e8e26525bc20db08
Parent: e7a2c0c
Committed by GitHub <[email protected]>
on 1/29/2026, 3:27:39 PM