SIGN IN SIGN UP

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

0 0 0 Python

fix: improve processor loading performance by avoiding redundant tokenizer parsing (#44927)

* fix(tokenization_utils_tokenizers): avoid parsing full vocab in from_file when only post_processor/padding/truncation are needed

* fix(tokenization_utils_tokenizers): fall back to from_file when model type is missing in tokenizer.json

* fix(tokenization_utils_tokenizers): restrict minimal tokenizer optimization to BPE/WordPiece/WordLevel only

* fix(tokenization_utils_tokenizers): add comment explaining why Unigram and older formats fall back to from_file

* apply suggestions

* fix

---------

Co-authored-by: ydshieh <[email protected]>
Y
Yih-Dar committed
b1527a32a1010cd94bfb4f937af247bb5871f6fd
Parent: 9dc8d8a
Committed by GitHub <[email protected]> on 3/23/2026, 10:46:49 AM