🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
fix: improve processor loading performance by avoiding redundant tokenizer parsing (#44927)
* fix(tokenization_utils_tokenizers): avoid parsing full vocab in from_file when only post_processor/padding/truncation are needed * fix(tokenization_utils_tokenizers): fall back to from_file when model type is missing in tokenizer.json * fix(tokenization_utils_tokenizers): restrict minimal tokenizer optimization to BPE/WordPiece/WordLevel only * fix(tokenization_utils_tokenizers): add comment explaining why Unigram and older formats fall back to from_file * apply suggestions * fix --------- Co-authored-by: ydshieh <[email protected]>
Y
Yih-Dar committed
b1527a32a1010cd94bfb4f937af247bb5871f6fd
Parent: 9dc8d8a
Committed by GitHub <[email protected]>
on 3/23/2026, 10:46:49 AM