🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Accumulate tokens into batches in `PreTrainedTokenizerBase.add_tokens()` (#17119)
* Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens() For tokenizers with a small number of special tokens or special tokens with consecutive token IDs, this reduces the time complexity of creating the trie from quadratic to linear, see also #16936. * Extend explanation of batching added tokens
V
Vít Novotný committed
6ee1474b67b088829555364a14ebfb45e661fac4
Parent: 52e7c92
Committed by GitHub <[email protected]>
on 5/31/2022, 2:36:45 PM