SIGN IN SIGN UP

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

0 0 0 Python

Accumulate tokens into batches in `PreTrainedTokenizerBase.add_tokens()` (#17119)

* Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens()

For tokenizers with a small number of special tokens or special tokens
with consecutive token IDs, this reduces the time complexity of creating
the trie from quadratic to linear, see also #16936.

* Extend explanation of batching added tokens
V
Vít Novotný committed
6ee1474b67b088829555364a14ebfb45e661fac4
Parent: 52e7c92
Committed by GitHub <[email protected]> on 5/31/2022, 2:36:45 PM