huggingface / transformers UNCLAIMED

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

0 0 0 Python

Accumulate tokens into batches in `PreTrainedTokenizerBase.add_tokens()` (#17119)

* Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens()

For tokenizers with a small number of special tokens or special tokens
with consecutive token IDs, this reduces the time complexity of creating
the trie from quadratic to linear, see also #16936.

* Extend explanation of batching added tokens

Vít Novotný committed 3y ago

6ee1474b67b088829555364a14ebfb45e661fac4

Parent: 52e7c92

Committed by GitHub <[email protected]> on 5/31/2022, 2:36:45 PM