huggingface / transformers UNCLAIMED
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
0 0 0 Python
Add Parakeet (#39062)

* first commit

Signed-off-by: nithinraok <[email protected]>

* update to handle masking for bs>1

Signed-off-by: nithinraok <[email protected]>

* Add tests and docs

Signed-off-by: nithinraok <[email protected]>

* update model ids

Signed-off-by: nithinraok <[email protected]>

* update docs and improve style

Signed-off-by: nithinraok <[email protected]>

* update librosa location

Signed-off-by: nithinraok <[email protected]>

* import guard torch too

Signed-off-by: nithinraok <[email protected]>

* ruff code checks fix

Signed-off-by: nithinraok <[email protected]>

* ruff format check

Signed-off-by: nithinraok <[email protected]>

* updated to parakeet names

Signed-off-by: nithinraok <[email protected]>

* update script

Signed-off-by: nithinraok <[email protected]>

* Add tokenizer decoding

Signed-off-by: nithinraok <[email protected]>

* Remove other model dependency

Signed-off-by: nithinraok <[email protected]>

* clean tests

Signed-off-by: nithinraok <[email protected]>

* fix tests

Signed-off-by: nithinraok <[email protected]>

* linting

Signed-off-by: nithinraok <[email protected]>

* fix ruff lint warnings

Signed-off-by: nithinraok <[email protected]>

* move to seperate folders

Signed-off-by: nithinraok <[email protected]>

* add parakeet ctc model code

Signed-off-by: nithinraok <[email protected]>

* simplify encoder structure

Signed-off-by: nithinraok <[email protected]>

* update documentation

Signed-off-by: nithinraok <[email protected]>

* add parakeet to toctree

Signed-off-by: nithinraok <[email protected]>

* fix tests

Signed-off-by: nithinraok <[email protected]>

* add parakeet doc

Signed-off-by: nithinraok <[email protected]>

* Address comments

Signed-off-by: nithinraok <[email protected]>

* Update featurizer to compute lens directly

Signed-off-by: nithinraok <[email protected]>

* fix ruff tests

Signed-off-by: nithinraok <[email protected]>

* fix encoding format

Signed-off-by: nithinraok <[email protected]>

* fix minor ctc decoding

Signed-off-by: nithinraok <[email protected]>

* revert modular_model_converter.py changes

* revert check_config_attributes.py changes

* refactor: fastconformer & parakeet_ctc -> parakeet

* modeling update

* test update

* propagate feature extractor updates

* propagate doc changes

* propagate doc changes

* propagate tokenization changes

* propagate conversion changes

* remove fastconformer tests

* remove modular

* update processor

* update processor

* tset update

* diverse fixes

* 100% macthing greedy batched

* Update conversion script.

* Refactor docs.

* Reafactor auto loading.

* Refactor and fix tokenization and processing.

* Update integration test.

* Modeling fixes:
- ensure correct attention mask shape
- ensure layer drop returns valid output
- correct blank token ID when computing CTC loss

* Format and repo consistency.

* Update model doc.

* Fix feature extraction tests.

* Fix (most) tokenizer tests.

* Add pipeline example.

* Fixes

* Use eager_attention_forward from Llama.

* Small tweaks.

* Replace Sequential with ModuleList

* Add check if not all layers copied

* Clean tokenizer.

* Standardize FastSpeech2ConformerConvolutionModule for Parakeet.

* Switch to modular for modeling and processing.

* Add processor tests.

* Fix modeling tests.

* Formating and docstrings.

* Add `return_attention_mask` like other feature extractors.

* clean up after merging main.

* nits on modeling

* configuration update

* nit

* simplification: use PretrainedTokenizerFast, simplify processor

* add dtype arg to mel_filter_bank

* feature extraction: simplify!

* modeling update

* change to ParakeetTokenizerFast

* correct attention mask handling

* auto update

* proc update

* test update

* feature extraction fixes

* modeling update

* conversion script update

* udpate tests feature integration

* update tokenization and tests

* processor tests

* revert audio_utils

* config docstring update

* blank_token -> pad_token

* modeling udpate

* doc update

* fix tests

* fix test

* fix tests

* address review comments

* add comment

* add comment

* explicitly not support flash

* atttention straightforward masking

* fix

* tokenizer update: skipping blank tokens by default

* doc update

* fix max_positions_embeddings handling

* nits

* change atol faeture extraction integration tests

* doc update + fix loss

* doc update

* nit

* update integration test for A10

* repo id name

* nit

---------

Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Eustache Le Bihan <[email protected]>
Co-authored-by: eustlb <[email protected]>
Co-authored-by: Eric B <[email protected]>
Nithin Rao committed 6mo ago
a579de7f5e00a9fdb1e9828aa3ab78385959f231
Parent: 1dd22a2
Committed by GitHub <[email protected]> on 9/25/2025, 1:52:24 PM