🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Add Parakeet (#39062)
* first commit Signed-off-by: nithinraok <[email protected]> * update to handle masking for bs>1 Signed-off-by: nithinraok <[email protected]> * Add tests and docs Signed-off-by: nithinraok <[email protected]> * update model ids Signed-off-by: nithinraok <[email protected]> * update docs and improve style Signed-off-by: nithinraok <[email protected]> * update librosa location Signed-off-by: nithinraok <[email protected]> * import guard torch too Signed-off-by: nithinraok <[email protected]> * ruff code checks fix Signed-off-by: nithinraok <[email protected]> * ruff format check Signed-off-by: nithinraok <[email protected]> * updated to parakeet names Signed-off-by: nithinraok <[email protected]> * update script Signed-off-by: nithinraok <[email protected]> * Add tokenizer decoding Signed-off-by: nithinraok <[email protected]> * Remove other model dependency Signed-off-by: nithinraok <[email protected]> * clean tests Signed-off-by: nithinraok <[email protected]> * fix tests Signed-off-by: nithinraok <[email protected]> * linting Signed-off-by: nithinraok <[email protected]> * fix ruff lint warnings Signed-off-by: nithinraok <[email protected]> * move to seperate folders Signed-off-by: nithinraok <[email protected]> * add parakeet ctc model code Signed-off-by: nithinraok <[email protected]> * simplify encoder structure Signed-off-by: nithinraok <[email protected]> * update documentation Signed-off-by: nithinraok <[email protected]> * add parakeet to toctree Signed-off-by: nithinraok <[email protected]> * fix tests Signed-off-by: nithinraok <[email protected]> * add parakeet doc Signed-off-by: nithinraok <[email protected]> * Address comments Signed-off-by: nithinraok <[email protected]> * Update featurizer to compute lens directly Signed-off-by: nithinraok <[email protected]> * fix ruff tests Signed-off-by: nithinraok <[email protected]> * fix encoding format Signed-off-by: nithinraok <[email protected]> * fix minor ctc decoding Signed-off-by: nithinraok <[email protected]> * revert modular_model_converter.py changes * revert check_config_attributes.py changes * refactor: fastconformer & parakeet_ctc -> parakeet * modeling update * test update * propagate feature extractor updates * propagate doc changes * propagate doc changes * propagate tokenization changes * propagate conversion changes * remove fastconformer tests * remove modular * update processor * update processor * tset update * diverse fixes * 100% macthing greedy batched * Update conversion script. * Refactor docs. * Reafactor auto loading. * Refactor and fix tokenization and processing. * Update integration test. * Modeling fixes: - ensure correct attention mask shape - ensure layer drop returns valid output - correct blank token ID when computing CTC loss * Format and repo consistency. * Update model doc. * Fix feature extraction tests. * Fix (most) tokenizer tests. * Add pipeline example. * Fixes * Use eager_attention_forward from Llama. * Small tweaks. * Replace Sequential with ModuleList * Add check if not all layers copied * Clean tokenizer. * Standardize FastSpeech2ConformerConvolutionModule for Parakeet. * Switch to modular for modeling and processing. * Add processor tests. * Fix modeling tests. * Formating and docstrings. * Add `return_attention_mask` like other feature extractors. * clean up after merging main. * nits on modeling * configuration update * nit * simplification: use PretrainedTokenizerFast, simplify processor * add dtype arg to mel_filter_bank * feature extraction: simplify! * modeling update * change to ParakeetTokenizerFast * correct attention mask handling * auto update * proc update * test update * feature extraction fixes * modeling update * conversion script update * udpate tests feature integration * update tokenization and tests * processor tests * revert audio_utils * config docstring update * blank_token -> pad_token * modeling udpate * doc update * fix tests * fix test * fix tests * address review comments * add comment * add comment * explicitly not support flash * atttention straightforward masking * fix * tokenizer update: skipping blank tokens by default * doc update * fix max_positions_embeddings handling * nits * change atol faeture extraction integration tests * doc update + fix loss * doc update * nit * update integration test for A10 * repo id name * nit --------- Signed-off-by: nithinraok <[email protected]> Co-authored-by: Eustache Le Bihan <[email protected]> Co-authored-by: eustlb <[email protected]> Co-authored-by: Eric B <[email protected]>
N
Nithin Rao committed
a579de7f5e00a9fdb1e9828aa3ab78385959f231
Parent: 1dd22a2
Committed by GitHub <[email protected]>
on 9/25/2025, 1:52:24 PM