🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Add GraniteMoeHybrid support for 4.0 (#37658)
* initial config and MLA layer Signed-off-by: Sukriti-Sharma4 <[email protected]> * first pass at decoder Signed-off-by: Sukriti-Sharma4 <[email protected]> * completion of layers Signed-off-by: Sukriti-Sharma4 <[email protected]> * modeling class Signed-off-by: Sukriti-Sharma4 <[email protected]> * adding hybrid class to imports Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix imports granitemoehybrid Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix granitehybrid imports Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix granitehybrid import Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix generated modeling file Signed-off-by: Sukriti-Sharma4 <[email protected]> * add some comments Signed-off-by: Sukriti-Sharma4 <[email protected]> * minor fixes in layers Signed-off-by: Sukriti-Sharma4 <[email protected]> * add sharedMLP layer Signed-off-by: Sukriti-Sharma4 <[email protected]> * correct layer names Signed-off-by: Sukriti-Sharma4 <[email protected]> * fixes in mamba config Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix mamba config Signed-off-by: Sukriti-Sharma4 <[email protected]> * change name of MLP layer Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix seq mizer layers Signed-off-by: Sukriti-Sharma4 <[email protected]> * correct mamba config Signed-off-by: Sukriti-Sharma4 <[email protected]> * fixes in param names Signed-off-by: Sukriti-Sharma4 <[email protected]> * enable hybrid model Signed-off-by: Sukriti-Sharma4 <[email protected]> * update config Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix config granite hybrid Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix attention layer Signed-off-by: Sukriti-Sharma4 <[email protected]> * cleanup to re-use mamba code Signed-off-by: Sukriti-Sharma4 <[email protected]> * keep layer types Signed-off-by: Sukriti-Sharma4 <[email protected]> * attention bias cleanup Signed-off-by: Sukriti-Sharma4 <[email protected]> * update mamba layer name Signed-off-by: Sukriti-Sharma4 <[email protected]> * first pass at tests Signed-off-by: Sukriti-Sharma4 <[email protected]> * first pass at tests Signed-off-by: Sukriti-Sharma4 <[email protected]> * use granite attention Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix: self attn weights Signed-off-by: Sukriti-Sharma4 <[email protected]> * pass at making pos_emb optional Signed-off-by: Sukriti-Sharma4 <[email protected]> * initialize self_attn only as needed Signed-off-by: Sukriti-Sharma4 <[email protected]> * overwrite forward to create HybridMambaCache Signed-off-by: Sukriti-Sharma4 <[email protected]> * Log invalid layer types * Add attention outputs test * Only emit attentions/logits if not None * Fix config test hidden size divisibility * mark granitmoehybrid as stateful * Initialize mamba convolutional layers * Formatting fixes * config docstring, removed some unused attrs * Fix missing arg in models test * Fix create and check decoder model test * support logits to keep in granitemoe * regen to pass logits_to_keep * Allow None or rope * Fix gradient checkpointing * Add granitemoehybrid as special cache for generate check * Remove unused MLA refs * Fix mamba layer mask * Remove logits to keep from config * Minor docstring nits * Update licenses * Enable cache by default * map layer types to layer block type * First pass at granite moe hybrid docs * Ignore granite moe hybrid in valid checkpoint check * Align attention interfaces * regenerate modular granitemoeshared attention interface * Align granite moe hybrid attn interface * run formatting * Handle mamba initialization * avoid conditional attr defs * Move hybrid layer validation to config * Add placeholder integration tests * Docs nits / Update model names * Clean up forward conditions * Use gradient checkpointing layer * Remove some copied bamba tests + inherit align test init delete more tests Use common layer init with bamba tests finish test consolidation * avoid redundant intermediate std var * use @can_return_tuple * Remove unused moe state * make skipped test names consistent * Fix docstring order * Add missing toc * Always create the shared mlp * Fix name in docstring * link preview model in docs --------- Signed-off-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Alex-Brooks <[email protected]>
S
Sukriti Sharma committed
471958b6208bb9e94e6305d279fad9a05aa42c36
Parent: fe29b8c
Committed by GitHub <[email protected]>
on 5/6/2025, 4:47:43 AM