SIGN IN SIGN UP

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

0 0 0 Python

Add GraniteMoeHybrid support for 4.0 (#37658)

* initial config and MLA layer

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* first pass at decoder

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* completion of layers

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* modeling class

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* adding hybrid class to imports

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix imports granitemoehybrid

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix granitehybrid imports

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix granitehybrid import

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix generated modeling file

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* add some comments

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* minor fixes in layers

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* add sharedMLP layer

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* correct layer names

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fixes in mamba config

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix mamba config

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* change name of MLP layer

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix seq mizer layers

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* correct mamba config

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fixes in param names

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* enable hybrid model

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* update config

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix config granite hybrid

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix attention layer

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* cleanup to re-use mamba code

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* keep layer types

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* attention bias cleanup

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* update mamba layer name

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* first pass at tests

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* first pass at tests

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* use granite attention

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* fix: self attn weights

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* pass at making pos_emb optional

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* initialize self_attn only as needed

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* overwrite forward to create HybridMambaCache

Signed-off-by: Sukriti-Sharma4 <[email protected]>

* Log invalid layer types

* Add attention outputs test

* Only emit attentions/logits if not None

* Fix config test hidden size divisibility

* mark granitmoehybrid as stateful

* Initialize mamba convolutional layers

* Formatting fixes

* config docstring, removed some unused attrs

* Fix missing arg in models test

* Fix create and check decoder model test

* support logits to keep in granitemoe

* regen to pass logits_to_keep

* Allow None or rope

* Fix gradient checkpointing

* Add granitemoehybrid as special cache for generate check

* Remove unused MLA refs

* Fix mamba layer mask

* Remove logits to keep from config

* Minor docstring nits

* Update licenses

* Enable cache by default

* map layer types to layer block type

* First pass at granite moe hybrid docs

* Ignore granite moe hybrid in valid checkpoint check

* Align attention interfaces

* regenerate modular granitemoeshared attention interface

* Align granite moe hybrid attn interface

* run formatting

* Handle mamba initialization

* avoid conditional attr defs

* Move hybrid layer validation to config

* Add placeholder integration tests

* Docs nits / Update model names

* Clean up forward conditions

* Use gradient checkpointing layer

* Remove some copied bamba tests + inherit

align test init

delete more tests

Use common layer init with bamba tests

finish test consolidation

* avoid redundant intermediate std var

* use @can_return_tuple

* Remove unused moe state

* make skipped test names consistent

* Fix docstring order

* Add missing toc

* Always create the shared mlp

* Fix name in docstring

* link preview model in docs

---------

Signed-off-by: Sukriti-Sharma4 <[email protected]>
Co-authored-by: Alex-Brooks <[email protected]>
S
Sukriti Sharma committed
471958b6208bb9e94e6305d279fad9a05aa42c36
Parent: fe29b8c
Committed by GitHub <[email protected]> on 5/6/2025, 4:47:43 AM