SIGN IN SIGN UP

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

0 0 108 Python

Add hrm text (#46025)

* make new branch because other branch has messed up diff

* fix

* ignore trf rule here, special case where we set requires grad

* forgot the conversion mapping

* Fix: pad `L_bp_cycles` with `H_cycles`, not `L_cycles`

`L_bp_cycles_padded` is indexed by `high_cycle_idx ∈ [0, H_cycles)`
inside the recurrent forward, but `HrmTextModel.__init__` was
left-padding it to length `config.L_cycles` instead of
`config.H_cycles`. With the upstream defaults (`H_cycles=2`,
`L_cycles=3`, `L_bp_cycles=[2]`) this silently produced
`L_bp_cycles_padded=[1, 1, 2]`, so the index-1 read in the second
H-cycle picked up the leading pad value (1) and the trailing 2 was
never reached. Inference is unaffected (the value is only consulted
under autograd in training); training-time gradient propagation
through the last H-cycle was capped at 1 L-iteration instead of
`raw_bp[-1]` (default 2).

* fixes

* test

* skip TP tests for now

* style

* last skip

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
Y
yifei wu committed
ca80e95782220b009afbeec6bf864258a67d988b
Parent: 461d428
Committed by GitHub <noreply@github.com> on 5/18/2026, 9:05:29 AM