🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Add hrm text (#46025)
* make new branch because other branch has messed up diff * fix * ignore trf rule here, special case where we set requires grad * forgot the conversion mapping * Fix: pad `L_bp_cycles` with `H_cycles`, not `L_cycles` `L_bp_cycles_padded` is indexed by `high_cycle_idx ∈ [0, H_cycles)` inside the recurrent forward, but `HrmTextModel.__init__` was left-padding it to length `config.L_cycles` instead of `config.H_cycles`. With the upstream defaults (`H_cycles=2`, `L_cycles=3`, `L_bp_cycles=[2]`) this silently produced `L_bp_cycles_padded=[1, 1, 2]`, so the index-1 read in the second H-cycle picked up the leading pad value (1) and the trailing 2 was never reached. Inference is unaffected (the value is only consulted under autograd in training); training-time gradient propagation through the last H-cycle was capped at 1 L-iteration instead of `raw_bp[-1]` (default 2). * fixes * test * skip TP tests for now * style * last skip --------- Co-authored-by: vasqu <antonprogamer@gmail.com>
Y
yifei wu committed
ca80e95782220b009afbeec6bf864258a67d988b
Parent: 461d428
Committed by GitHub <noreply@github.com>
on 5/18/2026, 9:05:29 AM