Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129)
* Fix delayed AllReduce on Gemma-4 MoE Skip forward past nodes that don't consume the current one, and allow a chain of MULs. * Check for all sources before skipping nodes * Address review comments
G
Gaurav Garg committed
fd6ae4ca1cd5446442f6c2e5e73a2a4c9bc44993
Parent: fb19f94
Committed by GitHub <noreply@github.com>
on 4/20/2026, 4:25:39 PM