model : NvFP4 quantized LM head support (#23046)
* NvFP4 quantized LM head support Signed-off-by: ynankani <ynankani@nvidia.com> * Address review commnets Signed-off-by: ynankani <ynankani@nvidia.com> * Add assert for NvFp4 lm head and tied embeddings Signed-off-by: ynankani <ynankani@nvidia.com> * Address review commnets Signed-off-by: ynankani <ynankani@nvidia.com> * Create output_s tensor only when LM head NvFp4 Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com>
Y
ynankani committed
42928bc14d33bde7b8fe8ea436ca44b40357737e
Parent: 59778f0
Committed by GitHub <noreply@github.com>
on 5/16/2026, 9:09:27 AM