[Performance] Optimize Adam GPU kernel and change learning rate type to float64 (#78972)
* [Performance] Optimize Adam GPU kernel and change learning rate type to float64 - Add get_lr_dtype() to Adam returning float64 for higher-precision lr - Add AdamDenseKernel_compatible (GPU): torch-compatible Adam math with double-precision lr, FMA moment updates, and amsgrad support - Change AdamKernelREG/MEM lr parameter from const MT* to const double* so the non-compatible path also reads lr at full precision - Set kernel->InputAt(2).SetDataType(FLOAT64) for adam/merged_adam on GPU, CPU, and XPU to ensure the lr tensor arrives as float64 - Fix CPU AdamDenseKernel and MergedAdamKernel to read lr as double - Fix XPU AdamDenseKernel and MergedAdamKernel to cast double lr to float32 before passing to XPU XDNN functions - Add _create_regularization_of_grad override in Adam for L2Decay to replicate PyTorch float64 weight-decay math - Update test_adam_op.py: LearningRate dtype float32->float64, rtol=2e-4 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix test1 * fix test2 * fix test3 * fix test4 * fix test5 * fix test6 --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Z
zhengshengning committed
f4014bfa7b9acddfcfcaffb57b57b2a5c8fe9e7a
Parent: 3fc714d
Committed by GitHub <noreply@github.com>
on 5/19/2026, 3:17:30 AM