0 0 0 C++

[Performance] Optimize Adam GPU kernel and change learning rate type to float64 (#78972)

* [Performance] Optimize Adam GPU kernel and change learning rate type to float64

- Add get_lr_dtype() to Adam returning float64 for higher-precision lr
- Add AdamDenseKernel_compatible (GPU): torch-compatible Adam math with
  double-precision lr, FMA moment updates, and amsgrad support
- Change AdamKernelREG/MEM lr parameter from const MT* to const double*
  so the non-compatible path also reads lr at full precision
- Set kernel->InputAt(2).SetDataType(FLOAT64) for adam/merged_adam on
  GPU, CPU, and XPU to ensure the lr tensor arrives as float64
- Fix CPU AdamDenseKernel and MergedAdamKernel to read lr as double
- Fix XPU AdamDenseKernel and MergedAdamKernel to cast double lr to
  float32 before passing to XPU XDNN functions
- Add _create_regularization_of_grad override in Adam for L2Decay to
  replicate PyTorch float64 weight-decay math
- Update test_adam_op.py: LearningRate dtype float32->float64, rtol=2e-4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix test1

* fix test2

* fix test3

* fix test4

* fix test5

* fix test6

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

zhengshengning committed 1mo ago

f4014bfa7b9acddfcfcaffb57b57b2a5c8fe9e7a

Parent: 3fc714d

Committed by GitHub <noreply@github.com> on 5/19/2026, 3:17:30 AM