onnx: add com.microsoft MultiHeadAttention handler
Standard (bidirectional) multi-head attention over unpacked query/key/value, lowered onto tract Sdpa, with optional present_key/present_value outputs. Bias, attention/padding masks, packed QKV and past KV cache are rejected with clear errors. Validated bit-close against onnxruntime (output + present_key/value). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
C
czoli1976 committed
7b2ea86fb9d7c321632f331ead65bbe996fa180c
Parent: 4876cc7
Committed by Mathieu Poumeyrol <kali@users.noreply.github.com>
on 5/27/2026, 3:11:46 PM