ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040)

* ggml-webgpu: makes the flash attn vec path compile and size its split/reduce work from the device’s reported subgroup range instead of assuming 32 subgroup size.

* ggml-webgpu: remove the extra max_wg_size >= max_subgroup_size guard. Remove hardcoded 32 when determine the value of reduce_wg_size and vec_nwg_cap

Zheyuan Chen committed 1mo ago

5ec717d1256e34558a44dc09adf1e6e16f2e2682

Parent: 0c3e4fc

Committed by GitHub <noreply@github.com> on 5/14/2026, 4:31:36 PM