n_heads × d_head -> d_head × d_head in DeltaNet (#903)

Clarified the explanation of the memory size calculation for `KV_cache_DeltaNet` and updated the quadratic term from `n_heads × d_head` to `d_head × d_head`.

Sebastian Raschka committed 4mo ago

bcc73f731d09cec9c091b4ed563eed68fbdeecf0

Parent: 488bef7

Committed by GitHub <[email protected]> on 11/6/2025, 12:28:37 AM