Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
n_heads × d_head -> d_head × d_head in DeltaNet (#903)
Clarified the explanation of the memory size calculation for `KV_cache_DeltaNet` and updated the quadratic term from `n_heads × d_head` to `d_head × d_head`.
S
Sebastian Raschka committed
bcc73f731d09cec9c091b4ed563eed68fbdeecf0
Parent: 488bef7
Committed by GitHub <[email protected]>
on 11/6/2025, 12:28:37 AM