SIGN IN SIGN UP
rasbt / LLMs-from-scratch UNCLAIMED

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

0 0 0 Jupyter Notebook

Fix MHAEinsum weight dimension bug when d_in != d_out (#857) (#893)

* Fix MHAEinsum weight dimension bug when d_in != d_out (#857)

Previously MHAEinsum initialized weight matrices with shape (d_out, d_in) and used inappropriate einsum notation, causing failures for non-square input-output dimensions. This commit corrects weight initialization to shape (d_in, d_out), updates einsum notation to 'bnd,do->bno', and adds three unit tests to verify parity across different d_in and d_out settings. All tests pass successfully.

* use pytest

* Update .gitignore

---------

Co-authored-by: rasbt <[email protected]>
A
Aviral Garg committed
27d52d6378b99c419006d284cd1a7301ad533fca
Parent: b1db33b
Committed by GitHub <[email protected]> on 11/1/2025, 2:45:31 AM