Fix bug in masking when kv cache is used. (#697)

* Fix bug in masking when kv cache is used.

* add tests

* dd tests

* upd

* add kv cache test to gh workflow

* explicit mask slicing

* upd

---------

Co-authored-by: rasbt <[email protected]>

Martin Ma committed 9mo ago

6522be94beb2640ae2a811249c73c67d0845567c

Parent: 37b26c2

Committed by GitHub <[email protected]> on 6/23/2025, 6:12:56 PM