* initial implementation
* CB support
* change how we call item on max_seq_len_q/k
* fix
* tests
* fix fa2 clash
* unify the fa dispatch
* fix
* modernbert...
* oops
* parity test
* style
* nit
* fixup imports for fa4
* enable attention sinks, fixup logits checks in parity test
* style
* change dispatch logic and introduce lower bound for FA
* style
* fix test
* min fa2, avoid 2x device sync
* style
* simple min version instead of list
* fixup error message on non init check
* fixup up non init check a tad more
* refactor some FA constants out to main fa utils
* new marker for all fas needed
* oops
* style and make the fa kernel fallback generalized
* default none...
* more refactors
* style
* fix
* this test faulty even on main, xformers can handle any shape apparently yikes
* lets make this more robust, we should check for none within...
* fix
* oops
Joao is regrettably no longer with us 🫡 so we should really stop getting users to ping him! This PR makes @cyrilvallez responsible for `generate` issues outside of VLMs.
Makes sure extras can be installed on all supported Python versions.
- cleaned up extras (removed natten, tweaked mistral-common etc,)
- adds a supported Python version range (10->14)
- dynamically update the metadata
- run a smoke test in the CI every night to verify pip install works on all extras
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Automatic release
* Install transformers from within the build
* setuptools
* Check build doesn't need to exist anymore
* Check build doesn't need to exist anymore
* -y
* torch install for pipeline
* TestPypi upload
* Fine tune
* Fine tune
* Update release instructions
* Update .github/workflows/release.yml
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* draft
* draft
* draft
* draft
* draft
* draft
* draft
* draft
* draft
* draft
* draft
* fail to see the check
* fail to see the check
* fail to see the check
* fail to see the check
* fail to see the check
* Apply style fixes
* fail to see the check
* fail to see the check
* fail to see the check
* Apply repo. consistency fixes
* fail to see the check
* Apply repo. consistency fixes
* fail to see the check
* delete
* Apply repo. consistency fixes
* comment
* Apply repo. consistency fixes
* comment
* Apply repo. consistency fixes
* comment
* Apply repo. consistency fixes
* comment
* Apply repo. consistency fixes
* back
* check
* check
* check
* check
* check
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* debug(ci): run `pwd` to check what we're working with
* fix(ci): `ls -lR`
* fix(ci): remove working directory which should not be there?
* fix(cb): make sure memory is freed when calling `stop`
* fix(ci): effectively clear cache
* fix(ci): reduce memory safety margin
* refactor(cb): add fixme note on default safety margin value
* feat(ci): add continuous batching to benchmarks
* refactor(ci): PR comments
* refactor(cb): when stopping, block by default
* fix(benchmarks): `stream` -> `streaming`
* fix(benchmarks): invalid configuration when cb has attn_impl == sdpa
* tests(cb): fix attn impl
* fix(benchmarks): update `get_throughput` formula
* fix(benchmarks): prevent version conflicts and ensure proper cleanup in continuous batching (#42063)
* Initial plan
* fix(benchmarks): ensure proper cleanup and remove transformers from requirements
- Remove transformers from benchmark_v2/requirements.txt to prevent version conflicts
- Add try-finally block to ensure ContinuousBatchingManager.stop() is always called
- This fixes TypeError about unexpected 'streaming' argument and prevents OOM from improper cleanup
Co-authored-by: McPatate <9112841+McPatate@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: McPatate <9112841+McPatate@users.noreply.github.com>
* fix(benchmarks): raise the exception on failure instead of ignoring
we catch the exception later on and raising it here helps debugging
because it will be logged
* test(cb): comment out failing tests for now
added a `FIXME` mark
* fix(benchmarks): revert `finally` removal but keep raising exception
* test(cb): fix missing `require_read_token` import
* refactor(benchmarks): error if no benchmarks were run
* refactor(benchmarks): change default lvls of cb bench config
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: McPatate <9112841+McPatate@users.noreply.github.com>