* draft
* draft
* draft
* draft
* draft
* draft
* draft
* draft
* draft
* draft
* draft
* fail to see the check
* fail to see the check
* fail to see the check
* fail to see the check
* fail to see the check
* Apply style fixes
* fail to see the check
* fail to see the check
* fail to see the check
* Apply repo. consistency fixes
* fail to see the check
* Apply repo. consistency fixes
* fail to see the check
* delete
* Apply repo. consistency fixes
* comment
* Apply repo. consistency fixes
* comment
* Apply repo. consistency fixes
* comment
* Apply repo. consistency fixes
* comment
* Apply repo. consistency fixes
* back
* check
* check
* check
* check
* check
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* debug(ci): run `pwd` to check what we're working with
* fix(ci): `ls -lR`
* fix(ci): remove working directory which should not be there?
* fix(cb): make sure memory is freed when calling `stop`
* fix(ci): effectively clear cache
* fix(ci): reduce memory safety margin
* refactor(cb): add fixme note on default safety margin value
* feat(ci): add continuous batching to benchmarks
* refactor(ci): PR comments
* refactor(cb): when stopping, block by default
* fix(benchmarks): `stream` -> `streaming`
* fix(benchmarks): invalid configuration when cb has attn_impl == sdpa
* tests(cb): fix attn impl
* fix(benchmarks): update `get_throughput` formula
* fix(benchmarks): prevent version conflicts and ensure proper cleanup in continuous batching (#42063)
* Initial plan
* fix(benchmarks): ensure proper cleanup and remove transformers from requirements
- Remove transformers from benchmark_v2/requirements.txt to prevent version conflicts
- Add try-finally block to ensure ContinuousBatchingManager.stop() is always called
- This fixes TypeError about unexpected 'streaming' argument and prevents OOM from improper cleanup
Co-authored-by: McPatate <9112841+McPatate@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: McPatate <9112841+McPatate@users.noreply.github.com>
* fix(benchmarks): raise the exception on failure instead of ignoring
we catch the exception later on and raising it here helps debugging
because it will be logged
* test(cb): comment out failing tests for now
added a `FIXME` mark
* fix(benchmarks): revert `finally` removal but keep raising exception
* test(cb): fix missing `require_read_token` import
* refactor(benchmarks): error if no benchmarks were run
* refactor(benchmarks): change default lvls of cb bench config
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: McPatate <9112841+McPatate@users.noreply.github.com>
- add new workflow to scan permissions github_token really need and advise pernmissions
- add actions-permissions/monitor on almost all worklows
=> the goal is to define properly all permissions blocks by jobs
# Conflicts:
# .github/workflows/check-workflow-permissions.yml
Co-authored-by: Pauline <pauline@Paulines-MacBook-Pro-2.local>
* CodeQL workflow for security analysis
Created CodeQL workflow to use reusable workflow from internal and simplified configuration.
* Update CodeQL workflow for main branch only and remving python from analysis
Restrict CodeQL analysis to 'actions' language only.
* Disable pull_request trigger in CodeQL workflow temporarly
Comment out pull_request trigger for CodeQL workflow
* Change ssh runner type
* Add wait step to SSH runner workflow
* Rename wait step to wait2 in ssh-runner.yml
* Remove wait step from ssh-runner.yml
Removed the wait step from the SSH runner workflow.
* Update runner type for single GPU A10 instance
* Update SSH runner version to 1.90.3
* Add sha256sum to ssh-runner workflow
* Update runner type and remove unused steps
* first commit
* add tests
* add kernel config
* add more tests
* add ci
* small fix
* change branch name
* update tests
* nit
* change test name
* revert jobs
* addressing review
* reenable all jobs
* address second review
* Big refactor, still classes to move around and script to re-complexify
* Move to streamer, isolate benches, propagate num tokens
* Some refacto
* Added compile mode to name
* Re-order
* Move to dt_tokens
* Better format
* Fix and disable use_cache by default
* Fixed compile and SDPA backend default
* Refactor results format
* Added default compile mode
* Always use cache
* Fixed cache and added flex
* Plan for missing modules
* Experiments: no cg and shuffle
* Disable compile for FA
* Remove wall time, add sweep mode, get git commit
* Review compliance, start
* Apply suggestions from code review
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Update benchmark_v2/framework/benchmark_runner.py
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Disable workflow
* Pretty print
* Added some pretty names to have pretty logs
* Review n2 compliance (end?)
* Style and end of PR
---------
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Update CI workflows to use devmi355 branch
* Add workflow trigger for AMD scheduled CI caller
* Remove unnecessary blank line in workflow YAML
* Add trigger for workflow_run on main branch
* Update workflow references from devmi355 to main
* Change runner_scale_set to runner_group in CI config