Commit Graph

667 Commits

Author SHA1 Message Date
Pauline Bailly-Masson
7b325cd573 Fix security issue 5 (#42072)
fix

Co-authored-by: Pauline <pauline@Paulines-MacBook-Pro-2.local>
2025-11-06 19:50:59 +01:00
Pauline Bailly-Masson
a9e2b80c71 add workflow to check permissions and advise a set of permissions req… (#42071)
add workflow to check permissions and advise a set of permissions required

Co-authored-by: Pauline <pauline@Paulines-MacBook-Pro-2.local>
2025-11-06 18:55:01 +01:00
Yih-Dar
5aa7dd07da Revert back to use GitHub context (#42066)
* check

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-06 14:41:58 +01:00
Yih-Dar
76fea9b482 Fix another Argument list too long in pr_slow_ci_suggestion.yml (#42061)
* fix

* trigger

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-06 13:33:23 +01:00
Yih-Dar
8a96f5fbe8 Be careful at explicit checkout actions (#42060)
final

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-06 11:01:06 +01:00
Yih-Dar
17fdaf9b7a Avoid explicit checkout in workflow (#42057)
* remove explicit checkout

* check 1

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-06 09:31:20 +01:00
Yih-Dar
bb65d2d953 Fix pr_slow_ci_suggestion.yml after #42023 (#42049)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-05 22:10:12 +01:00
Yih-Dar
57bdb4a680 Cleanup workflow - part 1 (#42023)
* part 1

* part 2

* part 3

* part 4

* part 5

* fix 1

* check 1

* part 6

* part 7

* part 8

* part 9

* part 10: rename file

* OK: new_model_pr_merged_notification.yml

* part 11

* fix 2

* revert check

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-05 21:01:06 +01:00
Yih-Dar
561233cabf Change trigger time for AMD CI (#42034)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-05 14:17:12 +01:00
Pauline Bailly-Masson
20396951af CodeQL workflow for security analysis (#42015)
* CodeQL workflow for security analysis

Created CodeQL workflow to use reusable workflow from internal and simplified configuration.

* Update CodeQL workflow for main branch only and remving python from analysis

Restrict CodeQL analysis to 'actions' language only.

* Disable pull_request trigger in CodeQL workflow temporarly

Comment out pull_request trigger for CodeQL workflow
2025-11-05 10:59:37 +01:00
Rémi Ouazan
dd4e048e75 Reduce the number of benchmark in the CI (#42008)
Changed how benchmark cfgs are chosen
2025-11-04 14:07:17 +01:00
Yih-Dar
6d4450e341 Fix torch+deepspeed docker file (#41985)
* fix

* delete

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-04 10:41:22 +00:00
Yih-Dar
258c76e4dc Fix run slow v2: empty report when there is only one model (#42002)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-04 06:46:21 +01:00
Guillaume LEGENDRE
1619a3475f fix (CI): Refactor SSH runners (#41991)
* Change ssh runner type

* Add wait step to SSH runner workflow

* Rename wait step to wait2 in ssh-runner.yml

* Remove wait step from ssh-runner.yml

Removed the wait step from the SSH runner workflow.

* Update runner type for single GPU A10 instance

* Update SSH runner version to 1.90.3

* Add sha256sum to ssh-runner workflow

* Update runner type and remove unused steps
2025-11-03 18:16:32 +01:00
Rémi Ouazan
ff0f7d6498 More data in benchmarking (#41848)
* Reduce scope of cross-generate

* Rm generate_sall configs

* Workflow benchmarks more

* Prevent crash when FA is not installed
2025-11-03 18:05:26 +01:00
Rémi Ouazan
80305364e2 Move the Mi355 to regular docker (#41989)
* Move the Mi355 to regular docker

* Disable gfx950 compilation for FA on AMD
2025-11-03 16:41:06 +01:00
Mohamed Mekkouri
a623cda427 [kernels] Add Tests & CI for kernels (#41765)
* first commit

* add tests

* add kernel config

* add more tests

* add ci

* small fix

* change branch name

* update tests

* nit

* change test name

* revert jobs

* addressing review

* reenable all jobs

* address second review
2025-11-03 16:36:52 +01:00
Yih-Dar
8fb854cac8 Run slow v2 (#41914)
* Super

* Super

* Super

* Super

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-01 19:40:40 +01:00
Yih-Dar
cad7eeeb5e Minor fix in docker image build workflow (#41949)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-30 11:02:11 +01:00
Jitesh Gupta
76fc50a152 Cache latest pytorch amd image locally on mi325 CI runner cluster (#41926) 2025-10-29 19:45:37 +01:00
Yih-Dar
10d557123b Update some workflow files (#41892)
* update

* update

* final check

* final check

* final clean

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-29 14:42:05 +01:00
Yih-Dar
e2e8dbed13 CI workflow for Flash Attn (#41857)
ci for flash attn

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-25 09:45:47 +02:00
Anton Vlasjuk
2c5b888c95 [Onnx docs] Remove some traces (#41791)
fix
2025-10-23 10:34:25 +02:00
Luc Georges
71db0d49e9 feat: add benchmark v2 ci with results pushed to dataset (#41672) 2025-10-20 08:56:58 +01:00
Yih-Dar
307c523854 further improve utils/check_bad_commit.py (#41658) (#41690)
* fix

* Update utils/check_bad_commit.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-10-17 23:07:00 +02:00
Steven Liu
b9bd8c45a1 [CI] Build translated docs (#41632)
fix
2025-10-16 14:01:33 +02:00
Marc Sun
2b2c20f315 Update issue template (#41573)
* update

* fix
2025-10-15 13:54:37 +02:00
Rémi Ouazan
94df0e6560 Benchmark overhaul (#41408)
* Big refactor, still classes to move around and script to re-complexify

* Move to streamer, isolate benches, propagate num tokens

* Some refacto

* Added compile mode to name

* Re-order

* Move to dt_tokens

* Better format

* Fix and disable use_cache by default

* Fixed compile and SDPA backend default

* Refactor results format

* Added default compile mode

* Always use cache

* Fixed cache and added flex

* Plan for missing modules

* Experiments: no cg and shuffle

* Disable compile for FA

* Remove wall time, add sweep mode, get git commit

* Review compliance, start

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Update benchmark_v2/framework/benchmark_runner.py

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Disable workflow

* Pretty print

* Added some pretty names to have pretty logs

* Review n2 compliance (end?)

* Style and end of PR

---------

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
2025-10-14 21:41:43 +02:00
Marc Sun
1a3a5f5289 Remove SigOpt (#41479)
* remove sigopt

* style
2025-10-09 18:05:55 +02:00
Yih-Dar
42bcc81ba2 Minor security fix for ssh-runner.yml (#41317)
security issue

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-03 14:14:34 +02:00
Yih-Dar
7adb43e60a Build doc in 2 jobs: en and other languages (#41290)
* separate

* separate

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-02 14:33:57 +00:00
Yih-Dar
e1f1d32af0 Remove some previous team members from allow list of triggering Github Actions (#41263)
* delete

* delete

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-02 16:32:28 +02:00
Luc Georges
639ad8ccd9 feat: use aws-highcpu-32-priv for amd docker img build (#41285)
* feat: use `aws-highcpu-32-priv` for amd docker img build

* feat: add `workflow_dispatch` event to docker build CI
2025-10-02 12:53:14 +00:00
Yih-Dar
9d8f693c7e add peft team members to issue/pr template (#41262)
* add

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-10-01 17:26:59 +00:00
Yih-Dar
8e7b0655f1 update code owners (#41221)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-30 16:21:19 +02:00
Tom Aarsen
1f1e93e095 Align pull request template to bug report template (#41220)
The only difference is that I don't users to https://discuss.huggingface.co/ for hub issues.
2025-09-30 14:25:41 +02:00
Ákos Hadnagy
399c589dfa Separate docker images for Nvidia and AMD in benchmarking (#41119)
Separate docker images for Nvidia and AMD
2025-09-29 17:03:27 +02:00
Guillaume LEGENDRE
2dcb20dcec CI Runners - move amd runners mi355 and 325 to runner group (#41193)
* Update CI workflows to use devmi355 branch

* Add workflow trigger for AMD scheduled CI caller

* Remove unnecessary blank line in workflow YAML

* Add trigger for workflow_run on main branch

* Update workflow references from devmi355 to main

* Change runner_scale_set to runner_group in CI config
2025-09-29 11:14:19 +02:00
Yih-Dar
03c92884b5 Update team member list for some CI workflows (#41094)
* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 09:48:40 +00:00
Yih-Dar
1bb69cce82 Fix CI jobs being all red 🔴 (false positive) (#41059)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-22 16:51:00 +02:00
Ákos Hadnagy
b9d337b6f3 Add write token for uploading benchmark results to the Hub (#41047)
* Separate write token for Hub upload

* Address review comments

* Address review comments
2025-09-22 14:13:46 +00:00
Ákos Hadnagy
67097bf340 Fix benchmark runner argument name (#41012) 2025-09-20 10:53:56 +02:00
Yuanyuan Chen
96a3e898cd RUFF fix on CI scripts (#40805)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 13:50:26 +00:00
Joao Gante
fce746512b [docs] rm stray tf/flax autodocs references (#40999)
rm tf references
2025-09-19 12:04:12 +01:00
Ákos Hadnagy
61eff450d3 Benchmarking v2 GH workflows (#40716)
* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description
2025-09-19 08:54:49 +00:00
Yih-Dar
5ac3c5171a Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 18:27:27 +02:00
Yih-Dar
738b223f57 Add captured actual outputs to CI artifacts (#40965)
* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 15:40:53 +02:00
Yih-Dar
270da89708 Remove runner_map (#40880)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 15:18:07 +02:00
Arthur
96d3795cfc Update model tags and integration references in bug report (#40881) 2025-09-15 12:08:29 +02:00
Ákos Hadnagy
9c804f7ec4 Redirect MI355 CI results to dummy dataset (#40862) 2025-09-14 18:42:49 +02:00