* Add a Dockerfile for PyTorch + ROCm based on official AMD released artifact
* Add a new artifact single-amdgpu testing on main
* Attempt to test the workflow without merging.
* Changed BERT to check if things are triggered
* Meet the dependencies graph on workflow
* Revert BERT changes
* Add check_runners_amdgpu to correctly mount and check availability
* Rename setup to setup_gpu for CUDA and add setup_amdgpu for AMD
* Fix all the needs.setup -> needs.setup_[gpu|amdgpu] dependencies
* Fix setup dependency graph to use check_runner_amdgpu
* Let's do the runner status check only on AMDGPU target
* Update the Dockerfile.amd to put ourselves in / rather than /var/lib
* Restore the whole setup for CUDA too.
* Let's redisable them
* Change BERT to trigger tests
* Restore BERT
* Add torchaudio with rocm 5.6 to AMD Dockerfile (#26050)
fix dockerfile
Co-authored-by: Felix Marty <felix@hf.co>
* Place AMD GPU tests in a separate workflow (correct branch) (#26105)
AMDGPU CI lives in an other workflow
* Fix invalid job name is dependencies.
* Remove tests multi-amdgpu for now.
* Use single-amdgpu
* Use --net=host for now.
* Remote host networking.
* Removed duplicated check_runners_amdgpu step
* Let's tag machine-types with mi210 for now.
* Machine type should be only mi210
* Remove unnecessary push.branches item
* Apply review suggestions moving from `x-amdgpu` to `x-gpu` introducing `amd-gpu` and `miXXX` labels.
* Remove amdgpu from step names.
* finalize
* delete
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* mixed precision support via accelerate
* fix issues
* fix for the sharded ddp case
* fix flax and tf failing tests
* `refactor the place to create `Accelerator` object
* move ddp prep to accelerate
* fix 😅
* resolving comments
* move fsdp handling to accelerate
* fixex
* fix saving
* shift torch dynamo handling to accelerate
* shift deepspeed integration and save & load utils to accelerate
* fix accelerate launcher support
* oops
* fix 🐛
* save ckpt fix
* Trigger CI
* nasty 🐛😅
* as deepspeed needs grad_acc fixes, transfer grad_acc to accelerate
* make tests happy
* quality ✨
* loss tracked needs to account for grad_acc
* fixing the deepspeed tests
* quality ✨
* 😅😅😅
* tests 😡
* quality ✨
* Trigger CI
* resolve comments and fix the issue with the previous merge from branch
* Trigger CI
* accelerate took over deepspeed integration
---------
Co-authored-by: Stas Bekman <stas@stason.org>
* intiial commit
* new styling
* update
* just run doctest in CI
* remove more test for fast dev
* update
* update refs
* update path and fetch upstream
* update documentatyion trests
* typo
* parse pwd
* don't check for files that are in hidden folders
* just give paths relative to transformers
* update
* update
* update
* major refactoring
* make sure options is ok
* lest test that mdx is tested
* doctest glob
* nits
* update doctest nightly
* some cleaning
* run correct test on diff
* debug
* run on a single worker
* skip_cuda_test tampkate
* updates
* add rA and continue on failure
* test options
* parse `py` codeblock?
* we don't need to replace ignore results, don't remember whyu I put it
* cleanup
* more cleaning
* fix arg
* more cleaning
* clean an todo
* more pre-processing
* doctest-module has none so extra `- ` is needed
* remove logs
* nits
* doctest-modules ....
* oups
* let's use sugar
* make dataset go quiet
* add proper timeout
* nites
* spleling timeout
* update
* properly skip tests that have CUDSA
* proper skipping
* cleaning main and get tests to run
* remove make report?
* remove tee
* some updates
* tee was removed but is the full output still available?
* [all-test]
* only our tests
* don't touch tee in this PR
* no atee-sys
* proper sub
* monkey
* only replace call
* fix sub
* nits
* nits
* fix invalid syntax
* add skip cuda doctest env variable
* make sure all packages are installed
* move file
* update check repo
* revert changes
* nit
* finish cleanup
* fix re
* findall
* update don't test init files
* ignore pycache
* `-ignore-pycache` when running pytests
* try to fix the import missmatch error
* install dec
* pytest is required as doctest_utils imports things from it
* the only log issues were dataset, ignore results should work
* more cleaning
* Update .circleci/create_circleci_config.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [ydshieh] empty string if cuda is found
* [ydshieh] fix condition
* style
* [ydshieh] fix
* Add comment
* style
* style
* show failure
* trigger CI
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Simplify update metadata job
* Match more branch names
* Install all what is necessary
* Install all what is necessary
* Forgot the dev
* Install less stuff
* This syntax?