Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
COMMITS
/ .github June 5, 2023
J
ci: fix typo in skip if for TPU (#17757)
Jirka Borovec committed
J
ci: drop NGC as required check (#17754)
Jirka Borovec committed
June 1, 2023
J
ci: fix TPU skip if (#17672)
Jirka Borovec committed
May 30, 2023
J
update group-check (#17719)
Jirka Borovec committed
May 29, 2023
J
replace local adjustment script with external (#17582)
Jirka Borovec committed
May 25, 2023
J
ci: update gcheck name (#17690)
Jirka Borovec committed
May 19, 2023
J
ci: drop e2e as required check (#17658)
Jirka Borovec committed
May 12, 2023
A
Support true 16-bit precision with deepspeed (#17576)
Adrian Wälchli committed
D
Allow setting the `SLURMEnvironment.main_address` via an env variable (#17596)
David Carreto Fidalgo committed
May 11, 2023
A
Set fixed seed for pytest execution order (#17614)
Adrian Wälchli committed
May 9, 2023
E
Fix docs levels and broken links (without rename) (#17545)
edenlightning committed
J
ci: use randon seed (#17571)
Jirka Borovec committed
May 4, 2023
J
tests: randomized order for PT & Fabric (#17460)
Jirka Borovec committed
J
Adding test for legacy checkpoints (#17562)
Jirka Borovec committed
J
ci: drop secondary pkg for LAI (#17565)
Jirka Borovec committed
J
update tags in bug issue (#17551)
Jirka Borovec committed
J
Adding tests for legacy checkpoints - 1.8.x (#17374)
Jirka Borovec committed
May 2, 2023
D
Bump playwright from 1.30.0 to 1.32.1 in /requirements (#17537)
dependabot[bot] committed
April 28, 2023
C
[TPU] Call `auto_device_count` for `is_available` (#17509)
Carlos Mocholí committed
April 27, 2023
J
fix issue labeler (#17501)
Jirka Borovec committed
J
gh: fix duplicate id in bug issue (#17495)
Jirka Borovec committed
J
Replace IPU with external implementation (#17075)
Jirka Borovec committed
J
ci: label issue with version (#17484)
Jirka Borovec committed
April 24, 2023
C
Install project specific dependencies (#17376)
Carlos Mocholí committed
J
app/tests: skip instead of fail (#17461)
Jirka Borovec committed
J
ci: update OS for pkg release (#17455)
Jirka Borovec committed
E
[App] Fix resolution of latest version in CLI (#17351)
Ethan Harris committed
April 19, 2023
C
[TPU] Do not delete jobs with "keepalive" in the name (#17411)
Carlos Mocholí committed
J
drop failing e2e quick app (#17409)
Jirka Borovec committed
April 18, 2023
L
[TPU] Add support for PJRT from PyTorch/XLA 2.0 (#17352)
Liyang90 committed
C
[TPU] Fix workflow (#17406)
Carlos Mocholí committed
C
Fix PyTorch MPS test failure in master (#17405)
Carlos Mocholí committed
C
[TPU] Fix workflow condition (#17379)
Carlos Mocholí committed
J
skip some App tests (#17401)
Jirka Borovec committed
April 17, 2023
A
Update pip upgrade command in CI (#17395)
Adrian Wälchli committed
April 16, 2023
A
Save and load sharded checkpoints with FSDP in Fabric (#17323)
Adrian Wälchli committed
April 14, 2023
C
[TPU] Use `pull_request_target` event (#17377)
Carlos Mocholí committed
C
[TPU] Add testing matrix with PJRT (#17368)
Carlos Mocholí committed
C
[TPU] Replace GKE in CI with manual gcloud usage (#17362)
Carlos Mocholí committed
April 12, 2023
J
docker: fix building PL image (#17353)
Jirka Borovec committed
April 11, 2023
D
Bump peter-evans/create-pull-request from 4 to 5 (#17313)
dependabot[bot] committed
C
[TPU] Improve TPU workflow (#17237)
Carlos Mocholí committed
C
Update CODEOWNERS (#17322)
Carlos Mocholí committed
March 30, 2023
C
checkgroup
Carlos Mocholí committed
C
Remove TODO
Carlos Mocholí committed
C
Trigger TPU tests if [TPU] is in the PR title
Carlos Mocholí committed
April 4, 2023
J
update bug template with version (#17226)
Jirka Borovec committed
March 27, 2023
J
ci/docs: wheels from cache (#17201)
Jirka Borovec committed
J
ci: fix docs with caches (#17200)
Jirka Borovec committed
March 25, 2023
J
ci: update runner for IPU (#17183)
Jirka Borovec committed