Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
COMMITS
/ .github/workflows July 11, 2023
J
drop AWS action (#18050)
Jirka Borovec committed
July 10, 2023
J
drop environment.yml (#18040)
Jirka Borovec committed
July 7, 2023
J
Requirements update (#17998)
Justus Schock committed
July 3, 2023
J
ci: mark job as canceled if not TPU allocation fails (#17978)
Jirka Borovec committed
J
docker: CUDA with runtime (#17977)
Jirka Borovec committed
D
Bump Lightning-AI/utilities from 0.8.0 to 0.9.0 (#17971)
dependabot[bot] committed
June 26, 2023
J
ci: drop aws creds (#17893)
Jirka Borovec committed
June 23, 2023
J
ci: ghost creates PR (#17902)
Jirka Borovec committed
June 22, 2023
J
ci: format json in signaling (#17891)
Jirka Borovec committed
June 21, 2023
J
tests: marking some flakiness (#17849)
Jirka Borovec committed
June 20, 2023
J
fix uploading created legacy ckpts (#17840)
Jirka Borovec committed
June 16, 2023
J
docs: adjust base image to ubuntu20.04 (#17846)
Jirka Borovec committed
June 15, 2023
J
ci: hotfix for doctests (#17841)
Jirka Borovec committed
D
Bump playwright from 1.32.1 to 1.35.0 in /requirements (#17835)
dependabot[bot] committed
J
ci: allow fail building NGC (#17815)
Jirka Borovec committed
June 13, 2023
N
Lightning Dataset (including optimized dataloading of s3 buckets) (#17743)
Noha Alon committed
June 5, 2023
J
ci: fix typo in skip if for TPU (#17757)
Jirka Borovec committed
June 1, 2023
J
ci: fix TPU skip if (#17672)
Jirka Borovec committed
May 29, 2023
J
replace local adjustment script with external (#17582)
Jirka Borovec committed
May 25, 2023
J
ci: update gcheck name (#17690)
Jirka Borovec committed
May 12, 2023
A
Support true 16-bit precision with deepspeed (#17576)
Adrian Wälchli committed
D
Allow setting the `SLURMEnvironment.main_address` via an env variable (#17596)
David Carreto Fidalgo committed
May 11, 2023
A
Set fixed seed for pytest execution order (#17614)
Adrian Wälchli committed
May 9, 2023
J
ci: use randon seed (#17571)
Jirka Borovec committed
May 4, 2023
J
tests: randomized order for PT & Fabric (#17460)
Jirka Borovec committed
J
Adding test for legacy checkpoints (#17562)
Jirka Borovec committed
J
ci: drop secondary pkg for LAI (#17565)
Jirka Borovec committed
J
Adding tests for legacy checkpoints - 1.8.x (#17374)
Jirka Borovec committed
May 2, 2023
D
Bump playwright from 1.30.0 to 1.32.1 in /requirements (#17537)
dependabot[bot] committed
April 27, 2023
J
fix issue labeler (#17501)
Jirka Borovec committed
J
Replace IPU with external implementation (#17075)
Jirka Borovec committed
J
ci: label issue with version (#17484)
Jirka Borovec committed
April 24, 2023
C
Install project specific dependencies (#17376)
Carlos Mocholí committed
J
app/tests: skip instead of fail (#17461)
Jirka Borovec committed
J
ci: update OS for pkg release (#17455)
Jirka Borovec committed
E
[App] Fix resolution of latest version in CLI (#17351)
Ethan Harris committed
April 19, 2023
C
[TPU] Do not delete jobs with "keepalive" in the name (#17411)
Carlos Mocholí committed
April 18, 2023
C
[TPU] Fix workflow (#17406)
Carlos Mocholí committed
C
Fix PyTorch MPS test failure in master (#17405)
Carlos Mocholí committed
C
[TPU] Fix workflow condition (#17379)
Carlos Mocholí committed
J
skip some App tests (#17401)
Jirka Borovec committed
April 17, 2023
A
Update pip upgrade command in CI (#17395)
Adrian Wälchli committed
April 16, 2023
A
Save and load sharded checkpoints with FSDP in Fabric (#17323)
Adrian Wälchli committed
April 14, 2023
C
[TPU] Use `pull_request_target` event (#17377)
Carlos Mocholí committed
C
[TPU] Add testing matrix with PJRT (#17368)
Carlos Mocholí committed
C
[TPU] Replace GKE in CI with manual gcloud usage (#17362)
Carlos Mocholí committed
April 12, 2023
J
docker: fix building PL image (#17353)
Jirka Borovec committed
April 11, 2023
D
Bump peter-evans/create-pull-request from 4 to 5 (#17313)
dependabot[bot] committed
C
[TPU] Improve TPU workflow (#17237)
Carlos Mocholí committed
March 30, 2023
C
Remove TODO
Carlos Mocholí committed