Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
COMMITS
/ tests/models/test_tpu.py June 10, 2021
C
Clean-up after logger connector redesign 2/2 (#7631)
Carlos Mocholí committed
June 8, 2021
C
New logger connector code (#7882)
Carlos Mocholí committed
May 30, 2021
C
Some test updates (#7761)
Carlos Mocholí committed
May 27, 2021
C
Rename and move Result (#7736)
Carlos Mocholí committed
May 7, 2021
C
Use `torch.nn.utils.clip_grad_norm_` and add `clip_grad_by_value` support for TPU (#7025)
Carlos Mocholí committed
May 4, 2021
C
`TrainerState` refactor [5/5] (#7173)
Carlos Mocholí committed
April 30, 2021
K
Device updates for TPU Pod (#7243)
Kaushik B committed
April 27, 2021
K
Add `debug` flag to TPU Training Plugins (PT_XLA_DEBUG) (#7219)
Kaushik B committed
April 14, 2021
C
Fix the `gradient_clip_algorithm` has no effect issue. (#6928)
CeShine Lee committed
April 13, 2021
K
Fix sync_dist for tpus (#6950)
Kaushik B committed
April 9, 2021
K
Fix TPU Spawn gather (#6896)
Kaushik B committed
April 6, 2021
A
Add `Trainer(gradient_clip_algorithm='value'|'norm')` (#6123)
Anthony Kim committed
K
[Fix] TPU Training Type Plugin (#6816)
Kaushik B committed
March 25, 2021
K
Fix checkpoint callback & Trainer.test(_) issue for TPUs (#6654)
Kaushik B committed
March 19, 2021
K
Update Gradient Clipping for TPU Accelerator (#6576)
Kaushik B committed
March 2, 2021
T
[bugfix] TPU test hangs to barrier on 1 process (#6272)
thomas chaton committed
J
Refactor: Runif for TPU and Horovod 5/n (#6301)
Jirka Borovec committed
February 23, 2021
J
fixing miss-leading tested acc values (#5876)
Jirka Borovec committed
February 18, 2021
A
rename accelerator_backend -> accelerator (#6034)
Adrian Wälchli committed
February 17, 2021
C
[HotFix] Resolve TPU Training (#6027)
chaton committed
February 12, 2021
J
PoC: Accelerator refactor (#5743)
Justus Schock committed
February 11, 2021
R
[tests/models] refactor with BoringModel (#5507)
Rohit Gupta committed
February 8, 2021
J
Refactor simplify tests (#5861)
Jirka Borovec committed
February 6, 2021
J
formatting tests: 4/n (#5846)
Jirka Borovec committed
January 14, 2021
A
Fix pre-commit isort failure on tests/models/*.py (#5423)
Arnaud Gelas committed
January 12, 2021
J
prune check on Trainer fit result (#5453)
Jirka Borovec committed
December 21, 2020
J
fix/enable - check F401 (#5201)
Jirka Borovec committed
December 14, 2020
J
set xxx_AVAILABLE as protected (#5082)
Jirka Borovec committed
December 12, 2020
J
drop unused test with result api (#5058)
Jirka Borovec committed
December 2, 2020
L
Tpu save (#4309)
Lezwon Castelino committed
November 26, 2020
J
simplify imports xla / TPU (#4872)
Jirka Borovec committed
November 14, 2020
J
isolate PL debugger in tests (#4643)
Jirka Borovec committed
October 13, 2020
W
notices (#4118)
William Falcon committed
October 11, 2020
W
ref: accelerator names (#4066)
William Falcon committed
October 7, 2020
W
clean and organize fit (#3938)
William Falcon committed
October 6, 2020
L
Added check to verify xla device is TPU (#3274)
Lezwon Castelino committed
T
Rename log_save_interval, row_log_interval (#3748)
Teddy Koker committed
October 4, 2020
A
Deprecate early_stop_callback Trainer argument (part 2) (#3845)
Adrian Wälchli committed
L
added broadcast option to tpu (#3814)
Lezwon Castelino committed
October 2, 2020
J
revert backend types (#3788)
Jirka Borovec committed
September 30, 2020
J
define distributed as a type (#3740)
Jirka Borovec committed
September 19, 2020
J
drop v0.10 deprecated (#3454)
Jirka Borovec committed
September 1, 2020
L
bugfix/3185 transpose (#3252)
Lezwon Castelino committed
August 13, 2020
L
Bugfix/2956 tpu distrib backend fix (#2959)
Lezwon Castelino committed
July 31, 2020
J
update CI testing with pip upgrade (#2380)
Jirka Borovec committed
J
pytorch 1.6 (#2745)
Jirka Borovec committed
July 30, 2020
July 27, 2020
J
fixing TPU tests (#2632)
Jirka Borovec committed
July 22, 2020
W
EvalResult support for val loop (PR 3/5) (#2651)
William Falcon committed
July 9, 2020
W
Fixes .test() for ddp (#2570)
William Falcon committed