Commit Graph

  • 1b9ef304d1 fix links in DeepSpeed docs (#21153) Mariano Ntrougkas 2025-09-03 16:32:09 +03:00
  • 70ad4fa09f tests: fix missing import Jirka B 2025-09-03 11:11:26 +02:00
  • b7e72686ef chore: remove redundant words (#21151) slicesequal 2025-09-03 17:08:27 +08:00
  • 647881dcf1 docs: clarify DeviceStatsMonitor logged metrics (#20895) Anay Dongre 2025-09-03 01:32:13 -07:00
  • 8c868a6f59 uv for pytorch tests (#21148) Shion Matsumoto 2025-09-03 10:02:35 +02:00
  • 773482a02b docs: warning don’t use torch.profiler.profile context manager around Trainer methods (#20864) Kavyansh Tyagi 2025-09-03 12:22:45 +05:30
  • 6a7acf03ff docs: update setup and restore. (#21149) GdoongMathew 2025-09-03 14:05:23 +08:00
  • 2fe67a7724 Fix: no_grad with AMP bug (#20921) Bas Krahmer 2025-09-02 23:34:18 +02:00
  • 120737c2bb Impr docs on batch sizes and limits in distributed (#21070) Nicki Skafte Detlefsen 2025-09-02 18:20:52 +02:00
  • d9118afe1e Add support for deepspeeds exclude_frozen_parameters (#21060) Nicki Skafte Detlefsen 2025-09-02 16:15:38 +02:00
  • e72fc19037 Fix LightningCLI not using ckpt_path hyperparameters to instantiate classes (#21116) Mauricio Villegas 2025-09-02 15:18:04 +02:00
  • 5f437672ee Fix workflow matrix reference (#21145) Shion Matsumoto 2025-09-02 05:15:42 -04:00
  • 617d03d622 Warning on eval module (#21146) Nicki Skafte Detlefsen 2025-09-02 11:07:32 +02:00
  • bcb506593b Fix TQDM progress bar showing the wrong total when using a finite and iterable dataloader (#21147) Nicki Skafte Detlefsen 2025-09-02 10:05:22 +02:00
  • 94586038df docs: move on_fit_end hook to teardown stage. (#21143) GdoongMathew 2025-09-02 15:58:03 +08:00
  • 0df60336f8 docs: update optimizer_zero_grad order, and the backward pass. (#21144) GdoongMathew 2025-09-02 15:57:23 +08:00
  • 8f79734ff4 build(deps): update ipython[all] requirement from <8.19.0 to <9.0.0 in /requirements (#21138) dependabot[bot] 2025-09-01 17:47:16 +02:00
  • 6c95450e5f Simplify Fabric tests workflow matrix (#21142) Shion Matsumoto 2025-09-01 11:47:03 -04:00
  • ba4cc1d58f Add missing device id for pytorch 2.8 (#21105) Nicki Skafte Detlefsen 2025-09-01 15:16:17 +02:00
  • 8fa457f89a Docs on hook call order (#21120) Nicki Skafte Detlefsen 2025-09-01 14:19:07 +02:00
  • aedf8e3464 Simplify workflow matrix (#21132) Shion Matsumoto 2025-09-01 06:45:37 -04:00
  • 6203f3be56 build(deps): update typing-extensions requirement from <4.15.0,>4.5.0 to >4.5.0,<4.16.0 in /requirements (#21133) dependabot[bot] 2025-09-01 12:25:52 +02:00
  • 1701d6e9b7 build(deps): update onnx requirement from <1.19.0,>1.12.0 to >1.12.0,<1.20.0 in /requirements (#21136) dependabot[bot] 2025-09-01 12:10:25 +02:00
  • 6de756b89c build(deps): bump pytest-rerunfailures from 15.1 to 16.0 in /requirements (#21137) dependabot[bot] 2025-09-01 11:45:26 +02:00
  • 1b14c0d597 build(deps): bump coverage from 7.10.5 to 7.10.6 in /requirements (#21135) dependabot[bot] 2025-09-01 11:45:18 +02:00
  • 8fe968ce47 build(deps): update ipython[notebook] requirement from <9.5.0 to <9.6.0 in /requirements (#21134) dependabot[bot] 2025-09-01 09:36:34 +02:00
  • 3c812ba13c build(deps): bump google-github-actions/setup-gcloud from 2 to 3 (#21140) dependabot[bot] 2025-09-01 09:34:49 +02:00
  • 366f6a83b5 build(deps): bump google-github-actions/auth from 2 to 3 (#21139) dependabot[bot] 2025-09-01 09:34:31 +02:00
  • ecf95235e5 fix(callbacks): Defer step/time-triggered ModelCheckpoint saves until validation metrics are available (#21106) littlebullGit 2025-08-29 18:24:07 -04:00
  • 729a146a91 resolve failing tests with pt-2.1 (#21130) Jirka Borovec 2025-08-30 00:20:25 +02:00
  • 85596f8e14 Adding test for legacy checkpoint created with 2.5.4 (#21128) PL Ghost 2025-08-29 14:23:00 +02:00
  • a708e2f8af docs: simplify version table (#21123) Jirka Borovec 2025-08-29 12:11:00 +02:00
  • 8923390497 chlog: new section Jirka B 2025-09-03 10:56:15 +02:00
  • 92544c3524 fabric: list hydra in optional extra (#21165) Jirka Borovec 2025-09-05 13:35:46 +02:00
  • 825f683cc1 run lit CI also on release PRs (#21166) Jirka Borovec 2025-09-05 13:35:32 +02:00
  • f1b413e96c Use uv for package installations in Makefile docs & test commands (#21167) Bhimraj Yadav 2025-09-05 13:37:32 +05:45
  • 1fc077b6e4 Time based validation support (#21071) Sohaib Ahmed 2025-09-04 23:53:28 +05:00
  • 3c81316aed debug failing tests for Fabric with ddp_fork on PT 2.8 (#21093) Jirka Borovec 2025-09-04 17:05:26 +02:00
  • bb4769f711 update flaky syntax (#21156) Jirka Borovec 2025-09-04 17:04:08 +02:00
  • e1e2534d32 Tuner cleanup on error (#21162) Nicki Skafte Detlefsen 2025-09-04 13:47:53 +02:00
  • d6499eddeb Respect verbose=False in seed_everything when no seed is provided (#21161) Kavyansh Tyagi 2025-09-04 17:10:10 +05:30
  • f724451f7d revert flush_logs_every_n_steps value ioannis@18861-CSVLogger-fails-on-remote-fs Bhimraj Yadav 2025-09-04 09:29:48 +00:00
  • e5f784981f test: enhance CSVLogger tests for column handling and remote filesystem behavior Bhimraj Yadav 2025-09-04 09:27:04 +00:00
  • d85dbe413b refactor: enhance CSVLogger's metric writing logic for local and remote filesystem support Bhimraj Yadav 2025-09-04 09:25:44 +00:00
  • cdc0db4898 Remove forgotten deprecated parse_as_dict argument in LightningCLI test (#21159) Mauricio Villegas 2025-09-04 10:18:47 +02:00
  • 011f0c39f8 Merge branch 'master' into ioannis@18861-CSVLogger-fails-on-remote-fs Bhimraj Yadav 2025-09-04 09:43:38 +05:45
  • 663b6ce3a8 uv for tests-fabric (#21155) Shion Matsumoto 2025-09-03 09:53:04 -04:00
  • f61713a849 fix links in DeepSpeed docs (#21153) Mariano Ntrougkas 2025-09-03 16:32:09 +03:00
  • 10f10b6151 chore: remove redundant words (#21151) slicesequal 2025-09-03 17:08:27 +08:00
  • ca912863d1 Merge branch 'master' into ioannis@18861-CSVLogger-fails-on-remote-fs Bhimraj Yadav 2025-09-03 14:33:59 +05:45
  • 9f757c0155 docs: clarify DeviceStatsMonitor logged metrics (#20895) Anay Dongre 2025-09-03 01:32:13 -07:00
  • 29e8ce4b8a uv for pytorch tests (#21148) Shion Matsumoto 2025-09-03 04:02:35 -04:00
  • 04e103be54 docs: warning don’t use torch.profiler.profile context manager around Trainer methods (#20864) Kavyansh Tyagi 2025-09-03 12:22:45 +05:30
  • 8ea6165029 docs: update setup and restore. (#21149) GdoongMathew 2025-09-03 14:05:23 +08:00
  • 216f9ec90c Fix: no_grad with AMP bug (#20921) Bas Krahmer 2025-09-02 23:34:18 +02:00
  • c2564a7ac8 docs: update for Fabric (#21125) Jirka Borovec 2025-09-02 20:50:29 +02:00
  • d3996ad32e Impr docs on batch sizes and limits in distributed (#21070) Nicki Skafte Detlefsen 2025-09-02 18:20:52 +02:00
  • 5071a04ae5 Add support for deepspeeds exclude_frozen_parameters (#21060) Nicki Skafte Detlefsen 2025-09-02 16:15:38 +02:00
  • 4824cc15d3 Fix LightningCLI not using ckpt_path hyperparameters to instantiate classes (#21116) Mauricio Villegas 2025-09-02 15:18:04 +02:00
  • c3ca8a516f Fix workflow matrix reference (#21145) Shion Matsumoto 2025-09-02 05:15:42 -04:00
  • 90eba3ffdb Warning on eval module (#21146) Nicki Skafte Detlefsen 2025-09-02 11:07:32 +02:00
  • 14a57c76bd Fix TQDM progress bar showing the wrong total when using a finite and iterable dataloader (#21147) Nicki Skafte Detlefsen 2025-09-02 10:05:22 +02:00
  • 630db82da6 docs: move on_fit_end hook to teardown stage. (#21143) GdoongMathew 2025-09-02 15:58:03 +08:00
  • 06bed20190 docs: update optimizer_zero_grad order, and the backward pass. (#21144) GdoongMathew 2025-09-02 15:57:23 +08:00
  • f165af7d8b [pre-commit.ci] pre-commit suggestions (#431) pre-commit-ci[bot] 2025-09-02 07:51:59 +00:00
  • 936dc8b91c Merge branch 'master' into ioannis@18861-CSVLogger-fails-on-remote-fs Jirka Borovec 2025-09-01 18:29:54 +02:00
  • da7f2f9a9f build(deps): update ipython[all] requirement from <8.19.0 to <9.0.0 in /requirements (#21138) dependabot[bot] 2025-09-01 17:47:16 +02:00
  • e760ad5643 Simplify Fabric tests workflow matrix (#21142) Shion Matsumoto 2025-09-01 11:47:03 -04:00
  • 6fc44c9254 Add missing device id for pytorch 2.8 (#21105) Nicki Skafte Detlefsen 2025-09-01 15:16:17 +02:00
  • 3d56296978 Docs on hook call order (#21120) Nicki Skafte Detlefsen 2025-09-01 14:19:07 +02:00
  • dff5db2d8a Merge branch 'master' into ioannis@18861-CSVLogger-fails-on-remote-fs Jirka Borovec 2025-09-01 13:17:05 +02:00
  • db77fa7a44 Simplify workflow matrix (#21132) Shion Matsumoto 2025-09-01 06:45:37 -04:00
  • 46445cb49f build(deps): update typing-extensions requirement from <4.15.0,>4.5.0 to >4.5.0,<4.16.0 in /requirements (#21133) dependabot[bot] 2025-09-01 12:25:52 +02:00
  • 9ace8b7b7e build(deps): update onnx requirement from <1.19.0,>1.12.0 to >1.12.0,<1.20.0 in /requirements (#21136) dependabot[bot] 2025-09-01 12:10:25 +02:00
  • 824dfb1e72 build(deps): bump pytest-rerunfailures from 15.1 to 16.0 in /requirements (#21137) dependabot[bot] 2025-09-01 11:45:26 +02:00
  • a93a5bbc0b build(deps): bump coverage from 7.10.5 to 7.10.6 in /requirements (#21135) dependabot[bot] 2025-09-01 11:45:18 +02:00
  • 2e4c99c2a0 build(deps): update ipython[notebook] requirement from <9.5.0 to <9.6.0 in /requirements (#21134) dependabot[bot] 2025-09-01 09:36:34 +02:00
  • 9a41f1f68e build(deps): bump google-github-actions/setup-gcloud from 2 to 3 (#21140) dependabot[bot] 2025-09-01 09:34:49 +02:00
  • c504088b03 build(deps): bump google-github-actions/auth from 2 to 3 (#21139) dependabot[bot] 2025-09-01 09:34:31 +02:00
  • b1cc925d94 fix(callbacks): Defer step/time-triggered ModelCheckpoint saves until validation metrics are available (#21106) littlebullGit 2025-08-29 18:24:07 -04:00
  • d85c474eb4 resolve failing tests with pt-2.1 (#21130) Jirka Borovec 2025-08-30 00:20:25 +02:00
  • 634e6e6d06 drop redundant find-path for install (#21127) Jirka Borovec 2025-08-29 14:48:48 +02:00
  • ca3bb2f70a docs: update chlog after 2.5.4 (#21129) Jirka Borovec 2025-08-29 14:30:33 +02:00
  • 0ebf1dd017 Adding test for legacy checkpoint created with 2.5.4 (#21128) PL Ghost 2025-08-29 14:23:00 +02:00
  • 691f084fd2 docs: simplify version table (#21123) Jirka Borovec 2025-08-29 12:11:00 +02:00
  • 58b89edb7a releasing 2.5.4 2.5.4 Jirka B 2025-08-29 10:10:14 +02:00
  • 33a17db3ac Update throughput table to include H200 stats (#21119) Nicki Skafte Detlefsen 2025-08-27 16:58:08 +02:00
  • 9288c1f2bc Update versioning governance document (#21107) Dan Dale 2025-08-27 07:19:44 -07:00
  • b4cef868cd docs: replace broken link to Torch 2.x (#21121) Jirka Borovec 2025-08-27 13:58:09 +02:00
  • 225d272dd9 build(deps): update myst-parser requirement from <4.0.0,>=0.18.1 to >=0.18.1,<5.0.0 in /requirements (#21114) dependabot[bot] 2025-08-25 22:39:52 +02:00
  • d8b071ba1b build(deps): bump coverage from 7.10.4 to 7.10.5 in /requirements (#21115) dependabot[bot] 2025-08-25 22:39:39 +02:00
  • 1e94608984 build(deps): update onnxscript requirement from <0.4.0,>=0.2.2 to >=0.2.2,<0.5.0 in /requirements (#21113) dependabot[bot] 2025-08-25 22:39:12 +02:00
  • 4708ee727a docs: fix log_metrics step description (#21109) Alexander Zhipa 2025-08-23 09:36:13 -04:00
  • 476540f020 Fix rich progress bar crashing on empty val dataloader sanity checking (#21108) Nicki Skafte Detlefsen 2025-08-22 12:03:19 +02:00
  • f2996d82d1 ci: pin also test requirements for minimal setup (#21102) Jirka Borovec 2025-08-21 18:07:18 +02:00
  • 72cc5223e0 fix mis-alignment column while using rich model summary in DeepSpeedstrategy (#21100) GdoongMathew 2025-08-21 20:52:20 +08:00
  • 28c2208f6b Fix: TorchMetrics documentation source link (#21104) Bhimraj Yadav 2025-08-21 17:39:18 +05:45
  • 913ffae83a docs: add note on TorchMetrics integration for logging best practices (#21103) Bhimraj Yadav 2025-08-21 17:38:38 +05:45
  • 6a9d1101e8 Make asyncio checkpointing work if validate/fit is called more than once (#20952) jj hunt 2025-08-19 17:40:45 +01:00
  • 2b23b2b3e5 bump: try deepspeed >=0.14.1,<=0.15.0 (#21076) Jirka Borovec 2025-08-19 15:13:19 +02:00