Commit Graph

9305 Commits

Author SHA1 Message Date
Carlos Mocholí f3c49b8e77
Remove warning on `no_backward_sync` with XLA strategy (#17761) 2023-06-07 16:07:03 +02:00
Bas Krahmer 420eb6f248
Added configurable strict loading for Fabric strategies (#17645)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: bas <bas.krahmer@talentflyxpert.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-06-06 18:26:13 -04:00
Taylor Robie 9c07cb397c
[FSDP] utility to apply optimizer during backward (#17710)
* utility to apply optimizer during backward

* start to address CI failures

* address CI failures

* address review comments and harden test

* change union annotation syntax

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try to debug CI

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add skip_windows and standalone to fsdp test

---------

Co-authored-by: Taylor Robie <taylor.robie@lightning.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-06 21:41:26 +02:00
M. Fox f67031b832
Add Fabric internal hooks (#17759)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-06 16:04:19 +00:00
M. Fox e2986fab14
External callback registry through entry points for Fabric (#17756) 2023-06-06 11:53:19 +00:00
Adrian Wälchli d23c772f3c
Expose public and private IP in LightningWork (#17742)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-06 11:35:08 +00:00
Jirka Borovec a901571fdf
ci: fix typo in skip if for TPU (#17757)
* ci: fix typo in skip if for TPU

* >

* $

* \

* |

* blablab

* rew
2023-06-05 14:02:07 -04:00
dependabot[bot] 26088eef12
Update tensorboardx requirement from <=2.5.1,>=2.2 to >=2.2,<=2.6 in /requirements (#17750)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-05 17:51:11 +00:00
Adrian Wälchli be9761e5c4
Simplify step redirection in strategy (#17531)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-06-05 11:38:56 +00:00
Adrian Wälchli 0eb8fdc138
Upgrade deepspeed version (#17748)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-06-05 10:28:30 +00:00
Jirka Borovec bf143f35e0
ci: drop NGC as required check (#17754) 2023-06-05 12:06:01 +02:00
Adrian Wälchli 67a14795cf
Address feedback for `Fabric.init_module()` (4/4) (#17607) 2023-06-03 02:07:02 +00:00
Carlos Mocholí 255b18823e
Fix race condition when downloading data (#17732) 2023-06-02 12:35:44 +02:00
Jirka Borovec 1f670a5cbd
docker: NGC prune git (#17740) 2023-06-02 02:22:59 +02:00
Jirka Borovec e314d3a772
ci: fix TPU skip if (#17672) 2023-06-01 18:03:55 +01:00
Quasar Kim 1307b605e8
Fix multithreading checkpoint loading (#17678)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-05-31 18:41:57 -04:00
Adrian Wälchli fd296e0605
Enable loading full state dict checkpoints with FSDP (#17623)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-31 11:30:07 -04:00
Adrian Wälchli e0ce34e8e0
Address feedback for `Fabric.init_module()` (3/4) (#17723)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-31 15:03:49 +00:00
Adrian Wälchli 53815e6635
Fix overlapping samples in DDP when no global seed is set (#17713)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-31 14:55:15 +00:00
Adrian Wälchli 41cfa33c01
Address feedback for `Fabric.init_module()` (2/4) (#17722) 2023-05-31 14:31:24 +00:00
Adrian Wälchli 88cd100369
Address feedback for `Fabric.init_module()` (1/4) (#17721)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-31 14:05:29 +00:00
Leng Yue 6f4524a25c
Support kwargs input for `LayerSummary` (#17709)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-30 00:17:50 -04:00
Ryan Mukherjee c3ad7568e1
avoid unnecessary workers with sequential `CombinedLoader ` (#17639)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-05-30 04:02:50 +00:00
Jirka Borovec cf14d624ae
update group-check (#17719) 2023-05-30 02:23:36 +02:00
Jirka Borovec 51b0e81105
replace local adjustment script with external (#17582) 2023-05-29 19:34:04 +00:00
dependabot[bot] bd53b0350b
Update deepdiff requirement from <6.2.4,>=5.7.0 to >=5.7.0,<6.3.1 in /requirements (#17631)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-29 14:44:25 +00:00
dependabot[bot] 614a2997ad
Update docker requirement from <6.1.2,>=5.0.0 to >=5.0.0,<6.1.3 in /requirements (#17632)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-29 14:41:30 +00:00
dependabot[bot] dff46e6c3b
Bump coverage from 6.5.0 to 7.2.5 in /requirements (#17629)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-29 13:44:29 +00:00
dependabot[bot] 7cba9510c4
Update torchvision requirement from <=0.15.1,>=0.12.0 to >=0.12.0,<=0.15.2 in /requirements (#17714)
Update torchvision requirement in /requirements

Updates the requirements on [torchvision](https://github.com/pytorch/vision) to permit the latest version.
- [Release notes](https://github.com/pytorch/vision/releases)
- [Commits](https://github.com/pytorch/vision/compare/v0.12.0...v0.15.2)

---
updated-dependencies:
- dependency-name: torchvision
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-05-29 15:26:18 +02:00
Felipe Whitaker a209f8b419
Fix doc formatting in `batch_size_finder.py` (#17696)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-05-29 12:58:35 +00:00
Jirka Borovec c8415dfcbe
ci: fix runif ref (#17716) 2023-05-29 14:05:06 +02:00
William Falcon 29bb16da2a
Update README.md 2023-05-26 23:06:31 -04:00
Jirka Borovec 0cc458e237
runif consistency (#17686) 2023-05-25 16:56:28 +00:00
Jirka Borovec 6ef6d0cb6b
ci: update gcheck name (#17690)
* ci: update gcheck name

* name

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* name

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-25 12:20:43 -04:00
Jirka Borovec 56377d9b1f
ci: separate parity/benchmarks (#17502)
* ci: separet benchmarks

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* measure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* conf

* isort

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci

* parity

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* taska

* name

* ...

* var

* ...

* ...

* ...

* cd

* reset_cudnn_benchmark

* import

* imports

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* models

* xfail

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-24 19:16:41 -04:00
Leng Yue 2c8758f0a8
Fix Mix Precision settings for FSDP Plugins (#17670) 2023-05-23 11:35:37 -04:00
dependabot[bot] 52257a9590
Update tensorboard requirement from <2.12.0,>=2.9.1 to >=2.9.1,<2.14.0 in /requirements (#17674)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-23 17:33:38 +02:00
Bas Krahmer dea1ff6633
Add Profiler table kwargs (#17662)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-23 11:31:17 -04:00
Adrian Wälchli 00909ba3ff
Raise environment variable collision errors only when Fabric CLI is used (#17679)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-22 19:12:26 -04:00
Adrian Wälchli e6b7f1383c
Refactor run-method-style Fabric tests (#17669)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-21 09:04:01 -04:00
Adrian Wälchli aa4d0d053d
Update email address in SECURITY.md (#17664) 2023-05-21 08:23:50 +02:00
Manan Goel 4d5df5fa8a
Fixed error when using W&B project name from environment variables (#16222)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-05-20 22:42:48 -04:00
Yurij Mikhalevich 61246c3b35
fix: get project (#17666) 2023-05-19 18:29:22 +00:00
Jirka Borovec 3a6d0d80c3
ci: drop e2e as required check (#17658) 2023-05-19 13:43:48 -04:00
Bas Krahmer ca9e006681
refactor Fabric tests to use launch method (#17648)
Co-authored-by: bas <bas.krahmer@talentflyxpert.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-19 13:42:49 -04:00
Bas Krahmer 3a68493d0a
Log `LearningRateMonitor` values to `Trainer.callback_metrics` for `EarlyStopping` (#17626)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-18 13:01:52 -04:00
Bas Krahmer 2ce975882d
Bugfix: LR finder max val batches (#17636) 2023-05-18 08:46:44 -04:00
Adrian Wälchli ccdd563bd5
Make `configure_sharded_model` implementation in test models idempotent (#17625) 2023-05-17 21:51:27 +00:00
Bas Krahmer a37f5a546c
omitted mention of QuantizationAwareTraining callback (#17646)
Co-authored-by: bas <bas.krahmer@talentflyxpert.com>
2023-05-17 17:23:46 +00:00
dependabot[bot] deb7046496
Update tqdm requirement from <4.65.0,>=4.57.0 to >=4.57.0,<4.66.0 in /requirements (#17630)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-15 21:28:59 +00:00