Adrian Wälchli
888466b144
Support true 16-bit precision with FSDP in Trainer ( #18219 )
2023-08-10 04:15:35 -04:00
Adrian Wälchli
70e31b6480
Make `all_reduce` consistent for both NCCL and GLOO ( #18235 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-08-09 17:39:57 -04:00
Jirka Borovec
efa7b2f9ef
docformatter: config with black ( #18064 )
...
* docformatter: config with black
* additional_dependencies: [tomli]
* 119
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-08-09 10:44:20 -04:00
pre-commit-ci[bot]
834bd61164
[pre-commit.ci] pre-commit suggestions ( #17983 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Jirka B <j.borovec+github@gmail.com>
2023-08-08 16:26:06 +02:00
Adrian Wälchli
7e13eb7299
Monitor subprocesses to avoid zombies ( #18218 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-08-08 09:25:21 +02:00
Gerson Kroiz
d7c2e597a1
[TPU] Add Fabric support for PyTorch XLA FSDP ( #18126 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-08-02 12:56:00 -04:00
Adrian Wälchli
50e01c7012
Meta device initialization for FSDP in Fabric ( #18122 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-08-02 07:58:32 -04:00
Adrian Wälchli
74dfd88090
Avoid reinstantiation of DataLoader if distributed sampler not required ( #18191 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-08-01 15:27:50 -04:00
Bilel Omrani
b4435bd29c
Fix Google Cloud Storage checkpointing ( #18088 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-08-01 20:08:42 +02:00
Adrian Wälchli
1db471305d
Avoid setting the multiprocessing context when importing lightning ( #18177 )
...
* avoid import at top module
* tests
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove comment
* update docs
* changelog
* mypy
* trigger app tests
* can't import lightning on py 3.8
* Update .github/workflows/ci-tests-app.yml
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-07-31 18:05:21 +02:00
Adrian Wälchli
d9493545cf
Allow accessing rank information before processes are launched in XLA ( #18194 )
2023-07-31 10:37:35 -04:00
Adrian Wälchli
508f02a624
Remove the unused `checkpoint_io` argument from the `FSDPStrategy` in Fabric ( #18192 )
2023-07-31 04:07:32 -04:00
Adrian Wälchli
41f0425a8d
Disable auto-detection of Kubeflow environment ( #18137 )
2023-07-28 05:03:48 -04:00
Adrian Wälchli
220e3b8e04
Add lazy checkpoint loading for FSDP full-state checkpoints ( #18150 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-26 18:38:15 -04:00
Carlos Mocholí
4c57c0bc07
[TPU] Do not cancel all jobs when one fails ( #18052 )
...
* Update tpu-tests.yml
* Update tpu-tests.yml
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Needs
* if:
* missed this
* Fix issue on multinode
* Latest fixes
* last fix?
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-25 14:24:50 +02:00
Carlos Mocholí
0e7e6b31c5
Fix [TPU] tests ( #18140 )
...
* Fix [TPU] tests
* More
2023-07-24 15:13:36 +02:00
Carlos Mocholí
3d573d5e79
Fix [TPU] tests ( #18136 )
...
* Debug [TPU] tests
* -U
* Uninstall typing extensions
* Minor simplifications
* Silly cancelling logic
* pip3?
* sudo
* More
* Revert "Silly cancelling logic"
This reverts commit ce31d874f3
.
2023-07-23 13:39:00 +02:00
Carlos Mocholí
01b82e4fb1
Minor miscellaneous fixes ( #18077 )
...
* Various miscellaneous fixes
* Update
* Update
* succeeded
* Comment everywhere
* hasattr
2023-07-20 14:44:51 +02:00
Adrian Wälchli
d6b5f3af15
Fix "optimizer in backward" compatibility with torch 2.1 nightly ( #18119 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-20 07:22:54 -04:00
Adrian Wälchli
ed6a48ed57
DeepSpeed precision simplifications ( #18113 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-20 07:13:31 -04:00
Carlos Mocholí
071f85842e
Support NVIDIA's Transformer Engine as a precision plugin ( #17597 )
2023-07-19 18:21:58 +02:00
Carlos Mocholí
d653e4e088
Relax the assumption that the root module is FSDP wrapped ( #18054 )
2023-07-19 15:34:03 +02:00
Adrian Wälchli
dab373de54
Support loading a raw PyTorch state-dict checkpoint in Fabric ( #18049 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-18 14:06:17 -04:00
Ishan Dutta
7116a9f9bb
Include parent directory validation check for deepspeed ( #17795 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-07-17 19:09:38 -04:00
Shihao Yin
c31ef77510
Fix `TensorBoardLogger.log_graph` not recording the graph ( #17926 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-17 18:18:39 -04:00
Adrian Wälchli
080eaf38fa
Enable setting the sharding strategy as string in FSDP ( #18087 )
2023-07-15 18:07:09 +02:00
Carlos Mocholí
c60f67e736
Support sets for policies in FSDP ( #18084 )
2023-07-15 17:39:28 +02:00
Carlos Mocholí
e9c42ed11f
More XLA fixes for nightly support ( #18085 )
2023-07-15 01:16:42 +02:00
Adrian Wälchli
356f5d0c65
Fix detection of next version in Fabric's CSVLogger ( #17986 )
2023-07-14 16:08:16 -04:00
Carlos Mocholí
2f657ae46e
Support custom policies for activation checkpointing with FSDP ( #18045 )
2023-07-14 20:00:52 +02:00
Carlos Mocholí
340eecd846
Add `Trainer.init_module` and `LightningModule.configure_model` ( #18004 )
2023-07-14 19:15:05 +02:00
Carlos Mocholí
3a55f0c0a1
Minor miscellaneous fixes ( #18068 )
2023-07-13 06:01:58 -04:00
Carlos Mocholí
ad74f8623f
Don't reapply activation checkpointing ( #18006 )
2023-07-10 13:24:09 +00:00
Justus Schock
7ca49f2cb7
Requirements update ( #18014 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-10 13:00:20 +00:00
Adrian Wälchli
acc70d0ae5
Support all half-precision modes in FSDP precision plugin ( #17807 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-07-09 18:40:46 +00:00
Adrian Wälchli
b14ddd9c49
Fix state dict loading for ddp/dp in Fabric ( #17997 )
...
* fix state dict loading for ddp/dp
* test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* changelog
* update test
* move params to same device before equality test
* test strategy
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-06 13:47:17 +02:00
Adrian Wälchli
3f4790bd27
Validate selected device indices in `DeepSpeedStrategy` ( #17952 )
2023-07-04 18:58:38 +00:00
Adrian Wälchli
c5fae6426e
Show CUDA matmul precision info only ever once ( #17960 )
2023-07-04 03:47:27 -04:00
Adrian Wälchli
c03dd38c6c
Refactor more Fabric tests that use the old .run() method ( #17930 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-03 16:26:58 +02:00
Adrian Wälchli
5d7669af46
Remove requirement to call `Fabric.launch()` with DP strategy ( #17931 )
2023-06-30 08:20:01 +00:00
Adrian Wälchli
7eca2a2fdd
Fix automatic step tracking in Fabric's CSVLogger ( #17942 )
2023-06-28 14:33:37 +02:00
Adrian Wälchli
8f7ad991ff
Reduce false positive warnings when calling module methods in Fabric ( #17875 )
2023-06-26 17:35:27 +02:00
Carlos Mocholí
58d2387e0c
Add `Fabric.save(filter=...)` ( #17845 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-06-20 18:18:59 +00:00
Carlos Mocholí
f78db4c674
Remove automatic sharding support with `Fabric.run` or `fabric.launch(fn)` ( #17832 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-06-15 16:02:09 +00:00
Boon
377bfd2768
Pass-through setattr for FabricModule ( #17731 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-06-12 19:33:51 +00:00
Adrian Wälchli
9ff7d7120b
Add `rank_zero_first` utility ( #17784 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-12 10:32:32 +00:00
Leng Yue
a23bae39c4
Enable loading full optimizer checkpoints with FSDP ( #17747 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-06-10 11:28:02 +00:00
Adrian Wälchli
24a3115995
Support empty weight initialization in `Fabric.init_module()` ( #17627 )
2023-06-07 18:33:53 +00:00
Alexander Kreuzer
f111bd483b
Fix to Parameters to `MixedPrecisionPlugin` are not validated and do not match doc string ( #17687 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-06-07 14:35:54 +00:00
Carlos Mocholí
f3c49b8e77
Remove warning on `no_backward_sync` with XLA strategy ( #17761 )
2023-06-07 16:07:03 +02:00
Bas Krahmer
420eb6f248
Added configurable strict loading for Fabric strategies ( #17645 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: bas <bas.krahmer@talentflyxpert.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-06-06 18:26:13 -04:00
Taylor Robie
9c07cb397c
[FSDP] utility to apply optimizer during backward ( #17710 )
...
* utility to apply optimizer during backward
* start to address CI failures
* address CI failures
* address review comments and harden test
* change union annotation syntax
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* try to debug CI
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add skip_windows and standalone to fsdp test
---------
Co-authored-by: Taylor Robie <taylor.robie@lightning.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-06 21:41:26 +02:00
M. Fox
f67031b832
Add Fabric internal hooks ( #17759 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-06 16:04:19 +00:00
M. Fox
e2986fab14
External callback registry through entry points for Fabric ( #17756 )
2023-06-06 11:53:19 +00:00
Adrian Wälchli
67a14795cf
Address feedback for `Fabric.init_module()` (4/4) ( #17607 )
2023-06-03 02:07:02 +00:00
Adrian Wälchli
fd296e0605
Enable loading full state dict checkpoints with FSDP ( #17623 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-31 11:30:07 -04:00
Adrian Wälchli
e0ce34e8e0
Address feedback for `Fabric.init_module()` (3/4) ( #17723 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-31 15:03:49 +00:00
Adrian Wälchli
41cfa33c01
Address feedback for `Fabric.init_module()` (2/4) ( #17722 )
2023-05-31 14:31:24 +00:00
Adrian Wälchli
88cd100369
Address feedback for `Fabric.init_module()` (1/4) ( #17721 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-31 14:05:29 +00:00
Jirka Borovec
51b0e81105
replace local adjustment script with external ( #17582 )
2023-05-29 19:34:04 +00:00
Jirka Borovec
0cc458e237
runif consistency ( #17686 )
2023-05-25 16:56:28 +00:00
Jirka Borovec
56377d9b1f
ci: separate parity/benchmarks ( #17502 )
...
* ci: separet benchmarks
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* measure
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* conf
* isort
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ci
* parity
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* taska
* name
* ...
* var
* ...
* ...
* ...
* cd
* reset_cudnn_benchmark
* import
* imports
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* models
* xfail
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-24 19:16:41 -04:00
Leng Yue
2c8758f0a8
Fix Mix Precision settings for FSDP Plugins ( #17670 )
2023-05-23 11:35:37 -04:00
Adrian Wälchli
00909ba3ff
Raise environment variable collision errors only when Fabric CLI is used ( #17679 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-22 19:12:26 -04:00
Adrian Wälchli
e6b7f1383c
Refactor run-method-style Fabric tests ( #17669 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-21 09:04:01 -04:00
Bas Krahmer
ca9e006681
refactor Fabric tests to use launch method ( #17648 )
...
Co-authored-by: bas <bas.krahmer@talentflyxpert.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-19 13:42:49 -04:00
Adrian Wälchli
7268670d1a
Support true 16-bit precision with deepspeed ( #17576 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-12 23:21:32 +00:00
David Carreto Fidalgo
1ade737488
Allow setting the `SLURMEnvironment.main_address` via an env variable ( #17596 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-05-12 11:31:48 +00:00
Adrian Wälchli
c712ec1ba9
Add support for saving with full state-dict in Fabric's FSDP ( #17526 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-11 13:02:30 -04:00
Zixuan Zhao
a36af3f9f8
Fixes a bug that causes `CSVLogger` to overwrite `version_0` when `root_dir` is a relative path. ( #17139 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-05-06 00:10:12 +00:00
Gerson Kroiz
8e6f24baa6
[TPU] For XLA Strategy, added function arg to control `broadcast_master_param()` ( #17522 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-05-05 17:57:24 +00:00
Carlos Mocholí
54e8095a78
Split `init_module` into `init` + `sharded_model` ( #17488 )
2023-05-05 15:54:52 +02:00
Jirka Borovec
4413e98e4e
ruff: enable & fixing RET ( #17540 )
2023-05-05 09:34:40 +00:00
Adrian Wälchli
fd5cae4635
Verify `Fabric.launch()` was called ( #17570 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-05 06:36:21 +00:00
Jirka Borovec
384c203532
ruff: PT some more fixes ( #17569 )
2023-05-05 08:25:15 +02:00
Carlos Mocholí
76caa81bf2
Compose RunIf utilities ( #17520 )
2023-05-05 01:21:58 +02:00
Jirka Borovec
f55d10f5ee
ruff: autofix PT ( #17541 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-04 11:50:39 -04:00
Adrian Wälchli
a533f68693
Support compiling a module after it was set up by Fabric ( #17529 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-05-03 09:00:11 +02:00
Adrian Wälchli
249395bfe0
DDP Parity tests as standalone task ( #17503 )
2023-05-03 05:36:07 +02:00
Adrian Wälchli
7523dd3199
Avoid creating CUDA stream if not running on CUDA ( #17499 )
2023-04-29 03:13:56 +00:00
Carlos Mocholí
6ec9a6bd9e
[TPU] Rename classes to use XLA instead of TPU ( #17383 )
2023-04-28 12:36:22 -04:00
Jirka Borovec
77889aa6bb
fabric: upstream runif to pkg ( #17504 )
2023-04-28 15:32:45 +00:00
Adrian Wälchli
ce3701bfc0
Update `Fabric.init_module` for FSDP ( #17510 )
2023-04-28 12:44:52 +00:00
Carlos Mocholí
114a6d64a3
[TPU] Call `auto_device_count` for `is_available` ( #17509 )
2023-04-28 12:32:23 +00:00
Carlos Mocholí
abc634d17c
Fix setup_model typos in Fabric ( #17498 )
2023-04-28 00:31:17 +00:00
Anton Kiselev
6b6594b831
Add timeout argument for `FSDPStrategy` ( #17274 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-04-28 00:27:06 +00:00
Jirka Borovec
db9f095b0b
Replace IPU with external implementation ( #17075 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-27 16:09:51 +00:00
Adrian Wälchli
614dcdf502
True half-precision support in Fabric ( #17287 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-04-27 12:37:33 +00:00
Jirka Borovec
156786343b
adding check for bandit vulnerabilities 1/n ( #17382 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-27 09:43:12 +00:00
pre-commit-ci[bot]
91cb4b9b87
[pre-commit.ci] pre-commit suggestions ( #17271 )
...
* [pre-commit.ci] pre-commit suggestions
updates:
- [github.com/PyCQA/docformatter: v1.4 → v1.6.0](https://github.com/PyCQA/docformatter/compare/v1.4...v1.6.0 )
- [github.com/psf/black: 22.12.0 → 23.3.0](https://github.com/psf/black/compare/22.12.0...23.3.0 )
- [github.com/charliermarsh/ruff-pre-commit: v0.0.237 → v0.0.260](https://github.com/charliermarsh/ruff-pre-commit/compare/v0.0.237...v0.0.260 )
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
* apply
* fixing
* docs/lines
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2023-04-26 21:37:41 +02:00
Adrian Wälchli
4d17b5fe77
Improved model initialization API for Fabric ( #17462 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-04-26 11:25:33 -04:00
dependabot[bot]
b792c90ea7
Update deepspeed requirement support window ( #16813 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2023-04-25 17:26:49 +02:00
Carlos Mocholí
f4b1fc0f71
Input validation for `Fabric.launch` ( #17423 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-04-25 00:44:48 +02:00
Jirka Borovec
df97141781
add & apply flake8-simplify ( #17386 )
2023-04-24 21:57:08 +00:00
Adrian Wälchli
d9b4ebd726
Enable precision autocast for `LightningModule` step methods in Fabric ( #17439 )
2023-04-24 11:50:59 +00:00
Adrian Wälchli
0631fa02ef
Handle edge case in `Fabric.setup()` when model has no parameters ( #17441 )
2023-04-24 10:13:36 +02:00
Adrian Wälchli
877d95f8d7
Minor Fabric backward refactor ( #17433 )
2023-04-21 19:36:46 +00:00
Adrian Wälchli
0ee71d6a7a
Fix LightningModule step methods bypassing DDP wrapper in Fabric ( #17424 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-21 15:29:32 -04:00
Jirka Borovec
111d1ba088
ruff: fixing flake8-comprehensions ( #17385 )
2023-04-21 09:07:58 +00:00
Carlos Mocholí
8dac251273
[TPU] Fix PjRT tests ( #17408 )
2023-04-19 16:39:00 +02:00
Adrian Wälchli
21ae19c69f
Add dynamo RunIf skip condition ( #17404 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-19 01:09:42 +02:00
Liyang90
47726391ad
[TPU] Add support for PJRT from PyTorch/XLA 2.0 ( #17352 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-04-18 18:52:36 +02:00
Carlos Mocholí
90ad36795a
[TPU] Refactor availability check ( #17384 )
2023-04-18 17:52:13 +02:00
Ryan Smith
8d5a91a2dd
Update Fabric CPU tests to work on GPU machines ( #17391 )
2023-04-18 14:03:40 +00:00
Adrian Wälchli
affe72cc3e
Add test for compiling FSDP model in Fabric ( #17394 )
2023-04-17 15:34:23 -04:00
Adrian Wälchli
0dc42f523e
Save and load sharded checkpoints with FSDP in Fabric ( #17323 )
...
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-16 14:11:49 -04:00
Ishan Dutta
e9d6856355
NumPy to Torch for lightning/fabric ( #17291 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-04-15 15:21:56 +00:00
Carlos Mocholí
05b481e3ae
[TPU] Add testing matrix with PJRT ( #17368 )
...
* Replace GKE in CI with manual gcloud usage
* Fix XRT test
* Reduce timeout to 35 minutes
* [TPU] Run tests with PJRT
* runtime as part of the job name
* CHANGELOG
* Update for app too
2023-04-14 16:39:13 +02:00
Carlos Mocholí
856b29fc72
[TPU] Replace GKE in CI with manual gcloud usage ( #17362 )
2023-04-14 12:47:31 +00:00
Adrian Wälchli
50662eb078
Fixes around `Strategy.set_world_ranks` ( #16966 )
...
* don't call set_world_ranks in xla strategy
* update
* fabric and other strategies
* CHANGELOG
* Typos
* Reuse test
---------
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-04-13 17:45:42 +02:00
Carlos Mocholí
0489f2efed
[TPU] v4 support ( #17227 )
2023-04-11 22:24:11 +00:00
Gerson Kroiz
7b8fd85e01
[TPU] Remove error check for IterableDatasets ( #17331 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-04-11 22:04:17 +00:00
Adrian Wälchli
51697a8bd6
Combined setup of model and optimizer with FSDP ( #17305 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-04-11 19:58:53 +00:00
Jirka Borovec
355dd9d343
test: adjust `is_timing_close` ( #17178 )
2023-03-24 12:07:07 +00:00
belerico
bb861cba7e
Let TorchCollective works on the `torch.distributed` WORLD process group by default ( #16995 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-03-20 23:30:27 +00:00
Atharva Phatak
ea708da55a
Add `is_wrapped` utility function for Fabric ( #16953 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-03-14 13:03:38 +00:00
janEbert
dd02397720
Allow frozen data classes in optimizer state dict ( #16656 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-03-10 15:37:18 +00:00
Adrian Wälchli
aa7f2522dc
Fix race condition in Fabric test ( #17002 )
2023-03-08 16:36:00 -05:00
Adrian Wälchli
b6c693d345
Add test for `torch.compile()` with `Fabric.setup()` ( #16977 )
2023-03-07 10:57:31 -05:00
Adrian Wälchli
7749525cbd
Document SLURM interactive mode ( #16955 )
2023-03-06 20:58:46 +00:00
Adrian Wälchli
3e04353c1c
New fabric parity tests ( #16899 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2023-03-06 20:19:25 +00:00
Carlos Mocholí
fca69e68da
Fabric: Test PyTorch 2.0 pre-release on CPU and CUDA ( #16905 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-03-03 17:48:49 +00:00
Jirka Borovec
760612fb8a
update list of fist party packages ( #16859 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-03-03 16:55:48 +00:00
Carlos Mocholí
888686e72b
Fix tests on single-GPU machine ( #16911 )
2023-03-03 01:33:45 +01:00
Adrian Wälchli
7820a117bc
Optimize precision conversion in forward of Fabric module wrapper ( #16903 )
2023-03-02 23:41:37 +00:00
Justus Schock
3d1927e6bc
Adds Gradient Clipping to Fabric ( #16715 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-02-27 23:44:13 +00:00
Yi Heng Lim
4444d0c37d
Fix support for passing -1 to `find_usable_cuda_devices` function ( #16866 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-02-27 20:08:42 +00:00
Adrian Wälchli
e3efbaa7f6
Incorporate pytorch's fixes in device_count_nvml ( #16795 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-02-27 18:07:55 +00:00
Adrian Wälchli
462f1ee691
Fix amp ddp test in Fabric ( #16862 )
2023-02-23 19:05:30 -05:00
Carlos Mocholí
d486f94dd2
Fabric: auto default ( #16842 )
2023-02-23 13:45:27 +00:00
Carlos Mocholí
235e692259
Fabric: do `set_epoch` for `batch_sampler.sampler` ( #16841 )
2023-02-23 00:11:29 +00:00
Carlos Mocholí
914effa04c
Rename `replace_sampler_ddp|replace_sampler` to `use_distributed_sampler` ( #16829 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-02-22 14:07:02 +01:00
Adrian Wälchli
0e4ca7c286
Set accelerator through CLI only if set explicitly ( #16818 )
2023-02-20 13:45:06 +00:00
Adrian Wälchli
81b7c30291
Make DDP subprocess the default launcher for multi-device ( #16780 )
2023-02-20 11:20:50 +00:00
Adrian Wälchli
2844e9e246
Fix XLAEnvironment detection on TPU pod ( #16806 )
2023-02-20 11:01:06 +01:00
Justus Schock
ac5fa03385
Introduce new precision layout in fabric ( #16767 )
2023-02-17 10:41:18 +00:00
Adrian Wälchli
91e692c767
Rename the TPUSpawnStrategy to XLAStrategy ( #16781 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-02-17 02:06:24 +00:00
Adrian Wälchli
c4c4793d56
Fix strategy type validation in connectors ( #16693 )
2023-02-10 10:50:56 +00:00
Adrian Wälchli
923a842e9c
Fix import from torch.distributed when distributed not available ( #16658 )
2023-02-07 04:51:59 -05:00
Carlos Mocholí
1b1241ceb1
Fix TPU tests ( #16628 )
2023-02-06 17:21:26 +00:00
Jirka Borovec
770b792925
copyright Lightning AI team ( #16647 )
...
* copyright Lightning AI team
* more...
2023-02-06 15:26:51 +01:00
Adrian Wälchli
0f75dce8b4
Add MPI cluster environment ( #16570 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-02-03 10:45:11 +00:00
Liyang90
e20172d370
Avoid wrapping prediction dataloader twice on TPU ( #16571 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-02-03 10:36:56 +01:00
Adrian Wälchli
85f7e1c9c8
Show tf32 info only on rank 0 ( #16152 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-02-03 00:56:12 +01:00
Jirka Borovec
377210d85d
tests: switch imports for fabric ( #16592 )
2023-02-01 20:34:38 +00:00
Carlos Mocholí
ef2a6088ff
Drop support for PyTorch 1.10 ( #16492 )
...
* Drop support for PyTorch 1.10
* CHANGELOG
* READMEs
* mypy
* ls
* New poplar version
* Fixed tests
* links
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* skip azure badges
* Table
* Matching dockerfiles
* Drop unnecessary channels and packages
* Push nightly
* Undo unrelated changes
* Revert "Push nightly"
This reverts commit 9618f737c4
.
---------
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-02-01 14:09:12 -05:00
Carlos Mocholí
dc298f2340
Drop support for Python 3.7 ( #16579 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2023-02-01 01:36:42 +00:00
Carlos Mocholí
b2387136ba
Fix `torch.compile` tests ( #16503 )
2023-01-27 02:41:45 +00:00
Adrian Wälchli
23e71a880a
Fabric checkpointing 3/n: Implement missing `get_module_state_dict` for strategies ( #16487 )
2023-01-26 13:10:14 +00:00
Jirka Borovec
50fd12f841
fabric: test with tbX ( #16511 )
2023-01-26 12:52:02 +00:00
Carlos Mocholí
d78cf99176
Remove the "native" suffix from the codebase ( #16490 )
2023-01-25 14:09:09 +00:00
Adrian Wälchli
96b7ed77e6
Enable more shorthand strategy names in the Fabric CLI ( #16485 )
2023-01-25 09:52:03 +00:00
Adrian Wälchli
c87bb71fa8
Add `Fabric.all_reduce` ( #16459 )
2023-01-24 22:35:00 +00:00
Adrian Wälchli
7603dd09cb
Fabric checkpointing 2/n: DeepSpeed implementation ( #16452 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-01-24 18:53:26 +01:00
Adrian Wälchli
9faa25f86f
Test that connector defaults match the ones in Trainer/Fabric ( #16463 )
2023-01-23 05:09:45 -05:00
Nikhil Shenoy
81914c7167
LightningFabric: Error handling for accelerator="mps" and ddp strategy pairing ( #16455 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2023-01-22 17:57:24 +00:00
Adrian Wälchli
39acb81b9b
Fabric checkpointing 1/n: base implementation ( #16434 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-01-19 20:40:12 +00:00
Adrian Wälchli
285cc53738
Make subprocess launcher the default in Lite ( #16388 )
2023-01-17 10:16:33 +00:00
Adrian Wälchli
f1e0fda879
Rename `Strategy.reduce` to `Strategy.all_reduce` in Lite ( #16370 )
2023-01-16 08:17:45 -05:00
Adrian Wälchli
8f1269283f
Add CSVLogger for Lightning Lite ( #16346 )
...
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2023-01-13 13:09:44 +00:00
Adrian Wälchli
0a2ee68ea0
Fix configuration validation error message in Lite CLI ( #16334 )
2023-01-12 15:09:28 +00:00
Carlos Mocholí
428844d01d
Fabric: drop FairScale's sharded implementation ( #16329 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2023-01-11 17:08:18 +00:00
Carlos Mocholí
3c3bff5e6e
Fabric: Remove `_Connector.is_distributed` ( #16327 )
2023-01-11 16:29:51 +01:00
Carlos Mocholí
794685493d
Remove `_StrategyType` ( #16328 )
2023-01-10 23:05:12 +01:00
Carlos Mocholí
047b4374a5
Annotate `Fabric.log_dict` with mapping input ( #16325 )
2023-01-10 23:02:55 +01:00
Lightning Forever
91aaa5313a
Lite: Support `self.log` from a LightningModule ( #16311 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-01-10 16:11:47 +00:00
Adrian Wälchli
b085fa12d3
Rename leftover definitions in Lite tests ( #16309 )
2023-01-10 15:02:05 +00:00
Lightning Forever
f24349bb64
Logger support in Lite ( #16121 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-01-09 18:33:18 +00:00
Adrian Wälchli
c656307127
Handle `set_to_none` when using DeepSpeed optimizer in Lite ( #16275 )
2023-01-09 09:01:11 -05:00
Adrian Wälchli
4c3ce605ad
Update precision input type annotations ( #14857 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2023-01-06 20:08:20 +00:00
pre-commit-ci[bot]
b59941cc52
[pre-commit.ci] pre-commit suggestions ( #16224 )
...
* [pre-commit.ci] pre-commit suggestions
updates:
- [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.4.0](https://github.com/pre-commit/pre-commit-hooks/compare/v4.3.0...v4.4.0 )
- [github.com/asottile/pyupgrade: v2.34.0 → v3.3.1](https://github.com/asottile/pyupgrade/compare/v2.34.0...v3.3.1 )
- https://github.com/myint/docformatter → https://github.com/PyCQA/docformatter
- [github.com/PyCQA/docformatter: v1.4 → v1.5.1](https://github.com/PyCQA/docformatter/compare/v1.4...v1.5.1 )
- [github.com/asottile/yesqa: v1.3.0 → v1.4.0](https://github.com/asottile/yesqa/compare/v1.3.0...v1.4.0 )
- [github.com/PyCQA/isort: 5.10.1 → 5.11.4](https://github.com/PyCQA/isort/compare/5.10.1...5.11.4 )
- [github.com/psf/black: 22.6.0 → 22.12.0](https://github.com/psf/black/compare/22.6.0...22.12.0 )
- [github.com/executablebooks/mdformat: 0.7.14 → 0.7.16](https://github.com/executablebooks/mdformat/compare/0.7.14...0.7.16 )
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-01-04 18:48:35 -05:00
Carlos Mocholí
15ef52bc73
Rename LightningLite to Fabric ( #16244 )
...
* Rename LightningLite to Fabric
* Fix introspection test
* Fix deprecated Lite tests
* Undo accidental Horovod removal
* Fixes
2023-01-04 10:57:18 -05:00