Commit Graph

63 Commits

Author SHA1 Message Date
Carlos Mocholí 12d6e44796
Grep for potential errors in standalone tests (#15341)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-05 04:29:38 +01:00
Rohit Gupta 61ae35c378
Use sklearn in runif (#15426)
* Use sklearn in runif
* test by removing sklearn dep
* remove repeated code
* seed
2022-11-01 11:40:32 +00:00
Rohit Gupta 773cb3e8c8
Fix skipped tests due to sklearn (#15311)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-10-31 13:58:34 +05:30
Rohit Gupta 0a729f6da1
Avoid initializing optimizers during deepspeed evaluation (#14944) 2022-10-22 00:37:03 +05:30
HELSON dd33528e00
[docs] Docs for ColossalaiStrategy (#15093) 2022-10-13 16:14:03 +00:00
Rohit Gupta eb17dc9839
Deprecate tuning enum and trainer properties (#15100) 2022-10-13 13:29:50 +00:00
ver217 2fef6d9403
Add ColossalAI strategy (#14224)
Co-authored-by: HELSON <c2h214748@gmail.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: otaj <ota@lightning.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-11 13:59:09 +02:00
Carlos Mocholí c334b7766c
Remove old testing artifacts (#15052) 2022-10-10 17:34:18 +00:00
Adrian Wälchli c76a95ea12
More tests for TPU accelerator in Lite (#14960)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-08 15:42:21 +00:00
otaj 7e518cacd2
Use `torch.testing.assert_close` everywhere (#15031)
remove unnecessary version check
2022-10-07 16:59:04 +02:00
Carlos Mocholí 7ef87464dd
Refactor XLA and TPU checks across codebase (#14550) 2022-10-04 22:54:14 +00:00
Adrian Wälchli 498cb60417
Fairscale integration tests for Lite (#14921)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 17:46:49 +00:00
Adrian Wälchli 822a7f50af
Align ddp and ddp-spawn strategies in setting up the environment (#11073)
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 19:30:09 +02:00
Adrian Wälchli d8e90f6581
Fairscale import updates (#14721)
* fairscale imports
* refactor to avoid meta package build issue

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-09-29 16:45:27 +00:00
Adrian Wälchli 5b446aec4d
DeepSpeed integration tests for Lite (#14901) 2022-09-29 16:39:32 +00:00
Carlos Mocholí 7893eb259a
Prepare CI to run on 3090s (#14910) 2022-09-29 14:01:59 +00:00
Justin Goodwin 45ca78167e
Improving Hydra+DDP support (#11617)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-09-22 16:03:13 +00:00
Adrian Wälchli dc1dc0df36
Attempt to query device count via NVML (#14631)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-22 09:57:13 +00:00
Carlos Mocholí 7e803ba53e
Clean-up dtype management (#14823)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2022-09-22 00:07:36 +00:00
Carlos Mocholí e9c571d39f
Move accelerator-specific parsing functions with their accelerators (#14753)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-18 22:48:45 +00:00
Adrian Wälchli 35c65b0287
Fix test suite when running on MPS-enabled hardware (#14708) 2022-09-16 19:21:36 +00:00
Adrian Wälchli 5bef75648e
Remove deprecated `torch_distributed_backend` logic (#14693)
* Remove deprecated torch_distributed_backend logic
* changelog
* mention deprecated
* imports

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-09-16 17:27:36 +02:00
Adrian Wälchli 6333caabb0
Standalone Lite: Strategy base classes and registry (#14662)
* add accelerator implementations to lite

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix imports

* rename registry argument

* fix test

* fix tests

* remove duplicated test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* deprecation

* deprecations

* flake8

* fixes

* add mps to runif

* fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove more

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local import

* undo device stats :(

* fix import

* stupid typehints

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more refactors :(

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* rename init_device to setup_device

* remove unused import

* make uppercase to differentiate from class

* trick test after moving import locally

* add base classes and registry

* reg

* registry

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tests

* update to other branches

* resolve todo(lite)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add very basic unit tests

* fix name assignment

* Update src/lightning_lite/strategies/parallel.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* remove deprecated property

* remove pre- and post backward for now

* protecting the registry utility function

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-09-14 09:15:21 -04:00
Adrian Wälchli 024e7b8204
Standalone Lite: Cluster Environments (#14509) 2022-09-12 12:20:08 +02:00
Adrian Wälchli d2459df2ff
Standalone Lite: Remaining Utilities (#14492)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Felonious-Spellfire <felonious.spellfire@gmail.com>
2022-09-07 15:25:23 +00:00
Rohit Gupta 8c6119fbce
Add auto wrapping support for `DDPFullyShardedStrategy` (#14383) 2022-09-05 19:07:26 +00:00
Carlos Mocholí e0c2c3e677
Clean up fairscale imports (#14476) 2022-09-01 18:08:40 +02:00
Adrian Wälchli 28e18881a9
Mark stage argument in hooks as required (#14064)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-09-01 15:47:40 +02:00
Adrian Wälchli fafd254678
Fix device parser logic to avoid creating CUDA context (#14319)
* let environment disable forking

* add helper function and error messages

* tests

* changelog

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 15:41:38 +00:00
Rohit Gupta 6d00f31f0c
Add auto wrapping for `DDPFullyShardedNativeStrategy` (#14252) 2022-08-26 09:01:48 +00:00
Adrian Wälchli e67842dcba
Support sharded optimizer state dumping outside of sharded strategies (#14208) 2022-08-26 07:58:21 +00:00
Rohit Gupta 48c23e5716
Use fsdp module to initialize precision scalar for fsdp native (#14092)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
2022-08-13 07:52:06 +00:00
Adrian Wälchli 807f9d8c96
Replace unwrapping logic in strategies (#13738)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-12 08:24:04 +00:00
Adrian Wälchli 4008f9cd41
Convert subprocess test to standalone test (#14101) 2022-08-10 17:15:12 -04:00
Carlos Mocholí 9b61b1c482
Remove duplicated test classes (#14122)
Remove duplicated classes
2022-08-10 17:21:05 +02:00
Adrian Wälchli 06c255c5c1
Skip ddp fork tests on windows (#14121) 2022-08-09 22:54:10 +00:00
Rohit Gupta ac369f5570
Fix incorrect `precision="mixed"` being used with `DeepSpeedStrategy` and `IPUStrategy` (#14041)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-09 21:25:23 +05:30
Adrian Wälchli 0cfc53d6b4
Fix regression on default value for `find_unused_parameters` (#14095) 2022-08-09 13:56:02 +05:30
Rohit Gupta b25275ccc2
Cast to fp16 before moving to device with deepspeed (#14000) 2022-08-05 22:15:15 +00:00
Carlos Mocholí 91dd6a68fb
Remove meta device utilities in favor of torchdistx (#13868) 2022-08-05 12:20:27 +00:00
Rohit Gupta e78bf2044b
Raise an error if batch transfer hooks are overridden with IPUAccelerator (#13961)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-04 12:04:42 +00:00
Adrian Wälchli eb233ea12d
Snapshot selected globals and restore them in spawned process (#13921)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-01 22:21:46 +00:00
Carlos Mocholí 1299e4f984
Run GPU tests with PyTorch 1.12 (#13716)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-07-28 19:37:57 +05:30
Carlos Mocholí 511875e567
Support DeepSpeed >=0.6.0, <0.6.5 (#13863)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-07-27 18:57:52 +02:00
Adrian Wälchli fff62f0ae5
Fix TPU testing and collect all tests (#11098)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-07-27 15:40:40 +00:00
Adrian Wälchli 81f149e9d4
Rename spawn-based launchers (#13743) 2022-07-23 11:48:15 -04:00
Adrian Wälchli d24978baa3
Add ddp_notebook alias for ddp_fork (#13744)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-23 09:06:35 -04:00
Adrian Wälchli c3299d2c59
Add support for DDP fork (#13405)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-07-22 16:05:35 +00:00
Sean Naren d78698528d
[FIX] Native FSDP precision + tests (#12985) 2022-07-20 11:32:35 +00:00
Justus Schock c75457da99 Rename GPUAccelerator to CUDAAccelerator 2022-07-19 13:06:30 -04:00