Carlos Mocholí
12d6e44796
Grep for potential errors in standalone tests ( #15341 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-05 04:29:38 +01:00
Rohit Gupta
61ae35c378
Use sklearn in runif ( #15426 )
...
* Use sklearn in runif
* test by removing sklearn dep
* remove repeated code
* seed
2022-11-01 11:40:32 +00:00
Rohit Gupta
773cb3e8c8
Fix skipped tests due to sklearn ( #15311 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-10-31 13:58:34 +05:30
Rohit Gupta
0a729f6da1
Avoid initializing optimizers during deepspeed evaluation ( #14944 )
2022-10-22 00:37:03 +05:30
HELSON
dd33528e00
[docs] Docs for ColossalaiStrategy ( #15093 )
2022-10-13 16:14:03 +00:00
Rohit Gupta
eb17dc9839
Deprecate tuning enum and trainer properties ( #15100 )
2022-10-13 13:29:50 +00:00
ver217
2fef6d9403
Add ColossalAI strategy ( #14224 )
...
Co-authored-by: HELSON <c2h214748@gmail.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: otaj <ota@lightning.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-11 13:59:09 +02:00
Carlos Mocholí
c334b7766c
Remove old testing artifacts ( #15052 )
2022-10-10 17:34:18 +00:00
Adrian Wälchli
c76a95ea12
More tests for TPU accelerator in Lite ( #14960 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-08 15:42:21 +00:00
otaj
7e518cacd2
Use `torch.testing.assert_close` everywhere ( #15031 )
...
remove unnecessary version check
2022-10-07 16:59:04 +02:00
Carlos Mocholí
7ef87464dd
Refactor XLA and TPU checks across codebase ( #14550 )
2022-10-04 22:54:14 +00:00
Adrian Wälchli
498cb60417
Fairscale integration tests for Lite ( #14921 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 17:46:49 +00:00
Adrian Wälchli
822a7f50af
Align ddp and ddp-spawn strategies in setting up the environment ( #11073 )
...
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 19:30:09 +02:00
Adrian Wälchli
d8e90f6581
Fairscale import updates ( #14721 )
...
* fairscale imports
* refactor to avoid meta package build issue
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-09-29 16:45:27 +00:00
Adrian Wälchli
5b446aec4d
DeepSpeed integration tests for Lite ( #14901 )
2022-09-29 16:39:32 +00:00
Carlos Mocholí
7893eb259a
Prepare CI to run on 3090s ( #14910 )
2022-09-29 14:01:59 +00:00
Justin Goodwin
45ca78167e
Improving Hydra+DDP support ( #11617 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-09-22 16:03:13 +00:00
Adrian Wälchli
dc1dc0df36
Attempt to query device count via NVML ( #14631 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-22 09:57:13 +00:00
Carlos Mocholí
7e803ba53e
Clean-up dtype management ( #14823 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2022-09-22 00:07:36 +00:00
Carlos Mocholí
e9c571d39f
Move accelerator-specific parsing functions with their accelerators ( #14753 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-18 22:48:45 +00:00
Adrian Wälchli
35c65b0287
Fix test suite when running on MPS-enabled hardware ( #14708 )
2022-09-16 19:21:36 +00:00
Adrian Wälchli
5bef75648e
Remove deprecated `torch_distributed_backend` logic ( #14693 )
...
* Remove deprecated torch_distributed_backend logic
* changelog
* mention deprecated
* imports
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-09-16 17:27:36 +02:00
Adrian Wälchli
6333caabb0
Standalone Lite: Strategy base classes and registry ( #14662 )
...
* add accelerator implementations to lite
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix imports
* rename registry argument
* fix test
* fix tests
* remove duplicated test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix tests
* deprecation
* deprecations
* flake8
* fixes
* add mps to runif
* fix tests
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove more
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* local import
* undo device stats :(
* fix import
* stupid typehints
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* more refactors :(
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
* rename init_device to setup_device
* remove unused import
* make uppercase to differentiate from class
* trick test after moving import locally
* add base classes and registry
* reg
* registry
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* tests
* update to other branches
* resolve todo(lite)
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add very basic unit tests
* fix name assignment
* Update src/lightning_lite/strategies/parallel.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* remove deprecated property
* remove pre- and post backward for now
* protecting the registry utility function
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove unused import
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-09-14 09:15:21 -04:00
Adrian Wälchli
024e7b8204
Standalone Lite: Cluster Environments ( #14509 )
2022-09-12 12:20:08 +02:00
Adrian Wälchli
d2459df2ff
Standalone Lite: Remaining Utilities ( #14492 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Felonious-Spellfire <felonious.spellfire@gmail.com>
2022-09-07 15:25:23 +00:00
Rohit Gupta
8c6119fbce
Add auto wrapping support for `DDPFullyShardedStrategy` ( #14383 )
2022-09-05 19:07:26 +00:00
Carlos Mocholí
e0c2c3e677
Clean up fairscale imports ( #14476 )
2022-09-01 18:08:40 +02:00
Adrian Wälchli
28e18881a9
Mark stage argument in hooks as required ( #14064 )
...
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-09-01 15:47:40 +02:00
Adrian Wälchli
fafd254678
Fix device parser logic to avoid creating CUDA context ( #14319 )
...
* let environment disable forking
* add helper function and error messages
* tests
* changelog
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-26 15:41:38 +00:00
Rohit Gupta
6d00f31f0c
Add auto wrapping for `DDPFullyShardedNativeStrategy` ( #14252 )
2022-08-26 09:01:48 +00:00
Adrian Wälchli
e67842dcba
Support sharded optimizer state dumping outside of sharded strategies ( #14208 )
2022-08-26 07:58:21 +00:00
Rohit Gupta
48c23e5716
Use fsdp module to initialize precision scalar for fsdp native ( #14092 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
2022-08-13 07:52:06 +00:00
Adrian Wälchli
807f9d8c96
Replace unwrapping logic in strategies ( #13738 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-08-12 08:24:04 +00:00
Adrian Wälchli
4008f9cd41
Convert subprocess test to standalone test ( #14101 )
2022-08-10 17:15:12 -04:00
Carlos Mocholí
9b61b1c482
Remove duplicated test classes ( #14122 )
...
Remove duplicated classes
2022-08-10 17:21:05 +02:00
Adrian Wälchli
06c255c5c1
Skip ddp fork tests on windows ( #14121 )
2022-08-09 22:54:10 +00:00
Rohit Gupta
ac369f5570
Fix incorrect `precision="mixed"` being used with `DeepSpeedStrategy` and `IPUStrategy` ( #14041 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-08-09 21:25:23 +05:30
Adrian Wälchli
0cfc53d6b4
Fix regression on default value for `find_unused_parameters` ( #14095 )
2022-08-09 13:56:02 +05:30
Rohit Gupta
b25275ccc2
Cast to fp16 before moving to device with deepspeed ( #14000 )
2022-08-05 22:15:15 +00:00
Carlos Mocholí
91dd6a68fb
Remove meta device utilities in favor of torchdistx ( #13868 )
2022-08-05 12:20:27 +00:00
Rohit Gupta
e78bf2044b
Raise an error if batch transfer hooks are overridden with IPUAccelerator ( #13961 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-08-04 12:04:42 +00:00
Adrian Wälchli
eb233ea12d
Snapshot selected globals and restore them in spawned process ( #13921 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-08-01 22:21:46 +00:00
Carlos Mocholí
1299e4f984
Run GPU tests with PyTorch 1.12 ( #13716 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-07-28 19:37:57 +05:30
Carlos Mocholí
511875e567
Support DeepSpeed >=0.6.0, <0.6.5 ( #13863 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-07-27 18:57:52 +02:00
Adrian Wälchli
fff62f0ae5
Fix TPU testing and collect all tests ( #11098 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-07-27 15:40:40 +00:00
Adrian Wälchli
81f149e9d4
Rename spawn-based launchers ( #13743 )
2022-07-23 11:48:15 -04:00
Adrian Wälchli
d24978baa3
Add ddp_notebook alias for ddp_fork ( #13744 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2022-07-23 09:06:35 -04:00
Adrian Wälchli
c3299d2c59
Add support for DDP fork ( #13405 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-07-22 16:05:35 +00:00
Sean Naren
d78698528d
[FIX] Native FSDP precision + tests ( #12985 )
2022-07-20 11:32:35 +00:00
Justus Schock
c75457da99
Rename GPUAccelerator to CUDAAccelerator
2022-07-19 13:06:30 -04:00