Carlos Mocholí
02074f16c7
Fix PyTorch versions in Lite CI ( #15338 )
...
* replace oldest in lite
* Fix PyTorch versions in Lite CI
* This will be moved to install pkg workflow in the mirror PR
* 1.13 fixes
* Windows fix
* sorting
Co-authored-by: otaj <ota@lightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-26 15:09:08 -04:00
Adrian Wälchli
38a9e69543
Extend the detection of interactive mode ( #15293 )
...
* extend interactive mode detection
* update test names
* changelog
* test
2022-10-26 15:24:11 +00:00
Adrian Wälchli
0f9156374d
Mark internal Lite APIs as protected ( #15307 )
...
* mark internal lite apis as protected
* formatting
* docs update
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-10-26 12:51:50 +00:00
otaj
76e462a0be
Do not lose references of trainer in test ( #15272 )
...
* Fix reference error
* Skip flaky hanging test
* .
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-25 09:23:15 -04:00
Dan Dale
27585a9bcf
Fix and refactor `test_deepspeed_engine_is_steppable` test ( #15251 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-10-23 21:25:36 +00:00
Carlos Mocholí
961e395677
Resolve collectives test issues ( #15195 )
...
Co-authored-by: otaj <ota@lightning.ai>
2022-10-21 01:08:38 +00:00
Carlos Mocholí
b866dc3a6a
Collective's PREMUL_SUM support with PyTorch 1.13 ( #15201 )
...
* Collective's PREMUL_SUM support with PyTorch 1.13
* Fix test
* Skip under 1.13
2022-10-20 12:36:06 +00:00
Carlos Mocholí
bf458701de
Avoid underscore suffix in filenames ( #15189 )
2022-10-20 07:39:19 -04:00
otaj
741462f373
[LAI] Make lite tests safe for combined package ( #15204 )
...
Make lite tests safe for combined package
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2022-10-20 09:10:39 +00:00
Adrian Wälchli
576757fd79
Validate SRUN variables when launching in SLURM ( #15011 )
2022-10-19 21:42:11 +00:00
Adrian Wälchli
045c2f5715
Efficient gradient accumulation in LightningLite ( #14966 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-19 19:55:12 +00:00
Jirka Borovec
d0b092fda8
Lite: setting extras & fix CI ( #15192 )
...
* extras
* test.txt
* doctest
* Apply suggestions from code review
* Fix imports
* Oops
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-19 19:05:23 +00:00
Carlos Mocholí
24c26f7db2
Standardize Lite's filenames ( #15058 )
2022-10-19 14:09:41 +02:00
Carlos Mocholí
0e18266023
Fix collective tests with PyTorch 1.13 ( #15167 )
2022-10-18 14:31:48 +02:00
Justus Schock
27965cc36b
Fix locally failing lite tests ( #15137 )
2022-10-18 09:49:14 +00:00
Adrian Wälchli
ed891e5049
Force NVML-based CUDA check in PyTorch 1.14+ ( #15110 )
2022-10-13 13:10:29 -04:00
Carlos Mocholí
da25d1d30d
Remove unused Lite code ( #15000 )
...
* Remove unused Lite code
* Remove duplicate import
* Group variable
* Fix monkeypatch
2022-10-10 22:16:56 +00:00
Carlos Mocholí
c334b7766c
Remove old testing artifacts ( #15052 )
2022-10-10 17:34:18 +00:00
Carlos Mocholí
d15bd1520e
[Lite] precision_plugin -> precision ( #15001 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-10-10 15:00:32 +00:00
Carlos Mocholí
0b04aa879f
Resolve interactions between CUDA tests ( #15042 )
2022-10-09 06:20:40 -04:00
Adrian Wälchli
c76a95ea12
More tests for TPU accelerator in Lite ( #14960 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-08 15:42:21 +00:00
Carlos Mocholí
62ca073a41
Introduce base collective and main subclasses ( #15016 )
...
Co-authored-by: otaj <ota@lightning.ai>
2022-10-07 19:53:19 +00:00
Dan Dale
3b75c52869
Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment ( #14984 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-05 18:52:06 -04:00
Dan Dale
ab1eb6531e
Fix fork tests failing in environments with CUDA available ( #14982 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-05 00:02:55 +00:00
Carlos Mocholí
7ef87464dd
Refactor XLA and TPU checks across codebase ( #14550 )
2022-10-04 22:54:14 +00:00
Carlos Mocholí
3028fd287d
Fix TPU test CI ( #14926 )
...
* Fix TPU test CI
* +x first
* Lite first to uncovert errors faster
* Fixes
* One more
* Simplify XLALauncher wrapping to avoid pickle error
* debug
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Debug commit successful. Trying local definitions
* Require tpu for mock test
* ValueError: The number of devices must be either 1 or 8, got 4 instead
* Fix mock test
* Simplify call, rely on defaults
* Skip OSError for now. Maybe upgrading will help
* Simplify launch tests, move some to lite
* Stricter typing
* RuntimeError: Accessing the XLA device before processes have spawned is not allowed.
* Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed."
This reverts commit f65107ebf3
.
* Alternative boring solution to the reverted commit
* Fix failing test on CUDA machine
* Workarounds
* Try latest mkl
* Revert "Try latest mkl"
This reverts commit d06813aa67
.
* Wrong exception
* xfail
* Mypy
* Comment change
* Spawn launch refactor
* Accept that we cannot lazy init now
* Fix mypy and launch test failures
* The base dockerfile already includes mkl-2022.1.0 - what if we use it?
* try a different mkl version
* Revert mkl version changes
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-10-03 09:13:33 -04:00
Adrian Wälchli
d7af8ce2a5
Simplify root node resolution for SLURM environment ( #14912 )
...
Co-authored-by: Seppo Enarvi <seppo.git@marjaniemi.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-30 15:40:43 +00:00
Adrian Wälchli
cd9247a782
Introduce primitives for input/output dtype conversion in Lite Precision ( #14792 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-09-30 15:29:03 +00:00
Carlos Mocholí
6256a318d7
Refactor launching tests to use our launchers ( #14954 )
2022-09-30 09:57:18 +02:00
Atharva Phatak
fdcb5cc90b
Hydra changes to lightning-lite ( #14950 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-29 21:59:35 -04:00
Adrian Wälchli
498cb60417
Fairscale integration tests for Lite ( #14921 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 17:46:49 +00:00
Adrian Wälchli
5b446aec4d
DeepSpeed integration tests for Lite ( #14901 )
2022-09-29 16:39:32 +00:00
Adrian Wälchli
ea5e817973
Better error message when trying to re-initialize CUDA in forked subprocess ( #14709 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-28 05:07:33 -04:00
Carlos Mocholí
9fc4ff3278
Move logic to error out on deprecation warnings into conftest ( #14902 )
2022-09-27 17:49:25 +02:00
Adrian Wälchli
d572a7e2ec
Fix double precision support in Lite ( #14827 )
2022-09-27 08:38:20 +00:00
Adrian Wälchli
d7404c775a
Integration tests for Precision in Lite ( #14815 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2022-09-26 18:50:11 +00:00
Adrian Wälchli
dc1dc0df36
Attempt to query device count via NVML ( #14631 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-22 09:57:13 +00:00
otaj
5ee2b86c44
Tests for fixed TypeError ( #14821 )
...
* tests for 14809
* Apply suggestions from code review
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-09-22 09:04:27 +02:00
Carlos Mocholí
7e803ba53e
Clean-up dtype management ( #14823 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2022-09-22 00:07:36 +00:00
Adrian Wälchli
3f0fec591d
Update device attribute in Lite's module wrapper ( #14822 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-21 19:06:10 +00:00
Carlos Mocholí
abc805f9ef
Remove the model argument from Lite's `optimizer_step` via structural typing ( #14810 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-21 19:28:45 +02:00
awaelchli
c0ff7a1b77
Add backward-compatibility for LightningLite in PL ( #14735 )
2022-09-20 13:31:56 +02:00
awaelchli
e3e71670e6
Move src/pytorch_lightning/lite to src/lightning_lite ( #14735 )
2022-09-20 13:31:56 +02:00
Carlos Mocholí
e9c571d39f
Move accelerator-specific parsing functions with their accelerators ( #14753 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-18 22:48:45 +00:00
Adrian Wälchli
1092265140
Remove check `num_slurm_tasks` in Lite ( #14761 )
2022-09-18 14:01:49 -04:00
Adrian Wälchli
35c65b0287
Fix test suite when running on MPS-enabled hardware ( #14708 )
2022-09-16 19:21:36 +00:00
Adrian Wälchli
47f0d336f1
Standalone Lite: Update LightningLite ( #14726 )
2022-09-16 17:25:27 +00:00
Adrian Wälchli
619e76f22d
Remove silent behavior when `num_slurm_tasks` does not correspond to number of processes in Trainer ( #14300 )
...
* simplify logic
* remove hpc
* update
* add changelog
* more tests
* update test
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-16 11:00:09 +00:00
Adrian Wälchli
38d89713a5
Standalone Lite: Connector ( #14692 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-09-15 14:14:51 +00:00
Adrian Wälchli
d3dcd68852
Standalone Lite: DDP Spawn Strategy Family ( #14675 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-15 10:51:12 +00:00