Commit Graph

26 Commits

Author SHA1 Message Date
Adrian Wälchli 05dbf48ad0
Activation checkpointing in FSDP without boilerplate (#15826)
* initial
* input type
* checkpointing
* fsdp in pl
* all_close

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-12-06 15:45:33 +00:00
Adrian Wälchli 657bfc586a
Fix device placement when setting up FSDP model in Lite (#15822)
* fix
* debug test
* simplify

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2022-11-28 04:05:48 +01:00
Adrian Wälchli 88b2e5a258
Revert new Hydra launch behavior (#15737)
* revert new hydra cwd behavior
* remove debug statements
* changelog

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2022-11-21 20:19:13 +00:00
Adrian Wälchli 86568521fd
FSDP (native) support for LightningLite (#14967)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-11-21 13:58:37 +00:00
Adrian Wälchli ec7acb5ae5
Speed up subprocess launch (#15738) 2022-11-21 06:19:47 -05:00
Adrian Wälchli 0dfb3d28ce
Support individual setup of model and optimizer in Lite (#15185) 2022-11-11 14:36:59 +01:00
Carlos Mocholí 12d6e44796
Grep for potential errors in standalone tests (#15341)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-11-05 04:29:38 +01:00
Adrian Wälchli 045c2f5715
Efficient gradient accumulation in LightningLite (#14966)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-19 19:55:12 +00:00
Jirka Borovec d0b092fda8
Lite: setting extras & fix CI (#15192)
* extras
* test.txt
* doctest
* Apply suggestions from code review
* Fix imports
* Oops

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-19 19:05:23 +00:00
Carlos Mocholí d15bd1520e
[Lite] precision_plugin -> precision (#15001)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-10-10 15:00:32 +00:00
Adrian Wälchli c76a95ea12
More tests for TPU accelerator in Lite (#14960)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-10-08 15:42:21 +00:00
Dan Dale ab1eb6531e
Fix fork tests failing in environments with CUDA available (#14982)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-05 00:02:55 +00:00
Carlos Mocholí 7ef87464dd
Refactor XLA and TPU checks across codebase (#14550) 2022-10-04 22:54:14 +00:00
Carlos Mocholí 3028fd287d
Fix TPU test CI (#14926)
* Fix TPU test CI

* +x first

* Lite first to uncovert errors faster

* Fixes

* One more

* Simplify XLALauncher wrapping to avoid pickle error

* debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Debug commit successful. Trying local definitions

* Require tpu for mock test

* ValueError: The number of devices must be either 1 or 8, got 4 instead

* Fix mock test

* Simplify call, rely on defaults

* Skip OSError for now. Maybe upgrading will help

* Simplify launch tests, move some to lite

* Stricter typing

* RuntimeError: Accessing the XLA device before processes have spawned is not allowed.

* Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed."

This reverts commit f65107ebf3.

* Alternative boring solution to the reverted commit

* Fix failing test on CUDA machine

* Workarounds

* Try latest mkl

* Revert "Try latest mkl"

This reverts commit d06813aa67.

* Wrong exception

* xfail

* Mypy

* Comment change

* Spawn launch refactor

* Accept that we cannot lazy init now

* Fix mypy and launch test failures

* The base dockerfile already includes mkl-2022.1.0 - what if we use it?

* try a different mkl version

* Revert mkl version changes

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-10-03 09:13:33 -04:00
Carlos Mocholí 6256a318d7
Refactor launching tests to use our launchers (#14954) 2022-09-30 09:57:18 +02:00
Atharva Phatak fdcb5cc90b
Hydra changes to lightning-lite (#14950)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-29 21:59:35 -04:00
Adrian Wälchli 498cb60417
Fairscale integration tests for Lite (#14921)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 17:46:49 +00:00
Adrian Wälchli 5b446aec4d
DeepSpeed integration tests for Lite (#14901) 2022-09-29 16:39:32 +00:00
Adrian Wälchli ea5e817973
Better error message when trying to re-initialize CUDA in forked subprocess (#14709)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-28 05:07:33 -04:00
Adrian Wälchli dc1dc0df36
Attempt to query device count via NVML (#14631)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-22 09:57:13 +00:00
Adrian Wälchli d3dcd68852
Standalone Lite: DDP Spawn Strategy Family (#14675)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-15 10:51:12 +00:00
Adrian Wälchli deca6cc5c4
Standalone Lite: DDP Strategy Family (#14670)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-15 01:36:17 +00:00
Adrian Wälchli 7867d152b3
Standalone Lite: DataParallel Strategy (#14681)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-14 19:27:53 -04:00
Adrian Wälchli 32cb774a5c
Standalone Lite: Single Device TPU Strategy (#14663)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-14 14:22:07 +00:00
Adrian Wälchli 6333caabb0
Standalone Lite: Strategy base classes and registry (#14662)
* add accelerator implementations to lite

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix imports

* rename registry argument

* fix test

* fix tests

* remove duplicated test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* deprecation

* deprecations

* flake8

* fixes

* add mps to runif

* fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove more

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local import

* undo device stats :(

* fix import

* stupid typehints

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more refactors :(

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* rename init_device to setup_device

* remove unused import

* make uppercase to differentiate from class

* trick test after moving import locally

* add base classes and registry

* reg

* registry

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tests

* update to other branches

* resolve todo(lite)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add very basic unit tests

* fix name assignment

* Update src/lightning_lite/strategies/parallel.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* remove deprecated property

* remove pre- and post backward for now

* protecting the registry utility function

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-09-14 09:15:21 -04:00
Adrian Wälchli 8f0a64dab6
Standalone Lite: Launchers (#14555)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-12 14:15:42 +00:00