Commit Graph

15 Commits

Author SHA1 Message Date
Dan Dale ab1eb6531e
Fix fork tests failing in environments with CUDA available (#14982)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-05 00:02:55 +00:00
Carlos Mocholí 7ef87464dd
Refactor XLA and TPU checks across codebase (#14550) 2022-10-04 22:54:14 +00:00
Carlos Mocholí 3028fd287d
Fix TPU test CI (#14926)
* Fix TPU test CI

* +x first

* Lite first to uncovert errors faster

* Fixes

* One more

* Simplify XLALauncher wrapping to avoid pickle error

* debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Debug commit successful. Trying local definitions

* Require tpu for mock test

* ValueError: The number of devices must be either 1 or 8, got 4 instead

* Fix mock test

* Simplify call, rely on defaults

* Skip OSError for now. Maybe upgrading will help

* Simplify launch tests, move some to lite

* Stricter typing

* RuntimeError: Accessing the XLA device before processes have spawned is not allowed.

* Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed."

This reverts commit f65107ebf3.

* Alternative boring solution to the reverted commit

* Fix failing test on CUDA machine

* Workarounds

* Try latest mkl

* Revert "Try latest mkl"

This reverts commit d06813aa67.

* Wrong exception

* xfail

* Mypy

* Comment change

* Spawn launch refactor

* Accept that we cannot lazy init now

* Fix mypy and launch test failures

* The base dockerfile already includes mkl-2022.1.0 - what if we use it?

* try a different mkl version

* Revert mkl version changes

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-10-03 09:13:33 -04:00
Carlos Mocholí 6256a318d7
Refactor launching tests to use our launchers (#14954) 2022-09-30 09:57:18 +02:00
Atharva Phatak fdcb5cc90b
Hydra changes to lightning-lite (#14950)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-29 21:59:35 -04:00
Adrian Wälchli 498cb60417
Fairscale integration tests for Lite (#14921)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 17:46:49 +00:00
Adrian Wälchli 5b446aec4d
DeepSpeed integration tests for Lite (#14901) 2022-09-29 16:39:32 +00:00
Adrian Wälchli ea5e817973
Better error message when trying to re-initialize CUDA in forked subprocess (#14709)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-28 05:07:33 -04:00
Adrian Wälchli dc1dc0df36
Attempt to query device count via NVML (#14631)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-22 09:57:13 +00:00
Adrian Wälchli d3dcd68852
Standalone Lite: DDP Spawn Strategy Family (#14675)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-15 10:51:12 +00:00
Adrian Wälchli deca6cc5c4
Standalone Lite: DDP Strategy Family (#14670)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-15 01:36:17 +00:00
Adrian Wälchli 7867d152b3
Standalone Lite: DataParallel Strategy (#14681)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-14 19:27:53 -04:00
Adrian Wälchli 32cb774a5c
Standalone Lite: Single Device TPU Strategy (#14663)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-14 14:22:07 +00:00
Adrian Wälchli 6333caabb0
Standalone Lite: Strategy base classes and registry (#14662)
* add accelerator implementations to lite

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix imports

* rename registry argument

* fix test

* fix tests

* remove duplicated test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* deprecation

* deprecations

* flake8

* fixes

* add mps to runif

* fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove more

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* local import

* undo device stats :(

* fix import

* stupid typehints

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more refactors :(

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* rename init_device to setup_device

* remove unused import

* make uppercase to differentiate from class

* trick test after moving import locally

* add base classes and registry

* reg

* registry

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tests

* update to other branches

* resolve todo(lite)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add very basic unit tests

* fix name assignment

* Update src/lightning_lite/strategies/parallel.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* remove deprecated property

* remove pre- and post backward for now

* protecting the registry utility function

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2022-09-14 09:15:21 -04:00
Adrian Wälchli 8f0a64dab6
Standalone Lite: Launchers (#14555)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-12 14:15:42 +00:00