Commit Graph

8 Commits

Author SHA1 Message Date
Dan Dale ab1eb6531e
Fix fork tests failing in environments with CUDA available (#14982)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-05 00:02:55 +00:00
Carlos Mocholí 7ef87464dd
Refactor XLA and TPU checks across codebase (#14550) 2022-10-04 22:54:14 +00:00
Carlos Mocholí 3028fd287d
Fix TPU test CI (#14926)
* Fix TPU test CI

* +x first

* Lite first to uncovert errors faster

* Fixes

* One more

* Simplify XLALauncher wrapping to avoid pickle error

* debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Debug commit successful. Trying local definitions

* Require tpu for mock test

* ValueError: The number of devices must be either 1 or 8, got 4 instead

* Fix mock test

* Simplify call, rely on defaults

* Skip OSError for now. Maybe upgrading will help

* Simplify launch tests, move some to lite

* Stricter typing

* RuntimeError: Accessing the XLA device before processes have spawned is not allowed.

* Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed."

This reverts commit f65107ebf3.

* Alternative boring solution to the reverted commit

* Fix failing test on CUDA machine

* Workarounds

* Try latest mkl

* Revert "Try latest mkl"

This reverts commit d06813aa67.

* Wrong exception

* xfail

* Mypy

* Comment change

* Spawn launch refactor

* Accept that we cannot lazy init now

* Fix mypy and launch test failures

* The base dockerfile already includes mkl-2022.1.0 - what if we use it?

* try a different mkl version

* Revert mkl version changes

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-10-03 09:13:33 -04:00
Carlos Mocholí 6256a318d7
Refactor launching tests to use our launchers (#14954) 2022-09-30 09:57:18 +02:00
Atharva Phatak fdcb5cc90b
Hydra changes to lightning-lite (#14950)
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-29 21:59:35 -04:00
Adrian Wälchli ea5e817973
Better error message when trying to re-initialize CUDA in forked subprocess (#14709)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-28 05:07:33 -04:00
Adrian Wälchli dc1dc0df36
Attempt to query device count via NVML (#14631)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-22 09:57:13 +00:00
Adrian Wälchli 8f0a64dab6
Standalone Lite: Launchers (#14555)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-12 14:15:42 +00:00