Dan Dale
acaeab27f6
Fix GPU tests that fail to raise expected configuration error when run in a CUDA environment ( #14983 )
2022-10-04 18:40:55 -04:00
thomas chaton
b936fd4380
[app] Add CloudCompute ID serializable within the flow and works state ( #14819 )
2022-10-04 19:46:44 +00:00
Sherin Thomas
53694eb93d
[App/Improvement] Cleaning up Queue abstraction ( #14977 )
...
[App/Improvement] Cleaning up Queue abstraction (#14977 )
2022-10-04 22:07:31 +05:30
Ethan Harris
ce919ee7d6
Fix commands and API test ( #14947 )
2022-10-04 15:38:40 +00:00
geoffrey-g-delhomme
9832d36851
Fix `ReduceLROnPlateau` update issue while resuming from a checkpoint ( #14702 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-10-04 11:55:51 +00:00
Kishan Savant
c059db446e
Remove the deprecated device_stats_monitor_prefix_keys ( #14890 )
...
* Remove the deprecated device_stats_monitor_prefix_keys
* Added pr no to changelog.md
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-10-03 17:13:02 +00:00
DP
c764221615
fixes typing errors in rich_progress.py ( #14963 )
2022-10-03 14:11:18 +00:00
Adam J. Stewart
09a8001923
Trainer: fix support for non-distributed PyTorch ( #14971 )
...
* Trainer: fix non-distributed use
* Update CHANGELOG
2022-10-03 13:15:07 +00:00
Carlos Mocholí
3028fd287d
Fix TPU test CI ( #14926 )
...
* Fix TPU test CI
* +x first
* Lite first to uncovert errors faster
* Fixes
* One more
* Simplify XLALauncher wrapping to avoid pickle error
* debug
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Debug commit successful. Trying local definitions
* Require tpu for mock test
* ValueError: The number of devices must be either 1 or 8, got 4 instead
* Fix mock test
* Simplify call, rely on defaults
* Skip OSError for now. Maybe upgrading will help
* Simplify launch tests, move some to lite
* Stricter typing
* RuntimeError: Accessing the XLA device before processes have spawned is not allowed.
* Revert "RuntimeError: Accessing the XLA device before processes have spawned is not allowed."
This reverts commit f65107ebf3
.
* Alternative boring solution to the reverted commit
* Fix failing test on CUDA machine
* Workarounds
* Try latest mkl
* Revert "Try latest mkl"
This reverts commit d06813aa67
.
* Wrong exception
* xfail
* Mypy
* Comment change
* Spawn launch refactor
* Accept that we cannot lazy init now
* Fix mypy and launch test failures
* The base dockerfile already includes mkl-2022.1.0 - what if we use it?
* try a different mkl version
* Revert mkl version changes
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-10-03 09:13:33 -04:00
otaj
e290c206c9
Bump version of fsspec ( #14975 )
...
fsspec verbump
2022-10-03 09:53:15 +00:00
Jerome Anand
e62521caf1
Update hpu mixed precision link ( #14974 )
...
Signed-off-by: Jerome <janand@habana.ai>
2022-10-03 09:05:17 +02:00
Carlos Mocholí
be7bfdba27
Remove unused gcsfs dependency ( #14962 )
2022-10-01 16:08:36 +00:00
otaj
511a070c52
Find last checkpoints on restart ( #14907 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-30 20:14:18 +00:00
Ziyad Sheebaelhamd
db26e087e7
Close profiler when `StopIteration` is raised ( #14945 )
2022-09-30 19:29:12 +00:00
Adrian Wälchli
d7af8ce2a5
Simplify root node resolution for SLURM environment ( #14912 )
...
Co-authored-by: Seppo Enarvi <seppo.git@marjaniemi.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-30 15:40:43 +00:00
Adrian Wälchli
cd9247a782
Introduce primitives for input/output dtype conversion in Lite Precision ( #14792 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-09-30 15:29:03 +00:00
Andres Algaba
3daa4c9cc0
Remove deprecated on_init_start_end ( #14867 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-09-30 15:11:38 +00:00
Pritam Soni
2721a2f06b
feat: option to add custom meta tags to the UI container ( #14915 )
2022-09-30 18:56:57 +05:30
Carlos Mocholí
fd2779e55f
Fix fork skip condition in GitHub workflows ( #14955 )
...
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-09-30 08:30:47 -04:00
Mauricio Villegas
15aa9c679d
An instance of SaveConfigCallback should only save the config once ( #14927 )
2022-09-30 12:16:37 +00:00
Masahiro Wada
abea29bfa3
Move type annotation into __init__ ( #14943 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-30 11:03:03 +00:00
Lee Jungwon
a9142d637a
Fix mypy typing errors in pytorch_lightning/trainer/trainer.py ( #14204 )
...
Co-authored-by: otaj <ota@lightning.ai>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-30 10:50:42 +00:00
Akihiro Nitta
021c2f1447
Fix typo in checkgroup.yml ( #14959 )
...
Fix typo
2022-09-30 10:12:06 +00:00
Carlos Mocholí
6256a318d7
Refactor launching tests to use our launchers ( #14954 )
2022-09-30 09:57:18 +02:00
Akihiro Nitta
e47d5a2376
CI: Combine conda and full testing into a single workflow ( #14387 )
...
* Remove conda job
* Remove conda job from readme
* Remove conda jobs from checkgroup
* Remove conda from docker builds
* Remove base-conda dockerfile
* Rewrite the strategy matrix while keeping equivalent
* Run the workflow on this branch
* Revert "Rewrite the strategy matrix while keeping equivalent"
This reverts commit e54298d60e57cffbf8107890987be3fe4a006c77.
* Add PyTorch versions
* Run on draft and disable unrelated costly CI
* Revert "Run the workflow on this branch"
This reverts commit 51ed8b905d8926b630dce4817124bd486135d3ec.
* tmp: Lightweight relevant CI
* Fix CI pathfilter
* Update matrix
* Drop skipping logic
* pip list
* reorder pip list
* tmp: lightweight ci
* Install specified pytorch
* Fix torch installation
* Uncomment steps
* Increase timeout
* bad merge
* Revert "Run on draft and disable unrelated costly CI"
This reverts commit eb5dc5e6bd
.
* Update checkgroup
* Update docs and remove Python/PyTorch versions
* Remove pip-list
* Fail if wrong pytorch version installed
* Add Python 3.8, PyTorch 1.9 job
* tmp: remove azure jobs
* tmp: remove dockers
* tmp: remove others
* Run all combinations
* Include oldest
* Exclude no Python 3.10 distributions
* tmp: no concurrency
* tmp: double timeout
* Add pytest log reporter
* Add pytest-reportlog
* Fewer jobs
* Revert "tmp: no concurrency"
This reverts commit 4a7978dcb3
.
* fix artifact name
* Revert test reports
* Revert unrelated changes
* Revert unrelated changes
* Add the combination of ex-conda jobs
* Update checkgroup
* revert timeout
* remove conda job
* revert docker build workflow file
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-29 22:39:04 -04:00
Atharva Phatak
fdcb5cc90b
Hydra changes to lightning-lite ( #14950 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
2022-09-29 21:59:35 -04:00
Jirka Borovec
f9ef19f108
Run CI helpers' doctests in a workflow ( #14498 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2022-09-30 01:56:56 +02:00
Kishan Savant
1e5411b143
Removed the deprecated datamodule_checkpointhooks ( #14909 )
...
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-09-29 22:31:58 +00:00
Aliaksandr Kuzmik
4c43e57b6f
Comet.ml logger - add usage tracking ( #14906 )
...
Co-authored-by: Aliaksandr.Kuzmik <AliaksandrK@comet.ml>
2022-09-29 21:10:54 +00:00
Adrian Wälchli
c8059d4464
Update quick start guide with latest info ( #14880 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-09-29 20:54:20 +00:00
Suyash Sonawane
72ac4b592f
Fixed docstring for unwatch method ( #14920 )
2022-09-29 19:20:42 +00:00
Tianshu Wang
485ab5e0de
Fix wandb `save_dir` is not overridden by `None` `dir` when using CLI ( #14878 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-09-29 19:20:07 +00:00
Prince Canuma
04aaf83901
Fix MissingFieldException in offline mode ( #14919 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
2022-09-29 18:47:51 +00:00
Adrian Wälchli
498cb60417
Fairscale integration tests for Lite ( #14921 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 17:46:49 +00:00
Adrian Wälchli
822a7f50af
Align ddp and ddp-spawn strategies in setting up the environment ( #11073 )
...
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 19:30:09 +02:00
Rohit Gupta
3a70e5dbcb
Call `LightningDataModule.load_state_dict` hook while restoring checkpoint using `LightningDataModule.load_from_checkpoint` ( #14883 )
2022-09-29 16:55:59 +00:00
Ethan Harris
93e802afc2
Simplify bug report template ( #14925 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
2022-09-29 16:49:45 +00:00
Adrian Wälchli
d8e90f6581
Fairscale import updates ( #14721 )
...
* fairscale imports
* refactor to avoid meta package build issue
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: thomas chaton <thomas@grid.ai>
2022-09-29 16:45:27 +00:00
Adrian Wälchli
5b446aec4d
DeepSpeed integration tests for Lite ( #14901 )
2022-09-29 16:39:32 +00:00
Kaushik B
0abdd80104
Prepare v1.8.0rc0 ( #14918 )
2022-09-29 18:00:25 +02:00
Carlos Mocholí
6e70f55f00
Clean up CODEOWNERS for PL and Lite ( #14942 )
...
* Clean up CODEOWNERS for PL and Lite
* Update
2022-09-29 10:17:05 -04:00
Carlos Mocholí
b8cc4525bd
Skip CircleCI trigger for forks ( #14930 )
2022-09-29 10:16:37 -04:00
Carlos Mocholí
7893eb259a
Prepare CI to run on 3090s ( #14910 )
2022-09-29 14:01:59 +00:00
Carlos Mocholí
4c53eae0f4
Self-review of the recent Trainer changes ( #14916 )
2022-09-29 13:59:16 +00:00
Carlos Mocholí
4eb7766f3c
Make internal torchscript check a class attribute ( #14904 )
2022-09-29 13:40:25 +00:00
otaj
5f0c4aad12
Introduce `ckpt_path="hpc"` keyword for checkpoint loading ( #14911 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 12:45:51 +00:00
Adrian Wälchli
ff3c5b7b9d
Docs section for SLURM troubleshooting ( #14873 )
...
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2022-09-29 12:41:31 +00:00
Adrian Wälchli
a45c047b38
Remove deprecated LightningIPUModule ( #14830 )
...
* Remove deprecated LightningIPUModule
* chlog
* fix import
* Fix 1.10 depr test
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-09-29 13:07:45 +01:00
Masahiro Wada
d377d0efde
Fix type hints of tuner/batch_size_scaling.py ( #13518 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: otaj <ota@lightning.ai>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-09-29 12:00:42 +00:00
Jerome Anand
136d57312d
Upgrade HPU image to release 1.6.1 ( #14932 )
2022-09-29 11:22:27 +00:00