Commit Graph

5378 Commits

Author SHA1 Message Date
Carlos Mocholí 8a7f504b6f
Detach hiddens and add test (#8249) 2021-07-02 14:03:12 +02:00
Sean Naren 07b1ce227c
[IPU] Fix Custom Poptorch options to IPUPlugin (#8241)
* Fixes to ensure ipu options are respected

* Better setter

* Add test for poptorch Options

* Fix test

* fix ipu test

* Update pytorch_lightning/plugins/training_type/ipu.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-02 11:23:57 +00:00
Adrian Wälchli e7139ab9f7
Support `DDPPlugin` to be used on CPU (#6208)
* Skip test due to 'Python bus error'

* Debug NCCL

* Remove NCCL_DEBUG statement

* Revert "Skip test due to 'Python bus error'"

This reverts commit e0a3e8785d.

* fix

* add test

* changelog

* yapf

* patch os environ

* make a special test

* destroy pg

* debug

* revert

* revert

* problematic test

* skip

* try the fixture

* test

* update sensitive test

* update changelog

* remove comment

* update wrong test

* update test name

* parameterization

* Revert "parameterization"

This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc.

* remove conftest

* ignore test

* teardown

* fix merge

* deep speed parameterization

* uncomment test

* update chlog

* update changelog

* split tests

* update test


update test


update test


update test

* update test comments

* unroll test

* unroll test

* unroll test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* increase shm

* sudo

* unroll ipu

* Revert "sudo"

This reverts commit 6cc68c1478.

* Revert "increase shm"

This reverts commit 8c27163483.

* x

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* find guilty test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* POPTORCH_WAIT_FOR_IPU=1

* move test

* redo parameterize for ipu

* de-comment test

* move chlog

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-02 12:00:24 +01:00
Carlos Mocholí a2e41045d2
Mark some loop attributes as protected (#8250) 2021-07-02 11:51:51 +01:00
deepsource-autofix[bot] 7e2f84e050
Remove methods with unnecessary super delegation. (#8148)
* Remove methods with unnecessary super delegation.

* Update fully_sharded.py

* replace init in test

Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
2021-07-02 08:00:55 +00:00
Kaushik B 365a9bae33
Update Torch Elastic documentation (#8248)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-02 03:47:58 +05:30
Adrian Wälchli af52de1198
update changelog after 1.3.8 patch release (#8239)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-07-01 21:49:06 +00:00
Guillaume Tauzin baa7de2d9e
Fix truncated_bptt_steps hiddens detach() and improve docs (#8145)
* Fix truncated_bptt_steps hiddens detach()
* Improve truncated_bptt_docs
* Add missing import
* Improve documentation wordings
* pep8
* detach typo
* Update test
* Implement comments
* parametrize test
* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>

* Remove import

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-01 22:16:14 +01:00
ananthsub 8b0aec8565
Deprecate `LightningModule.loaded_optimizer_states_dict` (#8229) 2021-07-01 23:02:29 +02:00
thomas chaton d51b0ae7fc
Add `state_dict` to loops (#8197)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-01 15:54:37 +00:00
Ethan Harris c0caeb3ea9
Update docs for new template (#8232)
* Update docs for new template

* Fixes

* Fixes

* Drop links
2021-07-01 16:19:09 +01:00
Carlos Mocholí 3e6f884a89
Avoid Pillow 8.3.0 due to errors with numpy (#8234)
* Avoid Pillow 8.3.0

* Move it to last
2021-07-01 13:16:38 +00:00
Palermo 36b893c43e
Add `ModelSummary.max_depth` (#8062)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-01 12:08:16 +02:00
Mauricio Villegas 3c74502919
Add support for optimizers and learning rate schedulers to LightningCLI (#8093)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-01 12:04:11 +02:00
karthikrangasai 1afc1ca7ef
Logging Non-matching keys when loading from checkpoint in non-strict … (#8152)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-30 18:33:13 +00:00
thomas chaton acb6f26006
[Refactor] Remove should_raise_exception (#8202)
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
2021-06-30 17:02:10 +00:00
deepsource-autofix[bot] c0782ffd1f
Remove unnecessary generator (#8154)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-30 11:40:13 +00:00
Carlos Mocholí 74eb6cc7e9
Clean `cuda.empty_cache` usage (#8199) 2021-06-30 13:04:24 +02:00
Ethan Harris 57dce7244c
Fix double precision casting complex buffers (#8208)
* Fix double precision casting complex buffers

* Update CHANGELOG.md

* Fixes

* Fixes

* Fix

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-30 10:57:42 +01:00
Adrian Wälchli d2203a8f18
update bug report issue template - include PL version (#8209)
* update github template

* Update .github/ISSUE_TEMPLATE/bug_report.md

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update .github/ISSUE_TEMPLATE/bug_report.md

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-06-30 10:47:44 +02:00
SATISH J 4af8eff0a1
fix: training_step_end doesn't work as stated in docs (#8188) 2021-06-30 00:24:06 +00:00
Carlos Mocholí 2e537b75e3
Deprecate `DDPPlugin.task_idx` (#8203) 2021-06-30 01:02:55 +02:00
Carlos Mocholí 87b1b86e2f
Add missing logging tests (#8195) 2021-06-29 22:52:50 +00:00
Carlos Mocholí df601405d9
Use full `torch.distributed` import (#8200) 2021-06-29 22:44:10 +00:00
Carlos Mocholí 47c76548aa
Sync our torchmetrics wrappers after the 0.4 release (#8205)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-29 22:05:48 +00:00
Kaushik B 9444a08d56
Fix Deprecation warning in DDPSpawn (#8193) 2021-06-29 09:29:51 -07:00
Carlos Mocholí 9aaa6822ec
Add CODEOWNERS for progress dataclasses (#8196) 2021-06-29 10:01:51 -04:00
thomas chaton bae08514d1
[refactor] Add should_raise_exception for gpus / tpus utilities (#8194)
* add should_raise

* update changelog

* Update pytorch_lightning/utilities/device_parser.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* add to tpu_cores parser

* add should_raise description

* update on comments

* update changelog

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-06-29 10:00:06 -04:00
Jirka Borovec df6885cd37
add how to contribute (#8129)
* add how to contribute

* docs

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>

* Update README.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-06-29 13:25:10 +00:00
Carlos Mocholí 571a810a7c
Improvements and changes to progress tracking dataclasses (#8140)
* Improvements to progress dataclasses

* Update CHANGELOG

* Rename function

* Undo CODEOWNERS update
2021-06-29 13:47:41 +01:00
Kaushik B 2a7fad92b9
Avoid passing unnecessary params from TPUSpawn to DDPSpawn (#8192) 2021-06-29 14:30:54 +02:00
Kaushik B f60aae9815
Update `dataloaders` params in example (#8191) 2021-06-29 14:23:48 +02:00
Adrian Wälchli 6db0fe3659
training loop refactor - move val loop (#8120)
* EvaluationDataLoaderLoop -> EvaluationLoop

* proposed rename files

* imports

* bad merge

* update init files

* glue imports together

* rename fit_loop.validation_loop to fit_loop.val_loop

* move loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Group imports

* Resolve circular import

* Comment

* fix test

* try to resolve circ import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-29 09:06:44 +00:00
Adrian Wälchli ae34df00cc
remove deadcode in trainer (#8121) 2021-06-29 09:11:24 +01:00
Justus Schock b12a0d0a0a
Make Plugins Proxies after transfering ownership (#8117)
* Update accelerator_connector.py

* Update accelerator_connector.py

* Update accelerator_connector.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update accelerator_connector.py

* Update accelerator_connector.py

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-28 22:21:48 +01:00
Justus Schock d6435a5b73
Bugfix/swa iterable dset (#8172)
* add test

* add fix

* Update CHANGELOG.md
2021-06-28 21:18:25 +00:00
Ethan Harris b1d8840fd8
Fix metric attribute lookup (#8181)
* Fix metric attribute lookup

* Update CHANGELOG.md

* Split tests
2021-06-28 20:17:43 +00:00
Adrian Wälchli bf54ac1cad
fix NCCL error with non-consecutive trainer gpus (#8165)
* device ids in barrier


x


x


s


same fix for spawn


fix non-nccl 


x

* add changelog

* get nccl backend

* get backend

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-06-28 22:08:10 +02:00
Kaushik B 2f3c65e57b
XLA Profiler integration (#8014) 2021-06-29 00:58:05 +05:30
thomas chaton c521624a92
[bugfix] Add mechanism to prevent deadlock for DDP on Exception Trigger (#8167)
* add mechanism to prevent deadlock

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve flake8 + update changelog

* update on comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* remove space

* resolve bugs

* overwrite config

* update on comments

* update on comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* update

* update test with comments

* Update pytorch_lightning/plugins/training_type/parallel.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-28 19:26:03 +00:00
thomas chaton 1f025789fc
[bugfix] Clean Validation Sanity Checking metrics (#8171)
* resolve logging issue

* update changelog

* remove breakpoint

* resolve bugs

* remove pass
2021-06-28 13:49:56 -04:00
thomas chaton c4492ad6aa
Merge pull request #8174 from PyTorchLightning/bugfix/8159_log_gpu_memory_on_step
[bugfix] Resolve memory not logged when missing metrics
2021-06-28 09:39:17 -04:00
Ethan Harris 2a372e3682
Fix module dict in base finetuning (#8170)
* Fix module dict in base finetuning

* Update CHANGELOG.md
2021-06-28 10:55:32 +00:00
Adrian Wälchli b978d2a1f2
remove message (#8163) 2021-06-28 09:57:52 +00:00
deepsource-autofix[bot] 03154eb30a
Refactor unnecessary `else` / `elif` when `if` block has a `return` statement (#8156)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-28 15:27:41 +05:30
deepsource-autofix[bot] 67f7e1318f
Fix dangerous default argument (#8164)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-28 09:52:37 +00:00
deepsource-autofix[bot] 9bd3747c71
Remove unnecessary use of comprehension (#8147)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-28 11:38:50 +02:00
deepsource-autofix[bot] c3065c5ce9
Iterate dictionary directly (#8155)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-27 21:55:16 +02:00
Adrian Wälchli 51ea84222b
resurface lost ddp info message (#8111) 2021-06-27 21:51:15 +02:00
Jirka Borovec 28afc7a10d
ignore tests in DeepSource analyses (#8151)
* ignore tests

* .
2021-06-27 11:08:20 +00:00