Commit Graph

2959 Commits

Author SHA1 Message Date
Kaushik B 7b6d0a842c
Fix progress bar updates for Pod Training (#8258)
* Fix progress bar updates for Pod Training

* Fix progress bar updates for Pod Training

* Add _pod_progress_bar_force_stdout
2021-07-05 10:38:38 +01:00
Adrian Wälchli ea5cfd2005
move batch to device before sending it to hooks (#7378)
* update train step

* test

* x

* limits

* val

* typeo

* x

* x

* step

* min gpus

* run all loops

* x

* limit test

* profiler

* clean up accelerator code

* move files

* rename

* move tests

* changelog

* reorder callbacks and model hooks

* add test description

* replace unneccessary method

* fix chlog

* adjust batch_to_device for DP Plugin

* update tests for dataloader idx

* unused imports

* hook change

* switch None

* clear memory

* change to None

* None

* None

* memory savings

* remove redundant todo

* hack

* cheat

* Revert "cheat"

This reverts commit a8433bd0b4.

* Revert "hack"

This reverts commit 43a6d1edeb.

* update new epoch loop

* remove from old loop code

* update chlog

* update hook test

* changelog

* teardown

* integrate changes in new eval loop

* fix hook calls

* add prediction step

* bad merge

* Revert "bad merge"

This reverts commit 488080863c.

* fix train batch hook test

* rm -rf _notebooks

* update chlog

* release memory

* fix type

* notebooks mess

* debug

* Revert "debug"

This reverts commit eec4ee2f77.

* teardown

* fix teardown bug

* debug

* x

* debug

* Revert "debug"

This reverts commit a6e6101946.

Revert "debug"

This reverts commit 5ddeaec069.

debug


debug


Revert "debug"

This reverts commit 605be746f7daedf265b2c05a1c153ce543394435.

Revert "Revert "debug""

This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8.

debug


x


x


x


s


tol


x


tol

* Fix changelog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-05 09:31:39 +01:00
Yuta Hayashibe 8193bae6bd
Add periods to the documentation (#8252) 2021-07-02 16:48:55 +02:00
Carlos Mocholí 0e19d16ca6
Move result teardown to loops (#8245)
* Move result teardown to loops

* Update CHANGELOG

* Remove teardown from run

* Move previous teardown to on_run_end

* Add comment

* Merge 8250

* Remove stage set to None where it shouldnt
2021-07-02 14:36:14 +01:00
thomas chaton f3e74abad0
[feat] Add restore to base loop (#8247)
* add loop restart

* update
2021-07-02 13:40:31 +01:00
Carlos Mocholí 8a7f504b6f
Detach hiddens and add test (#8249) 2021-07-02 14:03:12 +02:00
Sean Naren 07b1ce227c
[IPU] Fix Custom Poptorch options to IPUPlugin (#8241)
* Fixes to ensure ipu options are respected

* Better setter

* Add test for poptorch Options

* Fix test

* fix ipu test

* Update pytorch_lightning/plugins/training_type/ipu.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-02 11:23:57 +00:00
Adrian Wälchli e7139ab9f7
Support `DDPPlugin` to be used on CPU (#6208)
* Skip test due to 'Python bus error'

* Debug NCCL

* Remove NCCL_DEBUG statement

* Revert "Skip test due to 'Python bus error'"

This reverts commit e0a3e8785d.

* fix

* add test

* changelog

* yapf

* patch os environ

* make a special test

* destroy pg

* debug

* revert

* revert

* problematic test

* skip

* try the fixture

* test

* update sensitive test

* update changelog

* remove comment

* update wrong test

* update test name

* parameterization

* Revert "parameterization"

This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc.

* remove conftest

* ignore test

* teardown

* fix merge

* deep speed parameterization

* uncomment test

* update chlog

* update changelog

* split tests

* update test


update test


update test


update test

* update test comments

* unroll test

* unroll test

* unroll test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* increase shm

* sudo

* unroll ipu

* Revert "sudo"

This reverts commit 6cc68c1478.

* Revert "increase shm"

This reverts commit 8c27163483.

* x

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* find guilty test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* POPTORCH_WAIT_FOR_IPU=1

* move test

* redo parameterize for ipu

* de-comment test

* move chlog

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-02 12:00:24 +01:00
Carlos Mocholí a2e41045d2
Mark some loop attributes as protected (#8250) 2021-07-02 11:51:51 +01:00
deepsource-autofix[bot] 7e2f84e050
Remove methods with unnecessary super delegation. (#8148)
* Remove methods with unnecessary super delegation.

* Update fully_sharded.py

* replace init in test

Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
2021-07-02 08:00:55 +00:00
Guillaume Tauzin baa7de2d9e
Fix truncated_bptt_steps hiddens detach() and improve docs (#8145)
* Fix truncated_bptt_steps hiddens detach()
* Improve truncated_bptt_docs
* Add missing import
* Improve documentation wordings
* pep8
* detach typo
* Update test
* Implement comments
* parametrize test
* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>

* Remove import

Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-01 22:16:14 +01:00
ananthsub 8b0aec8565
Deprecate `LightningModule.loaded_optimizer_states_dict` (#8229) 2021-07-01 23:02:29 +02:00
thomas chaton d51b0ae7fc
Add `state_dict` to loops (#8197)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-01 15:54:37 +00:00
Palermo 36b893c43e
Add `ModelSummary.max_depth` (#8062)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-01 12:08:16 +02:00
Mauricio Villegas 3c74502919
Add support for optimizers and learning rate schedulers to LightningCLI (#8093)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-01 12:04:11 +02:00
karthikrangasai 1afc1ca7ef
Logging Non-matching keys when loading from checkpoint in non-strict … (#8152)
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-30 18:33:13 +00:00
thomas chaton acb6f26006
[Refactor] Remove should_raise_exception (#8202)
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
2021-06-30 17:02:10 +00:00
deepsource-autofix[bot] c0782ffd1f
Remove unnecessary generator (#8154)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-30 11:40:13 +00:00
Carlos Mocholí 74eb6cc7e9
Clean `cuda.empty_cache` usage (#8199) 2021-06-30 13:04:24 +02:00
Ethan Harris 57dce7244c
Fix double precision casting complex buffers (#8208)
* Fix double precision casting complex buffers

* Update CHANGELOG.md

* Fixes

* Fixes

* Fix

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-30 10:57:42 +01:00
Carlos Mocholí 2e537b75e3
Deprecate `DDPPlugin.task_idx` (#8203) 2021-06-30 01:02:55 +02:00
Carlos Mocholí df601405d9
Use full `torch.distributed` import (#8200) 2021-06-29 22:44:10 +00:00
Carlos Mocholí 47c76548aa
Sync our torchmetrics wrappers after the 0.4 release (#8205)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-29 22:05:48 +00:00
Kaushik B 9444a08d56
Fix Deprecation warning in DDPSpawn (#8193) 2021-06-29 09:29:51 -07:00
thomas chaton bae08514d1
[refactor] Add should_raise_exception for gpus / tpus utilities (#8194)
* add should_raise

* update changelog

* Update pytorch_lightning/utilities/device_parser.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* add to tpu_cores parser

* add should_raise description

* update on comments

* update changelog

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-06-29 10:00:06 -04:00
Carlos Mocholí 571a810a7c
Improvements and changes to progress tracking dataclasses (#8140)
* Improvements to progress dataclasses

* Update CHANGELOG

* Rename function

* Undo CODEOWNERS update
2021-06-29 13:47:41 +01:00
Kaushik B 2a7fad92b9
Avoid passing unnecessary params from TPUSpawn to DDPSpawn (#8192) 2021-06-29 14:30:54 +02:00
Adrian Wälchli 6db0fe3659
training loop refactor - move val loop (#8120)
* EvaluationDataLoaderLoop -> EvaluationLoop

* proposed rename files

* imports

* bad merge

* update init files

* glue imports together

* rename fit_loop.validation_loop to fit_loop.val_loop

* move loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Group imports

* Resolve circular import

* Comment

* fix test

* try to resolve circ import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-29 09:06:44 +00:00
Adrian Wälchli ae34df00cc
remove deadcode in trainer (#8121) 2021-06-29 09:11:24 +01:00
Justus Schock b12a0d0a0a
Make Plugins Proxies after transfering ownership (#8117)
* Update accelerator_connector.py

* Update accelerator_connector.py

* Update accelerator_connector.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update accelerator_connector.py

* Update accelerator_connector.py

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-28 22:21:48 +01:00
Justus Schock d6435a5b73
Bugfix/swa iterable dset (#8172)
* add test

* add fix

* Update CHANGELOG.md
2021-06-28 21:18:25 +00:00
Ethan Harris b1d8840fd8
Fix metric attribute lookup (#8181)
* Fix metric attribute lookup

* Update CHANGELOG.md

* Split tests
2021-06-28 20:17:43 +00:00
Adrian Wälchli bf54ac1cad
fix NCCL error with non-consecutive trainer gpus (#8165)
* device ids in barrier


x


x


s


same fix for spawn


fix non-nccl 


x

* add changelog

* get nccl backend

* get backend

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-06-28 22:08:10 +02:00
Kaushik B 2f3c65e57b
XLA Profiler integration (#8014) 2021-06-29 00:58:05 +05:30
thomas chaton c521624a92
[bugfix] Add mechanism to prevent deadlock for DDP on Exception Trigger (#8167)
* add mechanism to prevent deadlock

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve flake8 + update changelog

* update on comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* remove space

* resolve bugs

* overwrite config

* update on comments

* update on comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* update

* update test with comments

* Update pytorch_lightning/plugins/training_type/parallel.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-28 19:26:03 +00:00
thomas chaton 1f025789fc
[bugfix] Clean Validation Sanity Checking metrics (#8171)
* resolve logging issue

* update changelog

* remove breakpoint

* resolve bugs

* remove pass
2021-06-28 13:49:56 -04:00
thomas chaton c4492ad6aa
Merge pull request #8174 from PyTorchLightning/bugfix/8159_log_gpu_memory_on_step
[bugfix] Resolve memory not logged when missing metrics
2021-06-28 09:39:17 -04:00
Ethan Harris 2a372e3682
Fix module dict in base finetuning (#8170)
* Fix module dict in base finetuning

* Update CHANGELOG.md
2021-06-28 10:55:32 +00:00
Adrian Wälchli b978d2a1f2
remove message (#8163) 2021-06-28 09:57:52 +00:00
deepsource-autofix[bot] 03154eb30a
Refactor unnecessary `else` / `elif` when `if` block has a `return` statement (#8156)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-28 15:27:41 +05:30
deepsource-autofix[bot] c3065c5ce9
Iterate dictionary directly (#8155)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-27 21:55:16 +02:00
Adrian Wälchli 51ea84222b
resurface lost ddp info message (#8111) 2021-06-27 21:51:15 +02:00
deepsource-autofix[bot] e11fe19673
Remove unnecessary use of comprehension (#8149)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-27 10:00:02 +01:00
thomas chaton 24db914093
Support state restoration of logged results 2/2(#7966)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-25 19:16:11 +00:00
DJ ad95710812
document exceptions in utilities (#8122)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-25 13:41:45 +00:00
Adrian Wälchli 55a90af7fc
`pytorch_lightning.loops` file structure: group by dataloader, epoch, and batch loop (#8077) 2021-06-24 23:40:46 +02:00
Carlos Mocholí 4d9b72b8a9
Nuke RPC (#8101) 2021-06-23 18:31:13 +00:00
Sean Naren 8bd7b1bdd7
Add torchelastic check when sanitizing GPUs (#8095)
* Add torchelastic check

* Add changelog

* Address review

* fix
2021-06-23 14:09:53 +02:00
Adrian Wälchli 4dc08e4035
Loop Refactor 6/N - Remove Old Predict Loop (#8094) 2021-06-23 14:05:06 +02:00
Adrian Wälchli fe48203111
restrict public interface of training loop (#8024)
* active optimizers

* check checkpoint callback

* epoch loop properties

* epoch loop methods

* training_batch_loop

* changelog

* update chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

* yapf

* backward

* fix missing string reference

* is_last_batch remains public

* remove dead code

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-23 10:25:29 +00:00