Kaushik B
7b6d0a842c
Fix progress bar updates for Pod Training ( #8258 )
...
* Fix progress bar updates for Pod Training
* Fix progress bar updates for Pod Training
* Add _pod_progress_bar_force_stdout
2021-07-05 10:38:38 +01:00
Adrian Wälchli
ea5cfd2005
move batch to device before sending it to hooks ( #7378 )
...
* update train step
* test
* x
* limits
* val
* typeo
* x
* x
* step
* min gpus
* run all loops
* x
* limit test
* profiler
* clean up accelerator code
* move files
* rename
* move tests
* changelog
* reorder callbacks and model hooks
* add test description
* replace unneccessary method
* fix chlog
* adjust batch_to_device for DP Plugin
* update tests for dataloader idx
* unused imports
* hook change
* switch None
* clear memory
* change to None
* None
* None
* memory savings
* remove redundant todo
* hack
* cheat
* Revert "cheat"
This reverts commit a8433bd0b4
.
* Revert "hack"
This reverts commit 43a6d1edeb
.
* update new epoch loop
* remove from old loop code
* update chlog
* update hook test
* changelog
* teardown
* integrate changes in new eval loop
* fix hook calls
* add prediction step
* bad merge
* Revert "bad merge"
This reverts commit 488080863c
.
* fix train batch hook test
* rm -rf _notebooks
* update chlog
* release memory
* fix type
* notebooks mess
* debug
* Revert "debug"
This reverts commit eec4ee2f77
.
* teardown
* fix teardown bug
* debug
* x
* debug
* Revert "debug"
This reverts commit a6e6101946
.
Revert "debug"
This reverts commit 5ddeaec069
.
debug
debug
Revert "debug"
This reverts commit 605be746f7daedf265b2c05a1c153ce543394435.
Revert "Revert "debug""
This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8.
debug
x
x
x
s
tol
x
tol
* Fix changelog
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-05 09:31:39 +01:00
Yuta Hayashibe
8193bae6bd
Add periods to the documentation ( #8252 )
2021-07-02 16:48:55 +02:00
Carlos Mocholí
0e19d16ca6
Move result teardown to loops ( #8245 )
...
* Move result teardown to loops
* Update CHANGELOG
* Remove teardown from run
* Move previous teardown to on_run_end
* Add comment
* Merge 8250
* Remove stage set to None where it shouldnt
2021-07-02 14:36:14 +01:00
thomas chaton
f3e74abad0
[feat] Add restore to base loop ( #8247 )
...
* add loop restart
* update
2021-07-02 13:40:31 +01:00
Carlos Mocholí
8a7f504b6f
Detach hiddens and add test ( #8249 )
2021-07-02 14:03:12 +02:00
Sean Naren
07b1ce227c
[IPU] Fix Custom Poptorch options to IPUPlugin ( #8241 )
...
* Fixes to ensure ipu options are respected
* Better setter
* Add test for poptorch Options
* Fix test
* fix ipu test
* Update pytorch_lightning/plugins/training_type/ipu.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-02 11:23:57 +00:00
Adrian Wälchli
e7139ab9f7
Support `DDPPlugin` to be used on CPU ( #6208 )
...
* Skip test due to 'Python bus error'
* Debug NCCL
* Remove NCCL_DEBUG statement
* Revert "Skip test due to 'Python bus error'"
This reverts commit e0a3e8785d
.
* fix
* add test
* changelog
* yapf
* patch os environ
* make a special test
* destroy pg
* debug
* revert
* revert
* problematic test
* skip
* try the fixture
* test
* update sensitive test
* update changelog
* remove comment
* update wrong test
* update test name
* parameterization
* Revert "parameterization"
This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc.
* remove conftest
* ignore test
* teardown
* fix merge
* deep speed parameterization
* uncomment test
* update chlog
* update changelog
* split tests
* update test
update test
update test
update test
* update test comments
* unroll test
* unroll test
* unroll test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* increase shm
* sudo
* unroll ipu
* Revert "sudo"
This reverts commit 6cc68c1478
.
* Revert "increase shm"
This reverts commit 8c27163483
.
* x
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* find guilty test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* POPTORCH_WAIT_FOR_IPU=1
* move test
* redo parameterize for ipu
* de-comment test
* move chlog
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-02 12:00:24 +01:00
Carlos Mocholí
a2e41045d2
Mark some loop attributes as protected ( #8250 )
2021-07-02 11:51:51 +01:00
deepsource-autofix[bot]
7e2f84e050
Remove methods with unnecessary super delegation. ( #8148 )
...
* Remove methods with unnecessary super delegation.
* Update fully_sharded.py
* replace init in test
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
2021-07-02 08:00:55 +00:00
Guillaume Tauzin
baa7de2d9e
Fix truncated_bptt_steps hiddens detach() and improve docs ( #8145 )
...
* Fix truncated_bptt_steps hiddens detach()
* Improve truncated_bptt_docs
* Add missing import
* Improve documentation wordings
* pep8
* detach typo
* Update test
* Implement comments
* parametrize test
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
* Remove import
Signed-off-by: Guillaume Tauzin <guillaumetauzin.ut@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-01 22:16:14 +01:00
ananthsub
8b0aec8565
Deprecate `LightningModule.loaded_optimizer_states_dict` ( #8229 )
2021-07-01 23:02:29 +02:00
thomas chaton
d51b0ae7fc
Add `state_dict` to loops ( #8197 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-01 15:54:37 +00:00
Palermo
36b893c43e
Add `ModelSummary.max_depth` ( #8062 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-01 12:08:16 +02:00
Mauricio Villegas
3c74502919
Add support for optimizers and learning rate schedulers to LightningCLI ( #8093 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-01 12:04:11 +02:00
karthikrangasai
1afc1ca7ef
Logging Non-matching keys when loading from checkpoint in non-strict … ( #8152 )
...
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-30 18:33:13 +00:00
thomas chaton
acb6f26006
[Refactor] Remove should_raise_exception ( #8202 )
...
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
2021-06-30 17:02:10 +00:00
deepsource-autofix[bot]
c0782ffd1f
Remove unnecessary generator ( #8154 )
...
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-30 11:40:13 +00:00
Carlos Mocholí
74eb6cc7e9
Clean `cuda.empty_cache` usage ( #8199 )
2021-06-30 13:04:24 +02:00
Ethan Harris
57dce7244c
Fix double precision casting complex buffers ( #8208 )
...
* Fix double precision casting complex buffers
* Update CHANGELOG.md
* Fixes
* Fixes
* Fix
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-30 10:57:42 +01:00
Carlos Mocholí
2e537b75e3
Deprecate `DDPPlugin.task_idx` ( #8203 )
2021-06-30 01:02:55 +02:00
Carlos Mocholí
df601405d9
Use full `torch.distributed` import ( #8200 )
2021-06-29 22:44:10 +00:00
Carlos Mocholí
47c76548aa
Sync our torchmetrics wrappers after the 0.4 release ( #8205 )
...
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-29 22:05:48 +00:00
Kaushik B
9444a08d56
Fix Deprecation warning in DDPSpawn ( #8193 )
2021-06-29 09:29:51 -07:00
thomas chaton
bae08514d1
[refactor] Add should_raise_exception for gpus / tpus utilities ( #8194 )
...
* add should_raise
* update changelog
* Update pytorch_lightning/utilities/device_parser.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* add to tpu_cores parser
* add should_raise description
* update on comments
* update changelog
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-06-29 10:00:06 -04:00
Carlos Mocholí
571a810a7c
Improvements and changes to progress tracking dataclasses ( #8140 )
...
* Improvements to progress dataclasses
* Update CHANGELOG
* Rename function
* Undo CODEOWNERS update
2021-06-29 13:47:41 +01:00
Kaushik B
2a7fad92b9
Avoid passing unnecessary params from TPUSpawn to DDPSpawn ( #8192 )
2021-06-29 14:30:54 +02:00
Adrian Wälchli
6db0fe3659
training loop refactor - move val loop ( #8120 )
...
* EvaluationDataLoaderLoop -> EvaluationLoop
* proposed rename files
* imports
* bad merge
* update init files
* glue imports together
* rename fit_loop.validation_loop to fit_loop.val_loop
* move loop
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Group imports
* Resolve circular import
* Comment
* fix test
* try to resolve circ import
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-29 09:06:44 +00:00
Adrian Wälchli
ae34df00cc
remove deadcode in trainer ( #8121 )
2021-06-29 09:11:24 +01:00
Justus Schock
b12a0d0a0a
Make Plugins Proxies after transfering ownership ( #8117 )
...
* Update accelerator_connector.py
* Update accelerator_connector.py
* Update accelerator_connector.py
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update accelerator_connector.py
* Update accelerator_connector.py
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-28 22:21:48 +01:00
Justus Schock
d6435a5b73
Bugfix/swa iterable dset ( #8172 )
...
* add test
* add fix
* Update CHANGELOG.md
2021-06-28 21:18:25 +00:00
Ethan Harris
b1d8840fd8
Fix metric attribute lookup ( #8181 )
...
* Fix metric attribute lookup
* Update CHANGELOG.md
* Split tests
2021-06-28 20:17:43 +00:00
Adrian Wälchli
bf54ac1cad
fix NCCL error with non-consecutive trainer gpus ( #8165 )
...
* device ids in barrier
x
x
s
same fix for spawn
fix non-nccl
x
* add changelog
* get nccl backend
* get backend
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-06-28 22:08:10 +02:00
Kaushik B
2f3c65e57b
XLA Profiler integration ( #8014 )
2021-06-29 00:58:05 +05:30
thomas chaton
c521624a92
[bugfix] Add mechanism to prevent deadlock for DDP on Exception Trigger ( #8167 )
...
* add mechanism to prevent deadlock
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* resolve flake8 + update changelog
* update on comments
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
* remove space
* resolve bugs
* overwrite config
* update on comments
* update on comments
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
* update
* update test with comments
* Update pytorch_lightning/plugins/training_type/parallel.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-28 19:26:03 +00:00
thomas chaton
1f025789fc
[bugfix] Clean Validation Sanity Checking metrics ( #8171 )
...
* resolve logging issue
* update changelog
* remove breakpoint
* resolve bugs
* remove pass
2021-06-28 13:49:56 -04:00
thomas chaton
c4492ad6aa
Merge pull request #8174 from PyTorchLightning/bugfix/8159_log_gpu_memory_on_step
...
[bugfix] Resolve memory not logged when missing metrics
2021-06-28 09:39:17 -04:00
Ethan Harris
2a372e3682
Fix module dict in base finetuning ( #8170 )
...
* Fix module dict in base finetuning
* Update CHANGELOG.md
2021-06-28 10:55:32 +00:00
Adrian Wälchli
b978d2a1f2
remove message ( #8163 )
2021-06-28 09:57:52 +00:00
deepsource-autofix[bot]
03154eb30a
Refactor unnecessary `else` / `elif` when `if` block has a `return` statement ( #8156 )
...
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-28 15:27:41 +05:30
deepsource-autofix[bot]
c3065c5ce9
Iterate dictionary directly ( #8155 )
...
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-27 21:55:16 +02:00
Adrian Wälchli
51ea84222b
resurface lost ddp info message ( #8111 )
2021-06-27 21:51:15 +02:00
deepsource-autofix[bot]
e11fe19673
Remove unnecessary use of comprehension ( #8149 )
...
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-27 10:00:02 +01:00
thomas chaton
24db914093
Support state restoration of logged results 2/2( #7966 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-25 19:16:11 +00:00
DJ
ad95710812
document exceptions in utilities ( #8122 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-25 13:41:45 +00:00
Adrian Wälchli
55a90af7fc
`pytorch_lightning.loops` file structure: group by dataloader, epoch, and batch loop ( #8077 )
2021-06-24 23:40:46 +02:00
Carlos Mocholí
4d9b72b8a9
Nuke RPC ( #8101 )
2021-06-23 18:31:13 +00:00
Sean Naren
8bd7b1bdd7
Add torchelastic check when sanitizing GPUs ( #8095 )
...
* Add torchelastic check
* Add changelog
* Address review
* fix
2021-06-23 14:09:53 +02:00
Adrian Wälchli
4dc08e4035
Loop Refactor 6/N - Remove Old Predict Loop ( #8094 )
2021-06-23 14:05:06 +02:00
Adrian Wälchli
fe48203111
restrict public interface of training loop ( #8024 )
...
* active optimizers
* check checkpoint callback
* epoch loop properties
* epoch loop methods
* training_batch_loop
* changelog
* update chlog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* unused imports
* yapf
* backward
* fix missing string reference
* is_last_batch remains public
* remove dead code
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-23 10:25:29 +00:00