Commit Graph

824 Commits

Author SHA1 Message Date
Luis Perez 009e05d14f
[bugfix] Minor improvements to `apply_to_collection` and type signature of `log_dict` (#7851)
* minor fixeS

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-06-07 09:31:36 +01:00
Adrian Wälchli cfd01d7f8d
move amp checkpoint state management to precision plugin (#7831) 2021-06-07 07:45:01 +00:00
Ruotian(RT) Luo dff1047851
Fix an incorrect CHANGELOG link (#7850) 2021-06-06 23:57:23 +00:00
Sean Naren 7c7182d334
[IPU] Call accelerator hooks regardless if LM hook overridden 1/n (#7826)
* Modify API to ensure hooks defined in the accelerator are called as expected

* handle step_end in dp

* Add changelog

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Add todo and explanation

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-04 16:19:08 +00:00
thomas chaton 51d370f4c2
[doc] Move each profiler to its own file + Add missing PyTorchProfiler to the doc (#7822) 2021-06-04 21:08:29 +05:30
shuyingsunshine21 ca89a7f344
[sharded plugin] Fix check for fp16 precision (#7825)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-06-04 08:34:39 +02:00
Mauricio Villegas f34584001c
Fix support for torch Module type hints in LightningCLI (#7807)
* Fixed support for torch Module type hints in LightningCLI

* - Fix issue with serializing values when type hint is Any.
- Run unit test only on newer torchvision versions in which the base class is Module.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor change

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-04 07:43:43 +02:00
Adrian Wälchli 36770b22fd
validate manual optimization and supported features before running training (#7788)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-03 08:42:37 -07:00
Ethan Harris 03bb389b21
Fix double precision + ddp_spawn (#6924)
* Initial fix

* Initial fix

* Initial fix

* Updates

* Updates

* Update typing and docs

* Undo accidental refactor

* Remove unused imports

* Add DDP double precision test

* Remove unused variable

* Update CHANGELOG.md

* Fix test

* Update tests

* Formatting

* Revert bad change

* Add back changes

* Correct wrapping order

* Improve unwrapping

* Correct wrapping order

* Fix... finally

* Respond to comments

* Drop ddp test

* Simplify ddp spawn test

* Simplify ddp spawn test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-01 15:21:17 +00:00
Carlos Mocholí 195b24ba51
`apply_to_collection` improvements and add `apply_to_collections` (#7769)
* `apply_to_collection` improvements and add `apply_to_collections`

* Update CHANGELOG

* Minor fix

* Minor fix

* Remove attr

* Swap is first is None

* None test

* OrderedDict support

* flake8

* Fix docstring
2021-06-01 12:09:20 +00:00
Carlos Mocholí 1dd61e4e35
Extend support for logging a collection (#7771) 2021-06-01 12:51:50 +01:00
Carlos Mocholí 0dd6d3a798
Avoid adding `None` loss values in `training_epoch_end` (#7772) 2021-05-31 19:28:28 +00:00
Adrian Wälchli 7e6010fc93
fix info message when max training time reached (#7780)
* call time_elapsed

* elapsed formatting

* format

* update test

* changelog
2021-05-31 14:50:16 +02:00
Mauricio Villegas f6b5e3df57
Added save_config_filename init argument to LightningCLI (#7741) 2021-05-28 09:30:16 +02:00
Boris Dayma 9097347ea8
feat(wandb): log models as artifacts (#6231)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-27 20:15:02 +02:00
Carlos Mocholí 9304c0df8f
Rename and move Result (#7736)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-05-27 12:27:52 +00:00
Carlos Mocholí 906c067b07
Update hooks pseudocode (#7713) 2021-05-27 12:27:26 +02:00
Kaushik B 04dcb1786d
Add `__len__` method to IndexBatchSamplerWrapper (#7681)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-26 18:20:13 +02:00
Carlos Mocholí 311d9fe67e
Always run validation inside the training loop epoch (#7357)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-26 14:26:48 +02:00
Carlos Mocholí d26953c8bc
Add `ModelPruning(prune_on_train_epoch_end)` to choose when to apply pruning (#7704)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-26 00:57:56 +02:00
Xinyao(Alvin) Sun 7e2f7e956b
fix: improve UserWarning message (#7685)
* fix: improve UserWarning message
when both overfit and training dtaloader shuffling are enabled

fixes issue: #7656

* chore: update changelog

* Polish userwarning msg in pytorch_lightning/trainer/data_loading.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* shuffling typo

* Update CHANGELOG.md

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-25 17:35:15 +00:00
Kaushik B e7057d5898
Add `should_rank_save_checkpoint` property to Training Plugins (#7684)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-25 23:02:05 +05:30
Carlos Mocholí 8ba6304c73
Increment the total batch idx before the accumulation early exit (#7692)
* Increment the total batch idx before the accumulation early exit

* Update CHANGELOG
2021-05-25 10:23:40 +02:00
Jirka Borovec ad168fc4c6
chlog for 1.3.2 + legacy test (#7676) 2021-05-24 17:55:02 +00:00
Carlos Mocholí 8b01497e42
Fix global step update when the epoch is skipped (#7677)
* Fix global step update when the epoch is skipped

* Update CHANGELOG

* Move test
2021-05-24 17:36:56 +01:00
ananthsub fa41c588f4
Remove ProfilerConnector class (#7654)
* Remove ProfilerConnector class

* Update trainer.py

* Update CHANGELOG.md

* Update trainer.py

* Update trainer.py

* tests
2021-05-24 08:58:15 -07:00
Gyeongjae Choi a54bc5dba3
Fix progress bar print error when called before training (#7674)
* Check progress bar existence before printing

* Add tests for predict_progres_bar

* Add tests for progress_bar printing without training

* Update changelog
2021-05-24 17:33:28 +02:00
Xinyao(Alvin) Sun 0c958c5a1f
Fix dataloaders are not reset when tuning the model (#7566)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-24 10:21:45 +02:00
shuyingsunshine21 299f2c481b
FSDP with full state dict (#7487)
* Fix some test errors
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* checkpoint consolidation

* Update ddp_spawn.py

* Update test_metric_result_integration.py

* Update test_results.py

* Update utils.py

* Update utils.py

* Update test_all_gather_grad.py

* Update test_all_gather_grad.py

* Update test_results.py

* Revert "Update test_results.py"

This reverts commit 9d4a2b891d.

* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"

This reverts commit c5053da789, reversing
changes made to 0d23d75bc9.

* Revert "Update test_all_gather_grad.py"

This reverts commit 0d23d75bc9.

* Revert "Update utils.py"

This reverts commit 70fe5da9c6.

* Revert "Update utils.py"

This reverts commit a9aae99f6e.

* Revert "Update test_results.py"

This reverts commit ea74906878.

* Revert "Update test_metric_result_integration.py"

This reverts commit bf70e431b3.

* Revert "Update ddp_spawn.py"

This reverts commit f17210183b.

* Revert "checkpoint consolidation"

This reverts commit 536c1323b0.

* Revert "Revert "checkpoint consolidation""

This reverts commit 3a9fde915a.

* Revert "Revert "Revert "checkpoint consolidation"""

This reverts commit 7a369f47e1.

* Revert "Revert "Update ddp_spawn.py""

This reverts commit 8222dc98ea.

* Revert "Revert "Update test_metric_result_integration.py""

This reverts commit 6c095b2370.

* Revert "Revert "Update test_results.py""

This reverts commit 250d0aaaa2.

* Revert "Revert "Update utils.py""

This reverts commit 8651d54d79.

* Revert "Revert "Update test_all_gather_grad.py""

This reverts commit dcdcd29731.

* modify distributed environment to make test pass

* fix version for ddp plugin test

* fix

* fix

* changelog

* Update CHANGELOG.md

* fsdp with full state dict

* fix missing import

* modify unitest

* fix

* fix

* fix typo

* modify test and add changelog

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* limit max_epoch to 1 for testing

* test

* fix

* update

* testing remove special for multi gpu

* assert gpu

* add assertion for gpu

* fix

* Re-enable special test, use ModelCheckpoint

* Fix paths

* Fix path passing

* test

* test

* fix test

* fix

* pre-commit format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-24 08:11:45 +01:00
Xinyao(Alvin) Sun 01109cdf0c
Fix/mismatched toggle optimizer (#7563)
* fix: avoid potential mismatched toggling of optimzier
Refs #7405

chore: update CHANGELOG

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix: resolve a confict

chore: update changelog

* feat: add a test that fails in master

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo in tests/trainer/optimization/test_multiple_optimizers.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Polish tests/trainer/optimization/test_multiple_optimizers.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Polish tests/trainer/optimization/test_multiple_optimizers.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix: change placeholder in optimizer_step from positional args to keyword args

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-23 04:30:28 +02:00
shuyingsunshine21 2242423b75
refactor accelerator teardown -> training type plugin teardown (#7579) 2021-05-22 13:19:24 -07:00
Carlos Mocholí a8d9b5f783
Remove tbptt `self.log` flags and other dead code [5/n] (#7644) 2021-05-22 01:13:00 +00:00
Carlos Mocholí 33a1f5271f
[2/N] Define dataclasses for progress tracking (#7574)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-05-22 03:09:08 +02:00
ananthsub f6d892ac21
[feat] Support custom filesystems in LightningModule.to_torchscript (#7617)
* [feat] Support custom filesystems in LightningModule.to_torchscript

* Update CHANGELOG.md

* Update test_torchscript.py

* Update test_torchscript.py

* Update CHANGELOG.md

* Update test_torchscript.py
2021-05-21 11:23:15 +00:00
i-aki-y 7eafd8eac6
Add run_name argument to the MLFlowLogger constructor (#7622)
* Add run_name argument to the MLFlowLogger

* Update CHANGELOG

* Fix unnecessary line

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix style by using yapf

* Fix import error when mlflow is not installed

* Update CHANGELOG.md

* Update tests/loggers/test_mlflow.py

Co-authored-by: akiyuki ishikawa <aki.y.ishikwa@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-21 09:17:32 +01:00
Andrew Tritt 92cf396de2
Override `broadcast_object_list` for `torch<1.8` (#7592)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-20 08:29:55 +00:00
Yifu Wang ed271905cf
Clear predict_progress_bar in ProgressBar.__getstate__ (#7608)
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-20 01:38:49 +00:00
ananthsub 8266b141ba
[feat] Support time-based checkpointing during training (#7515)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-19 22:14:13 +00:00
ananthsub 9f5d4955b6
[1/N] Define dataclasses for progress tracking (#6603)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-19 21:02:20 +00:00
Jan-Henrik Lambrechts 608de6abf4
TensorBoardLogger sub_dir parameter for grouping logs (#6195)
* fixed a small typo

* cleaning up

* added sub_dir argument to tensorboard and wrote test

* sub dir arg exclusively for tensorboard, linted

* resolving merge conflict

* resolved merge conflict

* resolved merge conflict

* resolved merge conflict

* resolve merge conflict before revert

* resolving merge conflict

* reverted to pre-lint

* added tensorboard sub_dir test

* pep8 formatting

* removed sub_dir arg from test_all function:

* updated feature description

* typo in doc description

* updated CHANGELOG

* Update pytorch_lightning/loggers/tensorboard.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* swapped argument position

* added expandvars tests

* added expandvars

* removed model init

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* fix failed test

* Revert "fix failed test"

This reverts commit 50b34c66da.

* add env var to test

* fix typo in tests

* fix tests

* for test consistency

* fix typo

* fix typo 2

Co-authored-by: Ubuntu <azureuser@devhenrik.evuifrmjd4lepbj4relcwwu5va.ax.internal.cloudapp.net>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-05-19 19:50:58 +00:00
ananthsub b4e28e7169
[feat] Add stronger validation for checkpoint_callback argument (#7539)
* [feat] Add stronger validation for checkpoint_callback configuration

* chlog

* Update callback_connector.py

* Update test_model_checkpoint.py

* Update pytorch_lightning/trainer/connectors/callback_connector.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/trainer/connectors/callback_connector.py

* Update tests/checkpointing/test_model_checkpoint.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-19 19:38:08 +00:00
TOKUNAGA Hiroyuki 20f63377f8
Fix the condition for calling update_learning_rates (#7032)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-17 17:20:42 +02:00
Adrian Wälchli 502adbced3
refactor optimizer loop logic for manual and automatic optimization (#7526)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-05-17 14:42:01 +02:00
Nic Eggert f4f51e0dcf
Add kubeflow cluster environment (#7300)
* Add kubeflow cluster environment

* Add KubeflowEnvironment to docs

* Add KubeflowEnvironment to the changelog

* break up a long line

* Add method to detect kubeflow environment

* Select Kubeflow environment when available

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Run pre-commit

* task_idx == 0

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-17 09:05:24 +01:00
Adrian Wälchli 6e6e29af49
remove trainer hidden state | sanity refactor [2 / n] (#7507) 2021-05-17 08:57:15 +01:00
Mauricio Villegas d0081778f8
Enable fsspec by default for cli config file (#7521)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-17 08:53:00 +01:00
Alan Du 6ac16ff348
Fix DistribType for `ddp_cpu` (spawn) (#7492) 2021-05-14 20:53:26 +01:00
Rohit Gupta 7ca41734da
Add `dataloader_idx` to batch transfer hooks (#6241)
* replace with kwargs

* chlog

* fix

* add test

* fix

* device

* deepspeed

* pep

* optional

* docs

* bc

* comments

* pep

* mypy

* pep

* Apply suggestions from code review

* kwargs

* docs

* .

* .

* 1.3 -> 1.4

* kwargs -> step_kwargs
2021-05-13 23:03:55 +05:30
Carlos Mocholí a584196abf
Default `seed_everything(workers=True)` in the `LightningCLI` (#7504) 2021-05-13 12:18:03 +02:00
Adrian Wälchli dd1a17b071
Refactor result handling in training loop (#7506)
* refactor results

* rename dic -> dict

* simplify

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix None check

* chlog wording

* move process_closure_result to the end

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-13 09:30:34 +01:00
Jirka Borovec 298f9e5c2d
Prune deprecated utils modules (#7503)
* argparse_utils

* model_utils

* warning_utils

* xla_device_utils

* chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-13 07:24:42 +00:00
Jirka Borovec 946aee0c7b
prune data parallel (#7510) 2021-05-13 06:23:02 +01:00
Carlos Mocholí 072ad52b6b
Add `trainer.predict(ckpt_path)` (#7430)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-13 01:49:58 +02:00
Jirka Borovec d4ec75164c
Prune deprecated trainer attributes (#7501)
* use_single_gpu

* use_horovod

* use_ddp2

* use_ddp

* use_dp

* on_gpu

* use_tpu

* on_tpu

* on_cpu

* cleaning

* chlog

* Apply suggestions from code review

* Apply suggestions from code review
2021-05-12 20:10:15 +00:00
Jirka Borovec 96981091c7
Prune deprecated classif. metrics (#7499)
* stat_scores_multiple_classes

* precision_recall

* precision

* recall

* auc

* auroc

* multiclass_auroc

* iou

* clean-up

* chlog

* flake8

* imports

* prune
2021-05-12 18:03:34 +00:00
Jirka Borovec 140b0c727e
Prune deprecated trainer attributes 2 (#7502)
* accelerator_backend

* get_model

* clean

* chlog

* flake8
2021-05-12 10:19:30 -07:00
Federico Simonetta 8cdbd03d02
MLFlow now uses env variable as default tracking uri (#7457)
* Clarify logger flag

Clarify behavior of boolean values on the logger flag for Trainer.

* Update docs/source/common/trainer.rst

* doc

* MLFlow now uses env variable as default tracking uri

Solves https://github.com/PyTorchLightning/pytorch-lightning/issues/6894

* Update pytorch_lightning/loggers/mlflow.py

Co-authored-by: thomas chaton <thomas@grid.ai>

* changelog

Co-authored-by: SpontaneousDuck <kennywitham4@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-05-12 11:26:57 +02:00
shuyingsunshine21 8538c1f61e
Accelerator model state dict (#7474)
* Fix some test errors
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* checkpoint consolidation

* Update ddp_spawn.py

* Update test_metric_result_integration.py

* Update test_results.py

* Update utils.py

* Update utils.py

* Update test_all_gather_grad.py

* Update test_all_gather_grad.py

* Update test_results.py

* Revert "Update test_results.py"

This reverts commit 9d4a2b891d.

* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"

This reverts commit c5053da789, reversing
changes made to 0d23d75bc9.

* Revert "Update test_all_gather_grad.py"

This reverts commit 0d23d75bc9.

* Revert "Update utils.py"

This reverts commit 70fe5da9c6.

* Revert "Update utils.py"

This reverts commit a9aae99f6e.

* Revert "Update test_results.py"

This reverts commit ea74906878.

* Revert "Update test_metric_result_integration.py"

This reverts commit bf70e431b3.

* Revert "Update ddp_spawn.py"

This reverts commit f17210183b.

* Revert "checkpoint consolidation"

This reverts commit 536c1323b0.

* Revert "Revert "checkpoint consolidation""

This reverts commit 3a9fde915a.

* Revert "Revert "Revert "checkpoint consolidation"""

This reverts commit 7a369f47e1.

* Revert "Revert "Update ddp_spawn.py""

This reverts commit 8222dc98ea.

* Revert "Revert "Update test_metric_result_integration.py""

This reverts commit 6c095b2370.

* Revert "Revert "Update test_results.py""

This reverts commit 250d0aaaa2.

* Revert "Revert "Update utils.py""

This reverts commit 8651d54d79.

* Revert "Revert "Update test_all_gather_grad.py""

This reverts commit dcdcd29731.

* modify distributed environment to make test pass

* modify model state dict to training type plugin

* remove changes

* add changelog

* fixing isort for pre-commit failure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address code review

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-11 16:39:04 +01:00
Justus Schock 7b283e3c46
Bugfix/Multiple dataloaders (#7433)
* Update supporters.py

* Update apply_func.py

* Update supporters.py

* Update model_train_dataloaders.py

* Update model_train_steps.py

* Update test_dataloaders.py

* Update CHANGELOG.md

* Update model_train_steps.py

* Update test_dataloaders.py

* Update test_dataloaders.py

* Update supporters.py

* Update test_supporters.py

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update tests/trainer/test_dataloaders.py

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Apply suggestions from code review

Co-authored-by: Edgar Riba <edgar.riba@gmail.com>

* Update supporters.py

* Update supporters.py

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Edgar Riba <edgar.riba@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-11 16:33:29 +02:00
Jirka Borovec d7c44cc649
Docs: sync chlog 1.3.1 (#7478) 2021-05-11 12:44:22 +02:00
ananthsub fdf50a5e4b
Mark certain Trainer APIs as protected (#7420) 2021-05-11 11:53:41 +02:00
Adrian Wälchli ad9118f04a
remove trainer hidden state | sanity refactor [1 / n] (#7437)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-11 11:09:08 +02:00
David Fidalgo 4a1134db64
Log epoch metrics before firing the `on_evaluation_end` hook (#7272)
* Log epoch metrics before firing the `on_evaluation_end` hook (addresses #7166)

* test that epoch metrics are logged before `on_evaluation_end` hook

* update CHANGELOG

* Shorter test

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-11 10:54:31 +02:00
Carlos Mocholí b65ae79478
Automatically check `DataModule.has_{setup,teardown,prepare_data}` [2/2] (#7238)
* Automatically check `DataModule.has_{setup,teardown,prepare_data}`

* Use variable

* Spacing

* Docs

* Update CHANGELOG

* Remove `_DataModuleWrapper`

* Add test

* Update docs/source/extensions/datamodules.rst

* Bad merge

* add test for invalid name

* Remove ValueError

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-11 10:53:00 +02:00
shuyingsunshine21 987530cd38
Set `num_nodes` and `sync_batchnorm` From Trainer for Manually Passed Training Type Plugin (#7026)
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-08 11:25:51 +00:00
Akihiro Nitta 710b144b9b
Restore `trainer.current_epoch` after tuning (#7434)
* Add a test

* Save and restore current_epoch

* Update CHANGELOG

* alphabetical order
2021-05-08 07:15:52 +02:00
Ethan Harris 45143fd825
Improve val step logging (#7351)
* Fix val step logging

* Add a type

* Fix

* Update CHANGELOG.md
2021-05-07 22:58:03 +00:00
ananthsub f9e050c5e5
Move DP warning suppression to the DataParallel Plugin (#7421) 2021-05-07 23:02:44 +02:00
ananthsub fecce50355
Deprecate TrainerModelHooksMixin (#7422)
* Deprecate TrainerModelHooksMixin

* Update CHANGELOG.md

* Update model_hooks.py

* Update model_hooks.py
2021-05-07 13:19:36 -07:00
Carlos Mocholí 8208c330eb
Use `torch.nn.utils.clip_grad_norm_` and add `clip_grad_by_value` support for TPU (#7025)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-05-07 16:41:39 +00:00
Leonard Lausen 98b94b810c
Fix DeepSpeedPlugin with IterableDataset (#7362)
* deepspeed add train_micro_batch_size_per_gpu argument

* Update naming and doc

* Modify to use auto naming convention, add test

* Add iterable tests

* Fix tests, attempt by mocking

* Import correct package

* Fix comparison

* Set as special test

* Remove import

* Add Changelog

Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-07 10:46:03 +01:00
Jirka Borovec 28103c67c2
show mush go on (#7413)
* chlog + version

* readme

* .
2021-05-06 19:06:21 -04:00
Jirka Borovec fbc8b209f2
update versions (#7409)
* update versions

* chlog

* win

* str
2021-05-06 20:35:39 +00:00
Jirka Borovec b181b8c646
release 1.3.0 (#7404)
* v1.3.0

* ci event

* chlog

* badge

* formatting
2021-05-06 15:05:35 -04:00
Jirka Borovec d52e0a8f3e
v0.1.3.0rc3 + changelogs (#7388)
* v0.1.3.0rc3

* spaces

* wip

* wip

* wip

* wip

* prune

* wip

* wip

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-06 07:28:10 -04:00
ananthsub 7b45bcfedb
[2/2] Remove outputs from evaluation epoch end hooks (#7338)
* Remove outputs from on_train_epoch_end

* iterate

* Update callback_hook.py

* update

* early stop?

* fix

* Update pytorch_lightning/trainer/training_loop.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update trainer.py

* update

* Update training_loop.py

* early stop?

* fix

* Remove outputs from evaluation epoch end hooks

* update

* Update test_remove_1-5.py

* fix lints

* Update base.py

* rm-outputs

* Update evaluation_loop.py

* try-save-more-memory

* Update trainer.py

* Update trainer.py

* cache-at-start

* Update evaluation_loop.py

* Update training_loop.py

* Update training_loop.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-05-05 19:50:58 +00:00
Kaushik B fbcd63aa89
Update changelog for recent releases (#7387) 2021-05-05 15:25:56 -04:00
ananthsub 6104a6316a
[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks (#7339)
* Remove outputs from on_train_epoch_end

* iterate

* Update callback_hook.py

* update

* Update training_loop.py

* Update test_training_loop.py

* early stop?

* fix

* update tests

* Update test_hooks.py

* Update pytorch_lightning/trainer/callback_hook.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update pytorch_lightning/trainer/training_loop.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update trainer.py

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 17:18:16 +02:00
ananthsub 98670c83a9
Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule (#7323)
* deprecate-tbptt-trainer

* Update CHANGELOG.md

* Update lightning.py

* test

* Update lightning.py

* Update training_loop.py

* Update training_loop.py

* Update lightning.py

* Update training_loop.py

* Update training_loop.py

* update docs

* Update accelerator.py

* Update accelerator.py

* more docs

* tweaks

* chlog

* comments

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 11:21:00 +01:00
Christfried Focke 763a9a9495
Fix Namespace loading in PyYAML 5.4.x (#6673)
* Fix Namespace loading in PyYAML 5.4.x

* Remove OmegaConf reference from PyYAML requirements

* Max allowed version for pyyaml

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 22:56:11 +00:00
Carlos Mocholí 374ff750f5
Pass `current_epoch`/`global_step` as monitor candidates [1/2] (#7344)
* Pass `current_epoch`/`global_step` as monitor candidates

* Formatting

* Fix deprecated test

* Update CHANGELOG
2021-05-04 16:05:40 -04:00
Ethan Harris 2a740ebe77
Fix support for dataloader with None batches (#7342)
* Fix Dataloader None batch

* Fix Dataloader None batch

* Update CHANGELOG.md

* Fix breaking test

* Address comments
2021-05-04 12:24:03 +00:00
Carlos Mocholí 8c0ea92af2
`TrainerState` refactor [5/5] (#7173)
* `TrainerState` refactor

* flake8

* Update finished check

* Test cleanup

* Fix tests

* Fixes

* Reorder

* flake8

* Update CHANGELOG

* Better docs

* Better docs

* Remove default

* Update tests

* Bad merge
2021-05-04 12:50:56 +02:00
Adrian Wälchli a6aa1a0f82
make gpus=str in Trainer consistent with command line parsing of string (#6388)
* string gpu input

* update docs

* deprecation warning

* Revert "update docs"

This reverts commit c5f3893413.

* deprecation

* add changelog

* update parser

* update warning

* implement v1.5 behavior ahead of time

* formatting

* set accelerator in test to avoid different warning

* add warning

* remove todo warn

* Update pytorch_lightning/utilities/device_parser.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* resolve flake8

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-05-04 09:56:27 +00:00
Boris Dayma 2a20102321
fix(wandb): allow custom init args (#6989)
* feat(wandb): allow custom init args

* style: pep8

* fix: get dict args

* refactor: simplify init args

* test: test init args

* style: pep8

* docs: update CHANGELOG

* test: check default resume value

* fix: default value of anonymous

* fix: respect order of parameters

* feat: use look-up table for anonymous

* yapf formatting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 09:45:36 +00:00
Hemil Desai 82c19e1444
Update LR schedulers only when their corresponding Optimizer is being… (#4868)
* Update LR schedulers only when their corresponding Optimizer is being used.

In the case when optimizer frequencies are specified,
the LR scheduler corresponding to a particular optimizer is updated
only when that optimizer is being used in the training loop or epoch.

* pep8speak fixes

* Fix failing tests

* Add docs

* PR Feedback

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* formatting fix

* PR Feedback - part 2

* More PR feedback

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Add typing imports

* Stronger tests and fixes related to that

* Add more tests plus PR feedback

* Make optimizer_freq_cumsum a cached property

@cached_property is only available after Python 3.8 so had to do it manually.

* Fix tests

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Avoid mutable defaults

* Parametrize lr scheduling tests

* PR feedback

* Apply suggestions from code review

* spell

* Apply suggestions from code review

* flake8

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-05-04 09:37:40 +00:00
Adrian Wälchli b780af51be
update test for resume_from_checkpoint on missing file (#7255) 2021-05-04 09:16:34 +00:00
Daniel Mesejo-León 6da747e775
Deprecate `LightningModule.datamodule` reference in favor of the trainer one (#6929) (#7168)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-04 00:01:41 +00:00
Adrian Wälchli bf1394a472
improve early stopping verbose logging (#6811) 2021-05-03 20:20:48 +00:00
ananthsub 14c552bb92
[bugfix] Fix dataloading for iterable datasets and limit_train_batches (#7306)
* bugfix-dataloading

* rm-logs

* Update CHANGELOG.md

* Update test_dataloaders.py

* Update test_dataloaders.py

* Update training_loop.py

* Update test_dataloaders.py

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update test_dataloaders.py

* Update training_loop.py

* Update training_loop.py

* comments

* address comments

* more tests

* Update progress.py

* Update test_dataloaders.py

* Update test_dataloaders.py

* Update training_loop.py

* Update training_loop.py

* test ckpt fix?

* update again
2021-05-03 19:50:26 +01:00
Adrian Wälchli e0c64f0ef6
Fix Adagrad optimizer not working with DDP/GPU (#7277)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-05-03 03:57:17 +05:30
Kaushik B 490cc57809
Device updates for TPU Pod (#7243) 2021-04-30 23:14:06 +05:30
thomas chaton 16d6c9828d
[bugfix] Apex never instantiated. (#7274)
* update

* update

* update apex

* update

* update

* update

* remove test.py

* update

* update

* update on comments

* update changelog

* update

* update

* typo
2021-04-30 13:16:28 -04:00
ananthsub 44fd01734c
Move grad_norm to a dedicated utilities file (#7292)
* rm-grad-norm-mixin

* Update grads.py

* Update CHANGELOG.md

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update docstrings

* Update __init__.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-30 09:19:22 -07:00
ananthsub e407edba36
[fix] Attach train+val dataloaders to trainer in trainer loop (#7207)
* Update training_loop.py

* Update test_dataloaders.py

* changelog

* delay reload

* go back

* comments

* Update training_loop.py

* Update test_dataloaders.py

* Update tests/trainer/test_dataloaders.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-30 09:01:31 -07:00
thomas chaton 80b9ca0e38
[bugfix] Add reloading support using BaseFinetuning (#7253)
* update

* wip

* udpate

* update

* update

* update

* resolve bug

* update on comments

* update on comments

* update

* update

* formatting

* add comments

* update on comments

* update

* Update pytorch_lightning/callbacks/base.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update

* update

* Typing and minor changes

* Refactor

* Fix deprecated test

* Broken commit

* Fix broken commit

* flake8

* Update CHANGELOG

* update on comments

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-30 11:14:43 -04:00
Carlos Mocholí 5af086ab9f
Attach data refactor and tuner bugs [4/n] (#7258)
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 13:54:58 +00:00
Adrian Wälchli b9b3fa371f
fix case where an IterableDataset doesn't produce a batch for an epoch (#7294)
* wip

* fix

* add test

* refactor + test

* rm

* formatting

* update changelog

* doc

* docstring

* remove unused import

* Update CHANGELOG.md

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 12:45:55 +00:00
Adrian Wälchli 8232de427a
fix save_hyperparameters(container) if container is empty (#7268)
* fix

* add tests

* changelog

* fix test
2021-04-30 13:38:42 +01:00
ananthsub 338f5a3311
Remove exp_save_path on the LightningModule (#7266)
* deprecate-exp-save-path

* Update lightning.py

* Update CHANGELOG.md

* remove-not-deprecate
2021-04-29 17:44:04 -04:00