Kaushik B
27eb0035ca
Increase TPU Check timeout ( #7706 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-26 01:44:29 +00:00
Carlos Mocholí
d26953c8bc
Add `ModelPruning(prune_on_train_epoch_end)` to choose when to apply pruning ( #7704 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-26 00:57:56 +02:00
Xinyao(Alvin) Sun
7e2f7e956b
fix: improve UserWarning message ( #7685 )
...
* fix: improve UserWarning message
when both overfit and training dtaloader shuffling are enabled
fixes issue: #7656
* chore: update changelog
* Polish userwarning msg in pytorch_lightning/trainer/data_loading.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* shuffling typo
* Update CHANGELOG.md
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-25 17:35:15 +00:00
Kaushik B
e7057d5898
Add `should_rank_save_checkpoint` property to Training Plugins ( #7684 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-25 23:02:05 +05:30
Carlos Mocholí
a1c40f3207
Remove on epoch guard from the should stop validation check ( #7701 )
...
* Remove on epoch guard from the should stop validation check
* Formatting
2021-05-25 15:59:42 +01:00
Carlos Mocholí
e2ead9abd7
Refactor some loops code and hook tests ( #7682 )
2021-05-25 13:27:54 +02:00
Carlos Mocholí
8ba6304c73
Increment the total batch idx before the accumulation early exit ( #7692 )
...
* Increment the total batch idx before the accumulation early exit
* Update CHANGELOG
2021-05-25 10:23:40 +02:00
Carlos Mocholí
8b01497e42
Fix global step update when the epoch is skipped ( #7677 )
...
* Fix global step update when the epoch is skipped
* Update CHANGELOG
* Move test
2021-05-24 17:36:56 +01:00
Kaushik B
3f460b150a
Move parameter validation specific to TPU Training plugins ( #7415 )
...
* Move parameter validation specific to TPU Training plugins
* update docstring
2021-05-24 16:02:01 +00:00
ananthsub
fa41c588f4
Remove ProfilerConnector class ( #7654 )
...
* Remove ProfilerConnector class
* Update trainer.py
* Update CHANGELOG.md
* Update trainer.py
* Update trainer.py
* tests
2021-05-24 08:58:15 -07:00
Gyeongjae Choi
a54bc5dba3
Fix progress bar print error when called before training ( #7674 )
...
* Check progress bar existence before printing
* Add tests for predict_progres_bar
* Add tests for progress_bar printing without training
* Update changelog
2021-05-24 17:33:28 +02:00
Carlos Mocholí
2103b5efc9
Move sync code from step result to lightning module [6/n] ( #7651 )
2021-05-24 13:13:55 +01:00
Xinyao(Alvin) Sun
0c958c5a1f
Fix dataloaders are not reset when tuning the model ( #7566 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-24 10:21:45 +02:00
shuyingsunshine21
299f2c481b
FSDP with full state dict ( #7487 )
...
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d
.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789
, reversing
changes made to 0d23d75bc9
.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9
.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6
.
* Revert "Update utils.py"
This reverts commit a9aae99f6e
.
* Revert "Update test_results.py"
This reverts commit ea74906878
.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3
.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b
.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0
.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a
.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1
.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea
.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370
.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2
.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79
.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731
.
* modify distributed environment to make test pass
* fix version for ddp plugin test
* fix
* fix
* changelog
* Update CHANGELOG.md
* fsdp with full state dict
* fix missing import
* modify unitest
* fix
* fix
* fix typo
* modify test and add changelog
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* limit max_epoch to 1 for testing
* test
* fix
* update
* testing remove special for multi gpu
* assert gpu
* add assertion for gpu
* fix
* Re-enable special test, use ModelCheckpoint
* Fix paths
* Fix path passing
* test
* test
* fix test
* fix
* pre-commit format
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-24 08:11:45 +01:00
Xinyao(Alvin) Sun
01109cdf0c
Fix/mismatched toggle optimizer ( #7563 )
...
* fix: avoid potential mismatched toggling of optimzier
Refs #7405
chore: update CHANGELOG
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
fix: resolve a confict
chore: update changelog
* feat: add a test that fails in master
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typo in tests/trainer/optimization/test_multiple_optimizers.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Polish tests/trainer/optimization/test_multiple_optimizers.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Polish tests/trainer/optimization/test_multiple_optimizers.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fix: change placeholder in optimizer_step from positional args to keyword args
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-23 04:30:28 +02:00
shuyingsunshine21
2242423b75
refactor accelerator teardown -> training type plugin teardown ( #7579 )
2021-05-22 13:19:24 -07:00
Carlos Mocholí
a8d9b5f783
Remove tbptt `self.log` flags and other dead code [5/n] ( #7644 )
2021-05-22 01:13:00 +00:00
Carlos Mocholí
33a1f5271f
[2/N] Define dataclasses for progress tracking ( #7574 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-05-22 03:09:08 +02:00
Yifu Wang
8d6e2ff7b2
Improve argument validation for validate(), test(), and predict() ( #7605 )
...
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
2021-05-21 09:03:16 -07:00
ananthsub
f6d892ac21
[feat] Support custom filesystems in LightningModule.to_torchscript ( #7617 )
...
* [feat] Support custom filesystems in LightningModule.to_torchscript
* Update CHANGELOG.md
* Update test_torchscript.py
* Update test_torchscript.py
* Update CHANGELOG.md
* Update test_torchscript.py
2021-05-21 11:23:15 +00:00
Carlos Mocholí
e8a46bee15
Remove `Result(minimize)` parameter [4/n] ( #7628 )
2021-05-21 12:58:52 +02:00
Carlos Mocholí
603ef2cf7f
Use `trainer.call_hook` in the evaluation loop ( #7626 )
2021-05-21 11:54:52 +01:00
Carlos Mocholí
3d4dd28bec
Replace `CallbackHookNameValidator` with `FxValidator` [3/n] ( #7627 )
...
* Refactor FxValidator
* Fix tests
* Fix tests
* Class attribute
* Fix tests
* Better error message
* Fix tests
* Update pytorch_lightning/trainer/connectors/logger_connector/fx_validator.py
2021-05-21 11:54:16 +01:00
i-aki-y
7eafd8eac6
Add run_name argument to the MLFlowLogger constructor ( #7622 )
...
* Add run_name argument to the MLFlowLogger
* Update CHANGELOG
* Fix unnecessary line
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix style by using yapf
* Fix import error when mlflow is not installed
* Update CHANGELOG.md
* Update tests/loggers/test_mlflow.py
Co-authored-by: akiyuki ishikawa <aki.y.ishikwa@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-21 09:17:32 +01:00
ananthsub
94ef17ce77
Update model_checkpoint.py ( #7625 )
2021-05-20 23:16:18 +02:00
Andrew Tritt
92cf396de2
Override `broadcast_object_list` for `torch<1.8` ( #7592 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-20 08:29:55 +00:00
Yifu Wang
ed271905cf
Clear predict_progress_bar in ProgressBar.__getstate__ ( #7608 )
...
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-20 01:38:49 +00:00
ananthsub
8266b141ba
[feat] Support time-based checkpointing during training ( #7515 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-19 22:14:13 +00:00
ananthsub
9f5d4955b6
[1/N] Define dataclasses for progress tracking ( #6603 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-19 21:02:20 +00:00
Carlos Mocholí
901b2bac98
Unify `current_fx_name` and `current_hook_fx_name` [2/n] ( #7594 )
...
* Minor loggger connector cleanup [1/n]
* Missing line
* Address comments
* Rely on validator
* Unify `current_fx_name` and `current_hook_fx_name`
* Fix test
2021-05-19 20:31:06 +00:00
Carlos Mocholí
dbea5bb710
Add typing to `ModelPruning` callback ( #7529 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-19 22:01:42 +02:00
Jan-Henrik Lambrechts
608de6abf4
TensorBoardLogger sub_dir parameter for grouping logs ( #6195 )
...
* fixed a small typo
* cleaning up
* added sub_dir argument to tensorboard and wrote test
* sub dir arg exclusively for tensorboard, linted
* resolving merge conflict
* resolved merge conflict
* resolved merge conflict
* resolved merge conflict
* resolve merge conflict before revert
* resolving merge conflict
* reverted to pre-lint
* added tensorboard sub_dir test
* pep8 formatting
* removed sub_dir arg from test_all function:
* updated feature description
* typo in doc description
* updated CHANGELOG
* Update pytorch_lightning/loggers/tensorboard.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* swapped argument position
* added expandvars tests
* added expandvars
* removed model init
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix tests
* fix failed test
* Revert "fix failed test"
This reverts commit 50b34c66da
.
* add env var to test
* fix typo in tests
* fix tests
* for test consistency
* fix typo
* fix typo 2
Co-authored-by: Ubuntu <azureuser@devhenrik.evuifrmjd4lepbj4relcwwu5va.ax.internal.cloudapp.net>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-05-19 19:50:58 +00:00
ananthsub
b4e28e7169
[feat] Add stronger validation for checkpoint_callback argument ( #7539 )
...
* [feat] Add stronger validation for checkpoint_callback configuration
* chlog
* Update callback_connector.py
* Update test_model_checkpoint.py
* Update pytorch_lightning/trainer/connectors/callback_connector.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/trainer/connectors/callback_connector.py
* Update tests/checkpointing/test_model_checkpoint.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update CHANGELOG.md
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-19 19:38:08 +00:00
Carlos Mocholí
76ff600898
Minor logger connector cleanup [1/n] ( #7590 )
...
* Minor loggger connector cleanup [1/n]
* Missing line
* Address comments
* Rely on validator
2021-05-19 19:25:32 +00:00
TOKUNAGA Hiroyuki
20f63377f8
Fix the condition for calling update_learning_rates ( #7032 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-17 17:20:42 +02:00
Adrian Wälchli
502adbced3
refactor optimizer loop logic for manual and automatic optimization ( #7526 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-05-17 14:42:01 +02:00
Kaushik B
bf46730d92
Support TPU Pod Training (n/n) ( #7296 )
2021-05-17 11:33:44 +00:00
Nic Eggert
f4f51e0dcf
Add kubeflow cluster environment ( #7300 )
...
* Add kubeflow cluster environment
* Add KubeflowEnvironment to docs
* Add KubeflowEnvironment to the changelog
* break up a long line
* Add method to detect kubeflow environment
* Select Kubeflow environment when available
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Run pre-commit
* task_idx == 0
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-17 09:05:24 +01:00
Adrian Wälchli
6e6e29af49
remove trainer hidden state | sanity refactor [2 / n] ( #7507 )
2021-05-17 08:57:15 +01:00
Mauricio Villegas
d0081778f8
Enable fsspec by default for cli config file ( #7521 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-17 08:53:00 +01:00
Alan Du
6ac16ff348
Fix DistribType for `ddp_cpu` (spawn) ( #7492 )
2021-05-14 20:53:26 +01:00
Rohit Gupta
7ca41734da
Add `dataloader_idx` to batch transfer hooks ( #6241 )
...
* replace with kwargs
* chlog
* fix
* add test
* fix
* device
* deepspeed
* pep
* optional
* docs
* bc
* comments
* pep
* mypy
* pep
* Apply suggestions from code review
* kwargs
* docs
* .
* .
* 1.3 -> 1.4
* kwargs -> step_kwargs
2021-05-13 23:03:55 +05:30
Carlos Mocholí
a584196abf
Default `seed_everything(workers=True)` in the `LightningCLI` ( #7504 )
2021-05-13 12:18:03 +02:00
Adrian Wälchli
dd1a17b071
Refactor result handling in training loop ( #7506 )
...
* refactor results
* rename dic -> dict
* simplify
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* changelog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix None check
* chlog wording
* move process_closure_result to the end
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-13 09:30:34 +01:00
Jirka Borovec
298f9e5c2d
Prune deprecated utils modules ( #7503 )
...
* argparse_utils
* model_utils
* warning_utils
* xla_device_utils
* chlog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-13 07:24:42 +00:00
Jirka Borovec
946aee0c7b
prune data parallel ( #7510 )
2021-05-13 06:23:02 +01:00
Carlos Mocholí
072ad52b6b
Add `trainer.predict(ckpt_path)` ( #7430 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-13 01:49:58 +02:00
Jirka Borovec
d4ec75164c
Prune deprecated trainer attributes ( #7501 )
...
* use_single_gpu
* use_horovod
* use_ddp2
* use_ddp
* use_dp
* on_gpu
* use_tpu
* on_tpu
* on_cpu
* cleaning
* chlog
* Apply suggestions from code review
* Apply suggestions from code review
2021-05-12 20:10:15 +00:00
Jirka Borovec
96981091c7
Prune deprecated classif. metrics ( #7499 )
...
* stat_scores_multiple_classes
* precision_recall
* precision
* recall
* auc
* auroc
* multiclass_auroc
* iou
* clean-up
* chlog
* flake8
* imports
* prune
2021-05-12 18:03:34 +00:00
Jirka Borovec
140b0c727e
Prune deprecated trainer attributes 2 ( #7502 )
...
* accelerator_backend
* get_model
* clean
* chlog
* flake8
2021-05-12 10:19:30 -07:00
Carlos Mocholí
83283fdb20
Fix yapf-isort conflict ( #7500 )
2021-05-12 15:44:57 +02:00
Federico Simonetta
8cdbd03d02
MLFlow now uses env variable as default tracking uri ( #7457 )
...
* Clarify logger flag
Clarify behavior of boolean values on the logger flag for Trainer.
* Update docs/source/common/trainer.rst
* doc
* MLFlow now uses env variable as default tracking uri
Solves https://github.com/PyTorchLightning/pytorch-lightning/issues/6894
* Update pytorch_lightning/loggers/mlflow.py
Co-authored-by: thomas chaton <thomas@grid.ai>
* changelog
Co-authored-by: SpontaneousDuck <kennywitham4@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-05-12 11:26:57 +02:00
Christopher Ehmann
b9a52fa2ef
added stage param to LightningDataModule.setup example ( #7483 )
...
Co-authored-by: Sileadim <christopher@omnius.com>
2021-05-11 23:43:22 +05:30
shuyingsunshine21
8538c1f61e
Accelerator model state dict ( #7474 )
...
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d
.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789
, reversing
changes made to 0d23d75bc9
.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9
.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6
.
* Revert "Update utils.py"
This reverts commit a9aae99f6e
.
* Revert "Update test_results.py"
This reverts commit ea74906878
.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3
.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b
.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0
.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a
.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1
.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea
.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370
.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2
.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79
.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731
.
* modify distributed environment to make test pass
* modify model state dict to training type plugin
* remove changes
* add changelog
* fixing isort for pre-commit failure
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Address code review
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-11 16:39:04 +01:00
Justus Schock
7b283e3c46
Bugfix/Multiple dataloaders ( #7433 )
...
* Update supporters.py
* Update apply_func.py
* Update supporters.py
* Update model_train_dataloaders.py
* Update model_train_steps.py
* Update test_dataloaders.py
* Update CHANGELOG.md
* Update model_train_steps.py
* Update test_dataloaders.py
* Update test_dataloaders.py
* Update supporters.py
* Update test_supporters.py
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update tests/trainer/test_dataloaders.py
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
* Apply suggestions from code review
Co-authored-by: Edgar Riba <edgar.riba@gmail.com>
* Update supporters.py
* Update supporters.py
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Edgar Riba <edgar.riba@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-11 16:33:29 +02:00
ananthsub
fdf50a5e4b
Mark certain Trainer APIs as protected ( #7420 )
2021-05-11 11:53:41 +02:00
Adrian Wälchli
ad9118f04a
remove trainer hidden state | sanity refactor [1 / n] ( #7437 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-11 11:09:08 +02:00
David Fidalgo
4a1134db64
Log epoch metrics before firing the `on_evaluation_end` hook ( #7272 )
...
* Log epoch metrics before firing the `on_evaluation_end` hook (addresses #7166 )
* test that epoch metrics are logged before `on_evaluation_end` hook
* update CHANGELOG
* Shorter test
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-11 10:54:31 +02:00
Carlos Mocholí
b65ae79478
Automatically check `DataModule.has_{setup,teardown,prepare_data}` [2/2] ( #7238 )
...
* Automatically check `DataModule.has_{setup,teardown,prepare_data}`
* Use variable
* Spacing
* Docs
* Update CHANGELOG
* Remove `_DataModuleWrapper`
* Add test
* Update docs/source/extensions/datamodules.rst
* Bad merge
* add test for invalid name
* Remove ValueError
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-11 10:53:00 +02:00
Adrian Wälchli
6bc616d78f
fix display bug ( #7395 )
2021-05-10 11:26:15 +08:00
shuyingsunshine21
987530cd38
Set `num_nodes` and `sync_batchnorm` From Trainer for Manually Passed Training Type Plugin ( #7026 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-08 11:25:51 +00:00
Akihiro Nitta
710b144b9b
Restore `trainer.current_epoch` after tuning ( #7434 )
...
* Add a test
* Save and restore current_epoch
* Update CHANGELOG
* alphabetical order
2021-05-08 07:15:52 +02:00
Ethan Harris
45143fd825
Improve val step logging ( #7351 )
...
* Fix val step logging
* Add a type
* Fix
* Update CHANGELOG.md
2021-05-07 22:58:03 +00:00
ananthsub
f9e050c5e5
Move DP warning suppression to the DataParallel Plugin ( #7421 )
2021-05-07 23:02:44 +02:00
ananthsub
fecce50355
Deprecate TrainerModelHooksMixin ( #7422 )
...
* Deprecate TrainerModelHooksMixin
* Update CHANGELOG.md
* Update model_hooks.py
* Update model_hooks.py
2021-05-07 13:19:36 -07:00
Carlos Mocholí
8208c330eb
Use `torch.nn.utils.clip_grad_norm_` and add `clip_grad_by_value` support for TPU ( #7025 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-05-07 16:41:39 +00:00
Carlos Mocholí
9ba76ce60c
Unify `configure_optimizers` docs ( #7399 )
2021-05-07 16:10:24 +02:00
Leonard Lausen
98b94b810c
Fix DeepSpeedPlugin with IterableDataset ( #7362 )
...
* deepspeed add train_micro_batch_size_per_gpu argument
* Update naming and doc
* Modify to use auto naming convention, add test
* Add iterable tests
* Fix tests, attempt by mocking
* Import correct package
* Fix comparison
* Set as special test
* Remove import
* Add Changelog
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-07 10:46:03 +01:00
Jirka Borovec
28103c67c2
show mush go on ( #7413 )
...
* chlog + version
* readme
* .
2021-05-06 19:06:21 -04:00
Jirka Borovec
b181b8c646
release 1.3.0 ( #7404 )
...
* v1.3.0
* ci event
* chlog
* badge
* formatting
2021-05-06 15:05:35 -04:00
Gyeongjae Choi
d9bdc56b6a
Add _gpus_arg_default in argparse_utils for backward compatibility ( #7402 )
2021-05-06 13:35:12 +00:00
Jirka Borovec
d52e0a8f3e
v0.1.3.0rc3 + changelogs ( #7388 )
...
* v0.1.3.0rc3
* spaces
* wip
* wip
* wip
* wip
* prune
* wip
* wip
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-06 07:28:10 -04:00
Martin Kristiansen
c3fc0313ef
Updating docs and error message: half precision not available on CPU ( #7384 )
...
* Updating docs and error message to specify that half precission not available on CPU
* update messages
Co-authored-by: Martin Kristiansen <martinkristiansen@sixgill.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-05-06 09:05:50 +00:00
Carlos Mocholí
6ad05d3338
Update `configure_optimizers` docs ( #7390 )
...
* Update `configure_optimizers` docs
* Update pytorch_lightning/core/lightning.py
2021-05-06 10:39:01 +02:00
ananthsub
651f93a69f
Add documentation for ways to access all batch outputs for on_train_epoch_end hook ( #7389 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 22:18:45 +00:00
ananthsub
7b45bcfedb
[2/2] Remove outputs from evaluation epoch end hooks ( #7338 )
...
* Remove outputs from on_train_epoch_end
* iterate
* Update callback_hook.py
* update
* early stop?
* fix
* Update pytorch_lightning/trainer/training_loop.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update trainer.py
* update
* Update training_loop.py
* early stop?
* fix
* Remove outputs from evaluation epoch end hooks
* update
* Update test_remove_1-5.py
* fix lints
* Update base.py
* rm-outputs
* Update evaluation_loop.py
* try-save-more-memory
* Update trainer.py
* Update trainer.py
* cache-at-start
* Update evaluation_loop.py
* Update training_loop.py
* Update training_loop.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-05-05 19:50:58 +00:00
ananthsub
6104a6316a
[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks ( #7339 )
...
* Remove outputs from on_train_epoch_end
* iterate
* Update callback_hook.py
* update
* Update training_loop.py
* Update test_training_loop.py
* early stop?
* fix
* update tests
* Update test_hooks.py
* Update pytorch_lightning/trainer/callback_hook.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update pytorch_lightning/trainer/training_loop.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update trainer.py
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 17:18:16 +02:00
ananthsub
98670c83a9
Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule ( #7323 )
...
* deprecate-tbptt-trainer
* Update CHANGELOG.md
* Update lightning.py
* test
* Update lightning.py
* Update training_loop.py
* Update training_loop.py
* Update lightning.py
* Update training_loop.py
* Update training_loop.py
* update docs
* Update accelerator.py
* Update accelerator.py
* more docs
* tweaks
* chlog
* comments
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 11:21:00 +01:00
Kaushik B
e21b7a62d7
Add ddp_find_unused_parameters_false to Registry ( #7224 )
2021-05-04 22:40:00 +00:00
Carlos Mocholí
374ff750f5
Pass `current_epoch`/`global_step` as monitor candidates [1/2] ( #7344 )
...
* Pass `current_epoch`/`global_step` as monitor candidates
* Formatting
* Fix deprecated test
* Update CHANGELOG
2021-05-04 16:05:40 -04:00
Ethan Harris
2a740ebe77
Fix support for dataloader with None batches ( #7342 )
...
* Fix Dataloader None batch
* Fix Dataloader None batch
* Update CHANGELOG.md
* Fix breaking test
* Address comments
2021-05-04 12:24:03 +00:00
ramonemiliani93
5db832f181
Fix auto scaling mode when calling tune method on trainer. ( #7321 )
...
* Add test for non-existing mode, the test should fail if something different from `power` or `binsearch` is passed.
* Add newline.
* Apply fix
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update tests/tuner/test_scale_batch_size.py
* Update pytorch_lightning/tuner/batch_size_scaling.py
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-05-04 12:03:51 +00:00
ananthsub
69cf63e2fd
Update trainer.py ( #7340 )
2021-05-04 11:11:27 +00:00
Carlos Mocholí
8c0ea92af2
`TrainerState` refactor [5/5] ( #7173 )
...
* `TrainerState` refactor
* flake8
* Update finished check
* Test cleanup
* Fix tests
* Fixes
* Reorder
* flake8
* Update CHANGELOG
* Better docs
* Better docs
* Remove default
* Update tests
* Bad merge
2021-05-04 12:50:56 +02:00
Adrian Wälchli
a6aa1a0f82
make gpus=str in Trainer consistent with command line parsing of string ( #6388 )
...
* string gpu input
* update docs
* deprecation warning
* Revert "update docs"
This reverts commit c5f3893413
.
* deprecation
* add changelog
* update parser
* update warning
* implement v1.5 behavior ahead of time
* formatting
* set accelerator in test to avoid different warning
* add warning
* remove todo warn
* Update pytorch_lightning/utilities/device_parser.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* resolve flake8
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-05-04 09:56:27 +00:00
Boris Dayma
2a20102321
fix(wandb): allow custom init args ( #6989 )
...
* feat(wandb): allow custom init args
* style: pep8
* fix: get dict args
* refactor: simplify init args
* test: test init args
* style: pep8
* docs: update CHANGELOG
* test: check default resume value
* fix: default value of anonymous
* fix: respect order of parameters
* feat: use look-up table for anonymous
* yapf formatting
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 09:45:36 +00:00
Hemil Desai
82c19e1444
Update LR schedulers only when their corresponding Optimizer is being… ( #4868 )
...
* Update LR schedulers only when their corresponding Optimizer is being used.
In the case when optimizer frequencies are specified,
the LR scheduler corresponding to a particular optimizer is updated
only when that optimizer is being used in the training loop or epoch.
* pep8speak fixes
* Fix failing tests
* Add docs
* PR Feedback
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* formatting fix
* PR Feedback - part 2
* More PR feedback
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Add typing imports
* Stronger tests and fixes related to that
* Add more tests plus PR feedback
* Make optimizer_freq_cumsum a cached property
@cached_property is only available after Python 3.8 so had to do it manually.
* Fix tests
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Avoid mutable defaults
* Parametrize lr scheduling tests
* PR feedback
* Apply suggestions from code review
* spell
* Apply suggestions from code review
* flake8
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-05-04 09:37:40 +00:00
Carlos Mocholí
3fdb61ac1b
Replace `_DataModuleWrapper` with `__new__` [1/2] ( #7289 )
...
* Remove `_DataModuleWrapper`
* Update pytorch_lightning/core/datamodule.py
* Update pytorch_lightning/core/datamodule.py
* Replace `__reduce__` with `__getstate__`
2021-05-04 08:00:24 +00:00
Leonard Lausen
597b309f2e
Fix `Trainer.plugins` type declaration ( #7288 )
...
* Fix trainer.plugins type declaration
* Don't ClusterEnvironment(Plugin)
* fix import error, yapf formatter
* Add test
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 08:42:57 +02:00
SpontaneousDuck
f135debb6a
Clarify logger flag ( #7190 )
...
* Clarify logger flag
Clarify behavior of boolean values on the logger flag for Trainer.
* Update docs/source/common/trainer.rst
* doc
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-05-04 00:21:28 +00:00
Daniel Mesejo-León
6da747e775
Deprecate `LightningModule.datamodule` reference in favor of the trainer one ( #6929 ) ( #7168 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-04 00:01:41 +00:00
Adrian Wälchli
3e8db4142b
add forgotten test in #7240 ( #7283 )
...
^
2021-05-03 23:56:30 +00:00
Kaushik B
6d7c6d6403
Update Accelerator Connector for Registry ( #7214 )
2021-05-03 21:03:21 +00:00
ananthsub
b7a444883c
Remove model.trainer call inside of dataloading mixin ( #7317 )
...
* Update data_loading.py
* Update data_loading.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-03 13:53:54 -07:00
Mauricio Villegas
78a6fd5588
Example and documentation for LightningCLI linking model and data arguments ( #7299 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-03 20:45:46 +00:00
Adrian Wälchli
bf1394a472
improve early stopping verbose logging ( #6811 )
2021-05-03 20:20:48 +00:00
ananthsub
14c552bb92
[bugfix] Fix dataloading for iterable datasets and limit_train_batches ( #7306 )
...
* bugfix-dataloading
* rm-logs
* Update CHANGELOG.md
* Update test_dataloaders.py
* Update test_dataloaders.py
* Update training_loop.py
* Update test_dataloaders.py
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update test_dataloaders.py
* Update training_loop.py
* Update training_loop.py
* comments
* address comments
* more tests
* Update progress.py
* Update test_dataloaders.py
* Update test_dataloaders.py
* Update training_loop.py
* Update training_loop.py
* test ckpt fix?
* update again
2021-05-03 19:50:26 +01:00
ananthsub
39274273a4
Update accelerator.py ( #7318 )
2021-05-03 11:17:26 -04:00
Carlos Mocholí
badd0bba30
Move trainer functions ( #7295 )
2021-05-03 09:26:38 -04:00
Adrian Wälchli
e0c64f0ef6
Fix Adagrad optimizer not working with DDP/GPU ( #7277 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-05-03 03:57:17 +05:30
Kaushik B
490cc57809
Device updates for TPU Pod ( #7243 )
2021-04-30 23:14:06 +05:30
thomas chaton
16d6c9828d
[bugfix] Apex never instantiated. ( #7274 )
...
* update
* update
* update apex
* update
* update
* update
* remove test.py
* update
* update
* update on comments
* update changelog
* update
* update
* typo
2021-04-30 13:16:28 -04:00
ananthsub
44fd01734c
Move grad_norm to a dedicated utilities file ( #7292 )
...
* rm-grad-norm-mixin
* Update grads.py
* Update CHANGELOG.md
* Apply suggestions from code review
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docstrings
* Update __init__.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-30 09:19:22 -07:00
ananthsub
e407edba36
[fix] Attach train+val dataloaders to trainer in trainer loop ( #7207 )
...
* Update training_loop.py
* Update test_dataloaders.py
* changelog
* delay reload
* go back
* comments
* Update training_loop.py
* Update test_dataloaders.py
* Update tests/trainer/test_dataloaders.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-30 09:01:31 -07:00
thomas chaton
80b9ca0e38
[bugfix] Add reloading support using BaseFinetuning ( #7253 )
...
* update
* wip
* udpate
* update
* update
* update
* resolve bug
* update on comments
* update on comments
* update
* update
* formatting
* add comments
* update on comments
* update
* Update pytorch_lightning/callbacks/base.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update
* update
* Typing and minor changes
* Refactor
* Fix deprecated test
* Broken commit
* Fix broken commit
* flake8
* Update CHANGELOG
* update on comments
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-30 11:14:43 -04:00
Carlos Mocholí
5af086ab9f
Attach data refactor and tuner bugs [4/n] ( #7258 )
...
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 13:54:58 +00:00
Adrian Wälchli
ea2287e723
update training type plugin docs regarding result caching ( #7261 )
...
* add docs
* typo
* update
2021-04-30 13:03:10 +00:00
Adrian Wälchli
b9b3fa371f
fix case where an IterableDataset doesn't produce a batch for an epoch ( #7294 )
...
* wip
* fix
* add test
* refactor + test
* rm
* formatting
* update changelog
* doc
* docstring
* remove unused import
* Update CHANGELOG.md
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 12:45:55 +00:00
ananthsub
969e857690
Rename `trainer._launch` to `trainer._run` ( #7265 )
...
* rename-run
* fix
2021-04-30 13:39:02 +01:00
Adrian Wälchli
8232de427a
fix save_hyperparameters(container) if container is empty ( #7268 )
...
* fix
* add tests
* changelog
* fix test
2021-04-30 13:38:42 +01:00
Kaushik B
ac92b57e2b
No need of warning when saved callback_states is None ( #7293 )
2021-04-30 10:48:53 +00:00
ananthsub
338f5a3311
Remove exp_save_path on the LightningModule ( #7266 )
...
* deprecate-exp-save-path
* Update lightning.py
* Update CHANGELOG.md
* remove-not-deprecate
2021-04-29 17:44:04 -04:00
Adrian Wälchli
b6706470c1
fix fast_dev_run parsing from cli ( #7240 )
2021-04-30 01:16:20 +05:30
ananthsub
14b8dd479a
[2/2] Remove training loop force calling early stopping callback ( #7069 )
...
* rebase
* doc
* Update training_loop.py
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
2021-04-29 09:14:53 -07:00
Carlos Mocholí
a5ac3f8a16
Code cleaning in preparation for #7258 [3/n] ( #7262 )
2021-04-29 14:40:51 +02:00
thomas chaton
848288c8d8
[warning] Add a warning with missing callback with resume_from_checkpoint ( #7254 )
...
* add a warning
* add changelog
2021-04-29 12:39:45 +00:00
George
e272bea4dc
Updated `ModelCheckpoint` documentation ( #6873 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-28 23:56:58 +00:00
ananthsub
075de9356c
Reset current_fx properties on lightning module in teardown ( #7247 )
...
* Update trainer.py
* cleanup module properties in teardown
* Update test_trainer.py
* Update lightning.py
* Formatting
* flake8
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-28 12:17:20 -07:00
Carlos Mocholí
40f80230fe
Remove `trainer.fit` return value [2/n] ( #7237 )
...
* `_fit_impl` refactor and types
* Fix return
* Remove return docstring
* Fixes
* Fixes
* Remove `trainer.fit` return value
* Update CHANGELOG
* flake8
* Undo results change
* Fix test
* Revert changes for a separate PR
* flake8
2021-04-28 19:11:32 +01:00
Carlos Mocholí
bdc4272e99
`_launch` refactor and types [1/n] ( #7232 )
2021-04-28 17:41:08 +02:00
ananthsub
947d1cb757
[1/2] Add support for early stopping during training epoch end ( #6944 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-04-28 15:18:56 +02:00
Vaibhav Balloli
ccd87cadfc
Changes resume_from_checkpoint warning to error ( #7075 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-28 15:03:29 +02:00
Ethan Harris
d123aaa6a1
Update fsspec dependency and remove un-needed code ( #7210 )
...
* Update fsspec dep and remove un-needed code
* Remove unused import
2021-04-28 09:10:46 +01:00
Ali Benkassou
cbc6e30b5d
Replace 'step' with 'global_step' ( #7244 )
2021-04-28 06:44:11 +00:00
Kaushik B
94fcaaf5d7
Add `debug` flag to TPU Training Plugins (PT_XLA_DEBUG) ( #7219 )
2021-04-27 20:34:25 +00:00
thomas chaton
e76ebd640e
[feat] Add BasePredictionWriter 3/3 ( #7127 )
...
* wip
* update
* update
* update
* update
* update
* typo
* update on comments
* update
* update
* update
* update
* update changelog
* update
* Fix merge
* Fix merge
* move code
* resolve test
* add extra test
* add an extra test
* update on comments
* add typing
* resolve flake8
* Refactor and Docs
* Fix tests
* Fix tests
* Fix tests
* Duplicate
* Fix tests
* resolve bug
* update
* update on comments
* Update pytorch_lightning/utilities/imports.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/utilities/device_parser.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* update
* update
* update
* update on comments
* resolve flkae8
* update test
* Apply suggestions from code review
* update on comments
* Update pytorch_lightning/callbacks/prediction_writer.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update pytorch_lightning/callbacks/prediction_writer.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update pytorch_lightning/callbacks/prediction_writer.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* update on comments
* update
* update on comment
* Apply suggestions from code review
* update
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-04-27 20:23:55 +00:00
Kaushik B
c6d9f52cb3
Add a check for TPU Spawn barrrier ( #7241 )
2021-04-27 19:45:55 +00:00
thomas chaton
5a113a2f05
[bug/feat] Support parameters_to_ignore in DDP ( #7239 )
...
* update
* update
* update
* update on comments
* update
2021-04-27 17:49:32 +00:00
Seongmin Park
7fe8d18477
Do not `shuffle` in `LightningDataModule.from_datasets` for `IterableDataset` ( #7053 )
...
* Expose shuffle argument in LightningDataModule.from_datasets
* Add test for DataModule initialization with iterable datasets
* Add changelog
* Remove trailing whitespace
* Add more tests for coverage
* Fix sequence dataset coverage
* Fix Sequence dataset tests
* Directly check whether each passed dataset is an IterableDataset
* Expose shuffle argument in LightningDataModule.from_datasets
* Add test for DataModule initialization with iterable datasets
* Add changelog
* Remove trailing whitespace
* Add more tests for coverage
* Fix sequence dataset coverage
* Fix Sequence dataset tests
* Directly check whether each passed dataset is an IterableDataset
* Fix changelog to reflect review direction
* Update CHANGELOG.md
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Fix changelog to reflect review direction (2)
* Add suggested braces
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Reuse isinstance check
* Merged tests with parametrize. Use mocks
Co-authored-by: Seongmin Park <seongmin.park@actionpower.kr>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-27 12:53:49 -04:00
ananthsub
bab7225507
[fix] Add barriers before and after setup hook is run ( #7202 )
...
* Update data_connector.py
* move-barrier
* Update trainer.py
* Update ddp.py
* changelog
* Spacing
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-27 17:19:43 +01:00
thomas chaton
f920ba29f2
[bugfix] Metric not logged properly in manual optimization ( #7228 )
...
* resolve bug
* update changelog
* typo
* Update tests/trainer/optimization/test_manual_optimization.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-04-27 09:16:51 -04:00
thomas chaton
e147127c0e
[feat] Add better support for predict + ddp 2/3 ( #7215 )
...
* wip
* update
* update
* update
* update
* update
* typo
* update on comments
* update
* update
* update
* update
* update changelog
* update
* Fix merge
* Fix merge
* move code
* resolve test
* add extra test
* add an extra test
* update on comments
* add typing
* resolve flake8
* Refactor and Docs
* Fix tests
* Fix tests
* Fix tests
* Duplicate
* Fix tests
* resolve bug
* update
* update on comments
* update
* update changelog
* update
* update
* remove tpu
* resolve flake8
* update on comments
* update on comments
* update on comment
* resolve flake8
* add a cpu test for predict
* add None test
* update
* Update CHANGELOG.md
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* resolve tests
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-27 08:46:45 -04:00
Carlos Mocholí
ca6c87ffbe
Add back `clip_gradients(model)` ( #7231 )
2021-04-27 11:34:02 +00:00
Adrian Wälchli
3b36d81c03
Fixed `num_sanity_val_steps` affecting reproducibility of training data shuffling ( #7014 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-27 09:51:39 +00:00
Kaushik B
5cf9afa176
Add fairscale install msg for Sharded Plugins ( #7213 )
2021-04-27 08:22:44 +00:00
shuyingsunshine21
52a5cee0a7
Set smarter default for DDP sharded for performance optimization ( #6937 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-27 04:01:34 +05:30
ananthsub
dd5ec75e48
Deprecate save_function from model checkpoint callback ( #7201 )
...
* Update model_checkpoint.py
* Update CHANGELOG.md
* fix-tests
* deprecate not remove
* Update model_checkpoint.py
* Update test_remove_1-5.py
2021-04-26 17:55:26 +01:00
Alessio Bonfiglio
ac7d6a35c3
Fix `NeptuneLogger.log_text(step=None)` ( #7194 )
2021-04-26 15:28:55 +02:00
Kaushik B
6be0a859db
Update teardown for TPU acc ( #7211 )
2021-04-26 13:30:46 +01:00
ananthsub
bc3f08b0e3
[fix] Add barrier to accelerator's teardown ( #6814 )
2021-04-26 09:23:29 +00:00
ananthsub
68eac4d948
Enforce Lightning module as source of truth for automatic optimization ( #7130 )
...
* make lightning module source of truth for automatic optimization
* Update configuration_validator.py
* Update model_connector.py
* rm-references
* Update CHANGELOG.md
* Update CHANGELOG.md
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-04-26 05:36:26 +00:00
Kaushik B
44d775fccf
Update Error message for ProfileConnector ( #7204 )
...
* Update Error message for ProfileConnector
* Update test
2021-04-25 11:37:21 -07:00
ananthsub
31fcd7d0ab
Deprecate write_predictions on the LightningModule ( #7066 )
...
* deprecate-write-predictions
* Update CHANGELOG.md
* Update test_remove_1-5.py
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-04-25 16:54:56 +00:00
ananthsub
b3fe836656
Move metrics_to_scalars to a dedicated utilities file ( #7180 )
...
* rm-trainer-logging
* Update CHANGELOG.md
* Update metrics.py
* Update logging.py
* Update metrics.py
2021-04-24 10:25:33 +01:00
thomas chaton
f58865aada
Properly set `LightningModule.device` after model replacement ( #7188 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-23 16:36:52 +02:00
Sean Naren
8439aead66
Update FairScale on CI ( #7017 )
...
* Try updating CI to latest fairscale
* Update availability of imports.py
* Remove some of the fairscale custom ci stuff
* Update grad scaler within the new process as reference is incorrect for spawn
* Remove fairscale from mocks
* Install fairscale 0.3.4 into the base container, remove from extra.txt
* Update docs/source/conf.py
* Fix import issues
* Mock fairscale for docs
* Fix DeepSpeed and FairScale to specific versions
* Swap back to greater than
* extras
* Revert "extras"
This reverts commit 7353479f
* ci
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-04-23 12:37:00 +01:00
Akihiro Nitta
92af363270
Fix `lr_finder` suggesting too high learning rates ( #7076 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-23 10:59:40 +00:00
Adrian Wälchli
d534e53ec4
add missing predict docs ( #7150 )
...
* update docs
* add datamodule predict
* fix docs
* typo
2021-04-23 10:38:44 +00:00
Tharindu Hasthika
c502e47abf
Fixed setting of _save_dir when run initiated externally ( #7106 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-23 01:14:46 +00:00
Jirka Borovec
f48ac62334
fix pip install ( #7170 )
2021-04-22 16:48:11 -04:00
Jirka Borovec
aa7d3dc6cc
Fix `torchmetrics` compatibility ( #7131 )
...
* get_num_classes
* tmp
* fix one test
* fix deprecated tests
* fix deprecate
* pep8
* deprecate 0.3
* wip
* wip
* HaCK
* brnch
* brnch
* format
* Apply suggestions from code review
* prune
* rev
* mltilabel
* Apply suggestions from code review
* master
* rev
* .
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-04-22 20:45:46 +00:00
Jirka Borovec
ef5feac7ba
fix version + yapf ( #6999 )
2021-04-22 18:25:51 +00:00
Carlos Mocholí
33066f8fd9
Add `on_predict_{batch,epoch}_{start,end}` and `Callback.on_predict_{start,end}` ( #7141 )
...
* Update hooks typing and predict hooks
* Update CHANGELOG
* Progress
* Progress
* Add back `on_predict_{start,end}`
* Typing and fix
* Update tests/trainer/logging_/test_logger_connector.py
* Update tests/callbacks/test_lambda_function.py
2021-04-22 10:05:28 -04:00
ananthsub
3f1a08ab00
Fix mypy checks for double precision plugin ( #7151 )
2021-04-22 11:29:38 +01:00
thomas chaton
99b9dfa883
[bugfix] Remove warning for distributed values ( #7132 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-04-22 02:14:46 +02:00
Carlos Mocholí
345e9a0245
Fix argparse docs ( #7148 )
2021-04-22 02:13:00 +02:00
Sean Naren
ce14565ed9
[FSDP] Move on save checkpoint outside of zero check ( #7134 )
...
* Move on save checkpoint outside of zero check
* Remove unnecessary override
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-22 01:54:47 +02:00
ananthsub
2f84459d26
Broadcast dirpath for tighter consistency in model checkpoint callback ( #6978 )
...
* Update model_checkpoint.py
* Update model_checkpoint.py
* Update model_checkpoint.py
2021-04-21 10:20:27 -07:00
thomas chaton
013756404b
[bugfix] Add set_default_tensor_type to torch.DoubleTensor with precision=64 ( #7108 )
...
* update
* Update pytorch_lightning/plugins/precision/double.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/plugins/precision/double.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/plugins/precision/double.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* resolve tests
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-20 15:25:37 +00:00
thomas chaton
ca21da4f3b
Move save_hyperparameters to its own function ( #7119 )
...
* move hyper_parameters
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/utilities/parsing.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* resolve flake8
* update
* resolve tests
* Update pytorch_lightning/core/lightning.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-20 11:04:35 -04:00
Kaushik B
f168a535ca
Add MpModelWrapper in TPU Spawn ( #7045 )
...
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-20 13:05:27 +00:00
Akihiro Nitta
0302b8be32
Disable `lr_scheduler.step()` in manual optimization ( #6825 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-04-20 13:00:45 +02:00
thomas chaton
9beec26c3e
[bugfix] Add support for CombinedLoader in validation with ddp ( #7102 )
...
* add test
* add changelog
* resolve flake8
* remove print
2021-04-20 08:22:02 +00:00
Adrian Wälchli
67528c4665
Fix attribute error for _gpus_arg_default loading checkpoint prior to 1.2.8 ( #7043 )
2021-04-20 07:34:03 +00:00
Adrian Wälchli
6b15ca95f0
fix logger experiment version in multiple run DDP ( #7077 )
...
* fix
* changelog
2021-04-19 17:12:05 +00:00
Adrian Wälchli
d12c6cf2b3
more early stopping options (convergence and divergence threshold) ( #6868 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-19 16:49:52 +02:00
Adrian Wälchli
60c1c8fe83
Auto-set `DataLoader.worker_init_fn` with `seed_everything` ( #6960 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-04-19 16:28:37 +02:00
Akihiro Nitta
d1529c28a1
Optimization docs ( #6907 )
...
* .
* .
* Fix link to the section
* Fix link to the section
* Consistent indent
* Update docs
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Add note for optimizer.optimizer
* .
* Update hooks
* Update closure docstring
* Update optimizer methods
* Update optimizer
* Remove manopt + grad clipping (by @flukeskywalker)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-19 10:08:49 -04:00
Adrian Wälchli
2b232d3fbd
fix docs rendering in datamodule ( #7064 )
...
* [docs]: add newline to correctly render Example
* whitespace
Co-authored-by: Matthew Sarmiento <matthewcs@me.com>
2021-04-19 10:08:09 -04:00
Carlos Mocholí
a5e356adb1
Deprecate `@auto_move_data` in favor of `trainer.predict` ( #6993 )
...
* Deprecated `@auto_move_data` in favor of `trainer.predict`
* Update CHANGELOG
2021-04-19 14:53:21 +01:00
Adrian Wälchli
e9fca760ac
Set `DistributedSampler` seed if `seed_everything` was called ( #7024 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-19 14:50:31 +01:00
Nicki Skafte
fbee5a86e7
Correctly reset metric objects in self.log ( #7055 )
...
* reset
* fix tests
* fix tests
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* move logic
* chglog
* pep8
* Add test
* Improve test
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-04-19 14:48:48 +01:00
mlech26l
e61daff5cc
Typo LightningMoule -> LightningModule ( #7038 )
2021-04-19 13:48:44 +01:00
Carlos Mocholí
898ec8a94a
Create pytorch_lightning/utilities/types.py ( #7048 )
2021-04-19 14:43:16 +02:00
Kaushik B
30b7440e12
TPU Spawn Rank & root device Error ( #7074 )
...
* TPU Spawn Rank Error
* Update tpu spawn
* Fix root device property for tpu spawn
* Update changelog
2021-04-18 23:42:48 +02:00
Kaushik B
97be843226
Better approach to register plugins ( #7063 )
...
* Better approach to register plugins
* Add ddp_with_find_unused_parameters_false
* Remove unnecessary break
* Revert back the ddp commit
* Update register override logic
* Update register override logic
* fix mypy
2021-04-18 11:23:12 +02:00
thomas chaton
7b0b0d2844
update ( #7056 )
2021-04-16 21:22:19 +01:00
ananthsub
8bcd169767
[fix] Fix multi-node DDP launch by using local rank instead of global rank for main process ( #7061 )
...
* Update ddp.py
* Update CHANGELOG.md
2021-04-16 21:18:54 +01:00
Kaushik B
6a7b4cf5d3
Fix mypy for plugins registry ( #7062 )
2021-04-17 01:33:41 +05:30
Adrian Wälchli
3fb8eada34
rc2 ( #7057 )
2021-04-16 20:34:14 +02:00
Kaushik B
832a03af7c
Add Training Type Plugins Registry ( #6982 )
...
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-04-16 18:01:56 +05:30
Adrian Wälchli
67d21609c9
Add Trainer max_time argument + Callback ( #6823 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2021-04-16 13:38:57 +02:00
ananthsub
4c07ab5e99
Use PyTorch API logging for Lightning Trainer ( #6771 )
...
* Update trainer.py
* Update trainer.py
* Update trainer.py
2021-04-16 00:10:34 +02:00
Carlos Mocholí
f29ecbfd90
Typing for accelerators and plugins ( #7022 )
2021-04-15 16:48:16 +00:00
ananthsub
f6f81f0430
[fix] Add a cluster environment teardown to clean up environment state ( #6942 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-15 16:06:54 +00:00
Mauricio Villegas
f852a4f592
Changed basic_examples to use `LightningCLI` ( #6862 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-15 15:01:16 +00:00
Ethan Harris
f645df5e9a
Add typings for evaluation_loop.py and remove some dead code ( #7015 )
2021-04-15 07:36:04 +00:00
Edward Brown
5bd3cd5f71
Bugfix/cuda oom detection and handling ( #6934 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-15 03:22:11 +02:00
Jirka Borovec
895bea1ad3
rename about ( #7002 )
...
* rename about
* .
* ..
2021-04-14 18:56:40 -04:00
Adrian Wälchli
d3f73a0a74
Plugin Docs ( #6952 )
...
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-04-14 20:53:21 +00:00
SpontaneousDuck
dcff5036a8
Use PickleError base class to detect all pickle errors ( #6917 )
...
* Use PickleError base class to detect all pickle errors
* Update changelog with #6917
* Add pickle test for torch ScriptModule
Co-authored-by: Ken Witham <k.witham@kri.neu.edu>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2021-04-14 20:24:32 +00:00
shuyingsunshine21
03a73b37bc
Train End Error Handling Fix ( #6864 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-04-14 20:35:42 +02:00
Nicki Skafte
7c5ad1905d
Bugfix for predict progressbar ( #6884 )
...
* gating
* tests
* pep8
* changelog
2021-04-14 09:50:36 +01:00
CeShine Lee
24d0295ff1
Fix the `gradient_clip_algorithm` has no effect issue. ( #6928 )
2021-04-14 14:17:06 +05:30
Adrian Wälchli
33cc9fe138
Clean up environment access in plugins ( #6941 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-13 20:07:40 +02:00
Peng Zhang
89074fa2ad
Fix Multi-GPU join for horovod ( #6954 )
...
* fixjoin
* fix join on cpu
* fix typo
* try to undo horovod skip
* undo
* Try removing skip
* Update CHANGELOG
* add back skip for test_horovod_multi_optimizer
* Add back skip
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-13 17:44:41 +01:00
Carlos Mocholí
15926b462c
Add SWA warning if not running every epoch ( #6987 )
...
* Add SWA warning if not running every epoch
* Typo
2021-04-13 18:34:40 +02:00
Ethan Harris
b9bc77293b
Fix inconsistent outputs in `on_*_end` and `*_end` ( #6969 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-13 15:16:21 +01:00
ananthsub
e891ceb836
Remove evaluation loop legacy dict returns for `*_epoch_end` hooks ( #6973 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-13 12:37:54 +01:00
Hinrich B. Winther
b37b58a73e
Fix Checkpoint issue when using Horovod distributed backend (PyTorchLightning#6947) ( #6958 )
...
Co-Authored-By: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-13 09:18:52 +00:00