Jirka Borovec
ad168fc4c6
chlog for 1.3.2 + legacy test ( #7676 )
2021-05-24 17:55:02 +00:00
Carlos Mocholí
8b01497e42
Fix global step update when the epoch is skipped ( #7677 )
...
* Fix global step update when the epoch is skipped
* Update CHANGELOG
* Move test
2021-05-24 17:36:56 +01:00
Kaushik B
3f460b150a
Move parameter validation specific to TPU Training plugins ( #7415 )
...
* Move parameter validation specific to TPU Training plugins
* update docstring
2021-05-24 16:02:01 +00:00
ananthsub
fa41c588f4
Remove ProfilerConnector class ( #7654 )
...
* Remove ProfilerConnector class
* Update trainer.py
* Update CHANGELOG.md
* Update trainer.py
* Update trainer.py
* tests
2021-05-24 08:58:15 -07:00
Gyeongjae Choi
a54bc5dba3
Fix progress bar print error when called before training ( #7674 )
...
* Check progress bar existence before printing
* Add tests for predict_progres_bar
* Add tests for progress_bar printing without training
* Update changelog
2021-05-24 17:33:28 +02:00
Carlos Mocholí
2103b5efc9
Move sync code from step result to lightning module [6/n] ( #7651 )
2021-05-24 13:13:55 +01:00
Xinyao(Alvin) Sun
0c958c5a1f
Fix dataloaders are not reset when tuning the model ( #7566 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-24 10:21:45 +02:00
shuyingsunshine21
299f2c481b
FSDP with full state dict ( #7487 )
...
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d
.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789
, reversing
changes made to 0d23d75bc9
.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9
.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6
.
* Revert "Update utils.py"
This reverts commit a9aae99f6e
.
* Revert "Update test_results.py"
This reverts commit ea74906878
.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3
.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b
.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0
.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a
.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1
.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea
.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370
.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2
.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79
.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731
.
* modify distributed environment to make test pass
* fix version for ddp plugin test
* fix
* fix
* changelog
* Update CHANGELOG.md
* fsdp with full state dict
* fix missing import
* modify unitest
* fix
* fix
* fix typo
* modify test and add changelog
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* limit max_epoch to 1 for testing
* test
* fix
* update
* testing remove special for multi gpu
* assert gpu
* add assertion for gpu
* fix
* Re-enable special test, use ModelCheckpoint
* Fix paths
* Fix path passing
* test
* test
* fix test
* fix
* pre-commit format
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-24 08:11:45 +01:00
Xinyao(Alvin) Sun
01109cdf0c
Fix/mismatched toggle optimizer ( #7563 )
...
* fix: avoid potential mismatched toggling of optimzier
Refs #7405
chore: update CHANGELOG
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
fix: resolve a confict
chore: update changelog
* feat: add a test that fails in master
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typo in tests/trainer/optimization/test_multiple_optimizers.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* Polish tests/trainer/optimization/test_multiple_optimizers.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Polish tests/trainer/optimization/test_multiple_optimizers.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* fix: change placeholder in optimizer_step from positional args to keyword args
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-23 04:30:28 +02:00
shuyingsunshine21
2242423b75
refactor accelerator teardown -> training type plugin teardown ( #7579 )
2021-05-22 13:19:24 -07:00
Carlos Mocholí
a8d9b5f783
Remove tbptt `self.log` flags and other dead code [5/n] ( #7644 )
2021-05-22 01:13:00 +00:00
Carlos Mocholí
33a1f5271f
[2/N] Define dataclasses for progress tracking ( #7574 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-05-22 03:09:08 +02:00
Carlos Mocholí
110e49dc99
De-duplicate `DistributedSampler` mentions ( #7636 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-05-21 23:01:13 +02:00
Yifu Wang
8d6e2ff7b2
Improve argument validation for validate(), test(), and predict() ( #7605 )
...
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
2021-05-21 09:03:16 -07:00
Carlos Mocholí
e16d4fbdee
CI code cleaning ( #7615 )
2021-05-21 11:35:12 +00:00
ananthsub
f6d892ac21
[feat] Support custom filesystems in LightningModule.to_torchscript ( #7617 )
...
* [feat] Support custom filesystems in LightningModule.to_torchscript
* Update CHANGELOG.md
* Update test_torchscript.py
* Update test_torchscript.py
* Update CHANGELOG.md
* Update test_torchscript.py
2021-05-21 11:23:15 +00:00
Carlos Mocholí
e8a46bee15
Remove `Result(minimize)` parameter [4/n] ( #7628 )
2021-05-21 12:58:52 +02:00
Carlos Mocholí
603ef2cf7f
Use `trainer.call_hook` in the evaluation loop ( #7626 )
2021-05-21 11:54:52 +01:00
Carlos Mocholí
3d4dd28bec
Replace `CallbackHookNameValidator` with `FxValidator` [3/n] ( #7627 )
...
* Refactor FxValidator
* Fix tests
* Fix tests
* Class attribute
* Fix tests
* Better error message
* Fix tests
* Update pytorch_lightning/trainer/connectors/logger_connector/fx_validator.py
2021-05-21 11:54:16 +01:00
Nik
751975e39f
fix flag name to flush_logs_every_n_steps in logging doc ( #7633 )
...
* fix method name to flush_logs_every_n_steps in logging doc
* apply corrections in comments
2021-05-21 10:50:13 +00:00
deng-cy
03ea68f8a2
removed hparams assignment example ( #7639 )
2021-05-21 11:15:38 +01:00
i-aki-y
7eafd8eac6
Add run_name argument to the MLFlowLogger constructor ( #7622 )
...
* Add run_name argument to the MLFlowLogger
* Update CHANGELOG
* Fix unnecessary line
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix style by using yapf
* Fix import error when mlflow is not installed
* Update CHANGELOG.md
* Update tests/loggers/test_mlflow.py
Co-authored-by: akiyuki ishikawa <aki.y.ishikwa@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-21 09:17:32 +01:00
ananthsub
94ef17ce77
Update model_checkpoint.py ( #7625 )
2021-05-20 23:16:18 +02:00
Andrew Tritt
92cf396de2
Override `broadcast_object_list` for `torch<1.8` ( #7592 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-20 08:29:55 +00:00
Yifu Wang
ed271905cf
Clear predict_progress_bar in ProgressBar.__getstate__ ( #7608 )
...
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-20 01:38:49 +00:00
ananthsub
8266b141ba
[feat] Support time-based checkpointing during training ( #7515 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-19 22:14:13 +00:00
Fernando Pérez-García
485554c8b0
Add link to TorchIO tutorial in PyTorch Ecosystem examples ( #7612 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-19 21:06:26 +00:00
ananthsub
9f5d4955b6
[1/N] Define dataclasses for progress tracking ( #6603 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-19 21:02:20 +00:00
Carlos Mocholí
901b2bac98
Unify `current_fx_name` and `current_hook_fx_name` [2/n] ( #7594 )
...
* Minor loggger connector cleanup [1/n]
* Missing line
* Address comments
* Rely on validator
* Unify `current_fx_name` and `current_hook_fx_name`
* Fix test
2021-05-19 20:31:06 +00:00
Carlos Mocholí
dbea5bb710
Add typing to `ModelPruning` callback ( #7529 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-19 22:01:42 +02:00
Jan-Henrik Lambrechts
608de6abf4
TensorBoardLogger sub_dir parameter for grouping logs ( #6195 )
...
* fixed a small typo
* cleaning up
* added sub_dir argument to tensorboard and wrote test
* sub dir arg exclusively for tensorboard, linted
* resolving merge conflict
* resolved merge conflict
* resolved merge conflict
* resolved merge conflict
* resolve merge conflict before revert
* resolving merge conflict
* reverted to pre-lint
* added tensorboard sub_dir test
* pep8 formatting
* removed sub_dir arg from test_all function:
* updated feature description
* typo in doc description
* updated CHANGELOG
* Update pytorch_lightning/loggers/tensorboard.py
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* swapped argument position
* added expandvars tests
* added expandvars
* removed model init
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix tests
* fix failed test
* Revert "fix failed test"
This reverts commit 50b34c66da
.
* add env var to test
* fix typo in tests
* fix tests
* for test consistency
* fix typo
* fix typo 2
Co-authored-by: Ubuntu <azureuser@devhenrik.evuifrmjd4lepbj4relcwwu5va.ax.internal.cloudapp.net>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-05-19 19:50:58 +00:00
Jirka Borovec
6e56f56aa1
docker use $(nproc) ( #7606 )
...
* docker use $(nproc)
* Update typo
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-05-19 21:48:14 +02:00
ananthsub
b4e28e7169
[feat] Add stronger validation for checkpoint_callback argument ( #7539 )
...
* [feat] Add stronger validation for checkpoint_callback configuration
* chlog
* Update callback_connector.py
* Update test_model_checkpoint.py
* Update pytorch_lightning/trainer/connectors/callback_connector.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/trainer/connectors/callback_connector.py
* Update tests/checkpointing/test_model_checkpoint.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update CHANGELOG.md
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-19 19:38:08 +00:00
pre-commit-ci[bot]
12adff0573
[pre-commit.ci] pre-commit autoupdate ( #7577 )
...
updates:
- [github.com/pre-commit/pre-commit-hooks: v3.4.0 → v4.0.1](https://github.com/pre-commit/pre-commit-hooks/compare/v3.4.0...v4.0.1 )
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-19 19:37:31 +00:00
Jensun Ravichandran
922c0a607b
Fix incorrect code-snippet in optimizers doc ( #7598 )
...
`training_step(...)` should take `self` as the first argument. It's a simple but necessary fix.
2021-05-19 19:33:09 +00:00
Carlos Mocholí
76ff600898
Minor logger connector cleanup [1/n] ( #7590 )
...
* Minor loggger connector cleanup [1/n]
* Missing line
* Address comments
* Rely on validator
2021-05-19 19:25:32 +00:00
Jirka Borovec
7cdf03624f
make extra build for latest ( #7593 )
2021-05-19 19:15:58 +00:00
TOKUNAGA Hiroyuki
20f63377f8
Fix the condition for calling update_learning_rates ( #7032 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-17 17:20:42 +02:00
Adrian Wälchli
502adbced3
refactor optimizer loop logic for manual and automatic optimization ( #7526 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-05-17 14:42:01 +02:00
Kaushik B
bf46730d92
Support TPU Pod Training (n/n) ( #7296 )
2021-05-17 11:33:44 +00:00
Nic Eggert
f4f51e0dcf
Add kubeflow cluster environment ( #7300 )
...
* Add kubeflow cluster environment
* Add KubeflowEnvironment to docs
* Add KubeflowEnvironment to the changelog
* break up a long line
* Add method to detect kubeflow environment
* Select Kubeflow environment when available
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Run pre-commit
* task_idx == 0
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-17 09:05:24 +01:00
Adrian Wälchli
6e6e29af49
remove trainer hidden state | sanity refactor [2 / n] ( #7507 )
2021-05-17 08:57:15 +01:00
Mauricio Villegas
d0081778f8
Enable fsspec by default for cli config file ( #7521 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-17 08:53:00 +01:00
Loic Beheshti
e126649d19
add missing punctuation in lightning_cli.rst ( #7554 )
2021-05-15 00:26:47 +00:00
Alan Du
6ac16ff348
Fix DistribType for `ddp_cpu` (spawn) ( #7492 )
2021-05-14 20:53:26 +01:00
Jirka Borovec
53f8d9a800
update alumni ( #7545 )
2021-05-14 19:06:12 +02:00
Jirka Borovec
233f252bb4
update logo 48px ( #7530 )
2021-05-13 22:33:12 +02:00
Rohit Gupta
7ca41734da
Add `dataloader_idx` to batch transfer hooks ( #6241 )
...
* replace with kwargs
* chlog
* fix
* add test
* fix
* device
* deepspeed
* pep
* optional
* docs
* bc
* comments
* pep
* mypy
* pep
* Apply suggestions from code review
* kwargs
* docs
* .
* .
* 1.3 -> 1.4
* kwargs -> step_kwargs
2021-05-13 23:03:55 +05:30
Carlos Mocholí
a584196abf
Default `seed_everything(workers=True)` in the `LightningCLI` ( #7504 )
2021-05-13 12:18:03 +02:00
Adrian Wälchli
dd1a17b071
Refactor result handling in training loop ( #7506 )
...
* refactor results
* rename dic -> dict
* simplify
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* changelog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix None check
* chlog wording
* move process_closure_result to the end
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-13 09:30:34 +01:00