Commit Graph

6278 Commits

Author SHA1 Message Date
jjenniferdai 31f39c9578
Move `CheckpointConnector.fault_tolerant_auto_save_path` out of `CheckpointConnector.hpc_resume_path` (#11092) 2021-12-21 02:24:01 +01:00
Rohit Gupta 787f41eff6
update optimizer_step example in docs (#10420) 2021-12-21 08:19:40 +09:00
Carlos Mocholí 9826de2162
Delete legacy multinode tests (#11175) 2021-12-20 20:01:57 +01:00
Adrian Wälchli 08e661ff72
Rename `restore_checkpoint_after_pre_dispatch` to `restore_checkpoint_after_setup` (#11166)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-20 17:16:52 +00:00
Carlos Mocholí e8169bbd46
Fix setter usage for checkpoint io and precision in TTP (#11071)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-20 17:45:32 +01:00
Adrian Wälchli f5c2881b68
3/n Simplify spawn plugins: Merge `pre_dispatch` and `setup` logic (#11137) 2021-12-20 17:41:22 +01:00
Adrian Wälchli 2e47e2f4ae
Set spawn_method on initialization (#11162) 2021-12-20 17:39:54 +01:00
four4fish 0ee78e96ef
Rename `DDPFullyShardedPlugin` to `DDPFullyShardedStrategy` (#11143)
* Rename DDPFullyShardedPlugin to DDPFullyShardedStrategy

* update fsdp_plugin to fsdp_strategy

* update changelog

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-20 17:11:20 +01:00
ORippler 86a3c5e2a3
Add required states for resumed ModelCheckpoint GC (#10995)
* Add required states for resumed ModelCheckpoint GC

* Add backwards compatibility with legacy cktps

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Add test to check if attrs are written to ckpt

Note that we do not yet check for proper loading/reinstantiation of
ModelCheckpooint based on the ckpt written to disk

* Test if attributes are restored properly from ckpt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix broken `test_callbacks_state_fit_ckpt_path`

`ModelCheckpoint` is configured to save after every epoch,
but `trainer.fit` is called with `max_steps = 1`

Note there may be a better way of doing this, where `ModelCheckpoint`
is called after `training_step`

* Update test_restore.py

* Update test_restore.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Check that all attributes are restored properly

* revert changes, use fix on master

* Convert to proper unit test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor `test_mode_checkpoint_saveload_ckpt`

* First save, then load ckpt.
* Instantiate ModelCheckpoint twice.

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-12-20 17:05:15 +01:00
Danielle Pintz b1baf460d9
Include hook's object name when profiling (#11026) 2021-12-20 15:18:24 +01:00
Adrian Wälchli 29eb9cccf2
Rename the `TrainingTypePlugin` base to `Strategy` (#11120)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: four4fish <88516121+four4fish@users.noreply.github.com>
2021-12-20 12:50:11 +00:00
guyang3532 cc4a978bf6
Safely disable profiler (#11167) 2021-12-20 11:51:46 +00:00
Carlos Mocholí 7ed3dbf191
Fix evaluation logging on epoch end with multiple dataloaders (#11132) 2021-12-19 15:51:01 +01:00
Rohit Gupta 61eb6230c2
Prune EvalModelTemplate (#11153) 2021-12-19 13:08:43 +00:00
Danielle Pintz f95976d602
rename _call_ttp_hook to _call_strategy_hook (#11150) 2021-12-18 17:53:03 -08:00
Adrian Wälchli a3e2ef2be0
Refactor plugin tests whose assertions don't need to run in `on_fit_start` hook (#11149) 2021-12-18 23:38:40 +01:00
Rohit Gupta 3461af0ddb
Add support for returning callback from `LightningModule.configure_callbacks` (#11060) 2021-12-18 10:46:35 +00:00
Kaushik B 2a5d05b562
Fix tpu spawn plugin test (#11131) 2021-12-18 02:53:37 +00:00
Rafał Jankowski 3cc69f992b
Fixed NeptuneLogger when using DDP (#11030)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-18 01:40:13 +00:00
Carlos Mocholí 62f1e82e03
Fix CVE-2020-1747 and CVE-2020-14343 (#11099) 2021-12-17 20:27:15 +00:00
Carlos Mocholí 8508cce37d
Mark all result classes as protected (#11130) 2021-12-17 19:35:17 +00:00
Rohit Gupta 860959fb3f
Enable logging hparams only if there are any (#11105) 2021-12-17 19:40:56 +01:00
Carlos Mocholí dbb7f56b35
Deprecate `Trainer.verbose_evaluate` (#10931) 2021-12-17 19:26:32 +01:00
Carlos Mocholí 75d96d9897
Reset the current progress tracking state during double evaluation (#11119) 2021-12-17 19:20:11 +01:00
Rohit Gupta 92d9fc2280
Prune EvalModelTemplate (3/n) (#10971) 2021-12-17 19:10:52 +01:00
Adrian Wälchli 978f5e6ad6
Fix AttributeError when using CombinedLoader in prediction (#11111)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-12-17 18:02:25 +00:00
quancs 179b4dd415
remove redundant methods in RichProgressBar (#11100)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-17 17:40:31 +00:00
Carlos Mocholí 7e10f6d41f
Save the loop progress state by default (#10784) 2021-12-17 16:00:27 +00:00
Carlos Mocholí fa6d17c96f
Fix typing for utilities.warnings (#11115) 2021-12-17 15:07:27 +01:00
Adrian Wälchli 6582249a0c
Fix signal teardown outside main thread (#11124) 2021-12-17 14:12:02 +01:00
Carlos Mocholí 5956a0716b
Track the evaluation loop outputs in the loop (#10928) 2021-12-17 14:00:47 +01:00
Adrian Wälchli 210ff845c1
Mark `Trainer.run_stage` as protected (#11000)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-17 13:46:03 +01:00
Sean Naren c66cd12445
Remove partitioning of model in ZeRO 3 (#10655) 2021-12-17 12:36:53 +00:00
Carlos Mocholí 4415677994
Add typing for `trainer.logger` (#11114) 2021-12-17 13:34:18 +01:00
Carlos Mocholí 5932f52b2f
Avoid the deprecated `onnx.export(example_outputs=...)` in torch 1.10 (#11116) 2021-12-17 10:11:11 +01:00
Jirka Borovec 4ee01b715c
Merge pull request #11046 from PyTorchLightning/docs/security
Add security contact
2021-12-16 20:31:03 -05:00
Adrian Wälchli 1a7084634a
Remove leftover `clean_logger` call in tests (#11080) 2021-12-17 00:23:32 +00:00
Adrian Wälchli e19d93f69e
Initialize ModelCheckpoint state as early as possible (#11108) 2021-12-17 00:18:29 +01:00
Adrian Wälchli 262aefc8df
Remove obsolete `pre_dispatch` in `DDPSpawnShardedPlugin` (#10988) 2021-12-16 21:43:15 +01:00
Adrian Wälchli 2b0075a47e
Teardown sync-batchnorm after training (#11078)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-16 18:58:44 +00:00
Carlos Mocholí 46d6fbf11b
Add `Loop.replace` (#10324)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-16 17:41:38 +00:00
Adrian Wälchli c335a7891d
Remove redundant special case for disabling the progress bar on TPU (#11061) 2021-12-16 18:02:50 +01:00
Carlos Mocholí f37bd4677d
Update mypy (#11096) 2021-12-16 17:53:12 +01:00
Rohit Gupta cc42aa9401
Improve checkpoint docs (#10916) 2021-12-16 16:21:59 +00:00
Adrian Wälchli dcc55631f9
Update changelog after 1.5.6 release (#11094) 2021-12-16 12:57:03 +00:00
Mauricio Villegas 8bca259d6a
Fix intellisense for LightningCLI (#11075) 2021-12-16 12:38:11 +00:00
four4fish cec2d7946b
3/n Move accelerator into Strategy (#11022)
* remove training_step() from accelerator

* remove test, val, predict step

* move

* wip

* accelerator references

* cpu training

* rename occurrences in tests

* update tests

* pull from adrian's commit

* fix changelog merge pro

* fix accelerator_connector and other updates

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix doc build and some mypy

* fix lite

* fix gpu setup environment

* support customized ttp and accelerator

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tpu error check

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix precision_plugin initialization to recognisze cusomized plugin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update bug_report_model.py

* Update accelerator_connector.py

* update changelog

* allow shorthand typing references to pl.Accelerator

* rename helper method and add docstring

* fix typing

* Update pytorch_lightning/trainer/connectors/accelerator_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update tests/accelerators/test_cpu.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pre commit complaint

* update typing to long ugly path

* spacing in flow diagram

* remove todo comments

* docformatter

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

* revert test changes

* improve custom plugin examples

* remove redundant call to ttp attribute

it is no longer a property

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-16 04:41:34 +00:00
Carlos Mocholí 9e56290e2a
Support torch 1.10.1 (#11095) 2021-12-15 19:23:31 -08:00
jjenniferdai 01e0dac60f
Deprecate `Trainer.should_rank_save_checkpoint` property (#11068)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-12-16 02:24:49 +01:00
Carlos Mocholí 3c4d06bd42
Update the TQDM progress bar `on_train_epoch_end` (#11069) 2021-12-15 17:48:32 +00:00