Commit Graph

47 Commits

Author SHA1 Message Date
jjenniferdai d4a4b77906
[3/3] Update lightning callbacks to `Stateful`, deprecations for old `on_save/load_checkpoint` signatures (#11887) 2022-03-25 00:06:10 +00:00
Rohit Gupta 5b342f14a6
fix to avoid common hook warning if no hook is overridden (#12131) 2022-02-28 18:07:05 +05:30
Rohit Gupta 5d2d9b09df
Avoid patching common `DataHooks` to the `LightningModule` (#10603) 2022-02-25 09:26:59 +01:00
Carlos Mocholí 789fae828d
Fix `current_epoch` value on training end (#8578)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2022-02-10 17:55:59 +01:00
Rohit Gupta 400201712f
added warning for distributedsampler in case of evaluation (#11479) 2022-02-03 18:42:13 +00:00
Rohit Gupta 7948ed703d
Avoid enforcing `shuffle=False` for eval dataloaders (#11575) 2022-02-03 09:35:31 +00:00
Krishna Kalyan 6586dd23b7
Mark `CheckpointConnector` as protected (#11550)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-02-03 02:26:08 +00:00
Maaz Karim 16a04b29eb
Mark SignalConnector as protected (#11513)
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2022-01-20 08:39:59 +01:00
jjenniferdai 4b5761539e
Remove `hpc_save` (#11101) 2022-01-03 12:23:13 +00:00
jjenniferdai 31f39c9578
Move `CheckpointConnector.fault_tolerant_auto_save_path` out of `CheckpointConnector.hpc_resume_path` (#11092) 2021-12-21 02:24:01 +01:00
jjenniferdai 6e21dd3767
Deprecate `on_hpc_{save/load}` hooks (#10911)
* first commit

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update pr #

* test filterwarnings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a todo comment

* updates

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* `` Update pytorch_lightning/core/saving.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* `` Update pytorch_lightning/core/saving.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* model --> LightningModule Update pytorch_lightning/core/saving.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* model --> LightningModule Update pytorch_lightning/core/saving.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-12-08 14:56:15 -08:00
four4fish 629ca09e09
fix TypeError cause failure in singal_connector teardown (#10961) 2021-12-06 21:48:31 +00:00
Danielle Pintz 6043179931
Re-design `call_hook` interface (#10575) 2021-12-04 16:39:55 -05:00
Mauricio Villegas f3b0a06e90
Fix `SignalConnector._has_already_handler` check for callable type (#10483)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-11-30 22:47:52 +00:00
Adrian Wälchli 25473acddb
Restore signals on teardown (#10611)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-30 22:07:14 +00:00
thomas chaton 412d507a73
Fault Tolerant: move signal to SIGTERM (#10605) 2021-11-26 13:37:27 +00:00
thomas chaton 7d3ad5b76e
Don't register signal in thread (#10610) 2021-11-19 04:13:35 +01:00
Adrian Wälchli 0f6d89422b
Control automatic resubmission on SLURM (#10601) 2021-11-18 17:48:53 +00:00
Carlos Mocholí dcafc95f2b
Avoid deprecated `progress_bar_refresh_rate` usage (#10520)
Co-authored-by: Danielle Pintz <38207072+daniellepintz@users.noreply.github.com>
2021-11-15 22:04:48 +01:00
Carlos Mocholí 7a9a08c5d3
Drop torch 1.6 testing (#10390)
* Drop torch 1.6 support

* Drop 1.6 support

* Update CHANGELOG

* Fixes

* Split change

* Undo change

* 1.7 -> 1.7.1

https://github.com/pytorch/pytorch/issues/47354

* Force trigger nightly

* Update .github/workflows/events-nightly.yml

Co-authored-by: Aki Nitta <nitta@akihironitta.com>

* Revert 1.7.1 change - try wildcard

* Update adjust versions and test it

* Undo test changes

* Revert "Undo test changes"

This reverts commit 3a6acadd11.

* Update CHANGELOG.md

Co-authored-by: Aki Nitta <nitta@akihironitta.com>
2021-11-13 20:35:03 +00:00
Ross Johnstone c2f25d42ab
Make `monitor` required arg of EarlyStopping callback (#10328)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-09 18:08:03 +00:00
Kaushik B 45c45dc7b0
Deprecate `ProgressBar` and rename it to `TQDMProgressBar` (#10134)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-11-01 11:42:21 +00:00
thomas chaton 255e3edc98
resolve failing test (#10191)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-28 15:27:03 +00:00
jjenniferdai 6d79184ec5
Unify checkpoint load paths [redo #9693] (#10061) 2021-10-25 19:05:31 +00:00
jjenniferdai 2d9db211b5
Revert "Support serialized checkpoint loading (#9605)" (#10057)
This reverts commit f0e6f1b58a.
2021-10-21 02:51:22 +02:00
Adrian Wälchli 2c16f1d6b9
remove dataloader patching on the LightningModule (#9764)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-10-20 15:23:20 +02:00
jjenniferdai f0e6f1b58a
Support serialized checkpoint loading (#9605)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-20 09:38:35 +01:00
ananthsub 28fc8d2016
Add `enable_model_summary` flag and deprecate `weights_summary` (#9699)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-10-13 17:20:54 +05:30
Rohit Gupta db322f4bbb
Deprecate `checkpoint_callback` from the `Trainer` constructor in favour of `enable_checkpointing` (#9754)
* enable_chekpointing

* update codebase

* chlog

* update tests

* fix warning

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-12 07:55:07 +00:00
Rohit Gupta b303b4f895
Fix restoring training state during `trainer.fit` only (#9413)
* reload state on fit

* trainer.state

* add test

* chlog

* revert

* review

* review

* rev and ammend

* fix test and logic

* update

* code review

* Apply suggestions from code review

* better assertions

* better assertions

* Apply suggestions from code review

* add loop test

* Apply suggestions from code review

* Split for typing

* review comments

* review comments

* use if_else

* code review

* code review

* code review

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Remove unnecessary pieces from the test

* move test

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-06 14:57:40 +00:00
thomas chaton 5841ca9782
[Feat] Add auto_restart for fault tolerant training (#9722) 2021-10-01 16:37:17 +00:00
Danielle Pintz b3a5c7f442
Add `enable_progress_bar` to Trainer constructor (#9664) 2021-09-24 22:53:31 -07:00
Rohit Gupta 8fcdcb598b
Fix `accumulate_grad_batches` on init (#9652)
* fix accumuate_grad_batches on init

* chlog

* update error

* move to callback connector

* add test with callback

* fix tests

* Update pytorch_lightning/trainer/connectors/callback_connector.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update ipu logic

* rev

* rev

* rev

* pls work

* code review

Co-authored-by: Rohit Gupta <goku@rmac.local>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-24 18:51:54 +00:00
thomas chaton c7451b3ccf
[Feat] Add graceful detection of signal to exit + SignalConnector and merge SlurmConnector. (#9566)
Co-authored-by: Sean Naren <sean@grid.ai>
2021-09-17 19:13:59 +00:00
Kaushik B d773407e59
feat: Add ModelSummary Callback (#9344)
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-10 12:42:42 +00:00
Jirka Borovec 6e124e7207
CI: precommit - docformatter (#8584)
* CI: precommit - docformatter
* fix deprecated

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
Adrian Wälchli b9443a07b9
[2 / 3] improvements to saving and loading callback state (#7187)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-24 17:35:19 +00:00
Carlos Mocholí e1442d247e
Always use `trainer.call_hook` (#8498) 2021-08-20 18:22:03 +02:00
Carlos Mocholí ed13040729
Connect the model to the training type plugin at the start of run (#8536) 2021-08-04 17:43:34 +02:00
Adrian Wälchli 8c27fa71fa
[1 / 3] improvements to saving and loading callback state (#6886)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-29 00:12:32 +02:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Adrian Wälchli 6b7b40473b
deprecate hpc_load() and integrate it with restore() (#7955)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-14 12:20:01 +00:00
Carlos Mocholí 3df02b880a
Add checkpoint parameter to on_save_checkpoint (#6072)
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-02-25 21:18:19 +05:30
Adrian Wälchli b8619a695f
new LightningModule hook "configure_callbacks" (#5621)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-12 19:27:44 -05:00
Jirka Borovec a0f7831278
fix miss-leading imports in tests (#5873)
* fix imorts

* .
2021-02-09 05:10:52 -05:00
Jirka Borovec f83cca6107
formatting flake8 & isort (#5824)
* formatting

* isort

* make

* yapf

* isort
2021-02-05 18:33:12 -05:00
Adrian Wälchli 9555043a29
Force ModelCheckpoint callback to run last (#5731) 2021-02-03 16:40:57 -05:00