Commit Graph

208 Commits

Author SHA1 Message Date
Carlos Mocholí 321689f52e
Add `ModelCheckpoint(save_on_train_epoch_end)` (#8389)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-13 14:47:59 +00:00
Carlos Mocholí c4353ea702
Remove `dev_debugger.call_count` (#8317) 2021-07-07 19:59:59 +02:00
Carlos Mocholí 441e16f61c
Default `EarlyStopping.check_on_train_epoch_end=True` (#8286) 2021-07-05 15:45:23 +02:00
Kaushik B 3a8322deda
Add XLAStatsMonitor Callback (#8235) 2021-07-05 17:09:46 +05:30
Adrian Wälchli e7139ab9f7
Support `DDPPlugin` to be used on CPU (#6208)
* Skip test due to 'Python bus error'

* Debug NCCL

* Remove NCCL_DEBUG statement

* Revert "Skip test due to 'Python bus error'"

This reverts commit e0a3e8785d.

* fix

* add test

* changelog

* yapf

* patch os environ

* make a special test

* destroy pg

* debug

* revert

* revert

* problematic test

* skip

* try the fixture

* test

* update sensitive test

* update changelog

* remove comment

* update wrong test

* update test name

* parameterization

* Revert "parameterization"

This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc.

* remove conftest

* ignore test

* teardown

* fix merge

* deep speed parameterization

* uncomment test

* update chlog

* update changelog

* split tests

* update test


update test


update test


update test

* update test comments

* unroll test

* unroll test

* unroll test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* increase shm

* sudo

* unroll ipu

* Revert "sudo"

This reverts commit 6cc68c1478.

* Revert "increase shm"

This reverts commit 8c27163483.

* x

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* find guilty test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* POPTORCH_WAIT_FOR_IPU=1

* move test

* redo parameterize for ipu

* de-comment test

* move chlog

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-02 12:00:24 +01:00
Carlos Mocholí a2e41045d2
Mark some loop attributes as protected (#8250) 2021-07-02 11:51:51 +01:00
Justus Schock d6435a5b73
Bugfix/swa iterable dset (#8172)
* add test

* add fix

* Update CHANGELOG.md
2021-06-28 21:18:25 +00:00
Ethan Harris 2a372e3682
Fix module dict in base finetuning (#8170)
* Fix module dict in base finetuning

* Update CHANGELOG.md
2021-06-28 10:55:32 +00:00
deepsource-autofix[bot] e11fe19673
Remove unnecessary use of comprehension (#8149)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-27 10:00:02 +01:00
Adrian Wälchli 4becd1cf31
rename old `Trainer.train_loop` -> `Trainer.fit_loop` (#8025) 2021-06-22 11:49:32 +02:00
Carlos Mocholí f1fa4c4727
Update fit with val hook test (#8060) 2021-06-21 17:27:37 +00:00
simran2905 d1efae2e47
Fix checkpointed state for lr_schedulers with step interval (#7877)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-21 15:08:07 +00:00
Carlos Mocholí e55f01e665
Update evaluation hook tests (#8013) 2021-06-18 16:41:27 +00:00
Adrian Wälchli eebdc910dd
progressive restoring of trainer state (#7652) 2021-06-17 08:13:53 +00:00
Austin Basye 906de2a7fa
[feat] Named Parameter Groups in `LearningRateMonitor` (#7987)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-17 03:13:54 +02:00
Carlos Mocholí 4ffba600c9
Add predict hook test (#7973) 2021-06-16 15:09:24 +02:00
Adrian Wälchli 971908a1aa
Loop Refactor 1/N - Training Loop (#7871)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-15 12:55:06 +00:00
Dan Dale 3a0ed02bd4
Properly handle parent modules w/ parameters in `BaseFinetuning` callback (#7931)
Co-authored-by: Daniel Dale <dan@distributedinsight.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-14 16:01:07 +00:00
Adrian Wälchli c1eac483e9
split `restore_training_state` into logical parts [2 / 2] (#7900) 2021-06-10 21:54:21 +02:00
Carlos Mocholí ec4f8856af
Enable logger connector re-design (#7891)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-09 14:24:45 +00:00
Carlos Mocholí 5593b6f772
Merge pull request #7872 from PyTorchLightning/refactor/logger-poc-changes
Random fixes for logger connector PoC
2021-06-08 09:04:16 -04:00
thomas chaton ea71cf4a5f
[Test] Add extra test for val_check_interval in distributed scenario (#7863)
* add extra test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add computation

* Update docs/source/common/trainer.rst

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update docs/source/common/trainer.rst

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update tests/trainer/test_dataloaders.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* use tmpdir

* update on comments

* update

* Update tests/callbacks/test_progress_bar.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-07 10:37:32 +00:00
thomas chaton d1becce4c1
[bugfix] Resolve LearningRateMonitor + BackboneFinetuning (#7835)
* add test + resolve bug

* update changelog

* resolve bug

* resolve bug

* Update pytorch_lightning/callbacks/lr_monitor.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/lr_monitor.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* resolve comments

* update

* Update tests/callbacks/test_lr_monitor.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/callbacks/lr_monitor.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-07 10:17:11 +00:00
Sean Naren 10839376e2
[IPU] Add special tests for IPUs 2/n (#7833)
* Add special tests for IPUs, run nvprof only if cuda available

* Add missing min_gpu
2021-06-04 23:23:09 +05:30
Adrian Wälchli 7e6010fc93
fix info message when max training time reached (#7780)
* call time_elapsed

* elapsed formatting

* format

* update test

* changelog
2021-05-31 14:50:16 +02:00
Carlos Mocholí 311d9fe67e
Always run validation inside the training loop epoch (#7357)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-26 14:26:48 +02:00
Carlos Mocholí d26953c8bc
Add `ModelPruning(prune_on_train_epoch_end)` to choose when to apply pruning (#7704)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-26 00:57:56 +02:00
Carlos Mocholí e2ead9abd7
Refactor some loops code and hook tests (#7682) 2021-05-25 13:27:54 +02:00
Gyeongjae Choi a54bc5dba3
Fix progress bar print error when called before training (#7674)
* Check progress bar existence before printing

* Add tests for predict_progres_bar

* Add tests for progress_bar printing without training

* Update changelog
2021-05-24 17:33:28 +02:00
Yifu Wang ed271905cf
Clear predict_progress_bar in ProgressBar.__getstate__ (#7608)
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-20 01:38:49 +00:00
Adrian Wälchli a1a655d006
Reduce log output size in special tests (#7481) 2021-05-11 17:36:20 +02:00
Adrian Wälchli ad9118f04a
remove trainer hidden state | sanity refactor [1 / n] (#7437)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-11 11:09:08 +02:00
ananthsub 7b45bcfedb
[2/2] Remove outputs from evaluation epoch end hooks (#7338)
* Remove outputs from on_train_epoch_end

* iterate

* Update callback_hook.py

* update

* early stop?

* fix

* Update pytorch_lightning/trainer/training_loop.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update trainer.py

* update

* Update training_loop.py

* early stop?

* fix

* Remove outputs from evaluation epoch end hooks

* update

* Update test_remove_1-5.py

* fix lints

* Update base.py

* rm-outputs

* Update evaluation_loop.py

* try-save-more-memory

* Update trainer.py

* Update trainer.py

* cache-at-start

* Update evaluation_loop.py

* Update training_loop.py

* Update training_loop.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-05-05 19:50:58 +00:00
ananthsub 6104a6316a
[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks (#7339)
* Remove outputs from on_train_epoch_end

* iterate

* Update callback_hook.py

* update

* Update training_loop.py

* Update test_training_loop.py

* early stop?

* fix

* update tests

* Update test_hooks.py

* Update pytorch_lightning/trainer/callback_hook.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update pytorch_lightning/trainer/training_loop.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update trainer.py

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 17:18:16 +02:00
Carlos Mocholí 8c0ea92af2
`TrainerState` refactor [5/5] (#7173)
* `TrainerState` refactor

* flake8

* Update finished check

* Test cleanup

* Fix tests

* Fixes

* Reorder

* flake8

* Update CHANGELOG

* Better docs

* Better docs

* Remove default

* Update tests

* Bad merge
2021-05-04 12:50:56 +02:00
thomas chaton 80b9ca0e38
[bugfix] Add reloading support using BaseFinetuning (#7253)
* update

* wip

* udpate

* update

* update

* update

* resolve bug

* update on comments

* update on comments

* update

* update

* formatting

* add comments

* update on comments

* update

* Update pytorch_lightning/callbacks/base.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update

* update

* Typing and minor changes

* Refactor

* Fix deprecated test

* Broken commit

* Fix broken commit

* flake8

* Update CHANGELOG

* update on comments

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-30 11:14:43 -04:00
ananthsub 14b8dd479a
[2/2] Remove training loop force calling early stopping callback (#7069)
* rebase

* doc

* Update training_loop.py

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md
2021-04-29 09:14:53 -07:00
Carlos Mocholí 40f80230fe
Remove `trainer.fit` return value [2/n] (#7237)
* `_fit_impl` refactor and types

* Fix return

* Remove return docstring

* Fixes

* Fixes

* Remove `trainer.fit` return value

* Update CHANGELOG

* flake8

* Undo results change

* Fix test

* Revert changes for a separate PR

* flake8
2021-04-28 19:11:32 +01:00
ananthsub 947d1cb757
[1/2] Add support for early stopping during training epoch end (#6944)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-04-28 15:18:56 +02:00
thomas chaton e76ebd640e
[feat] Add BasePredictionWriter 3/3 (#7127)
* wip

* update

* update

* update

* update

* update

* typo

* update on comments

* update

* update

* update

* update

* update changelog

* update

* Fix merge

* Fix merge

* move code

* resolve test

* add extra test

* add an extra test

* update on comments

* add typing

* resolve flake8

* Refactor and Docs

* Fix tests

* Fix tests

* Fix tests

* Duplicate

* Fix tests

* resolve bug

* update

* update on comments

* Update pytorch_lightning/utilities/imports.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/utilities/device_parser.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update

* update

* update

* update on comments

* resolve flkae8

* update test

* Apply suggestions from code review

* update on comments

* Update pytorch_lightning/callbacks/prediction_writer.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update pytorch_lightning/callbacks/prediction_writer.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update pytorch_lightning/callbacks/prediction_writer.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* update on comments

* update

* update on comment

* Apply suggestions from code review

* update

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-04-27 20:23:55 +00:00
Adrian Wälchli 3b36d81c03
Fixed `num_sanity_val_steps` affecting reproducibility of training data shuffling (#7014)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-27 09:51:39 +00:00
Carlos Mocholí 33066f8fd9
Add `on_predict_{batch,epoch}_{start,end}` and `Callback.on_predict_{start,end}` (#7141)
* Update hooks typing and predict hooks

* Update CHANGELOG

* Progress

* Progress

* Add back `on_predict_{start,end}`

* Typing and fix

* Update tests/trainer/logging_/test_logger_connector.py

* Update tests/callbacks/test_lambda_function.py
2021-04-22 10:05:28 -04:00
Adrian Wälchli d12c6cf2b3
more early stopping options (convergence and divergence threshold) (#6868)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-19 16:49:52 +02:00
Adrian Wälchli 67d21609c9
Add Trainer max_time argument + Callback (#6823)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2021-04-16 13:38:57 +02:00
shuyingsunshine21 03a73b37bc
Train End Error Handling Fix (#6864)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-04-14 20:35:42 +02:00
Carlos Mocholí 15926b462c
Add SWA warning if not running every epoch (#6987)
* Add SWA warning if not running every epoch

* Typo
2021-04-13 18:34:40 +02:00
Ethan Harris b9bc77293b
Fix inconsistent outputs in `on_*_end` and `*_end` (#6969)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-13 15:16:21 +01:00
scart97 eb15abcd82
Fix finetuning complex models correctly unfreezes. (#6880)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-08 12:59:06 +05:30
Michael Baumgartner 6dc1078822
Enforce an epoch scheduler interval when using SWA (#6588)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-06 02:57:33 +00:00
Karthik Prasad c3da7f50bb
Sanitize `None` params during pruning (#6836)
* sanitize none params during pruning

* amend
2021-04-06 01:47:59 +02:00