Commit Graph

235 Commits

Author SHA1 Message Date
Eric Wiener cf1a589956
Support infinite training (#8877)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-04 23:33:43 +00:00
Carlos Mocholí d5ee8d8e3f
Disable `{save,check}_on_train_epoch_end` with `check_val_every_n_epoch>1` (#9156) 2021-09-03 14:27:44 +00:00
ananthsub 86a0cb74a4
Check `max_time` when setting defaults for min/max epochs (#9072)
Co-authored-by: tchaton <thomas@grid.ai>
2021-08-27 15:01:12 +00:00
Adrian Wälchli 0abd6e94b5
[3 / 3] improvements to saving and loading callback state (#7161)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-26 10:02:49 +02:00
Adrian Wälchli b9443a07b9
[2 / 3] improvements to saving and loading callback state (#7187)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-08-24 17:35:19 +00:00
Kaushik B 538e743f17
feat: Add Rich Progress Bar (#8929)
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-24 02:40:36 +00:00
Michele Sanna 9ff0c22e43
Handle the case with no queries in `GPUStatsMonitor` (#9014)
Co-authored-by: Michele Sanna <{ID}+{username}@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-21 05:22:33 +02:00
Carlos Mocholí bfeffde8f4
Smart handling of `EarlyStopping.check_on_train_epoch_end` (#8888)
* Smart handling of `EarlyStopping.check_on_train_epoch_end`

* dummy value

* Extra flag
2021-08-14 08:50:39 +02:00
Carlos Mocholí 7d87879350
Fix SWA with a list of learning rates (#8747)
* Fix swa lrs - needs test

* Add test

* Update CHANGELOG
2021-08-14 08:50:08 +02:00
Jirka Borovec 3096ab88eb
Tests: fix deprecated TM mape (#8830) 2021-08-10 09:26:05 +00:00
Carlos Mocholí 4928dc5579
Improve SWA docs (#8717) 2021-08-05 16:07:50 +00:00
Jirka Borovec f67892ea96
CI: yesqa (#8564)
* add yesqa
* fix flake8

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-02 16:05:56 +00:00
Rio H ba8053492f
Deprecate LightningModule.model_size (#8495)
Co-authored-by: Caleb Robinson <calebrob6@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-30 13:53:40 +00:00
Carlos Mocholí 0dc0472e1f
Use class name in SWA info message (#8602) 2021-07-29 09:39:46 +02:00
Adrian Wälchli 8c27fa71fa
[1 / 3] improvements to saving and loading callback state (#6886)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-29 00:12:32 +02:00
Jirka Borovec 0c0b24c031
Prune deprecated metrics (#8586)
* drop metrics

* drop tests

* fix imports

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-28 16:57:31 +00:00
Sean Naren aadd2a9d9c
Load ckpt path when model provided in validate/test/predict (#8352)
* Change trainer loading behaviour for validate/test/predict

* Fix

* Fix/add tests

* remove

* Cleanups

* Space

* cleanups

* Add CHANGELOG.md

* Move after setup

* Cleanups on logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remve

* fix test

* feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update pytorch_lightning/trainer/properties.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Feedback

* Same fix

* Same fix

* Add test for behaviour, modify based on feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Wording

* Apply suggestions from code review

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Cleanup docs

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* feedback

* Fixes to test API

* Add carlos description

* Move logic further

* Move checkpoint connector logic

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-28 10:12:46 +00:00
Santiago Castro b256d6acd3
Avoid unnecessary list creation (#8595) 2021-07-28 13:36:45 +05:30
Carlos Mocholí e63968ab88
Add `pyupgrade` to `pre-commit` (#8557)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 14:38:12 +02:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Elad Segal 07635d0e86
fix restoring finetune callbacks after accelerator setup on training resume (#8501)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-23 19:49:32 +02:00
Carlos Mocholí f7027a8701
Remove `torch >= 1.6` checks (#8523) 2021-07-23 04:03:20 +00:00
Jirka Borovec b7dbcc3e13
Quant as optional step (#8464)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-22 12:44:27 +00:00
thomas chaton ea13f6021c
[bugfix] Prevent deepcopy of dataloaders / Trainer in SWA Callback (#8472)
* resolve deepcopy

* update changelog

* move private

* update on comments

* Update CHANGELOG

* Set skipped attributes to None

* Simplify test

* update

* update changelog

* update

* update on comments

* typo

* update

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-20 18:31:49 +00:00
Xuehai Pan 2c5d94d98b
Fix: handle logical CUDA device IDs for GPUStatsMonitor if `CUDA_VISIBLE_DEVICES` set (#8260)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-19 11:42:43 +00:00
deepsource-autofix[bot] cbf71d0a14
Remove unnecessary comprehension (#8405) 2021-07-19 08:30:24 +00:00
Carlos Mocholí 6ce77a102b
Set minimum PyTorch version to 1.6 (#8288)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-07-13 17:12:49 +00:00
Carlos Mocholí 321689f52e
Add `ModelCheckpoint(save_on_train_epoch_end)` (#8389)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-13 14:47:59 +00:00
Carlos Mocholí c4353ea702
Remove `dev_debugger.call_count` (#8317) 2021-07-07 19:59:59 +02:00
Carlos Mocholí 441e16f61c
Default `EarlyStopping.check_on_train_epoch_end=True` (#8286) 2021-07-05 15:45:23 +02:00
Kaushik B 3a8322deda
Add XLAStatsMonitor Callback (#8235) 2021-07-05 17:09:46 +05:30
Adrian Wälchli e7139ab9f7
Support `DDPPlugin` to be used on CPU (#6208)
* Skip test due to 'Python bus error'

* Debug NCCL

* Remove NCCL_DEBUG statement

* Revert "Skip test due to 'Python bus error'"

This reverts commit e0a3e8785d.

* fix

* add test

* changelog

* yapf

* patch os environ

* make a special test

* destroy pg

* debug

* revert

* revert

* problematic test

* skip

* try the fixture

* test

* update sensitive test

* update changelog

* remove comment

* update wrong test

* update test name

* parameterization

* Revert "parameterization"

This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc.

* remove conftest

* ignore test

* teardown

* fix merge

* deep speed parameterization

* uncomment test

* update chlog

* update changelog

* split tests

* update test


update test


update test


update test

* update test comments

* unroll test

* unroll test

* unroll test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* increase shm

* sudo

* unroll ipu

* Revert "sudo"

This reverts commit 6cc68c1478.

* Revert "increase shm"

This reverts commit 8c27163483.

* x

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* find guilty test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* POPTORCH_WAIT_FOR_IPU=1

* move test

* redo parameterize for ipu

* de-comment test

* move chlog

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* Update tests/accelerators/test_accelerator_connector.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-02 12:00:24 +01:00
Carlos Mocholí a2e41045d2
Mark some loop attributes as protected (#8250) 2021-07-02 11:51:51 +01:00
Justus Schock d6435a5b73
Bugfix/swa iterable dset (#8172)
* add test

* add fix

* Update CHANGELOG.md
2021-06-28 21:18:25 +00:00
Ethan Harris 2a372e3682
Fix module dict in base finetuning (#8170)
* Fix module dict in base finetuning

* Update CHANGELOG.md
2021-06-28 10:55:32 +00:00
deepsource-autofix[bot] e11fe19673
Remove unnecessary use of comprehension (#8149)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-27 10:00:02 +01:00
Adrian Wälchli 4becd1cf31
rename old `Trainer.train_loop` -> `Trainer.fit_loop` (#8025) 2021-06-22 11:49:32 +02:00
Carlos Mocholí f1fa4c4727
Update fit with val hook test (#8060) 2021-06-21 17:27:37 +00:00
simran2905 d1efae2e47
Fix checkpointed state for lr_schedulers with step interval (#7877)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-21 15:08:07 +00:00
Carlos Mocholí e55f01e665
Update evaluation hook tests (#8013) 2021-06-18 16:41:27 +00:00
Adrian Wälchli eebdc910dd
progressive restoring of trainer state (#7652) 2021-06-17 08:13:53 +00:00
Austin Basye 906de2a7fa
[feat] Named Parameter Groups in `LearningRateMonitor` (#7987)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-17 03:13:54 +02:00
Carlos Mocholí 4ffba600c9
Add predict hook test (#7973) 2021-06-16 15:09:24 +02:00
Adrian Wälchli 971908a1aa
Loop Refactor 1/N - Training Loop (#7871)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-15 12:55:06 +00:00
Dan Dale 3a0ed02bd4
Properly handle parent modules w/ parameters in `BaseFinetuning` callback (#7931)
Co-authored-by: Daniel Dale <dan@distributedinsight.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-14 16:01:07 +00:00
Adrian Wälchli c1eac483e9
split `restore_training_state` into logical parts [2 / 2] (#7900) 2021-06-10 21:54:21 +02:00
Carlos Mocholí ec4f8856af
Enable logger connector re-design (#7891)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-09 14:24:45 +00:00
Carlos Mocholí 5593b6f772
Merge pull request #7872 from PyTorchLightning/refactor/logger-poc-changes
Random fixes for logger connector PoC
2021-06-08 09:04:16 -04:00
thomas chaton ea71cf4a5f
[Test] Add extra test for val_check_interval in distributed scenario (#7863)
* add extra test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add computation

* Update docs/source/common/trainer.rst

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update docs/source/common/trainer.rst

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update tests/trainer/test_dataloaders.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* use tmpdir

* update on comments

* update

* Update tests/callbacks/test_progress_bar.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-07 10:37:32 +00:00
thomas chaton d1becce4c1
[bugfix] Resolve LearningRateMonitor + BackboneFinetuning (#7835)
* add test + resolve bug

* update changelog

* resolve bug

* resolve bug

* Update pytorch_lightning/callbacks/lr_monitor.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/lr_monitor.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* resolve comments

* update

* Update tests/callbacks/test_lr_monitor.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/callbacks/lr_monitor.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-07 10:17:11 +00:00