Carlos Mocholí
321689f52e
Add `ModelCheckpoint(save_on_train_epoch_end)` ( #8389 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-13 14:47:59 +00:00
Carlos Mocholí
c4353ea702
Remove `dev_debugger.call_count` ( #8317 )
2021-07-07 19:59:59 +02:00
Carlos Mocholí
441e16f61c
Default `EarlyStopping.check_on_train_epoch_end=True` ( #8286 )
2021-07-05 15:45:23 +02:00
Kaushik B
3a8322deda
Add XLAStatsMonitor Callback ( #8235 )
2021-07-05 17:09:46 +05:30
Adrian Wälchli
e7139ab9f7
Support `DDPPlugin` to be used on CPU ( #6208 )
...
* Skip test due to 'Python bus error'
* Debug NCCL
* Remove NCCL_DEBUG statement
* Revert "Skip test due to 'Python bus error'"
This reverts commit e0a3e8785d
.
* fix
* add test
* changelog
* yapf
* patch os environ
* make a special test
* destroy pg
* debug
* revert
* revert
* problematic test
* skip
* try the fixture
* test
* update sensitive test
* update changelog
* remove comment
* update wrong test
* update test name
* parameterization
* Revert "parameterization"
This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc.
* remove conftest
* ignore test
* teardown
* fix merge
* deep speed parameterization
* uncomment test
* update chlog
* update changelog
* split tests
* update test
update test
update test
update test
* update test comments
* unroll test
* unroll test
* unroll test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* increase shm
* sudo
* unroll ipu
* Revert "sudo"
This reverts commit 6cc68c1478
.
* Revert "increase shm"
This reverts commit 8c27163483
.
* x
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* find guilty test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* POPTORCH_WAIT_FOR_IPU=1
* move test
* redo parameterize for ipu
* de-comment test
* move chlog
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Update tests/accelerators/test_accelerator_connector.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-02 12:00:24 +01:00
Carlos Mocholí
a2e41045d2
Mark some loop attributes as protected ( #8250 )
2021-07-02 11:51:51 +01:00
Justus Schock
d6435a5b73
Bugfix/swa iterable dset ( #8172 )
...
* add test
* add fix
* Update CHANGELOG.md
2021-06-28 21:18:25 +00:00
Ethan Harris
2a372e3682
Fix module dict in base finetuning ( #8170 )
...
* Fix module dict in base finetuning
* Update CHANGELOG.md
2021-06-28 10:55:32 +00:00
deepsource-autofix[bot]
e11fe19673
Remove unnecessary use of comprehension ( #8149 )
...
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
2021-06-27 10:00:02 +01:00
Adrian Wälchli
4becd1cf31
rename old `Trainer.train_loop` -> `Trainer.fit_loop` ( #8025 )
2021-06-22 11:49:32 +02:00
Carlos Mocholí
f1fa4c4727
Update fit with val hook test ( #8060 )
2021-06-21 17:27:37 +00:00
simran2905
d1efae2e47
Fix checkpointed state for lr_schedulers with step interval ( #7877 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-21 15:08:07 +00:00
Carlos Mocholí
e55f01e665
Update evaluation hook tests ( #8013 )
2021-06-18 16:41:27 +00:00
Adrian Wälchli
eebdc910dd
progressive restoring of trainer state ( #7652 )
2021-06-17 08:13:53 +00:00
Austin Basye
906de2a7fa
[feat] Named Parameter Groups in `LearningRateMonitor` ( #7987 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-17 03:13:54 +02:00
Carlos Mocholí
4ffba600c9
Add predict hook test ( #7973 )
2021-06-16 15:09:24 +02:00
Adrian Wälchli
971908a1aa
Loop Refactor 1/N - Training Loop ( #7871 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-15 12:55:06 +00:00
Dan Dale
3a0ed02bd4
Properly handle parent modules w/ parameters in `BaseFinetuning` callback ( #7931 )
...
Co-authored-by: Daniel Dale <dan@distributedinsight.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-14 16:01:07 +00:00
Adrian Wälchli
c1eac483e9
split `restore_training_state` into logical parts [2 / 2] ( #7900 )
2021-06-10 21:54:21 +02:00
Carlos Mocholí
ec4f8856af
Enable logger connector re-design ( #7891 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-09 14:24:45 +00:00
Carlos Mocholí
5593b6f772
Merge pull request #7872 from PyTorchLightning/refactor/logger-poc-changes
...
Random fixes for logger connector PoC
2021-06-08 09:04:16 -04:00
thomas chaton
ea71cf4a5f
[Test] Add extra test for val_check_interval in distributed scenario ( #7863 )
...
* add extra test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add computation
* Update docs/source/common/trainer.rst
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update docs/source/common/trainer.rst
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update tests/trainer/test_dataloaders.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* use tmpdir
* update on comments
* update
* Update tests/callbacks/test_progress_bar.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-07 10:37:32 +00:00
thomas chaton
d1becce4c1
[bugfix] Resolve LearningRateMonitor + BackboneFinetuning ( #7835 )
...
* add test + resolve bug
* update changelog
* resolve bug
* resolve bug
* Update pytorch_lightning/callbacks/lr_monitor.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/lr_monitor.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* update on comments
* resolve comments
* update
* Update tests/callbacks/test_lr_monitor.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/callbacks/lr_monitor.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-07 10:17:11 +00:00
Sean Naren
10839376e2
[IPU] Add special tests for IPUs 2/n ( #7833 )
...
* Add special tests for IPUs, run nvprof only if cuda available
* Add missing min_gpu
2021-06-04 23:23:09 +05:30
Adrian Wälchli
7e6010fc93
fix info message when max training time reached ( #7780 )
...
* call time_elapsed
* elapsed formatting
* format
* update test
* changelog
2021-05-31 14:50:16 +02:00
Carlos Mocholí
311d9fe67e
Always run validation inside the training loop epoch ( #7357 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-26 14:26:48 +02:00
Carlos Mocholí
d26953c8bc
Add `ModelPruning(prune_on_train_epoch_end)` to choose when to apply pruning ( #7704 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-26 00:57:56 +02:00
Carlos Mocholí
e2ead9abd7
Refactor some loops code and hook tests ( #7682 )
2021-05-25 13:27:54 +02:00
Gyeongjae Choi
a54bc5dba3
Fix progress bar print error when called before training ( #7674 )
...
* Check progress bar existence before printing
* Add tests for predict_progres_bar
* Add tests for progress_bar printing without training
* Update changelog
2021-05-24 17:33:28 +02:00
Yifu Wang
ed271905cf
Clear predict_progress_bar in ProgressBar.__getstate__ ( #7608 )
...
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-05-20 01:38:49 +00:00
Adrian Wälchli
a1a655d006
Reduce log output size in special tests ( #7481 )
2021-05-11 17:36:20 +02:00
Adrian Wälchli
ad9118f04a
remove trainer hidden state | sanity refactor [1 / n] ( #7437 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-11 11:09:08 +02:00
ananthsub
7b45bcfedb
[2/2] Remove outputs from evaluation epoch end hooks ( #7338 )
...
* Remove outputs from on_train_epoch_end
* iterate
* Update callback_hook.py
* update
* early stop?
* fix
* Update pytorch_lightning/trainer/training_loop.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update trainer.py
* update
* Update training_loop.py
* early stop?
* fix
* Remove outputs from evaluation epoch end hooks
* update
* Update test_remove_1-5.py
* fix lints
* Update base.py
* rm-outputs
* Update evaluation_loop.py
* try-save-more-memory
* Update trainer.py
* Update trainer.py
* cache-at-start
* Update evaluation_loop.py
* Update training_loop.py
* Update training_loop.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-05-05 19:50:58 +00:00
ananthsub
6104a6316a
[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks ( #7339 )
...
* Remove outputs from on_train_epoch_end
* iterate
* Update callback_hook.py
* update
* Update training_loop.py
* Update test_training_loop.py
* early stop?
* fix
* update tests
* Update test_hooks.py
* Update pytorch_lightning/trainer/callback_hook.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update pytorch_lightning/trainer/training_loop.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update trainer.py
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 17:18:16 +02:00
Carlos Mocholí
8c0ea92af2
`TrainerState` refactor [5/5] ( #7173 )
...
* `TrainerState` refactor
* flake8
* Update finished check
* Test cleanup
* Fix tests
* Fixes
* Reorder
* flake8
* Update CHANGELOG
* Better docs
* Better docs
* Remove default
* Update tests
* Bad merge
2021-05-04 12:50:56 +02:00
thomas chaton
80b9ca0e38
[bugfix] Add reloading support using BaseFinetuning ( #7253 )
...
* update
* wip
* udpate
* update
* update
* update
* resolve bug
* update on comments
* update on comments
* update
* update
* formatting
* add comments
* update on comments
* update
* Update pytorch_lightning/callbacks/base.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update
* update
* Typing and minor changes
* Refactor
* Fix deprecated test
* Broken commit
* Fix broken commit
* flake8
* Update CHANGELOG
* update on comments
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-30 11:14:43 -04:00
ananthsub
14b8dd479a
[2/2] Remove training loop force calling early stopping callback ( #7069 )
...
* rebase
* doc
* Update training_loop.py
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
2021-04-29 09:14:53 -07:00
Carlos Mocholí
40f80230fe
Remove `trainer.fit` return value [2/n] ( #7237 )
...
* `_fit_impl` refactor and types
* Fix return
* Remove return docstring
* Fixes
* Fixes
* Remove `trainer.fit` return value
* Update CHANGELOG
* flake8
* Undo results change
* Fix test
* Revert changes for a separate PR
* flake8
2021-04-28 19:11:32 +01:00
ananthsub
947d1cb757
[1/2] Add support for early stopping during training epoch end ( #6944 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-04-28 15:18:56 +02:00
thomas chaton
e76ebd640e
[feat] Add BasePredictionWriter 3/3 ( #7127 )
...
* wip
* update
* update
* update
* update
* update
* typo
* update on comments
* update
* update
* update
* update
* update changelog
* update
* Fix merge
* Fix merge
* move code
* resolve test
* add extra test
* add an extra test
* update on comments
* add typing
* resolve flake8
* Refactor and Docs
* Fix tests
* Fix tests
* Fix tests
* Duplicate
* Fix tests
* resolve bug
* update
* update on comments
* Update pytorch_lightning/utilities/imports.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/utilities/device_parser.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* update
* update
* update
* update on comments
* resolve flkae8
* update test
* Apply suggestions from code review
* update on comments
* Update pytorch_lightning/callbacks/prediction_writer.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update pytorch_lightning/callbacks/prediction_writer.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update pytorch_lightning/callbacks/prediction_writer.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* update on comments
* update
* update on comment
* Apply suggestions from code review
* update
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-04-27 20:23:55 +00:00
Adrian Wälchli
3b36d81c03
Fixed `num_sanity_val_steps` affecting reproducibility of training data shuffling ( #7014 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-27 09:51:39 +00:00
Carlos Mocholí
33066f8fd9
Add `on_predict_{batch,epoch}_{start,end}` and `Callback.on_predict_{start,end}` ( #7141 )
...
* Update hooks typing and predict hooks
* Update CHANGELOG
* Progress
* Progress
* Add back `on_predict_{start,end}`
* Typing and fix
* Update tests/trainer/logging_/test_logger_connector.py
* Update tests/callbacks/test_lambda_function.py
2021-04-22 10:05:28 -04:00
Adrian Wälchli
d12c6cf2b3
more early stopping options (convergence and divergence threshold) ( #6868 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-19 16:49:52 +02:00
Adrian Wälchli
67d21609c9
Add Trainer max_time argument + Callback ( #6823 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2021-04-16 13:38:57 +02:00
shuyingsunshine21
03a73b37bc
Train End Error Handling Fix ( #6864 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-04-14 20:35:42 +02:00
Carlos Mocholí
15926b462c
Add SWA warning if not running every epoch ( #6987 )
...
* Add SWA warning if not running every epoch
* Typo
2021-04-13 18:34:40 +02:00
Ethan Harris
b9bc77293b
Fix inconsistent outputs in `on_*_end` and `*_end` ( #6969 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-13 15:16:21 +01:00
scart97
eb15abcd82
Fix finetuning complex models correctly unfreezes. ( #6880 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-08 12:59:06 +05:30
Michael Baumgartner
6dc1078822
Enforce an epoch scheduler interval when using SWA ( #6588 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-06 02:57:33 +00:00
Karthik Prasad
c3da7f50bb
Sanitize `None` params during pruning ( #6836 )
...
* sanitize none params during pruning
* amend
2021-04-06 01:47:59 +02:00