Commit Graph

217 Commits

Author SHA1 Message Date
Jirka Borovec 6e124e7207
CI: precommit - docformatter (#8584)
* CI: precommit - docformatter
* fix deprecated

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
Kaushik B dc3391beae
Remove deprecation warnings being called for `on_{task}_dataloader` (#9279)
* Avoid deprecation warnings being called when hooks are not implemented
* Update tests & changelog
* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-06 10:03:30 +02:00
Eric Wiener cf1a589956
Support infinite training (#8877)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-04 23:33:43 +00:00
Adrian Wälchli b91747ef75
remove backward from training batch loop (#9265) 2021-09-03 00:15:40 +00:00
Adrian Wälchli e802f519ea
Tighten the checks for `Trainer.terminate_on_nan` (#9190) 2021-09-02 18:35:22 +02:00
Danielle Pintz b046bd0670
Add on_exception callback hook (#9183) 2021-09-01 10:49:00 +02:00
Danielle Pintz 65be98b5e2
Add mocked function assert to test_error_handling_all_stages (#9182) 2021-08-30 22:58:48 +00:00
Ning 1657588f35
deprecate `on_{train/val/test/predict}_dataloader()` from DataHooks (#9098)
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-08-28 17:27:56 +00:00
Danielle Pintz bd13d392af
Add error handling for all trainer entry points (#8819)
* [lightning] Ensure error handling works different trainer entry points
2021-08-18 02:04:40 +00:00
Adrian Wälchli 5b143d0264
simplify grad clip tests (#8883)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-13 16:40:20 +05:30
Adrian Wälchli 5fd157eb21
Reduce flakiness of memory test (#8651)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-12 12:50:45 +00:00
Carlos Mocholí ed13040729
Connect the model to the training type plugin at the start of run (#8536) 2021-08-04 17:43:34 +02:00
Kaushik B d01d8334b5
Fix `ddp` accelerator choice for cpu (#8645)
* Fix ddp accelerator choice for cpu
2021-08-02 21:24:07 +00:00
Kaushik B 850416f0a0
Fix distributed types support for CPUs (#8667) 2021-08-02 16:42:28 +05:30
Adrian Wälchli 529c42f848
fix collecting training_step outputs (#8613) 2021-07-30 13:03:15 +00:00
Sean Naren aadd2a9d9c
Load ckpt path when model provided in validate/test/predict (#8352)
* Change trainer loading behaviour for validate/test/predict

* Fix

* Fix/add tests

* remove

* Cleanups

* Space

* cleanups

* Add CHANGELOG.md

* Move after setup

* Cleanups on logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remve

* fix test

* feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update pytorch_lightning/trainer/properties.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Feedback

* Same fix

* Same fix

* Add test for behaviour, modify based on feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Wording

* Apply suggestions from code review

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Cleanup docs

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* feedback

* Fixes to test API

* Add carlos description

* Move logic further

* Move checkpoint connector logic

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-07-28 10:12:46 +00:00
Santiago Castro b256d6acd3
Avoid unnecessary list creation (#8595) 2021-07-28 13:36:45 +05:30
Carlos Mocholí e63968ab88
Add `pyupgrade` to `pre-commit` (#8557)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 14:38:12 +02:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
thomas chaton c9af1a7aec
[bugfix] Reduce memory leaks (#8490)
* reduce memory leak

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* resolve flake8

* update on comments

* resolve bug

* update

* Undo whitespace changes

* remove bug

* resolve flake8

* revert change

* update on comments

* delete the ddp wrapper as it hold memory

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve flake8

* update on comments

* update changelog

* resolve test

* Update CHANGELOG

* Refactor teardown

* Fix comment

* Do it for non-gpu too

* remove ref when the model is not a lightning_module

* Fix import error

* move down

* resolve bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve assignement

* update

* move above

* Fix device calls to support tpu training

* Updat todo

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-07-21 11:37:05 +02:00
Carlos Mocholí 9877265887
Simplify logger connector access (#8318) 2021-07-07 14:13:30 +02:00
Adrian Wälchli 6db0fe3659
training loop refactor - move val loop (#8120)
* EvaluationDataLoaderLoop -> EvaluationLoop

* proposed rename files

* imports

* bad merge

* update init files

* glue imports together

* rename fit_loop.validation_loop to fit_loop.val_loop

* move loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Group imports

* Resolve circular import

* Comment

* fix test

* try to resolve circ import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-29 09:06:44 +00:00
thomas chaton c521624a92
[bugfix] Add mechanism to prevent deadlock for DDP on Exception Trigger (#8167)
* add mechanism to prevent deadlock

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve flake8 + update changelog

* update on comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* remove space

* resolve bugs

* overwrite config

* update on comments

* update on comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* update

* update test with comments

* Update pytorch_lightning/plugins/training_type/parallel.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-28 19:26:03 +00:00
Adrian Wälchli 55a90af7fc
`pytorch_lightning.loops` file structure: group by dataloader, epoch, and batch loop (#8077) 2021-06-24 23:40:46 +02:00
thomas chaton f79f0f9de1
[Refactor] Remove _run_evaluation + 3 EvaluationLoop (#8065)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-22 16:10:07 +02:00
Adrian Wälchli 4becd1cf31
rename old `Trainer.train_loop` -> `Trainer.fit_loop` (#8025) 2021-06-22 11:49:32 +02:00
Adrian Wälchli 0d6dfd42d8
Merge pull request #7990 from PyTorchLightning/refactor/loops/loops_everywhere_eval
Loop Refactor 3/N - Evaluation Loop
2021-06-18 08:54:59 -04:00
David Chan c6e02e481e
[feat] Allow overriding optimizer_zero_grad and/or optimizer_step when using accumulate_grad_batches (#7980) 2021-06-17 12:50:37 +02:00
Adrian Wälchli 971908a1aa
Loop Refactor 1/N - Training Loop (#7871)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-15 12:55:06 +00:00
Carlos Mocholí ac4eb0a06a
`is_overridden` improvements (#7918)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-11 13:47:00 +02:00
Adrian Wälchli c1eac483e9
split `restore_training_state` into logical parts [2 / 2] (#7900) 2021-06-10 21:54:21 +02:00
Carlos Mocholí 5593b6f772
Merge pull request #7872 from PyTorchLightning/refactor/logger-poc-changes
Random fixes for logger connector PoC
2021-06-08 09:04:16 -04:00
ananthsub fa41c588f4
Remove ProfilerConnector class (#7654)
* Remove ProfilerConnector class

* Update trainer.py

* Update CHANGELOG.md

* Update trainer.py

* Update trainer.py

* tests
2021-05-24 08:58:15 -07:00
Yifu Wang 8d6e2ff7b2
Improve argument validation for validate(), test(), and predict() (#7605)
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
2021-05-21 09:03:16 -07:00
Carlos Mocholí 3d4dd28bec
Replace `CallbackHookNameValidator` with `FxValidator` [3/n] (#7627)
* Refactor FxValidator

* Fix tests

* Fix tests

* Class attribute

* Fix tests

* Better error message

* Fix tests

* Update pytorch_lightning/trainer/connectors/logger_connector/fx_validator.py
2021-05-21 11:54:16 +01:00
Carlos Mocholí 901b2bac98
Unify `current_fx_name` and `current_hook_fx_name` [2/n] (#7594)
* Minor loggger connector cleanup [1/n]

* Missing line

* Address comments

* Rely on validator

* Unify `current_fx_name` and `current_hook_fx_name`

* Fix test
2021-05-19 20:31:06 +00:00
Alan Du 6ac16ff348
Fix DistribType for `ddp_cpu` (spawn) (#7492) 2021-05-14 20:53:26 +01:00
Carlos Mocholí 072ad52b6b
Add `trainer.predict(ckpt_path)` (#7430)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-13 01:49:58 +02:00
Jirka Borovec d4ec75164c
Prune deprecated trainer attributes (#7501)
* use_single_gpu

* use_horovod

* use_ddp2

* use_ddp

* use_dp

* on_gpu

* use_tpu

* on_tpu

* on_cpu

* cleaning

* chlog

* Apply suggestions from code review

* Apply suggestions from code review
2021-05-12 20:10:15 +00:00
Adrian Wälchli ad9118f04a
remove trainer hidden state | sanity refactor [1 / n] (#7437)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-11 11:09:08 +02:00
Carlos Mocholí 7dcddb27f0
Refactor tests to use `BoringModel` (#7401) 2021-05-07 15:59:32 +02:00
Carlos Mocholí 8c0ea92af2
`TrainerState` refactor [5/5] (#7173)
* `TrainerState` refactor

* flake8

* Update finished check

* Test cleanup

* Fix tests

* Fixes

* Reorder

* flake8

* Update CHANGELOG

* Better docs

* Better docs

* Remove default

* Update tests

* Bad merge
2021-05-04 12:50:56 +02:00
Carlos Mocholí 5af086ab9f
Attach data refactor and tuner bugs [4/n] (#7258)
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 13:54:58 +00:00
ananthsub 14b8dd479a
[2/2] Remove training loop force calling early stopping callback (#7069)
* rebase

* doc

* Update training_loop.py

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md
2021-04-29 09:14:53 -07:00
thomas chaton 848288c8d8
[warning] Add a warning with missing callback with resume_from_checkpoint (#7254)
* add a warning

* add changelog
2021-04-29 12:39:45 +00:00
ananthsub 075de9356c
Reset current_fx properties on lightning module in teardown (#7247)
* Update trainer.py

* cleanup module properties in teardown

* Update test_trainer.py

* Update lightning.py

* Formatting

* flake8

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-28 12:17:20 -07:00
Carlos Mocholí 40f80230fe
Remove `trainer.fit` return value [2/n] (#7237)
* `_fit_impl` refactor and types

* Fix return

* Remove return docstring

* Fixes

* Fixes

* Remove `trainer.fit` return value

* Update CHANGELOG

* flake8

* Undo results change

* Fix test

* Revert changes for a separate PR

* flake8
2021-04-28 19:11:32 +01:00
thomas chaton e76ebd640e
[feat] Add BasePredictionWriter 3/3 (#7127)
* wip

* update

* update

* update

* update

* update

* typo

* update on comments

* update

* update

* update

* update

* update changelog

* update

* Fix merge

* Fix merge

* move code

* resolve test

* add extra test

* add an extra test

* update on comments

* add typing

* resolve flake8

* Refactor and Docs

* Fix tests

* Fix tests

* Fix tests

* Duplicate

* Fix tests

* resolve bug

* update

* update on comments

* Update pytorch_lightning/utilities/imports.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/utilities/device_parser.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update

* update

* update

* update on comments

* resolve flkae8

* update test

* Apply suggestions from code review

* update on comments

* Update pytorch_lightning/callbacks/prediction_writer.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update pytorch_lightning/callbacks/prediction_writer.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update pytorch_lightning/callbacks/prediction_writer.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* update on comments

* update

* update on comment

* Apply suggestions from code review

* update

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-04-27 20:23:55 +00:00
thomas chaton e147127c0e
[feat] Add better support for predict + ddp 2/3 (#7215)
* wip

* update

* update

* update

* update

* update

* typo

* update on comments

* update

* update

* update

* update

* update changelog

* update

* Fix merge

* Fix merge

* move code

* resolve test

* add extra test

* add an extra test

* update on comments

* add typing

* resolve flake8

* Refactor and Docs

* Fix tests

* Fix tests

* Fix tests

* Duplicate

* Fix tests

* resolve bug

* update

* update on comments

* update

* update changelog

* update

* update

* remove tpu

* resolve flake8

* update on comments

* update on comments

* update on comment

* resolve flake8

* add a cpu test for predict

* add None test

* update

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* resolve tests

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-27 08:46:45 -04:00
ananthsub dd5ec75e48
Deprecate save_function from model checkpoint callback (#7201)
* Update model_checkpoint.py

* Update CHANGELOG.md

* fix-tests

* deprecate not remove

* Update model_checkpoint.py

* Update test_remove_1-5.py
2021-04-26 17:55:26 +01:00