Commit Graph

37 Commits

Author SHA1 Message Date
Arnaud Gelas 2373858b33
Fix pre-commit isort failure on tests/checkpointing/*.py (#5427)
* Remove tests.checkpointing from skipped module in pyproject.toml

* Fix pre-commit isort failure on tests/checkpointing/*.py
2021-01-12 03:31:51 -05:00
Alan Du f6dc354349
Throw MisconfigurationError on unknown mode (#5255)
* Throw MisconfigurationError on unknown mode

* Add tests

* Add match condition for deprecation message
2021-01-12 02:31:26 -05:00
Jirka Borovec 059f4630c8
prune check on Trainer fit result (#5453)
* prune check on Trainer fit result

* flake8

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* .

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-01-11 19:36:48 -05:00
Jirka Borovec beb8cacf1c fix formatting - flake8 + isort 2021-01-06 21:31:48 +01:00
Carlos Mocholí 3ee3c42035 Prepare 1.1.3 release (#5365)
* Prepare 1.1.3 release

* Fix flake8 error

* suppress

* Remove 1.1.4 section

* Add missing commits to CHANGELOG

* Update PR template

* Add missing commit

* fix

* Update CHANGELOG.md

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

(cherry picked from commit 4d9db866a1)
2021-01-06 15:17:27 +01:00
Jirka Borovec 9610ea817b refactor imports of logger dependencies (#4860)
* refactor imports of logger dependencies

* fix

* fix

* fix

* name

* fix

* mocks

* fix tests

* fix mlflow

* fix test tube

* fix wandb import check

* whitespace

* name

* name

* hack

* hack

* rev

* fix

* update mlflow import check

* try without installing conda dep

* .

* .

* .

* .

* .

* .

* .

* .

* .

Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

(cherry picked from commit ec0fb7a3ec)
2021-01-06 15:16:06 +01:00
chaton 56437e98a6 [bug-fix] Trainer.test points to latest best_model_path (#5161)
* resolve bug

* update code

* add set -e

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update test

* Update tests/checkpointing/test_trainer_checkpoint.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Update tests/checkpointing/test_trainer_checkpoint.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update on comments

* resolve test

* convert to set

* update

* add error triggering

* update

* update on comments

* update

* resolve import

* update

* update

* Update pytorch_lightning/plugins/rpc_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

(cherry picked from commit d5b367871f)
2021-01-06 15:14:10 +01:00
Rohit Gupta 9cfbf8d609 Disable checkpointing, earlystopping and logging with fast_dev_run (#5277)
* Disable checkpointing, earlystopping and logger with fast_dev_run

* docs

* chlog

* disable callbacks and enable DummyLogger

* add log

* use dummy logger method

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

(cherry picked from commit f740245521)
2021-01-06 12:57:24 +01:00
Rohit Gupta 81e9d4260e Fix saved filename in ModelCheckpoint if it already exists (#4861)
* disable version if not required

* disable version if not required

* pep

* chlog

* improve test

* improve test

* parametrize test and update del_list

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* try appending version to already saved ckpt_file

* Revert "try appending version to already saved ckpt_file"

This reverts commit 710e05e01f738d982aabf1f36c09fa59293e5c0c.

* add more assertions

* use BoringModel

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2021-01-05 09:57:37 +01:00
Jirka Borovec b72ed71d4e
Refactor: clean trainer device & distrib setters (#5297)
* naive replace

* simplify

* clean

* .

* fix

* .

* fix

* fix
2021-01-04 17:10:13 +00:00
Jirka Borovec af833f673c
drop deprecated TrainResult (#5323)
* drop TrainResult

* .

* .

* .

* .

* .

* .
2021-01-04 09:54:21 +08:00
Jirka Borovec fb90eec515
drop deprecated checkpoint filepath (#5321)
* drop deprecated checkpoint filepath

* tests
2021-01-02 00:08:29 +01:00
Jirka Borovec 35fd6e93c7
refactor - check E501 (#5200) 2020-12-21 14:23:09 +05:30
Carlos Mocholí 398f122a42
Improve some tests (#5049)
* Improve some tests

* Add TrainerState asserts

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-13 23:04:16 +08:00
Jirka Borovec 05f25f3a54
update usage of deprecated checkpoint_callback (#5006)
* drop usage of deprecated checkpoint_callback

* fix

* fix
2020-12-09 14:14:34 -05:00
Jan-Henrik Lambrechts b00991efd8
Added changeable extension variable for model checkpoints (#4977)
* Added changeable extension variable for model checkpoints

* Removed whitespace

* Removed the last bit of whitespace

* Wrote tests for FILE_EXTENSION

* Fixed formatting issues

* More formatting issues

* Simplify test by just using defaults

* Formatting to PEP8

* Added dummy class that inherits ModelCheckpoint; run only one batch instead of epoch for integration test

* Fixed too much whitespace formatting

* some changes

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-12-06 22:58:50 +05:30
chaton c2e6e68c7e
optimizer clean up (#4658)
* add LightningOptimizer

* typo

* add mock closure

* typo

* remove logic in optimizer_step

* update

* update

* update

* desactivate LightningOptimizer for hovorod

* resolve flake

* typo

* check optimizer name

* change name

* added backward to LightningOptimizer

* remove use_lightning_optimizer

* move update

* simplify init

* resolve comments

* resolve bug

* update

* update

* resolve bugs

* resolve flake8

* set state

* work manual_optimizer_step

* add doc

* add enable_pl_optimizer

* make optimizer_step

* add make_optimizer_step

* add examples

* resolve test

* add test_optimizer_return_options_enable_pl_optimizer

* add enable_pl_optimizer=True

* update

* update tests

* resolve bugs

* update

* set Trainer to False

* update

* resolve bugs

* update

* remove from doc

* resolve bug

* typo

* update

* set to True

* simplification

* typo

* resolve horovod

* unwrap horovod

* remove Optimizer

* resolve horovod

* move logic to amp_backend

* doesn't seem to be pickable

* update

* add again

* resolve some bugs

* cleanup

* resolve bug with AMP

* change __repr__

* round at -12

* udpate

* update

* update

* remove from horovod

* typo

* add convert_to_lightning_optimizers in each accelerators

* typo

* forgot

* forgot a convert_to_lightning_optimizers

* update

* update

* update

* increase coverage

* update

* resolve flake8

* update

* remove useless code

* resolve comments + add support for LightningOptimizer base class

* resolve flake

* check optimizer get wrapped back

* resolve DDPSharded

* reduce code

* lightningoptimizer

* Update pytorch_lightning/core/optimizer.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/core/lightning.py

* remove reference to step function

* Apply suggestions from code review

* update on comments

* resolve

* Update CHANGELOG.md

* add back training_step in apex and native_amp

* rename optimizer_step

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-01 00:09:46 +00:00
Jeff Yang 7d96fd1168
[tests/checkpointing] refactor with BoringModel (#4661)
* [tests/checkpointing] refactor with BoringModel

* [tests/checkpointing] refactor with BoringModel

* [tests/checkpointing] refactor with BoringModel

* LessBoringModel -> LogInTwoMethods

* LessBoringModel -> LogInTwoMethods

* LessBoringModel -> TrainingStepCalled

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai>
2020-11-24 01:23:12 +01:00
Roger Shieh 42e59c6add
Cast hparams to dict when not using omegaconf (#4770)
* init fix

* init test

* more specific dict assert

* update changelog

* Update tests/checkpointing/test_model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-20 19:53:05 +08:00
Carlos Mocholí 396a46f55f
Add current_score to ModelCheckpoint.on_save_checkpoint (#4721)
* Add current_score to ModelCheckpoint.on_save_checkpoint

* Update CHANGELOG

[ci skip]

* fix

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* fix2

* Add test for NaN

* Fix failing tests

* Simplify line

* Add test docstrings

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-11-18 08:09:44 +00:00
Jirka Borovec e1955e3c89
isolate PL debugger in tests (#4643)
* isolate PL debugger in tests

* miss

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-14 11:22:56 +00:00
Kai Zhang 30ad3e2ad3
Replace a MisconfigurationException with warning in ModelCheckpoint callback (#4560)
* replace MisconfigurationException with warning

* update test

* check raising UserWarning

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-10 10:44:43 +01:00
Rohit Gupta ad2556b669
Disable saving checkpoints if not trained (#4372)
* Disable saving checkpoints if not trained

* chlog

* update test

* fix

Co-authored-by: chaton <thomas@grid.ai>
2020-11-03 11:38:32 +05:30
Jirka Borovec ef03c39ab7
Add step index in checkpoint name (#3807)
* true final value of global step

* ch check

* tests

* save each validation interval

* wip

* add test

* add test

* wip

* fix tests, revert old edits, fix merge conflicts, update doctests

* test + bugfix

* sort files

* format test

* suggestion by ananth

* added changelog

* naming

* docs

* example

* suggestion

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix test

* pep

* pep

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-11-02 15:05:58 +01:00
Adrian Wälchli 6ae4c6ec85
update docs on checkpoint_callback Trainer argument (#4461)
* docs update

* update callbacks docs

* docs

* notebook examples

* warning

* line lenght

* update deprecation

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Roger Shieh <55400948+s-rog@users.noreply.github.com>
2020-11-02 06:18:20 +01:00
Jeff Yang 0f584faa6b
PyTorch 1.7 Stable support (#3821)
* prepare for 1.7 support [ci skip]

* tpu [ci skip]

* test run 1.7

* all 1.7, needs to fix tests

* couple with torchvision

* windows try

* remove windows

* 1.7 is here

* on purpose fail [ci skip]

* return [ci skip]

* 1.7 docker

* back to normal [ci skip]

* change to some_val [ci skip]

* add seed [ci skip]

* 4 places [ci skip]

* fail on purpose [ci skip]

* verbose=True [ci skip]

* use filename to track

* use filename to track

* monitor epoch + changelog

* Update tests/checkpointing/test_model_checkpoint.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-30 15:42:14 +00:00
Adrian Wälchli d1234c592d
deprecate passing ModelCheckpoint instance to Trainer(checkpoint_callback=...) (#4336)
* first attempt

* update tests

* support multiple

* test bugfix

* changelog

* pep

* pep

* import order

* import

* improve test for resuming

* test

* update test

* add references test

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* docstring suggestion deprecation

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>

* paramref

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-10-30 04:47:37 +01:00
Carlos Mocholí 00cc69aed7
Add "monitor" to saved ModelCheckpoints (#4383)
* Add key

* Remove unused variables

* Update CHANGELOG [skip ci]

* best_model_monitor -> monitor

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-28 15:21:08 +05:30
chaton 3abfec8962
[HOTFIX] ModelCheckpoint - Don't increase current_epoch and global_step if not trained (#4291)
* add two tests w/wo tempdir

* resolve flake8

* this test is failing

* update bug report

* resolve bug and add test

* remove bug_report

* resolve flake8

* resolve bug

* resolve pep8

* resolve pep8

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
2020-10-23 11:17:50 +01:00
Rohit Gupta 4c7ebdc32b
Add dirpath and filename parameter in ModelCheckpoint (#4213)
* Add dirpath and filename parameter in ModelCheckpoint

* remove old function

* chlog

* codefactor

* update tests

* docs

* fix doctest and added tests

* pathlib dirpath

* dep version and docs

* try fix doctest

* pep

* suggestions
Co-authored-by: carmocca <carlossmocholi@gmail.com>

* suggestions

* fix test

* pep

* trigger tests

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* suggestions

* try fix windows test

* add and update some tests

* trigger tests

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-23 09:59:12 +05:30
William Falcon 8a20d6af51
make save fx part of model checkpoint cb (#4284) 2020-10-21 10:06:42 -04:00
Sean Naren 98eb736496
Added getstate/setstate method for torch.save serialization (#4127)
* Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object

* Added tests to ensure torch.save does not fail

* Added flags to ensure compatible ddp cpu environment

* Removed torch version check due to minimum already being 1.3, reduced epochs for speed

* Moved tests to separate file

* Update to accelerator, move to ddp_spawn to prevent hanging ddp
2020-10-13 16:47:23 -04:00
William Falcon 09c2020a93
notices (#4118) 2020-10-13 07:18:07 -04:00
Jirka Borovec 8873750cf0
remove deprecated early_stop_callback (#3982) 2020-10-08 06:30:33 -04:00
Sean Naren 2aebf65241
Test to ensure ckpt filepath contains correct val score (#3933)
* Added test to ensure ckpt filepath contains the correct val score reported from the trainer

* Modified to check all saved ckpt files
2020-10-07 07:43:17 -04:00
Jirka Borovec 6ac0958166
fix init nan for checkpointing (#3863)
* add test for checkpoint nan

* fix

* pep
2020-10-05 07:36:12 -04:00
William Falcon d9656d166c
fixed model checkpoint frequency (#3852)
* fixed model checkpoint frequency

* fixed model checkpoint frequency

* fixed model checkpoint frequency

* fixed model checkpoint frequency

* merged
2020-10-04 21:49:20 -04:00