Commit Graph

753 Commits

Author SHA1 Message Date
William Falcon a38d108a68
add dist lib to enable syncing anything across devices (#3762)
* add dist lib to enable syncing anything across devices
2020-10-01 01:21:38 -04:00
William Falcon cf182e80fc
Finish Allow on_save_checkpoint... (#3688)
* Finish #3562

* Apply suggestions from code review

* Apply suggestions from code review

* fix tests

* Finish #3562

* Apply suggestions from code review

* Apply suggestions from code review

* fix tests

* fix structure

* fix structure

* make save_last test pass

* unnecessary global rank check

* fix test

* update test

* update test

* test

* test

* run save on all

* remove assert

* tracking saves

* check if fails

* test

* clean up

* adjust horovod test

* clean up

* remove unnecessary makdirs

* change

* undo

* debug

* debug

* debug

* debug

* mock

* undo debug code

* add extra assertions

* test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-30 16:15:29 -04:00
Adrian Wälchli c73032e39d
Make ModelCheckpoint(save_top_k=-1) track the best models (#3735)
* fix topk=-1 tracking best

* update test

* clean up

* add changelog

* enable loading best topk in trainer.test()

* make trivial

* return right away

* make windows test path happy
2020-09-30 08:34:02 -04:00
Jirka Borovec 31a36f04df
define distributed as a type (#3740)
* define type

* miss

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* miss

* warn

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-30 08:33:01 -04:00
Adrian Wälchli 9405c880af
log/save_interval based on global step (#3667)
* log interval based on global step

* test

* test

* test

* test

* pep

* pep

* added changelog

* pep

* merge

* remove unused arg
2020-09-30 12:26:27 +02:00
William Falcon b3be8022bd
tests for val step flow and logging (#3731)
* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end

* ref: test log dict

* ref: test log dict

* ref: test log dict

* ref: test log dict
2020-09-29 22:12:56 -04:00
ananthsub 3dcf7130c5
Support checkpoint hooks on data module (#3563)
* Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter

* Store a reference to the trainer on the datamodule

Fixes #3682

* Update data_connector.py

* Update data_connector.py

* Update test_datamodules.py

* Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter

* support checkpoint hooks for datamodule

refactor on_{save/load}_checkpoint to a separate hook class that both the lightning module and data module inherit
add spots in callback connector to call new datamodule hooks if available

* hooks formatting

* Update hooks.py

* Update checkpoint_connector.py

* Update lightning.py

* update based on upstream/master

checkout upstream/master

* Update checkpoint_connector.py

* add tests

* undo format revert

* Updated CHANGELOG.md

* add checkpoint hooks

* add Dict type

* import CheckpointHooks
2020-09-29 19:51:44 +02:00
William Falcon c14928a72a
ref: test val flow steps (#3723)
* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end
2020-09-29 11:42:38 -04:00
Maxim Grechkin 7bb139816a
Add a more direct test of multi-gpu training working (#2084)
* Add a more direct test of multi-gpu training working

* Update tests/base/develop_pipelines.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-29 15:38:09 +02:00
Carlos Mocholí 3b2efe5b2a
Fix ModelCheckpoint period (#3630)
* Fix ModelCheckpoint period

* Remove comma

* Minor changes

* skip check

* Revert "skip check"

Already pushed to master

This reverts commit 00d9e77b81.

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-29 15:36:45 +02:00
William Falcon f42ea303c9
ref: enable self.log for eval loop metrics (#3715)
* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end

* ref: test val epoch end
2020-09-29 02:00:28 -04:00
William Falcon c41ea86b35
ref: move backends back to individual files (1/5) (ddp_cpu) (#3712)
* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: make each backend independent for easier debugging and independent debugging

* ref: test val epoch end

* ref: test val epoch end
2020-09-29 01:59:18 -04:00
Rohit Gupta 783750547d
disable optimizers setup during testing (#3059)
* disable configure_optimizers during testing

* minor changes

* hvd and ddp

* fix precision during testing

* fix ddp

* fix amp

* fix cpu

* update dp

* simplify optimizers

* add test

* codefactor

* ref optimizer setup

* chlog

* suggestions

* isort

* rebased with master
2020-09-29 01:09:04 +02:00
William Falcon 4d5c0fa1bc
ref: separate flow vs log tests (#3704) 2020-09-28 12:01:52 -04:00
William Falcon cdd7266cd8
ref: enable self.log from val step (#3701)
* .log in eval

* ref

* ref: enable self.log in val step
2020-09-28 10:49:07 -04:00
William Falcon 2ecaa2a8be
ref: (2/n) fix no log in epoch end (#3699) 2020-09-28 08:25:44 -04:00
William Falcon ddd11075bd
[WIP] ref: deprecated results obj, added support for simpler comms (1/n) (#3681)
* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix typing err

* fix str

* fix typing err
2020-09-27 23:19:46 -04:00
William Falcon ff2bab0996
ref: (results 1/n) enable tracking original metric when step and epoch are both true (#3685)
* enable tracking original metric when step and epoch are both true
2020-09-27 22:08:31 -04:00
William Falcon 931995b55b
remove flake 8 (#3687) 2020-09-27 20:40:02 -04:00
Adrian Wälchli f37e9e8a83
Fix global step increment on training_epoch_end (#3673)
* fix

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix global step err

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-09-27 20:19:51 -04:00
Adrian Wälchli d15fd751c7
change default save_top_k, save_last to None (#3680)
* topk default

* fix test that doesn't have best available

* remove print

* #3680 changes

* fix backward

* temp revert

te

* add warning by carmocca

* format docstring for test

* specify monitor in ES test with top k

* improve docstring for save_last

* remove commented lines

* revert passing model to test

* undo regex mistake

* changelog

* fix test covering case monitor=None and savetopk=-1

* docstring

* fix test for saving all checkpoints

* don't save checkpoints for save_top_k=0

* add test for savetopk=0

Co-authored-by @carmocca

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-09-27 20:05:02 -04:00
ananthsub 94c79bb3ba
Add a reference to the Trainer on the LightningDataModule (#3684)
* Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter

* Store a reference to the trainer on the datamodule

Fixes #3682

* Update data_connector.py

* Update data_connector.py

* Update test_datamodules.py
2020-09-27 19:48:01 -04:00
Pariente Manuel 3d76f604bd
Add ModelCheckpoint.to_yaml method (#3048)
* Add ModelCheckpoint.to_json()

* Add ModelCheckpoint.to_json() test

* Fix W292: Add new line at end of file

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Fixed tests

* Update pytorch_lightning/callbacks/model_checkpoint.py

* Apply suggestions from code review

* fix test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-27 14:39:40 +02:00
William Falcon d79bce1dff
enable None model checkpoint default (#3669)
* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default
2020-09-26 23:14:04 -04:00
Adrian Wälchli 3ff5327e83
Mocking loggers (part 1, wandb) (#3596)
* mocking for wandb

* remove wandb import in amp test

* mock loggers in sphinx

* check tests

* Update extra.txt

* setup

* dev

* min

* revert

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-25 16:00:02 +02:00
Carlos Mocholí e70aea7642
Allow ModelCheckpoint monitor to be None (#3633)
* Fix ModelCheckpoint period

* Test for less epochs
2020-09-25 15:54:04 +02:00
Carlos Mocholí ed12e422a4
Fix incorrect "Saving latest checkpoint" warning (#3588)
* Fix incorrect "Saving latest checkpoint" warning

* Replace warning with info. Run PyCharm's optimize imports

* Remove unused class variable. Refactor logic. Improve test

* Fix De Morgan's
2020-09-25 14:18:06 +02:00
Antoine Broyelle 17c8c95fbc
Wrap prepare_data and setup only once inside DataModule (#3654)
Fix #3652
2020-09-25 07:09:50 -04:00
Carlos Mocholí 908382f196
Split GPUStatsMonitor function (#3644)
* Split function

* Add docstrings

* Add typing annotations

* Minor refactor

* Make static to add a test
2020-09-25 07:30:30 +02:00
Jirka Borovec aa52c930f4
test examples (#3643)
* test examples

* testing

* testing

* typo

* req

* exception

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-24 17:33:11 +02:00
Adrian Wälchli 3affa0e49a
use tmpdir in tests when writing predictions to disk (#3561)
* save to tmpdir

* path
2020-09-23 07:44:15 -04:00
William Falcon 031274c25d
fix dp issues + update examples and test examples (#3618)
* fix dp

* fix dp

* fix dp

* fix dp

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples
2020-09-23 00:19:46 -04:00
William Falcon c591013708
enable any logged metric to be accessible in callbacks (#3598)
* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* clarify forward

* clarify forward

* clarify forward

* clarify forward
2020-09-22 18:00:23 -04:00
Nicki Skafte 88e6b29bba
faster tests (#3604) 2020-09-22 07:37:34 -04:00
Carlos Mocholí 1223cdbaa1
Add missing line. Add a test (#3594) 2020-09-21 22:17:51 -04:00
Nicki Skafte b1347c956a
[Metrics] AUROC error on multilabel + improved testing (#3350)
* error on multilabel

* fix tests

* fix pep8

* changelog

* update doc test

* fix doctest

* fix doctest

* update from suggestion

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update test_classification.py

* Update test_classification.py

* retrigger test

* 'pep8

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-21 11:46:48 +02:00
William Falcon 21cfdf6874
ref: result 1/n (make monitor default to checkpoint_on to simplify re… (#3571)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* force crash when max_epochs < epochs in a checkpoint

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2020-09-20 22:58:43 -04:00
William Falcon 277538970d
force crash when max_epochs < epochs in a checkpoint (#3580)
* force crash when max_epochs < epochs in a checkpoint

* force crash when max_epochs < epochs in a checkpoint
2020-09-20 22:04:22 -04:00
William Falcon 9acee67c31
fixes 3549 (#3564) 2020-09-19 20:00:50 -04:00
Rohit Gupta 07b857769a
Allow kwargs in Wandb & Neptune + kwargs docstring (#3475)
* Allow kwargs in WandbLogger

* isort

* kwargs docstring

* typo

* kwargs for other loggers

* pep and isort

* formatting

* fix failing test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-19 18:51:43 +02:00
Jirka Borovec 8eb77cd06a
drop v0.10 deprecated (#3454)
* drop v0.10 deprecated

* import

* missed
2020-09-19 11:47:26 -04:00
Boris Feld e2af4f120e
Improve Comet Logger pickled behavior (#2553)
* Improve Comet Logger pickled behavior

* Delay the creation of the actual experiment object for as long as we can.
* Save the experiment id in case an Experiment object is created so we can
  continue the same experiment in the sub-processes.
* Run pre-commit on the comet file.

* Handle review comment

Make most Comet Logger attribute protected as they might not reflect the final
Experiment attributes. Also fix the typo in the test name.

* Ensure that CometLogger.name and CometLogger.version always returns str

* Add new test for CometLogger.version behavior

* Add new tests for CometLogger.name and CometLogger.version

* Apply review suggestions

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Remove extraneous comments in Comet logger tests

* Fix lint issues

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-18 23:26:29 +02:00
Carlos Mocholí 580b04b490
Fix ModelCheckpoints name formatting (#3163)
* Fix ModelCheckpoint's name formatting

* Fix failing tests

* Add dot to CHECKPOINT_SUFFIX

* Set variables to their default values at the end of tests

* Fix logic for filepath='' and filename=None. Add test

* Fix Windows tests

* Fix typo. Remove leading line break and zeroes

* Remove CHECKPOINT_SUFFIX

* Fix typos. Use appropriate f-string format

* Apply suggestions from code review

* Fix broken tests after #3320

* Finish changes suggested by Borda

* Use explicit test var names

* Apply suggestions

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Apply suggestions

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update CHANGELOG

* Apply suggestions from code review

* for

* prepend whitespace in warn msg

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-18 23:09:11 +02:00
Lucas Steinmann 197acd535f
Fix early stopping with training step's return dict (#3347)
* Fixes the test for early stopping without val step.

The expression which checked, if early stopping was triggered, had an off-by-one error and hence was true even if early stopping was not triggered.

Furthermore set patience to 0 and max epochs to 10, to ensure loss has enough time to flatten.

* Fixes early stopping without val step.

The issue has been, that only `early_stop_on` key was checked and not an arbitrary monitor key.

* Fixes branch, which checks whether early stopping is done during validation.

Before only `val_early_stop_on` was checked. Since arbitrary keys can be used, the set of possible validation keys cannot be exhaustive. Hence this disables "early stopping on_train_epoch_end" via an instance attribute if early stopping was executed in on_validation_epoch_end.
Furthermore adds a test, which ensures arbitrary keys work.

* Improve check whether eval results are used.

Only disable early checking with train results if eval results are actually used. Before they were always disabled in ``on_validation_epoch_end``.
Rename and document instance variable, to make it more clear.

* Remove wrong documentation on behaviour of early stopping with train result' dict.

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-18 23:08:04 +02:00
Jirka Borovec 7b64472ced
fix lib paths after Wandb 0.10 (#3520)
* try

* try

* drop 0.20

* drop 0.19.5

* -U

* Fixed Horovod in CI due to wandb==0.10.0 sys.path modifications (#3525)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* format

* wb freeze

* types

Co-authored-by: Travis Addair <taddair@uber.com>
2020-09-17 08:37:49 -04:00
Abe Botros 76c4afb840
Fix IoU score for classes not present in target or pred (#3098)
* Fix IoU score for classes not present in target or pred

Fixes #3097

- Allow configurable not_present_score for IoU for classes
  not present in target or pred. Defaults to 1.0.
- Also allow passing `num_classes` parameter through from iou
  metric class down to its underlying functional iou
  call.

* Changelog: move IoU not-present score fix to [unreleased]

* IoU: avoid recomputing class presence in target and pred

Use already-computed support, true positives, and false positives to
determine if a class is not present in either target or pred.

* Test IoU against sklearn jaccard_score

Also add TODO to test our IoU's not_present_score against sklearn's
jaccard_score's zero_division when it beecomes available.

* IoU: remove_bg -> ignore_index

Fixes #2736

- Rename IoU metric argument from `remove_bg` -> `ignore_index`.
- Accept an optional int class index to ignore, instead of a bool and
  instead of always assuming the background class has index 0.
- If given, ignore the class index when computing the IoU output,
  regardless of reduction method.

* Improve documentation for IoU not_present_score

* Update default IoU not_present_score to 0.0

* Add note about IoU division by zero

* Rename IoU not_present_score -> absent_score

* Update IoU absent score changelog wording

* Condense IoU absent_score argument docstring

* Remove unnecessary IoU ignore_index comment

* docstrings

* isort

* flake8

* Fix test of IoU against sklearn jaccard

Use macro instead of micro averaging in sklearn's jaccard score, to
match multi-class IoU, which conventionally takes per-class scores
before averaging.

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-09-17 10:37:49 +02:00
Jirka Borovec c64520e658
fix tensorboard version (#3132)
* tensorboard version

* WIP test tb hparams logs (#3040)

* optional

* req

* tensorboard>=2.2.0

* data

* data

* TB

Co-authored-by: Rosario Scalise <rosario@cs.washington.edu>
2020-09-15 23:48:48 +02:00
Adrian Wälchli 4ed96b2eb4
fix gradient norm tracking for row_log_interval > 1 (#3489)
* fix + test

* changelog

* Apply suggestions from code review

Co-authored-by: Tim Chard <timchard@hotmail.com>

* improve test

Co-authored-by: Tim Chard <timchard@hotmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-09-15 18:41:27 +02:00
Nicki Skafte 28af34bc51
[Metrics] Class reduction similar to sklearn (#3322)
* new class reduce interface

* update docs

* pep8

* update_class_metrics

* fix doctest

* changelog

* fix docs

* fix codefactor

* fix codefactor

* formatting

* fix typo

* fix typo

* typo pr -> per

* update from suggestion

* fix error

* Apply suggestions from code review

* Update CHANGELOG.md

* formatting

* timeouts

* docstring formatting for reg metrics

* pep

* flake8

* revert workflow changes

* suggestions

Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-09-15 14:36:14 +02:00
Alexander 5732a56560
Pass epoch argument to Comet Logger (#3438)
* Pass epoch argument

* Copy epoch instead of inplace pop

* Remove whitespace

* Add test for epoch logging

* add docstring

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-15 14:30:42 +02:00