Commit Graph

791 Commits

Author SHA1 Message Date
Rohit Gupta 783750547d
disable optimizers setup during testing (#3059)
* disable configure_optimizers during testing

* minor changes

* hvd and ddp

* fix precision during testing

* fix ddp

* fix amp

* fix cpu

* update dp

* simplify optimizers

* add test

* codefactor

* ref optimizer setup

* chlog

* suggestions

* isort

* rebased with master
2020-09-29 01:09:04 +02:00
William Falcon 4d5c0fa1bc
ref: separate flow vs log tests (#3704) 2020-09-28 12:01:52 -04:00
William Falcon cdd7266cd8
ref: enable self.log from val step (#3701)
* .log in eval

* ref

* ref: enable self.log in val step
2020-09-28 10:49:07 -04:00
William Falcon 2ecaa2a8be
ref: (2/n) fix no log in epoch end (#3699) 2020-09-28 08:25:44 -04:00
William Falcon ddd11075bd
[WIP] ref: deprecated results obj, added support for simpler comms (1/n) (#3681)
* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* ref: deprecated results obj, added support for simpler comms. Decouples logging from loops

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix typing err

* fix str

* fix typing err
2020-09-27 23:19:46 -04:00
William Falcon ff2bab0996
ref: (results 1/n) enable tracking original metric when step and epoch are both true (#3685)
* enable tracking original metric when step and epoch are both true
2020-09-27 22:08:31 -04:00
William Falcon 931995b55b
remove flake 8 (#3687) 2020-09-27 20:40:02 -04:00
Adrian Wälchli f37e9e8a83
Fix global step increment on training_epoch_end (#3673)
* fix

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix global step err

* fix global step err

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-09-27 20:19:51 -04:00
Adrian Wälchli d15fd751c7
change default save_top_k, save_last to None (#3680)
* topk default

* fix test that doesn't have best available

* remove print

* #3680 changes

* fix backward

* temp revert

te

* add warning by carmocca

* format docstring for test

* specify monitor in ES test with top k

* improve docstring for save_last

* remove commented lines

* revert passing model to test

* undo regex mistake

* changelog

* fix test covering case monitor=None and savetopk=-1

* docstring

* fix test for saving all checkpoints

* don't save checkpoints for save_top_k=0

* add test for savetopk=0

Co-authored-by @carmocca

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-09-27 20:05:02 -04:00
ananthsub 94c79bb3ba
Add a reference to the Trainer on the LightningDataModule (#3684)
* Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter

* Store a reference to the trainer on the datamodule

Fixes #3682

* Update data_connector.py

* Update data_connector.py

* Update test_datamodules.py
2020-09-27 19:48:01 -04:00
Pariente Manuel 3d76f604bd
Add ModelCheckpoint.to_yaml method (#3048)
* Add ModelCheckpoint.to_json()

* Add ModelCheckpoint.to_json() test

* Fix W292: Add new line at end of file

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Fixed tests

* Update pytorch_lightning/callbacks/model_checkpoint.py

* Apply suggestions from code review

* fix test

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-27 14:39:40 +02:00
William Falcon d79bce1dff
enable None model checkpoint default (#3669)
* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default

* enable None model checkpoint default
2020-09-26 23:14:04 -04:00
Adrian Wälchli 3ff5327e83
Mocking loggers (part 1, wandb) (#3596)
* mocking for wandb

* remove wandb import in amp test

* mock loggers in sphinx

* check tests

* Update extra.txt

* setup

* dev

* min

* revert

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-25 16:00:02 +02:00
Carlos Mocholí e70aea7642
Allow ModelCheckpoint monitor to be None (#3633)
* Fix ModelCheckpoint period

* Test for less epochs
2020-09-25 15:54:04 +02:00
Carlos Mocholí ed12e422a4
Fix incorrect "Saving latest checkpoint" warning (#3588)
* Fix incorrect "Saving latest checkpoint" warning

* Replace warning with info. Run PyCharm's optimize imports

* Remove unused class variable. Refactor logic. Improve test

* Fix De Morgan's
2020-09-25 14:18:06 +02:00
Antoine Broyelle 17c8c95fbc
Wrap prepare_data and setup only once inside DataModule (#3654)
Fix #3652
2020-09-25 07:09:50 -04:00
Carlos Mocholí 908382f196
Split GPUStatsMonitor function (#3644)
* Split function

* Add docstrings

* Add typing annotations

* Minor refactor

* Make static to add a test
2020-09-25 07:30:30 +02:00
Jirka Borovec aa52c930f4
test examples (#3643)
* test examples

* testing

* testing

* typo

* req

* exception

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-24 17:33:11 +02:00
Adrian Wälchli 3affa0e49a
use tmpdir in tests when writing predictions to disk (#3561)
* save to tmpdir

* path
2020-09-23 07:44:15 -04:00
William Falcon 031274c25d
fix dp issues + update examples and test examples (#3618)
* fix dp

* fix dp

* fix dp

* fix dp

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples

* fix examples
2020-09-23 00:19:46 -04:00
William Falcon c591013708
enable any logged metric to be accessible in callbacks (#3598)
* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* enable any logged or written metric to be accessible in callbacks

* clarify forward

* clarify forward

* clarify forward

* clarify forward
2020-09-22 18:00:23 -04:00
Nicki Skafte 88e6b29bba
faster tests (#3604) 2020-09-22 07:37:34 -04:00
Carlos Mocholí 1223cdbaa1
Add missing line. Add a test (#3594) 2020-09-21 22:17:51 -04:00
Nicki Skafte b1347c956a
[Metrics] AUROC error on multilabel + improved testing (#3350)
* error on multilabel

* fix tests

* fix pep8

* changelog

* update doc test

* fix doctest

* fix doctest

* update from suggestion

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update test_classification.py

* Update test_classification.py

* retrigger test

* 'pep8

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-21 11:46:48 +02:00
William Falcon 21cfdf6874
ref: result 1/n (make monitor default to checkpoint_on to simplify re… (#3571)
* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax)

* force crash when max_epochs < epochs in a checkpoint

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2020-09-20 22:58:43 -04:00
William Falcon 277538970d
force crash when max_epochs < epochs in a checkpoint (#3580)
* force crash when max_epochs < epochs in a checkpoint

* force crash when max_epochs < epochs in a checkpoint
2020-09-20 22:04:22 -04:00
William Falcon 9acee67c31
fixes 3549 (#3564) 2020-09-19 20:00:50 -04:00
Rohit Gupta 07b857769a
Allow kwargs in Wandb & Neptune + kwargs docstring (#3475)
* Allow kwargs in WandbLogger

* isort

* kwargs docstring

* typo

* kwargs for other loggers

* pep and isort

* formatting

* fix failing test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-19 18:51:43 +02:00
Jirka Borovec 8eb77cd06a
drop v0.10 deprecated (#3454)
* drop v0.10 deprecated

* import

* missed
2020-09-19 11:47:26 -04:00
Boris Feld e2af4f120e
Improve Comet Logger pickled behavior (#2553)
* Improve Comet Logger pickled behavior

* Delay the creation of the actual experiment object for as long as we can.
* Save the experiment id in case an Experiment object is created so we can
  continue the same experiment in the sub-processes.
* Run pre-commit on the comet file.

* Handle review comment

Make most Comet Logger attribute protected as they might not reflect the final
Experiment attributes. Also fix the typo in the test name.

* Ensure that CometLogger.name and CometLogger.version always returns str

* Add new test for CometLogger.version behavior

* Add new tests for CometLogger.name and CometLogger.version

* Apply review suggestions

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Remove extraneous comments in Comet logger tests

* Fix lint issues

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-18 23:26:29 +02:00
Carlos Mocholí 580b04b490
Fix ModelCheckpoints name formatting (#3163)
* Fix ModelCheckpoint's name formatting

* Fix failing tests

* Add dot to CHECKPOINT_SUFFIX

* Set variables to their default values at the end of tests

* Fix logic for filepath='' and filename=None. Add test

* Fix Windows tests

* Fix typo. Remove leading line break and zeroes

* Remove CHECKPOINT_SUFFIX

* Fix typos. Use appropriate f-string format

* Apply suggestions from code review

* Fix broken tests after #3320

* Finish changes suggested by Borda

* Use explicit test var names

* Apply suggestions

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Apply suggestions

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update CHANGELOG

* Apply suggestions from code review

* for

* prepend whitespace in warn msg

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-18 23:09:11 +02:00
Lucas Steinmann 197acd535f
Fix early stopping with training step's return dict (#3347)
* Fixes the test for early stopping without val step.

The expression which checked, if early stopping was triggered, had an off-by-one error and hence was true even if early stopping was not triggered.

Furthermore set patience to 0 and max epochs to 10, to ensure loss has enough time to flatten.

* Fixes early stopping without val step.

The issue has been, that only `early_stop_on` key was checked and not an arbitrary monitor key.

* Fixes branch, which checks whether early stopping is done during validation.

Before only `val_early_stop_on` was checked. Since arbitrary keys can be used, the set of possible validation keys cannot be exhaustive. Hence this disables "early stopping on_train_epoch_end" via an instance attribute if early stopping was executed in on_validation_epoch_end.
Furthermore adds a test, which ensures arbitrary keys work.

* Improve check whether eval results are used.

Only disable early checking with train results if eval results are actually used. Before they were always disabled in ``on_validation_epoch_end``.
Rename and document instance variable, to make it more clear.

* Remove wrong documentation on behaviour of early stopping with train result' dict.

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-18 23:08:04 +02:00
Jirka Borovec 7b64472ced
fix lib paths after Wandb 0.10 (#3520)
* try

* try

* drop 0.20

* drop 0.19.5

* -U

* Fixed Horovod in CI due to wandb==0.10.0 sys.path modifications (#3525)

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* format

* wb freeze

* types

Co-authored-by: Travis Addair <taddair@uber.com>
2020-09-17 08:37:49 -04:00
Abe Botros 76c4afb840
Fix IoU score for classes not present in target or pred (#3098)
* Fix IoU score for classes not present in target or pred

Fixes #3097

- Allow configurable not_present_score for IoU for classes
  not present in target or pred. Defaults to 1.0.
- Also allow passing `num_classes` parameter through from iou
  metric class down to its underlying functional iou
  call.

* Changelog: move IoU not-present score fix to [unreleased]

* IoU: avoid recomputing class presence in target and pred

Use already-computed support, true positives, and false positives to
determine if a class is not present in either target or pred.

* Test IoU against sklearn jaccard_score

Also add TODO to test our IoU's not_present_score against sklearn's
jaccard_score's zero_division when it beecomes available.

* IoU: remove_bg -> ignore_index

Fixes #2736

- Rename IoU metric argument from `remove_bg` -> `ignore_index`.
- Accept an optional int class index to ignore, instead of a bool and
  instead of always assuming the background class has index 0.
- If given, ignore the class index when computing the IoU output,
  regardless of reduction method.

* Improve documentation for IoU not_present_score

* Update default IoU not_present_score to 0.0

* Add note about IoU division by zero

* Rename IoU not_present_score -> absent_score

* Update IoU absent score changelog wording

* Condense IoU absent_score argument docstring

* Remove unnecessary IoU ignore_index comment

* docstrings

* isort

* flake8

* Fix test of IoU against sklearn jaccard

Use macro instead of micro averaging in sklearn's jaccard score, to
match multi-class IoU, which conventionally takes per-class scores
before averaging.

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-09-17 10:37:49 +02:00
Jirka Borovec c64520e658
fix tensorboard version (#3132)
* tensorboard version

* WIP test tb hparams logs (#3040)

* optional

* req

* tensorboard>=2.2.0

* data

* data

* TB

Co-authored-by: Rosario Scalise <rosario@cs.washington.edu>
2020-09-15 23:48:48 +02:00
Adrian Wälchli 4ed96b2eb4
fix gradient norm tracking for row_log_interval > 1 (#3489)
* fix + test

* changelog

* Apply suggestions from code review

Co-authored-by: Tim Chard <timchard@hotmail.com>

* improve test

Co-authored-by: Tim Chard <timchard@hotmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-09-15 18:41:27 +02:00
Nicki Skafte 28af34bc51
[Metrics] Class reduction similar to sklearn (#3322)
* new class reduce interface

* update docs

* pep8

* update_class_metrics

* fix doctest

* changelog

* fix docs

* fix codefactor

* fix codefactor

* formatting

* fix typo

* fix typo

* typo pr -> per

* update from suggestion

* fix error

* Apply suggestions from code review

* Update CHANGELOG.md

* formatting

* timeouts

* docstring formatting for reg metrics

* pep

* flake8

* revert workflow changes

* suggestions

Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-09-15 14:36:14 +02:00
Alexander 5732a56560
Pass epoch argument to Comet Logger (#3438)
* Pass epoch argument

* Copy epoch instead of inplace pop

* Remove whitespace

* Add test for epoch logging

* add docstring

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-15 14:30:42 +02:00
Phil b5dc6998ae
Disable train dataloader shuffle when overfit_batches is active. (#3501)
* Disable train dataloader shuffle when overfit_batches is active.

* pep8

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-09-15 05:07:27 -04:00
Justus Schock 4dc4c8cfa5
Metric aggregation (#3321)
* metric aggregation

* metric aggregation

* add at_least_1d

* fix output formatting

* add metric tests

* add missing test case

* remove reduce_op frm metric classes

* fix reduce_op stuff

* start test fixing

* fix tests due to aggregation

* fix faulty import

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* remove reduce_op docstrings

* add compute

* remove import

* remove collection metric

* update base class

* update tests

* Update metric.py

* Update metric.py

* Apply suggestions from code review

* change default aggregate

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-09-14 07:23:11 -04:00
Cookie_thief a552d4a2d5
fix normalize mode at confusion matrix (replace nans with zeros) (#3465)
* replace nans to 0 at conf. matrix & update tests

* cm.isnan() -> torch.isnan(cm)

* fix row-wise division while normalize

* update tests

* pep8 fix

* Update tests/metrics/test_classification.py

add comment to test

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Update tests/metrics/functional/test_classification.py

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Update pytorch_lightning/metrics/functional/classification.py

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* final update

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-09-14 10:05:51 +02:00
William Falcon 1d7c615d82
cleaning up stale logger tests + flake8 (#3490)
* cleaning up stale logger tests

* cleaning up stale logger tests

* cleaning up stale logger tests

* cleaning up stale logger tests

* cleaning up stale logger tests

* cleaning up stale logger tests
2020-09-14 00:06:48 -04:00
William Falcon 59d8472548
ref: slurm connector 1/n (#3476)
* ref: slurm connector 1/n

* ref: slurm connector 1/n

* ref: slurm connector 1/n

* ref: slurm connector 1/n
2020-09-12 11:07:15 -04:00
William Falcon cd16aa9854
ref: checkpoint connector methods 4/n (#3474)
* ref: checkpoint connector methods 4/n

* ref: checkpoint connector methods 4/n

* ref: checkpoint connector methods 4/n

* ref: checkpoint connector methods 4/n

* ref: checkpoint connector methods 4/n

* ref: checkpoint connector methods 4/n

* ref: checkpoint connector methods 4/n

* ref: checkpoint connector methods 4/n

* ref: checkpoint connector methods 4/n
2020-09-12 08:42:27 -04:00
William Falcon de99222834
ref: accelerator connector methods x/n (#3469)
* ref: accelerator connector methods x/n

* ref: accelerator connector methods x/n
2020-09-11 21:52:22 -04:00
ananthsub d1d48e2ea1
Fix trivial comparison in model checkpoint test (#3464)
We were comparing keys across the same checkpoint dict instead of ckpt_last vs ckpt_last_epoch

All other changes here are formatting
2020-09-11 20:50:46 +02:00
Adrian Wälchli bd5f53c519
implement fix and test (#3459) 2020-09-11 10:55:58 -04:00
Nicki Skafte 93cf6d0054
[Metrics] class based embedding similarity + tests (#3358)
* embedding similarity class + test

* fix tests

* fix pep8

* add docs

* noindex

* Update docs/source/metrics.rst

* Update pytorch_lightning/metrics/self_supervised.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/metrics/self_supervised.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* suggestions

* changes to init

* move __all__

* fix imports

* Apply suggestions from code review

* assert typo

* change import

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <nugginea@gmail.com>
2020-09-11 12:11:50 +02:00
Cookie_thief d05d4c78e1
add num_classes argument to confusion matrix (#3450)
* add num_classes arg to confusion matrix

* update ConfusionMatrix test

* final update)
2020-09-10 18:39:04 -04:00
Rohit Gupta a1ea681c47
Fix batch_outputs with optimizer frequencies (#3229)
* Fix batch_outputs with optimizers frequencies

* optimizers

* fix batch_outputs with optimizer frequencies

* clean test

* suggestion

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* chlog

* failing doctest

* failing doctest

* update doctest

* chlog

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-09-10 23:01:20 +02:00
William Falcon 5abf7d9123
ref: move lr_finder (#3434)
* ref: move lr_finder

* ref: move lr_finder

* ref: move lr_finder

* ref: move lr_finder

* ref: move lr_finder

* ref: move lr_finder

* ref: move lr_finder
2020-09-09 22:12:27 -04:00
William Falcon b36c5e86d0
ref: trainer argparse 1/n (#3421)
* ref: trainer argparse 1/n

* ref: trainer argparse 1/n

* ref: trainer argparse 1/n

* ref: trainer argparse 1/n

* ref: trainer argparse 1/n

* ref: trainer argparse 1/n

* ref: trainer argparse 1/n

* ref: trainer argparse 1/n
2020-09-09 12:31:17 -04:00
Patrick Orlando 656c1af0df
Get experiment_id from MLFlow only once instead of each training loop (#3394)
* Get experiment_id from MLFlow only once instead of each training loop.

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* add test that asserts mlflow client is called to retrieve experiment id only once

* make pep8 happy

* logs

Co-authored-by: Patrick Orlando <patrick.orlando@rea-group.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-09 11:38:26 +02:00
Adrian Wälchli e245065fbc
limit auto scaling batch size to the size of the training dataset (#3271)
* fix

* fix and test

* fix merge error

* test for max dataset size

* changelog

* update docs

* fix merge

* unused imports

* imports
2020-09-09 10:51:43 +02:00
William Falcon 8f6b115511
ref: added model connector (#3407)
* ref: added model connector

* ref: added model connector

* ref: added model connector
2020-09-09 00:24:20 -04:00
William Falcon 722c44c7d0
ref: device to gpus (#3405)
* ref: device to gpus

* ref: device to gpus

* ref: device to gpus

* ref: device to gpus

* ref: device to gpus
2020-09-08 22:14:17 -04:00
Travis Addair 091d37f968
Added check for apex AMP and unit tests for Horovod + AMP (#3404)
* Added check for apex AMP and unit tests for Horovod + AMP

* Changelog

* Fixed order of Horovod and Apex optimizer wrapping
2020-09-08 20:30:57 -04:00
William Falcon aaf26d70c4
ref: device parser (#3400)
* ref: train loop refactors part 2: 1/n

* ref: device parser

* ref: device parser

* ref: device parser

* ref: device parser

* ref: device parser

* ref: device parser

* ref: device parser

* ref: device parser
2020-09-08 18:46:42 -04:00
William Falcon ff5f099cb7
ref: remove inner train loop 1/n (#3397)
* ref: remove inner train loop 1/n

* ref: remove inner train loop 1/n
2020-09-08 12:05:00 -04:00
William Falcon d438ad8a8d
ensure calling test multiple times does not change results (#3391) 2020-09-07 22:25:12 -04:00
William Falcon b76d9e5dd5
Refa22 (#3388)
* ref: inner train loop (intermediate step) 20/n

* ref: inner train loop (intermediate step) 21/n

* ref: inner train loop (intermediate step) 21/n

* ref: inner train loop (intermediate step) 21/n

* ref: inner train loop (intermediate step) 21/n

* ref: inner train loop (intermediate step) 21/n
2020-09-07 16:45:31 -04:00
William Falcon 0b5b70d6c9
ref: inner train loop (intermediate step) 17/n (#3376)
* ref: inner train loop (intermediate step) 17/n

* ref: inner train loop (intermediate step) 17/n

* ref: inner train loop (intermediate step) 17/n
2020-09-07 09:31:42 -04:00
William Falcon 69e3f904df
ref: inner train loop (intermediate step) 16/n (#3375)
* ref: inner train loop (intermediate step) 16/n

* ref: inner train loop (intermediate step) 16/n

* ref: inner train loop (intermediate step) 16/n

* ref: inner train loop (intermediate step) 16/n

* ref: inner train loop (intermediate step) 16/n

* ref: inner train loop (intermediate step) 16/n
2020-09-06 21:57:20 -04:00
William Falcon 7073de8a95
ref: inner train loop (intermediate step) 14/n (#3373)
* ref: inner train loop (intermediate step) 14/n

* ref: inner train loop (intermediate step) 14/n
2020-09-06 19:55:18 -04:00
William Falcon 85421466ab
ref: inner train loop (intermediate step) 10/n (#3369) 2020-09-06 08:59:58 -04:00
Rohit Gupta 24809b0b26
Refactor GPUStatsMonitor to improve training speed (#3257)
* Refactor GPUMonitor to improve training speed

* added gpu ids to monitor

* update tests

* added deprecation warning

* pep

* fix test

* fix docs

* fix log_gpu_memory

* move deprecation check

* chlog

* Update CHANGELOG.md

* suggestions and fix

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-09-04 06:02:16 -04:00
Adrian Wälchli 48c22c8bad
update batch size in DataModule when auto scaling batch size (#3266)
* fix datamodule hasattr

* fix patch check

* fix setattr

* update docs

* revert patch fix

* changelog

* fix datamodule passed in as fit arg

* docs

* set datamodule batch size in lightning_setattr

* fix merge

* check with has_attr

* access datamodule via trainer

* pass fit args down to tuner

* docs

* fix typos in docs

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-03 22:07:49 +02:00
Adrian Wälchli 4ad5a78dce
to_torchscript method for LightningModule (#3258)
* script

* docs

* simple test

* move test

* fix doctest

* no grad context

* extend tests


test


test

* datamodule test

* clean up test

* docs

* name

* fix import

* update changelog

* fix import

* skip pytorch 1.3 in test

* update codeblock

* skip bugged 1.4

* typehints

* doctest not working on all pytorch versions

* rename TestGAN to prevent pytest interference

* add note about pytorch version

* fix torchscript version inconsistency in tests

* reset training state + tests

* update docstring

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* update docstring, dict return

* add docs to index

* add link

* doc eval mode

* forward

* optional save to file path

* optional

* test torchscript device

* test save load with file path

* pep

* str

* Commit typing suggestion

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* skip test if cuda not available

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2020-09-03 20:24:44 +02:00
Rohit Gupta 4a22fca524
Changed LearningRateLogger to LearningRateMonitor (#3251)
* Change LearningRateLogger to LearningRateMonitor

* file rename

* docs

* add LearningRateLogger with deprecation warning

* deprecated LearningRateLogger

* move deprecation check

* chlog

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-09-03 18:17:15 +00:00
HT Liu d521c1b178
Fix: gather_all_tensors cross GPUs in DDP (#3319)
* Fix: gather_all_tensors cross GPUs in metrics

* add a test case for gather_all_tensors_ddp in #3253
2020-09-03 12:27:32 +02:00
William Falcon 0d90d53a81
ref: moving train loop to own object 2/n (intermediate steps) (#3313)
* ref: moving train loop to own object 2/n (intermediate steps)

* ref: moving train loop to own object 2/n (intermediate steps)
2020-09-01 21:06:40 -04:00
Nicki Skafte b66ce88f0d
[metrics] Renaming of precision recall metric (#3308)
* rename metrics

* update docs
2020-09-01 14:59:33 -04:00
William Falcon 7d57f8d407
ref: move prepare_data to data connector (#3307)
* ref: moved argparse code to central class

* ref: moved argparse code to central class

* ref: moved argparse code to central class
2020-09-01 14:59:09 -04:00
Lezwon Castelino 3910ad0330
bugfix/3185 transpose (#3252)
* change t() to transpose() as xla devices do not support .t() on 1-dim tensor

* detach tensor before copying

* Revert "detach tensor before copying"

This reverts commit 37cc7bbe

* changed dims

* added test_result_obj_on_tpu

* detach before copying

* detach before copying

* detach before copying

* replace torch.cat with sum
2020-09-01 09:17:52 -04:00
William Falcon 805ff37e8c
ref: .tune() (temporary) (#3293)
* ref: .tune()

* ref: .tune()

* ref: .tune()

* ref: .tune()

* ref: .tune()

* ref: .tune()
2020-08-31 17:36:09 -04:00
William Falcon caf7893f27
ref: modular is_overridden (#3290)
* ref: modular is_overridden

* ref: modular is_overridden

* ref: modular is_overridden

* ref: modular is_overridden
2020-08-31 12:12:02 -04:00
Carlos Mocholí cc80749c7e
Parse Union[bool, str] arguments (#3235)
* Parse Union[bool, str] arguments

* Address review

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-29 10:39:42 -04:00
Jeremy Jordan a5d1176cf6
callback method for on_save_checkpoint (#2501)
* initial draft

* fix test

* Update pytorch_lightning/trainer/callback_hook.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* remove old code

* untested upgrade script

* document limitations

* clean up and add tests

* Update pytorch_lightning/trainer/training_io.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect PR comments

* fix formatting

* Update docs/source/callbacks.rst

* clarify docs

* revert change for loading checkpoints

* small edits

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-28 16:50:52 +02:00
monney d5254ff9df
warn user when dropping unpicklable hparams (#2874)
* refactored clean_namespace

* Update try except to handle pickling error

* Consolidated clean_namespace. Added is_picklable

* PEP8

* Change warning to use rank_zero_warn. Added Test to ensure proper hparam filtering

* Updated imports

* Corrected Test Case
2020-08-28 09:07:43 +02:00
Rohit Gupta 85cd558a3f
Follow up of #2892 (#3202)
* Follow up of #2892

* typo

* iterabledataset
2020-08-27 15:28:29 -04:00
Rohit Gupta f03943ee94
Fix GpuUsageLogger to work on different platforms (#3008)
* Fix GpuUsageLogger

* docstrings

* misconfigexception

* add basic tests

* skip doctest

* fix parameter and docstring

* rm cl

* skip doctest

* cleanup

* chlog

* add suggestions from review

* add test from suggestions

* fix import

* fix test

* fix test

* fix test

* fix test

* rename GpuUsageLogger to GPUStatsMonitor

* doc fix

* Apply suggestions from code review

* update docs format

* update docs

* miss

* merge

* fix title formatting

* unindent

* punctuation

* simplify if statements

* fix test

* suggestions

* pep

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix on_train_batch_*

* use AttributeDict

* usage

* rank zero

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* import

* minor changes

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-08-27 19:50:32 +02:00
William Falcon f3c63f7746
tests to ensure correct dataloader calls (#3221)
* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence
2020-08-27 09:49:46 -04:00
William Falcon a1705441a9
ref: remove _evaluate fx (#3197)
* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate

* remove _evaluate
2020-08-26 12:28:14 -04:00
Lezwon Castelino d9ea25590e
fix ONNX model save on GPU (#3145)
* added to(device)

* added test

* fix test on gpu

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* remove multi gpu check

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* updated message

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* updated test

* onxx to onnx

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update tests/models/test_onnx.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* add no grad

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* add isinstance back

* chlog

* error is input_sample is not Tensor

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-08-26 16:22:19 +00:00
Sordie 888340d17e
Fix RMSLE metric (#3188)
* fix rmsle

* Updated test to match rmsle fix

* Updated RMSLE example result to match functional

* chlog

* add randomized test

* fix pep8

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-08-26 08:02:53 -04:00
Nicki Skafte 17d8773106
New modular metric interface (#2528)
* new base structure

* missing packages

* updated interface

* revert some changes

* fixes

* add changelog

* fix bug

* added description

* test for pickable

* fixing test

* fixing test

* fix pickle issue

* reduceop typehints back

* remove redundant module arg

* add save/load test

* add aggregate method

* text clarification

* fix doctest

* Apply suggestions from code review

* change test to results obj

* fix docs

* formatting

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* formatting

* pep

* Update CHANGELOG.md

* suggestions

* fix tests

* fix pep8

* fix tests

Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-08-26 13:01:29 +02:00
William Falcon bda1400225
ref: restore on_eval_start hook (#3183)
* restore eval loop hook
2020-08-26 00:45:43 -04:00
William Falcon 2f6d82e0e6
ref: remove on_eval_start hook (#3176)
* remove on_eval_start hook

* remove on_eval_start hook
2020-08-25 22:28:00 -04:00
William Falcon 6068b29d29
ref: remove obscure forward call in eval + CPU backend ___step (#3123)
* remove obscure forward call in eval

* remove obscure forward call in eval

* remove obscure forward call in eval

* remove obscure forward call in eval

* remove obscure forward call in eval

* remove obscure forward call in eval
2020-08-24 12:31:40 -04:00
Uladzislau Sazanovich 2d42ec008f
Make trainer.state a read-only property (#3109)
* Make trainer.state a read-only property

* Update states.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-08-24 16:49:33 +02:00
William Falcon 8d7ca5cd2c
ref: refactored gpu backend __step (#3120)
* refactored gpu backend __step

* refactored gpu backend __step

* refactored gpu backend __step

* refactored gpu backend __step
2020-08-24 09:22:05 -04:00
Jirka Borovec 45e7491dcc
drop packaging (#3105) 2020-08-24 05:28:56 -04:00
s-rog 7b054399c6
fix tb hparams logging (#2974)
* log_hyperparams add default metric

also adds scalar support

* fix typos and style

* another typo

* keep original logging implementation

* remove missed line

* fix capitalization

* add step to leg_metrics for tests

* disable hp metric none (-1) logging

to pass tests

* initial arg implementation

* add step to log_metrics

* add hp_metric case to log test

* add docs 

and minor formatting

* fix broken else

* pep8 style

* edit tests

* Update pytorch_lightning/loggers/tensorboard.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/loggers/tensorboard.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-08-24 06:57:04 +00:00
Rohit Gupta 34c88d127b
Fix log_graph in TensorBoardLogger (#3092) 2020-08-22 06:35:09 -04:00
Rohit Gupta 7cca3859a7
Fix num_sanity_val_steps is clipped to limit_val_batches (#2917)
* Fix num_sanity_val_steps according to limit_val_steps

* fix test

* add num_sanity_batches

* pep

* update docstring in test

* add more test

* chlog

* update comments and docstring in test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai>
2020-08-21 20:11:31 +02:00
Jirka Borovec bcdb750976
changelogs clean (#3082)
* clean

* ver
2020-08-20 22:58:53 +00:00
Nathan Raw bab89b8d21
Add transfer_batch_to_device hook to DataModule (#3038)
*  add dm to_device logic in trainer

* 🔥 remove unnecessary comment

*  add to_device logic to datamodule

*  add test

* updated docs

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-20 08:47:11 -04:00
Peter Yu cee5eaf659
flake8 fixes (#3064)
* flake8 fixes

* fix pep8

* fix pep8

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-20 07:45:22 -04:00
Peter Yu 88886ace72
More robust way of collecting init argument names for LightningModules (#3066)
When a LightningModule inherits from a class that implements `__new__()` such as `typing.Generic`, `inspect.signature(cls)` short-circuits and returns the signature of `__new__()` instead of `__init__()`. So, we need to be more specific and call inspection directly on the init function.
2020-08-20 07:19:11 -04:00
William Falcon 3453bba898
re-enabled naming metrics in ckpt name (#3060)
* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name
2020-08-19 20:34:09 -04:00