Commit Graph

121 Commits

Author SHA1 Message Date
William Falcon 51de6802ed
added warning when changing monitor and using results obj (#3014)
* added warning when changing monitor and using results obj

* added warning when changing monitor and using results obj

* added warning when changing monitor and using results obj
2020-08-17 10:29:28 -04:00
Jeff Yang 73ebd1066d
Fix accumulate_grad_batches for last batch (#2853)
* first attempt

* update changelog

* fix pep8 and tests

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* added new tests

* fixed tests

* Apply suggestions from code review

* used num_training_batches

* fixed pep8

* fixed with is_last_batch suggested by @awaelchli

* fixed with num_training_batches

* fixed with num_training_batches

* cleanup

* fix test and update docs

* fixed for alignment, update docs

* minor changes

* update doc

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-15 15:06:37 -04:00
Nicki Skafte 6a051c887f
Add docs for GpuUsageLogger (#2945)
* add docs

* fix spelling
2020-08-13 18:58:14 -04:00
William Falcon 6c5a0a172f
Resultd (#2947)
* updated docs
2020-08-13 09:58:05 -04:00
Gerardo Roa Dabike f6a3d8fd8d
GPU Usage Logger (#2932)
* GPU utilisation Callback

* GPU utilisation Callback

* Fixing style

* Fixing style

* Fixing CodeFactor: partial executable path

* Fix a misspelling in the Class name
2020-08-12 15:09:34 -04:00
Brendan Fahy 56396abe98
fix checkpointing to remote file paths (#2925) 2020-08-12 06:31:17 -04:00
William Falcon d13e5c9e53
document lightiningmodule better (#2920)
* updated docs
2020-08-11 19:39:43 -04:00
Brendan Fahy 97e6f35b34
fix missing return statement. Do not normalize remote paths (#2894)
* fix missing return statement. Do not normalize remote paths

* Update pytorch_lightning/utilities/cloud_io.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Add some documentation that we now support s3 and hdfs paths

* suggestion from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-09 22:38:43 +00:00
Caldera 6c18fd9a24
Update lr_logger.py (#2847)
* Update lr_logger.py

when logging learning_rate, we should provide different choices to log including 'step' and 'epoch'

* Update lr_logger.py

add some type annotations and docstrings

* Update lr_logger.py

fixed a bug where `on_train_batch_start()` can't be triggered, instead, we should use on_batch_start(); add `interval` args so that we can record learning_rates with respect to `global_step` or `current_epoch`.

* Update lr_logger.py

restore _extract_lr()

* suggestion

* Update lr_logger.py

modify _extract_lr(), it no more need to pass `interval` parameter.

* Update test_lr_logger.py

SkafteNicki 's suggetion

* log_interval now supports `None`, `step`, `epoch`

* change `log_interval` to `logging_interval`

* Update test_lr_logger.py

* Update lr_logger.py

* put types check into `on_train_start()`

* cleanup

* docstring typos

* minor changes from suggestions

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-09 16:30:43 +00:00
Brendan Fahy 6e77181ec7
Squashed commit of the following: (#2164)
commit 29fb0506cd38a15c359e369cc8bc4435916b0c78
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 19:35:30 2020 +0000

    fix checking for version for docs to build

commit 467fd640db02275972c7111af031c86bb59333e9
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:56:05 2020 +0000

    remove no local test

commit a7cc9f88de00feec1a5406874d05313c42bd004c
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:46:44 2020 +0000

    fix

commit 3fdbb729da79ae9348c83410a138666bad467951
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:23:30 2020 +0000

    revert requirements

commit 9b8686bd83e2bc243cf329e26f1c667c6949cf67
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:16:42 2020 +0000

    make it a fixture

commit eec74953d24c8b25268d3b6dde3cc4affdd5cb8f
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:01:32 2020 +0000

    fix up the testing

commit 896d94a0e60083d52c81db2a036b7f1e015cad11
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 17:47:28 2020 +0000

    fix some tests

commit 6d22bde19767bf2b71dfd44839b01efdf6888f83
Merge: 6175d4e2 6ebe0d72
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 10:20:47 2020 +0000

    Merge remote-tracking branch 'origin/master' into tb_use_gfile

commit 6175d4e26b15a43c412c26d501762cd0b570616a
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Fri Aug 7 10:16:36 2020 +0000

    Use tensorboard.compat.gfile to support remote writing
2020-08-09 06:08:44 -04:00
Krzysztof Woś 6ebe0d7266
Fix docstring (#2884)
* Fix docstring

"mean absolute loss" rather than "root mean absolute loss"

* minor docstring fix

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-08-08 11:58:18 +00:00
Adrian Wälchli f798cffd02
save last model after saving top_k when save_last=True (#2881)
* save_last should be last

* changelog

* seed, docs

* retrigger ci

* compare filenames

* move constants

* fix test

* epoch, global step

* improve test
2020-08-08 06:02:43 -04:00
Jirka Borovec b7d72706c3
clean imports (#2867)
* clean imports

* miss
2020-08-08 00:33:51 +02:00
William Falcon f82d7feb6c
updated hooks (#2850)
* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks
2020-08-07 09:29:57 -04:00
William Falcon b507c42c47
clarify batch hooks (#2842)
* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook
2020-08-05 20:01:30 -04:00
siahuat0727 38d6b2598e
Fix docs typo (#2778) 2020-08-04 17:46:35 +02:00
William Falcon 7da7d2e428
callback docs (#2794)
* added logging docs

* added logging docs

* added logging docs

* added logging docs
2020-08-01 22:56:34 -04:00
siahuat0727 b9381c3258
Fix docs typo (#2747) 2020-07-29 07:11:49 -04:00
Rohit Gupta 84c507c4df
Fix max_batches with fast_dev_run. (#2581)
* Fix fast_dev_run to run for all val_dataloaders

* fast_dev_run check

* changelog

* explicit

* limit_batches with fast_dev_run in init

* add test

* whitespace and comment fix

* comment and assertion

* added tests

* Fix fast_dev_run to run for all val_dataloaders

* fast_dev_run check

* changelog

* explicit

* limit_batches with fast_dev_run in init

* add test

* whitespace and comment fix

* comment and assertion

* added tests

* added tests

* added tests

* added tests

* update rtol

* Revert "update rtol"

This reverts commit 4320329540.

* added tests

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-27 17:56:55 -04:00
Adrian Wälchli d03953260d
Fix weights_save_path when logger is used + simplify path handling + better docs (#2681)
* fix weights_save path and drop ckpt_path

* add tests

* unused import

* update docs

* changelog

* pep8

* fix horovod test

* make backward compatible

* perform same test for all loggers

* fix for when logger=False and weights_save_path is set

* update changelog

* update docs

* update tests

* do not set save dir dynamically

* remove duplicate test

* remove duplicated tests

* update tests

* update tests

* remove remaining ckpt_path references

* move defaults to init as suggested by @Borda

* test deprecation
2020-07-27 12:53:11 -04:00
Adrian Wälchli 1e68968ed7
support num_sanity_val_steps=-1 (#2246)
* support sanity_val_step=-1

* fix list size

* simplification

* simplify

* add test for num_sanity_val_steps=-1

* update test

* update docs

* extend tests to multiple dataloaders

* changelog

* Update tests/trainer/test_trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* improve test

* refactor the sanity check decision

* fix merge

* Update trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-23 07:07:03 -04:00
William Falcon 62ce00f96c
EvalResult support for val loop (PR 3/5) (#2651)
* add EvalResult to support to val/test loops
2020-07-22 13:53:10 -04:00
William Falcon 6d10ac2ac8
Structured results (train loop only. val loop separate PR) (PR 2/5) (#2615)
* r

* r

* r

* patched optimizer closure with sr

* patched optimizer closure with sr

* patched optimizer closure with sr

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added autoreduce for train step

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added hooks

* added hooks

* added hooks

* added hooks

* added hooks

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* cache

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Update pytorch_lightning/callbacks/early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

* Update pytorch_lightning/core/step_result.py

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* simple

* finished tests for structured results on train epoch

* simple

* simple

* revert

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Update tests/base/deterministic_model.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* finished tests for structured results on train epoch

* docstring typos

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Update pytorch_lightning/core/step_result.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/overrides/data_parallel.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2020-07-20 19:00:20 -04:00
Adrian Wälchli f16b4cfc52
save_dir fix for MLflowLogger + save_dir tests for others (#2502)
* mlflow rework

* logger save_dir

* folder

* mlflow

* simplify

* fix test

* add a test for file dir contents

* new line

* changelog

* docs

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* test for comet logger

* improve mlflow checkpoint test

* prevent  commet logger error on pytest exit

* test tensorboard save dir structure

* wandb save dir test

* skip test on windows

* add mlflow to pickle tests

* wandb

* code factor

* remove unused imports

* remove unused setter

* wandb mock

* wip mock

* wip mock

* wandb tests with mocking

* clean up

* clean up

* comments

* include wandblogger in test

* clean up

* missing argument

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-07-09 07:15:41 -04:00
William Falcon e5a979990e
Hang (#2488)
* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test
2020-07-03 15:16:45 -04:00
William Falcon 020c332ae9
Clean up (#2467)
* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test
2020-07-03 00:38:29 -04:00
Adrian Wälchli 145670f893
fix logging on rank 0 only (#2425)
* fix and test for ddp block logging rank > 0

* rename

* use the dummy logger

* dummy logger test

* set the logger in  model

* decorator for rank zero experiment

* simplify check

* simplify

* fix problem with None in checkpoint path

* revert configure logger

* unused import

* offline

* try rank 0 decorator in checkpoint

* try fix test

* imgs

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* fix tpu tests

* fix tpu tests

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-30 18:09:16 -04:00
Oliver Neumann 1a54ed6ad9
Checking ipywidgets is installed for ensure tqdm working (#2417)
* Adding importing ipywidgets before importing tqdm.auto to make sure ipywidgets is installed.

* Updated CHANGELOG.md

* Updated ipywidgets importing checks to @awaelchli comments.

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-30 16:59:35 -04:00
Adrian Wälchli 25ee51bc57
Continue Jeremy's early stopping PR #1504 (#2391)
* add state_dict for early stopping

* move best attr after monitor_op defined

* improve early stopping and model checkpoint callbacks

* fix formatting

* fix attr init order

* clean up setting of default_root_dir attr

* logger needs default root dir set first

* reorg trainer init

* remove direct references to checkpoint callback

* more fixes

* more bugfixes

* run callbacks at epoch end

* update tests to use on epoch end

* PR cleanup

* address failing tests

* refactor for homogeneity

* fix merge conflict

* separate tests

* tests for early stopping bug regressions

* small fixes

* revert model checkpoint change

* typo fix

* fix tests

* update train loop

* cannot pass an int as default_save_path

* refactor log message

* fix test case

* appease the linter

* fix some doctests

* move config to callback

* fixes from rebase

* fixes from rebase

* chlog

* docs

* reformat

* formatting

* fix

* fix

* fixes from rebase

* add new test for patience

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/callbacks/test_early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix formatting

* remove enable_early_stop attribute

* add state_dict for early stopping

* move best attr after monitor_op defined

* improve early stopping and model checkpoint callbacks

* fix formatting

* fix attr init order

* clean up setting of default_root_dir attr

* logger needs default root dir set first

* reorg trainer init

* remove direct references to checkpoint callback

* more fixes

* more bugfixes

* run callbacks at epoch end

* update tests to use on epoch end

* PR cleanup

* address failing tests

* refactor for homogeneity

* fix merge conflict

* separate tests

* tests for early stopping bug regressions

* small fixes

* revert model checkpoint change

* typo fix

* fix tests

* update train loop

* fix test case

* appease the linter

* fix some doctests

* move config to callback

* fixes from rebase

* fixes from rebase

* chlog

* docs

* reformat

* formatting

* fix

* fix

* fixes from rebase

* add new test for patience

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/callbacks/test_early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix formatting

* remove enable_early_stop attribute

* fix test with new epoch indexing

* fix progress bar totals

* fix off by one error (see #2289) epoch starts at 0 now

* added missing imports

* fix hpc_save folderpath

* fix formatting

* fix tests

* small fixes from a rebase

* fix

* tmpdir

* tmpdir

* tmpdir

* wandb

* fix merge conflict

* add back evaluation after training

* test_resume_early_stopping_from_checkpoint TODO

* undo the horovod check

* update changelog

* remove a duplicate test from merge error

* try fix dp_resume test

* add the logger fix from master

* try remove default_root_dir

* try mocking numpy

* try import numpy in docs test

* fix wandb test

* pep 8 fix

* skip if no amp

* dont mock when doctesting

* install extra

* fix the resume ES test

* undo conf.py changes

* revert remove comet pickle from test

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update weights_loading.rst

* Update weights_loading.rst

* Update weights_loading.rst

* renamed flag

* renamed flag

* revert the None check in logger experiment name/version

* add the old comments

* _experiment

* test chckpointing on DDP

* skip the ddp test on windows

* cloudpickle

* renamed flag

* renamed flag

* parentheses for clarity

* apply suggestion max epochs

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-28 21:36:46 -04:00
Jirka Borovec f278ac42c8
Revert/Fix: epoch indexing from 1, to be from 0 (#2289)
* Revert "deprecated: epoch indexing from 1 (#2206)"

This reverts commit f94b919b

* chlog

* grad index

* Apply suggestions from code review

* tests

* fix

* test
2020-06-19 23:39:53 -04:00
Paweł Biernat 3256fe4e5a
Update progress.py (#2268)
Fixes a minor bug introduced in #2213
2020-06-19 15:47:39 -04:00
Jirka Borovec a2d3ee80ad
final cleanup for v0.8.0 (#2181)
* final clean for v0.8.0

* chlog

* chlog

* date

* rename stage

* date

* missing
2020-06-18 07:21:44 -04:00
William Falcon 34816e9ec4
adds setup+teardown hook (#2229)
* allow regression metrics to import

* allow regression metrics to import

* allow regression metrics to import

* allow regression metrics to import

* allow regression metrics to import

* allow regression metrics to import

* allow regression metrics to import

* allow regression metrics to import

* allow regression metrics to import
2020-06-17 19:49:58 -04:00
William Falcon 2411c3be70
replace train_percent_check with limit_train_batches (#2220)
* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* chlog

* deprecated

* deprecated

* deprecated

* tests

* tests

* Apply suggestions from code review

* tests

* hydra support

* tests

* hydra support

* hydra support

* hydra support

* tests

* typo

* typo

* Update test_dataloaders.py

* docs

* docs

* docs

* docs

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-06-17 13:42:28 -04:00
William Falcon 04c794ca72
[WIP] Rename overfit_pct to overfit_batches (and fix) and val_percent_check and test_percent_check (and fix) (#2213)
* fixed percent check for val/test

* fixed percent check for val/test

* fixed percent check for val/test

* fixed percent check for val/test

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* add on fit_start on fit_end hooks

* add on fit_start on fit_end hooks

* add on fit_start on fit_end hooks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-06-17 08:03:28 -04:00
William Falcon e1f238a097
add on fit_start on fit_end hooks (#2217)
* add on fit_start on fit_end hooks

* add on fit_start on fit_end hooks

* add on fit_start on fit_end hooks
2020-06-17 07:37:16 -04:00
Jirka Borovec f94b919b96
deprecated: epoch indexing from 1 (#2206)
* epoch indexing from 1

* chlog

* fix tests

* fix tests

* self.min_epochs
2020-06-16 06:33:41 -04:00
Jirka Borovec 8870a84aa8
reduce test warnings (#2202)
* reduce test warnings

* Update test_trainer.py

* Update test_trainer.py

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-15 23:06:17 -04:00
Simon-Martin Schröder fd1693e289
Handle KeyboardInterrupt during training (#2134)
* Handle KeyboardInterrupt during training

Fixes #2079.

* chlog

* Fix whitespace

* Update callback_hook.py

* Update base.py

* Update training_loop.py

* Update test_trainer.py

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update CHANGELOG.md

* on_keyboard_interrupt

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-06-15 12:35:26 +02:00
William Falcon 5fd01b0e68
Finish Ananthsub patch 1 (enable prepare_data from correct processes). clarify local vs global rank (#2166)
* [trainer] Call prepare_data once per node in DDP/DDP2 training

* refactored DDP routes

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* renamed proc_rank to local_rank

* spawn message

* spawn message

* spawn message

* fixes

* fixes

* fixes

* fixes

* fixes

* Update trainer.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2020-06-13 12:00:14 -04:00
Jan Sellner ea6b350b32
Minor warning message fix (#2173) 2020-06-13 10:02:15 -04:00
Nima Sarang 6d74c8484d
get fullpath before splitting (#2153) 2020-06-12 21:25:08 -04:00
William Falcon 479ab49d03
temporarily fixes early stopping bug (#2119)
* fixes early stopping bug

* fixes early stopping bug

* fixes early stopping bug

* fixes early stopping bug

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* fixe docs

* added test
2020-06-08 19:28:26 -04:00
William Falcon 0be530a427
Revert "Fixes EarlyStopping With Precision=16 (#1996)" (#2032)
This reverts commit bf39cb26c5.
2020-05-31 15:20:18 -04:00
authman bf39cb26c5
Fixes EarlyStopping With Precision=16 (#1996)
* Patch for issue 1815, which will allow EarlyStopping to work on precision=16

* Added a whitespace to the end of the line so CICD can rerun. No reason for the latest macos test to have been cancelled.

* Format.
2020-05-31 15:02:19 -04:00
Fabio Natanael Kepler 8b9b923ca8
Keep track of the best model's path saved by ModelCheckpoint (#1799)
* Add an additional attribute to ModelCheckpoint to keep track of the best model's path

Currently, only the best metric value is directly tracked. This new attribute will help in uses cases where the trained model needs to be used or tracked right after training.

* Add small description and usage example to docs

* Fix PEP8 issues

* Fix doctest example

* Fix expected output in doctest

* Apply suggestions from code review

* Show example as code block instead of doctest

* Apply suggestions from code review

* Update CHANGELOG.md

* Rename `ModelCheckpoint.best` to `ModelCheckpoint.best_model_score`

Also rename `ModelCheckpoint.best_model` (added in this PR) to `ModelCheckpoint.best_model_path`, for consistency, and `kth_best_model` to `kth_best_model_path`.

* Update pytorch_lightning/trainer/training_io.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Add warning when loading checkpoint from an old version

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-31 08:47:13 -04:00
Ivan Nazarov 7c19c373ac
LearningRateLogger in multi-scheduler setting (#1944)
* fixed undesired behaviour due to dict.fromkeys

* a test for log length consistency

* runtime-warn if no schedulers are configured

* chlog

* move

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-05-27 22:44:46 -04:00
Federico Baldassarre 65b4352930
early stopping checks on_validation_end (#1458)
* Fixes PyTorchLightning/pytorch-lightning#490

`EarlyStopping` should check the metric of interest `on_validation_end` rather than `on_epoch_end`. 
In a normal scenario, this does not cause a problem, but in combination with `check_val_every_n_epoch>1` in the `Trainer` it results in a warning or in a `RuntimeError` depending on `strict`.

* Highlighted that ES callback runs on val epochs in docstring

* Updated EarlyStopping in rst doc

* Update early_stopping.py

* Update early_stopping.rst

* Update early_stopping.rst

* Update early_stopping.rst

* Update early_stopping.rst

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update docs/source/early_stopping.rst

* fix doctest indentation warning

* Train loop calls early_stop.on_validation_end

* chlog

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-05-25 17:33:00 +00:00
Lucas Vazquez 112dd5c4f6
Adds the option of saving the last model on checkpoint (#1908)
* saves model every epoch

* implement test for save_last

* Update CHANGELOG.md

* Update CHANGELOG.md

* changes test description

Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>

Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
2020-05-25 07:47:44 -04:00
Ashraful Islam e0a5aee3a3
fix porgressbar postfix order (#1874) 2020-05-18 20:33:51 -04:00