Commit Graph

226 Commits

Author SHA1 Message Date
William Falcon 0d90d53a81
ref: moving train loop to own object 2/n (intermediate steps) (#3313)
* ref: moving train loop to own object 2/n (intermediate steps)

* ref: moving train loop to own object 2/n (intermediate steps)
2020-09-01 21:06:40 -04:00
William Falcon 805ff37e8c
ref: .tune() (temporary) (#3293)
* ref: .tune()

* ref: .tune()

* ref: .tune()

* ref: .tune()

* ref: .tune()

* ref: .tune()
2020-08-31 17:36:09 -04:00
Carlos Mocholí cc80749c7e
Parse Union[bool, str] arguments (#3235)
* Parse Union[bool, str] arguments

* Address review

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-29 10:39:42 -04:00
Rohit Gupta 85cd558a3f
Follow up of #2892 (#3202)
* Follow up of #2892

* typo

* iterabledataset
2020-08-27 15:28:29 -04:00
William Falcon f3c63f7746
tests to ensure correct dataloader calls (#3221)
* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence
2020-08-27 09:49:46 -04:00
William Falcon bda1400225
ref: restore on_eval_start hook (#3183)
* restore eval loop hook
2020-08-26 00:45:43 -04:00
William Falcon 2f6d82e0e6
ref: remove on_eval_start hook (#3176)
* remove on_eval_start hook

* remove on_eval_start hook
2020-08-25 22:28:00 -04:00
William Falcon 6068b29d29
ref: remove obscure forward call in eval + CPU backend ___step (#3123)
* remove obscure forward call in eval

* remove obscure forward call in eval

* remove obscure forward call in eval

* remove obscure forward call in eval

* remove obscure forward call in eval

* remove obscure forward call in eval
2020-08-24 12:31:40 -04:00
Uladzislau Sazanovich 2d42ec008f
Make trainer.state a read-only property (#3109)
* Make trainer.state a read-only property

* Update states.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-08-24 16:49:33 +02:00
Jirka Borovec 45e7491dcc
drop packaging (#3105) 2020-08-24 05:28:56 -04:00
Rohit Gupta 7cca3859a7
Fix num_sanity_val_steps is clipped to limit_val_batches (#2917)
* Fix num_sanity_val_steps according to limit_val_steps

* fix test

* add num_sanity_batches

* pep

* update docstring in test

* add more test

* chlog

* update comments and docstring in test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
Co-authored-by: Ananya Harsh Jha <ananya@pytorchlightning.ai>
2020-08-21 20:11:31 +02:00
William Falcon 3453bba898
re-enabled naming metrics in ckpt name (#3060)
* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name
2020-08-19 20:34:09 -04:00
Adrian Wälchli 7b917de946
fix setting batch_size attribute in batch_size finder (finishing PR #2523) (#3043)
* lightning attr fix

* revert refactor

* create test

* separate test

* changelog update

* tests

* revert

* Update pytorch_lightning/trainer/training_tricks.py

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-19 19:01:55 -04:00
Adrian Wälchli 89a5d8fee9
fix auto scale batch size not working with precision=16 (#3045)
* add test

* test

* test

* add fix

* changelog

* check batch size changed
2020-08-19 20:41:33 +00:00
William Falcon 8315a65d0a
fix result obj dp auto reduce (#3013)
* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* added warning when changing monitor and using results obj
2020-08-17 10:29:39 -04:00
William Falcon 51de6802ed
added warning when changing monitor and using results obj (#3014)
* added warning when changing monitor and using results obj

* added warning when changing monitor and using results obj

* added warning when changing monitor and using results obj
2020-08-17 10:29:28 -04:00
William Falcon 465d4ffd2c
added lr scheduler test using dev debugger (#3004)
* added lr scheduler test using dev debugger

* added lr scheduler test using dev debugger

* added lr scheduler test using dev debugger
2020-08-16 11:37:38 -04:00
William Falcon d702d4d393
removed callback metrics from test results obj (#2994)
* removed callback metrics from test results obj

* removed callback metrics from test results obj
2020-08-15 21:45:41 -04:00
Jeff Yang 73ebd1066d
Fix accumulate_grad_batches for last batch (#2853)
* first attempt

* update changelog

* fix pep8 and tests

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* added new tests

* fixed tests

* Apply suggestions from code review

* used num_training_batches

* fixed pep8

* fixed with is_last_batch suggested by @awaelchli

* fixed with num_training_batches

* fixed with num_training_batches

* cleanup

* fix test and update docs

* fixed for alignment, update docs

* minor changes

* update doc

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-15 15:06:37 -04:00
William Falcon b8371fa56c
Fixes #2972 #2946 (#2986)
* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add step metrics

* add step metrics
2020-08-15 08:36:00 -04:00
shijianjian 18d31a3b63
Added strict=False for load_from_checkpoint (#2819)
* Added strict=False and hparams_file accepcts dict

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Type check fix

* Added tests

* Linting & test fix

* Removed redundant code & test

* Added strict=False and hparams_file accepcts dict

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Type check fix

* Added tests

* Linting & test fix

* Removed redundant code & test

* Apply suggestions from code review

* tests

* tests

* chlog

* Update tests/models/test_restore.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update test comments

* Added docstring for the strict attribute

* Added supplementary tests

* Update saving.py

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* pep8, removed extra func

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananyahjha93 <ananya@pytorchlightning.ai>
2020-08-13 16:25:43 -04:00
Jirka Borovec 4354690e55
add apex test (#2921)
* add apex test

* rename

* level

* events

* wrap

* evt

* miss

* apex

* apex

* apex

* apex

* apex

* apex

* Update tests/models/test_amp.py

Co-authored-by: William Falcon <waf2107@columbia.edu>

* notes

* notes

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-13 10:03:13 -04:00
William Falcon 28f79d9f7a
Mapkeys (#2900)
* added a map dict

* added a map dict
2020-08-09 18:50:39 -04:00
Uladzislau Sazanovich e9846dd758
Add tracking of basic states in Trainer [wip - to-be-merged after v0.9] (#2541)
* Add initial tracking of states in Trainer.

* Add INTERRUPTED state, improve tests, move state switching from callback to a trainer.

* Move part of a trainer state switching to a decorator.

* Add documentation.

* Fix docs, rename state enum, restore state to previous on exit if None, add tests for decorator only.

* Fix callback typing.

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-09 06:24:09 -04:00
William Falcon 256059a1d0
tracks all outputs including TBPTT and multiple optimizers (#2890)
* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update
2020-08-09 06:00:15 -04:00
Rohit Gupta 4d0406ec8b
deepcopy model state_dict in tests (#2887)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-08 16:13:06 +00:00
Jirka Borovec f8c058215f
simplify tests & cleaning (#2588)
* simplify

* tmpdir

* revert

* clean

* accel

* types

* test

* edit test acc

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update test acc

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-07 23:22:05 +02:00
William Falcon f82d7feb6c
updated hooks (#2850)
* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks
2020-08-07 09:29:57 -04:00
Rohit Gupta a642349228
Support limit_mode_batches (int) for infinite dataloader (#2840)
* Support limit_mode_batches(int) for infinite dataloader

* flake8

* revert and update

* add and update tests

* pep8

* chlog

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Add suggestions by @awaelchli

* docs

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Apply suggestions from code review

* fix

* max

* check

* add and update tests

* max

* check

* check

* check

* chlog

* tests

* update exception message

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-08-07 13:02:36 +02:00
Nicki Skafte 9a402461da
Bugfix: Lr finder and hparams compatibility (#2821)
* fix hparams lr finder bug

* add tests for new functions

* better tests

* fix codefactor

* fix styling

* fix tests

* fix codefactor

* Apply suggestions from code review

* modified hook

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-07 00:34:48 +02:00
Jirka Borovec ed3ee982b3
clean tests imports (#2834) 2020-08-06 16:58:51 +02:00
s-rog 9b997c8616
add test for none checkpoint in ddp_spawn (#2845)
* add test for none checkpoint in ddp_spawn

* fix code style

* make sure checkpoint_callback is none

* Fix tests

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2020-08-06 07:11:43 -04:00
William Falcon b507c42c47
clarify batch hooks (#2842)
* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook
2020-08-05 20:01:30 -04:00
William Falcon 5d0f0325d8
Revert "Support limit_mode_batches (int) for infinite dataloader" (#2839)
* Revert "Support limit_mode_batches (int) for infinite dataloader (#2787)"

This reverts commit de9c9f0864.

* Update training_tricks.py
2020-08-05 15:57:26 -04:00
Rohit Gupta de9c9f0864
Support limit_mode_batches (int) for infinite dataloader (#2787)
* Support limit_mode_batches(int) for infinite dataloader

* flake8

* revert and update

* add and update tests

* pep8

* chlog

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Add suggestions by @awaelchli

* docs

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Apply suggestions from code review

* fix

* max

* check

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-08-05 17:04:49 +00:00
Rohit Gupta 8baec1a191
Fix shuffle for distributed sampler (#2789)
* Fix shuffle for distributed sampler

* add test

* test

* chlog

* update test

* update test

* update test

* assertions via callback

* define callback outside for pickling

* skip ddp test on windows

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-01 23:22:57 -04:00
Jirka Borovec 06e8910f06
pytorch 1.6 (#2745)
* pt 1.6

* don't use the new zipfile serialization for now

* quick flake8 fixes

* remove unnecessary f

* coalesce strings

* remove comma

* remove extra commas

* Apply suggestions from code review

Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>

* set _use_new_zipfile_serialization to False only for pytorch 1.6.0

* remove unnecessary comments

* flake8 fixes

* use pkg_resources instead of packaging

* readme

* format

* version

* chlog

Co-authored-by: Peter Yu <peter@asapp.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
2020-07-31 11:18:32 +02:00
Jirka Borovec 949734489a
remove deprecated in v0.9 (#2760)
* remove deprecated in v0.9

* data_loader

* import

* hook

* args
2020-07-30 23:19:28 +02:00
Jirka Borovec 590e7fb1fd
tests: add default_root_dir=tmpdir (#2392)
* tests: add default_root_dir=tmpdir

* remove duplicate tmpdir args

* add missing fixture

* test requires multi gpu

* typo

* resize

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-07-28 09:47:53 -04:00
Jirka Borovec 0fe933e23d
fixing TPU tests (#2632)
* init

* rename

* tpu_core_idx

* idx 8

* idxs

* @pl_multi_process_test

* assert

* assert

* deamon

* no close

* imort

* msg

* use_single_gpu

* dataset

* idx

* fix idx

* dataset

* format

* add pickable

* typo

* apex

* typo

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* docs

* typo

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* tests

* docs

* docs

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* docs

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-07-27 19:07:09 -04:00
Rohit Gupta 84c507c4df
Fix max_batches with fast_dev_run. (#2581)
* Fix fast_dev_run to run for all val_dataloaders

* fast_dev_run check

* changelog

* explicit

* limit_batches with fast_dev_run in init

* add test

* whitespace and comment fix

* comment and assertion

* added tests

* Fix fast_dev_run to run for all val_dataloaders

* fast_dev_run check

* changelog

* explicit

* limit_batches with fast_dev_run in init

* add test

* whitespace and comment fix

* comment and assertion

* added tests

* added tests

* added tests

* added tests

* update rtol

* Revert "update rtol"

This reverts commit 4320329540.

* added tests

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-27 17:56:55 -04:00
Nathan Raw 9076551aec
Enable val/test loop disabling + datamodule tests (#2692)
* 🎨 warn instead of error out on loaders

* 🐛 test misconfiguration should still fail

* 🚧 .

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-25 12:57:40 -04:00
Rohit Gupta cb0c6ad51a
fix setup call while testing (#2624)
* fix setup call while testing

* changelog

* drop if condition

* add test to check setup call

* flake8

* update test to check model stage

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-24 23:57:31 -04:00
Nathan Raw 1caf8beb2c
Datamodule (#2668)
*  Add copy of pl_bolts datamodule to lightning

*  add datamodule to necessary init files

* 🚧 add datamodule property to LightningModule

* 🚧 .

* 🎨 Let DataModule do its own thing

* 🚧 add back setup and run both hooks implicitly

* 🚧 .

* 🐛 fix add_argparse_args

* 💄 apply black formatting and isort

* 📝 docstrings

* 📝 .

* 📝 .

* 🐛 overwrite cls prepare_data instead of instance

* 📝 .

*  add some tests

* Update datamodule.py

* Update datamodule.py

* Update datamodule.py

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-24 11:42:15 -04:00
Adrian Wälchli 1e68968ed7
support num_sanity_val_steps=-1 (#2246)
* support sanity_val_step=-1

* fix list size

* simplification

* simplify

* add test for num_sanity_val_steps=-1

* update test

* update docs

* extend tests to multiple dataloaders

* changelog

* Update tests/trainer/test_trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* improve test

* refactor the sanity check decision

* fix merge

* Update trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-23 07:07:03 -04:00
William Falcon 62ce00f96c
EvalResult support for val loop (PR 3/5) (#2651)
* add EvalResult to support to val/test loops
2020-07-22 13:53:10 -04:00
William Falcon 6d10ac2ac8
Structured results (train loop only. val loop separate PR) (PR 2/5) (#2615)
* r

* r

* r

* patched optimizer closure with sr

* patched optimizer closure with sr

* patched optimizer closure with sr

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added train step structured result

* added autoreduce for train step

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added auto reduce on train

* added hooks

* added hooks

* added hooks

* added hooks

* added hooks

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* cache

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Update pytorch_lightning/callbacks/early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

* Update pytorch_lightning/core/step_result.py

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* simple

* finished tests for structured results on train epoch

* simple

* simple

* revert

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Update tests/base/deterministic_model.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* finished tests for structured results on train epoch

* docstring typos

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* finished tests for structured results on train epoch

* Update pytorch_lightning/core/step_result.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/overrides/data_parallel.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2020-07-20 19:00:20 -04:00
William Falcon aaa1553e35
tests for val loop flow (#2605)
* add tests for single scalar return from training

* add tests for single scalar return from training

* add tests for single scalar return from training

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only

* fixing val step only
2020-07-14 14:20:45 -04:00
William Falcon 1d565e175d
add tests for single scalar return from training (#2587)
* add tests for single scalar return from training

* add tests for single scalar return from training

* add tests for single scalar return from training

* add tests for single scalar return from training

* add tests for single scalar return from training
2020-07-11 17:43:00 -04:00
William Falcon f35337adba
Fixes .test() for ddp (#2570)
* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint

* enable none checkpoint
2020-07-09 18:36:36 -04:00
Hayden Housen 992a7e2a41
Start accumulate gradients schedule at epoch 0 (continued) (#2513)
* Start accumulate gradients schedule at epoch 0

* Undo change in #2375

* Update test_trainer.py::test_gradient_accumulation_scheduling

* Fix pep8 formatting

* Remove 'Datasets/' folder

* Split args for readability

* Fix pep8 formatting
2020-07-09 07:11:07 -04:00
Espen Haugsdal b3ebfec863
Fix argparse default value bug (#2526)
* Add failing test for bug

* Fix bug
2020-07-09 07:10:30 -04:00
William Falcon 11069c8784
Fix ddp tests + .test() (#2512)
* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* fix deprecation warnings

* added base tests for tpu

* added base tests for tpu

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
2020-07-07 12:24:56 -04:00
Jeremy Jordan a91b06ed1e
fix worker warning (#2504)
* fix worker warning

* improve tests

* suggestion

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-07-06 15:45:43 +02:00
vr140 96b32bee04
[tiny] Fix training_dataloader usage to be train_dataloader instead. (#2521)
Co-authored-by: Vijay Rajaram <vrajaram3@gatech.edu>
2020-07-06 10:44:44 +02:00
William Falcon 9924c76faa
Amp2 (#2505)
* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang

* fix tpu hang
2020-07-04 22:52:49 -04:00
Adrian Wälchli 927f305f7e
Warn user when IterableDataset has __len__ defined (#2437)
* add warning when getting checking len

* added test

* changelog

* pep

* do not show warning below 1.4

* try version parse

* comments

* xfail

* Update requirements/base.txt

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/data_loading.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* version

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-07-01 07:53:19 -04:00
William Falcon a42a0e16dd
Fixes train outputs (#2428)
* fix outputs

* fix outputs
2020-06-30 10:03:49 -04:00
Adrian Wälchli 25ee51bc57
Continue Jeremy's early stopping PR #1504 (#2391)
* add state_dict for early stopping

* move best attr after monitor_op defined

* improve early stopping and model checkpoint callbacks

* fix formatting

* fix attr init order

* clean up setting of default_root_dir attr

* logger needs default root dir set first

* reorg trainer init

* remove direct references to checkpoint callback

* more fixes

* more bugfixes

* run callbacks at epoch end

* update tests to use on epoch end

* PR cleanup

* address failing tests

* refactor for homogeneity

* fix merge conflict

* separate tests

* tests for early stopping bug regressions

* small fixes

* revert model checkpoint change

* typo fix

* fix tests

* update train loop

* cannot pass an int as default_save_path

* refactor log message

* fix test case

* appease the linter

* fix some doctests

* move config to callback

* fixes from rebase

* fixes from rebase

* chlog

* docs

* reformat

* formatting

* fix

* fix

* fixes from rebase

* add new test for patience

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/callbacks/test_early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix formatting

* remove enable_early_stop attribute

* add state_dict for early stopping

* move best attr after monitor_op defined

* improve early stopping and model checkpoint callbacks

* fix formatting

* fix attr init order

* clean up setting of default_root_dir attr

* logger needs default root dir set first

* reorg trainer init

* remove direct references to checkpoint callback

* more fixes

* more bugfixes

* run callbacks at epoch end

* update tests to use on epoch end

* PR cleanup

* address failing tests

* refactor for homogeneity

* fix merge conflict

* separate tests

* tests for early stopping bug regressions

* small fixes

* revert model checkpoint change

* typo fix

* fix tests

* update train loop

* fix test case

* appease the linter

* fix some doctests

* move config to callback

* fixes from rebase

* fixes from rebase

* chlog

* docs

* reformat

* formatting

* fix

* fix

* fixes from rebase

* add new test for patience

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/callbacks/test_early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix formatting

* remove enable_early_stop attribute

* fix test with new epoch indexing

* fix progress bar totals

* fix off by one error (see #2289) epoch starts at 0 now

* added missing imports

* fix hpc_save folderpath

* fix formatting

* fix tests

* small fixes from a rebase

* fix

* tmpdir

* tmpdir

* tmpdir

* wandb

* fix merge conflict

* add back evaluation after training

* test_resume_early_stopping_from_checkpoint TODO

* undo the horovod check

* update changelog

* remove a duplicate test from merge error

* try fix dp_resume test

* add the logger fix from master

* try remove default_root_dir

* try mocking numpy

* try import numpy in docs test

* fix wandb test

* pep 8 fix

* skip if no amp

* dont mock when doctesting

* install extra

* fix the resume ES test

* undo conf.py changes

* revert remove comet pickle from test

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update weights_loading.rst

* Update weights_loading.rst

* Update weights_loading.rst

* renamed flag

* renamed flag

* revert the None check in logger experiment name/version

* add the old comments

* _experiment

* test chckpointing on DDP

* skip the ddp test on windows

* cloudpickle

* renamed flag

* renamed flag

* parentheses for clarity

* apply suggestion max epochs

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-28 21:36:46 -04:00
Jirka Borovec 51711c265a
fix loading model with kwargs (#2387)
* test

* fix

* fix
2020-06-27 16:38:03 -04:00
Jirka Borovec f1c96930b1
repair CI for Win (#2358)
* no cov

* no cov

* ReduceOp

* group

* reduce_op.sum

* Update sklearns.py

* formatting

* horovod

* Apply suggestions from code review

* horovod

* horovod

* horovod

* horovod

* ci

* print

* ci

* timeout

* timeout

* time

* fix

* distributed cpu

* pipes

* time

* cpu

* spawn

* spawn

* spawn

* tp

* separate

* os

* os

* npm

* Fix load_from_checkpoint() not working with URL on Windows

* Update CHANGELOG

* Update CHANGELOG.md

Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>

* Apply suggestions from code review

* fix

* fix meta tags creating empty lines

* pyright

* node

* fix httpserver address

* drop tutils.default_trainer_options

* imports

* Better fix for load_from_checkpoint() not working with absolute path on Windows (#2294)

* Fix load_from_checkpoint() not working with URL on Windows

* Update CHANGELOG

* Update CHANGELOG.md

Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>

* drop duplicate

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: airium <airium@outlook.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: AIRIUM <38249940+airium@users.noreply.github.com>
2020-06-26 21:38:25 -04:00
Jirka Borovec a5f45787ea
fix get dataloader size (#2375)
* get dataloader size

* pyright
2020-06-26 15:38:48 -04:00
Thomas Schaaf 7c0a3f4745
Bugfix/_has_len (#2307)
* deal with NotImplementedError raised by torchtext

* deal with NotImplementedError raised by torchtext

* Added tests for dataloader which raise NotImplementedError in __len__()

* Fixed some typos

* enabled tests for dataloader raising NotImplementedError in __len__ and corrected match string for raised exception

* deleted empty line for style compliance

* refactored CustomNotImplementedErrorDataloader to derive from CustomInfDataloader

* enabled reduced number of not_implemented_error dataloader test to reduce runtime for continuous integration

* reduced test number of not_implemented_error dataloader test further to reduce test time

* reduced test number of not_implemented_error dataloader test to one to reduce test time

* disabled all not_implemented_error dataloader test to see if test pass in time

* added __next__ with a reduced number (5) of elements after which CustomNotImplementedErrorDataloader stops to speedup test.

* enabling all not_implemented_error dataloader test

* added brief description of change and relation of torchtext

* CustomNotImplementedErrorDataloader reduced number of batches served to 2.

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Disable parallelism in dataloader

Suspect that it might cause pytest to hang more frequent

* added max_steps=None to Trainer in not_implemented_error dataloader tests

* rearranged not_implemented_error test in file to group them together

* disabled parallel data loading
Reason: testing if that stops the test framework from hanging.

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Thomas Schaaf <tschaaf@cs.cmu.edu>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-06-26 09:31:08 -04:00
William Falcon 598f5140c5
refactor training loop (#2336)
* refactoring training epoch

* refactored training epoch

* refactored training epoch

* refactored training epoch

* refactored training epoch

* refactored training epoch

* fixes slurm weights saving

* fixes slurm weights saving
2020-06-23 23:38:22 -04:00
Lezwon Castelino 9446390779
fix TPU parsing and TPU tests (#2094)
* added tpu params test

* added tests

* removed xla imports

* added test cases for TPU

* fix pep 8 issues

* refactorings and comments

* add message to MisconfigurationException

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* test if device is set correctly

* added TPU device check
removed mark.spawn

* removed device selection

* remove xla_device call

* readded spawn due to test failures

* add TODO for tpu check

* Apply suggestions from code review

* Apply suggestions from code review

* flake8

* added tpu args to cli tests

* added support for tpu_core selection via cli

* fixed flake formatting

* replaced default_save_path with default_root_dir

* added check for data type for tpu_cores

* fixed flake indent

* protected

* protected

* added tpu params test

* added tests

* removed xla imports

* test if device is set correctly

* added support for tpu_core selection via cli

* replaced default_save_path with default_root_dir

* added check for data type for tpu_cores

* chlog

* fixed tpu cores error

* rebased with latest changes

* flake fix

* Update pytorch_lightning/trainer/distrib_parts.py

added suggesstion

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-06-23 12:06:57 -04:00
Adrian Wälchli e085e93dd3
Add missing test for "multiple dataloader + percent_check fix" (#2226)
* Init fix num_batches

* Fix num_batches in case of multiple dataloaders

* Apply suggestions from code review

* Changes based on suggestions

* Flake8

* Add test to check num_batches

* generalize dataloader percent check test

* fix formatting

* remove hparams

* tests

* CHANGELOG

* Update CHANGELOG.md

* max_batches can be int

* conflict and rebase

* add back the test


fix


fix message


0.0 works


Revert "fix message"

This reverts commit 839cacf8b8610f4e697e654ef6f3d2501bf23984.

* update changelog

* Update CHANGELOG.md

* Fix num batches in case of multiple dataloaders and percent_check (#1920)

* git conflict

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* missing union

* doc update suggestion by @rohitgr7

* extend test

* changelog

* docs add note about multiple loaders

* update changelog

* remove unused variable

Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-06-23 11:21:24 -04:00
William Falcon 0f073819d3
refactored training_batch + tests to verify correctness (#2328)
* refactored training_bath

* refactored training_bath

* refactored training_bath

* refactored training_bath

* refactored training_bath

* refactored training_bath

* refactored training_bath

* refactored training_bath

* refactored training_bath

* refactored training_bath

* refactored training_bath
2020-06-23 11:17:10 -04:00
Jirka Borovec 4b90b79080
check omegaconf gpus (#2273)
* check omegaconf gpus

* test

* test

* Apply suggestions from code review

Co-authored-by: Omry Yadan <omry@fb.com>

Co-authored-by: Omry Yadan <omry@fb.com>
2020-06-19 23:42:11 -04:00
Jirka Borovec 7ecb0d2528
test CLI parsing gpus (#2284)
* cli gpus

* test

* test
2020-06-19 23:41:42 -04:00
Jirka Borovec f278ac42c8
Revert/Fix: epoch indexing from 1, to be from 0 (#2289)
* Revert "deprecated: epoch indexing from 1 (#2206)"

This reverts commit f94b919b

* chlog

* grad index

* Apply suggestions from code review

* tests

* fix

* test
2020-06-19 23:39:53 -04:00
thschaaf 554fb4754c
Bugfix/_has_len (#2293)
* deal with NotImplementedError raised by torchtext

* deal with NotImplementedError raised by torchtext

* Added tests for dataloader which raise NotImplementedError in __len__()

* Fixed some typos

Co-authored-by: Thomas Schaaf <tschaaf@cs.cmu.edu>
2020-06-19 23:38:15 -04:00
Vincent Thibault 4903f9ebd4
Fixed the load_from_checkpoint path detected as URL bug (#2244)
* Fixed the load_from_checkpoint path detected as URL bug

* Fixed the load_from_checkpoint path detected as URL bug

* fixed Caps lock typo

* Added .absolute() to checkpoint path to force hard drive prefix in string
2020-06-18 17:53:51 -04:00
William Falcon 2411c3be70
replace train_percent_check with limit_train_batches (#2220)
* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* drop train_percent_check

* chlog

* deprecated

* deprecated

* deprecated

* tests

* tests

* Apply suggestions from code review

* tests

* hydra support

* tests

* hydra support

* hydra support

* hydra support

* tests

* typo

* typo

* Update test_dataloaders.py

* docs

* docs

* docs

* docs

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-06-17 13:42:28 -04:00
William Falcon 04c794ca72
[WIP] Rename overfit_pct to overfit_batches (and fix) and val_percent_check and test_percent_check (and fix) (#2213)
* fixed percent check for val/test

* fixed percent check for val/test

* fixed percent check for val/test

* fixed percent check for val/test

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* overfit_pct now uses train loaders for val and test and does not shuffle

* add on fit_start on fit_end hooks

* add on fit_start on fit_end hooks

* add on fit_start on fit_end hooks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-06-17 08:03:28 -04:00
William Falcon 55fbcc00f6
Metrics docs (#2184)
* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* add workers fix

* add workers fix

* add workers fix

* add workers fix

* add workers fix

* add workers fix

* add workers fix

* add workers fix

* add workers fix

* add workers fix

* Update docs/source/metrics.rst

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Update docs/source/metrics.rst

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Update docs/source/metrics.rst

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Update docs/source/metrics.rst

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* add workers fix

* add workers fix

* add workers fix

* doctests

* add workers fix

* add workers fix

* fixes

* fix docs

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* add workers fix

* Update docs/source/metrics.rst

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* doctests

* add workers fix

* fix docs

* fixes

* fixes

* fix doctests

* Apply suggestions from code review

* fix doctests

* fix examples

* bug

* Update docs/source/metrics.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/metrics.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update docs/source/metrics.rst

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fixes

* fixes

* fixes

* fixes

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Nicki Skafte <nugginea@gmail.com>
2020-06-16 07:42:56 -04:00
Jirka Borovec e289e45120
test: save hparams to yaml (#2198)
* save hparams to yaml

* import

* resolves

* req

* Update requirements/base.txt

Co-authored-by: Omry Yadan <omry@fb.com>

Co-authored-by: Omry Yadan <omry@fb.com>
2020-06-16 06:34:55 -04:00
Jirka Borovec f94b919b96
deprecated: epoch indexing from 1 (#2206)
* epoch indexing from 1

* chlog

* fix tests

* fix tests

* self.min_epochs
2020-06-16 06:33:41 -04:00
Jirka Borovec 8870a84aa8
reduce test warnings (#2202)
* reduce test warnings

* Update test_trainer.py

* Update test_trainer.py

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-15 23:06:17 -04:00
Jirka Borovec db7bb4c348
cleaning tests (#2201) 2020-06-15 22:03:40 -04:00
Peter Yu 37e7582486
Add ckpt_path option to LightningModule.test() (#2190)
* Add ckpt_path option to LightningModule.test()

If ckpt_path is "best" (default), it loads the best weights saved by ModelCheckpoint for the test loop.
If ckpt_path is a path to a checkpoint file, it loads the weights from the file for the test loop.
If ckpt_path is None, it uses the weights from the end of training for the test loop.
If model parameter is set, ckpt_path is ignored.

* Update test_set.rst

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-15 08:02:37 -04:00
Simon-Martin Schröder fd1693e289
Handle KeyboardInterrupt during training (#2134)
* Handle KeyboardInterrupt during training

Fixes #2079.

* chlog

* Fix whitespace

* Update callback_hook.py

* Update base.py

* Update training_loop.py

* Update test_trainer.py

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update CHANGELOG.md

* on_keyboard_interrupt

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-06-15 12:35:26 +02:00
Jirka Borovec c0903b800d
past checkpoints (#2160)
* past checkpoints

* omegaConf save

* enforce type

* resolve=True

Co-authored-by: Omry Yadan <omry@fb.com>

* test omegaconf

* tests

* test past

Co-authored-by: Omry Yadan <omry@fb.com>
2020-06-14 11:36:45 -04:00
Jirka Borovec 2674976f2c
remove deprecated API for v0.8 (#2073)
* remove deprecated API

* chlog

* times

* missed

* formatting check

* missing

* missing

* miss

* fix docs build error

* fix pep whitespace error

* docs

* wip

* amp_level

* amp_level

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-06-12 14:37:52 -04:00
Peter Yu 06cd849538
Allow loading checkpoints from urls (#1667)
* allow loading checkpoints from urls

* tmpdir_server fixture

* test cases for loading checkpoints from url

* dir => root_dir

* default map_location to None

* test case for resume_from_checkpoint

* changelog

* doc update

* monkeypatch TORCH_HOME to avoid caching

* Use a threading server with random ports so that it is easier to clean up

* test fixes

* pep8 fix

* ThreadingHTTPServer support in 3.6

* pep8 fix

* fix changelog

* separate tests for urls

* typo

Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-06-11 17:12:48 -04:00
Jirka Borovec 16a7326e52
test cloudpickle (#2105)
* cloudpickle

* ci tests
2020-06-09 16:51:30 -04:00
Jirka Borovec d2967d9305
update hparams, allow OmegaConf (#2047)
* DictConf

* inits

* Apply suggestions from code review

Co-authored-by: Omry Yadan <omry@fb.com>

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* atrib

* wip

* wip

* wip

* added hparams test

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* Update test_hparams.py

* added hparams test

* added hparams test

* pep8

* pep8

* pep8

* docs

* wip

* wip

* clean

* review @omry

* Update docs/source/hyperparameters.rst

Co-authored-by: Omry Yadan <omry@fb.com>

Co-authored-by: Omry Yadan <omry@fb.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-08 07:19:34 -04:00
Jirka Borovec c09317e68f
cleaning (#2030)
* cleaning

* optim imports

* fix

* typo

* on

* mergify
2020-06-04 11:25:07 -04:00
Jirka Borovec c438d0dd90
increase acc (#2039)
* increase acc

* try 0.45

* @pytest

* @pytest

* try .50

* duration

* pytest
2020-06-03 08:28:19 -04:00
Devashish Shankar ade3f36b7a
Raise an error when lightning replaces an existing sampler (#2020)
* Raise an error when lightning replaces an existing sampler

Currently, Trainer replaces the existing sampler with DistributedSampler
if running distributing training and `replace_sampler_ddp=True` (default
behaviour). If a user has configured an existing sampler, this would
lead to widely different results if running a distributed vs
non-distributed training.

This PR fixes this by raising an Error if user has configured a sampler
and uses `replace_sampler_ddp=True`. The recommended behavior from now
on is to either remove the sampler or set `replace_sampler_ddp=False`

* Fix tests

* Simpler fix

* Fix tests

* Make inner method protected

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-06-02 18:52:04 -04:00
William Falcon 82a20296e3
Replaces ddp .spawn with subprocess (#2029)
* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* replace ddp spawn with subprocess

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix
2020-06-01 11:00:32 -04:00
William Falcon 0e37e8c4d2
hotfix to unblock hparams and OmniConf - removes auto_register_init_args by default (#2025)
* ogc install

* cleaned up tests

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix

* hot fix
2020-05-31 08:29:51 -04:00
Jirka Borovec df78e84060
unify tests (#1940)
* unify tests

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-05-27 22:45:23 -04:00
Jirka Borovec 5e8c5abf63
fix default arg (#1927)
* fix default

* formatting errors

* update

* flake8
2020-05-26 19:04:42 -04:00
Adrian Wälchli 34237cfcaf
handle unknown args passed to Trainer.from_argparse_args (#1932)
* filter valid args

* error on unknown manual args

* added test

* changelog

* update docs and doctest

* simplify

* doctest

* doctest

* doctest

* better test with mock check for init call

* fstring

* extend test

* skip test on 3.6 not working

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-05-25 16:01:29 -04:00
Adrian Wälchli 8ca8336ce5
protect progress bar callback (#1855)
* wip protected progress bar settings

* remove callback attr from LRfinder

* whitespace

* changelog
2020-05-25 07:49:23 -04:00
Lucas Vazquez 112dd5c4f6
Adds the option of saving the last model on checkpoint (#1908)
* saves model every epoch

* implement test for save_last

* Update CHANGELOG.md

* Update CHANGELOG.md

* changes test description

Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>

Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
2020-05-25 07:47:44 -04:00
Nicki Skafte a34eb9e169
Fix logger bug and prepare data bug (#1933)
* tests, fix logger bug and prepare data bug

* add CHANGELOG.md

Co-authored-by: Nicki Skafte <nugginea@gmail.com>
2020-05-25 07:43:56 -04:00
William Falcon caa9c6760b
replace Hparams by init args (#1896)
* remove the need for hparams

* remove the need for hparams

* remove the need for hparams

* remove the need for hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* finished moco

* basic

* testing

* todo

* recurse

* hparams

* persist

* hparams

* chlog

* tests

* tests

* tests

* tests

* tests

* tests

* review

* saving

* tests

* tests

* tests

* docs

* finished moco

* hparams

* review

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* hparams

* overwrite

* transform

* transform

* transform

* transform

* cleaning

* cleaning

* tests

* examples

* examples

* examples

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* chp key

* tests

* Apply suggestions from code review

* class

* updated docs

* updated docs

* updated docs

* updated docs

* save

* wip

* fix

* flake8

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-05-24 18:59:08 -04:00
Rohit Gupta ac76dfcf62
Remove NaNs from loss in LRFinder (#1862)
* Remove NaNs from loss in LRFinder

* np.isfinite

* chlog

* add test

* chlog

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-05-19 08:39:19 +02:00
Victor Quach 1a797bdad5
add test for trainer.test() (#1858)
* fix trainer.test()

* Update trainer.py

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-05-17 16:30:20 -04:00