Commit Graph

562 Commits

Author SHA1 Message Date
William Falcon 3453bba898
re-enabled naming metrics in ckpt name (#3060)
* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name

* re-enabled naming metrics in ckpt name
2020-08-19 20:34:09 -04:00
Nicki Skafte cefc7f7c32
Feature/log computational graph (#3003)
* add methods

* log in trainer

* add tests

* changelog

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* text

* added argument

* update tests

* fix styling

* improve testing
2020-08-19 19:08:46 -04:00
Adrian Wälchli 7b917de946
fix setting batch_size attribute in batch_size finder (finishing PR #2523) (#3043)
* lightning attr fix

* revert refactor

* create test

* separate test

* changelog update

* tests

* revert

* Update pytorch_lightning/trainer/training_tricks.py

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-19 19:01:55 -04:00
Adrian Wälchli 89a5d8fee9
fix auto scale batch size not working with precision=16 (#3045)
* add test

* test

* test

* add fix

* changelog

* check batch size changed
2020-08-19 20:41:33 +00:00
William Falcon 8315a65d0a
fix result obj dp auto reduce (#3013)
* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* added warning when changing monitor and using results obj
2020-08-17 10:29:39 -04:00
William Falcon 465d4ffd2c
added lr scheduler test using dev debugger (#3004)
* added lr scheduler test using dev debugger

* added lr scheduler test using dev debugger

* added lr scheduler test using dev debugger
2020-08-16 11:37:38 -04:00
Adrian Wälchli 188e06c261
ddp fix for trainer.test() + add basic ddp tests (#2997)
* add ddp script variations

* add ddp test

* rename

* shell

* test

* test

* try call

* try without subprocess

* test

* display the error

* list all variations

* try string

* try copy env

* debug

* pythonpath

* path

* update test

* change

* simple ddp test

* replace

* remove random port

* random port

* str

* clean up

* check run spawn

* clean up

* docs

* docs

* update test

* docs

* changelog

* changelog
2020-08-16 11:19:57 -04:00
William Falcon 44802f7697 tasks docs 2020-08-15 22:36:53 -04:00
William Falcon d702d4d393
removed callback metrics from test results obj (#2994)
* removed callback metrics from test results obj

* removed callback metrics from test results obj
2020-08-15 21:45:41 -04:00
Jeff Yang 73ebd1066d
Fix accumulate_grad_batches for last batch (#2853)
* first attempt

* update changelog

* fix pep8 and tests

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* added new tests

* fixed tests

* Apply suggestions from code review

* used num_training_batches

* fixed pep8

* fixed with is_last_batch suggested by @awaelchli

* fixed with num_training_batches

* fixed with num_training_batches

* cleanup

* fix test and update docs

* fixed for alignment, update docs

* minor changes

* update doc

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-15 15:06:37 -04:00
William Falcon 7d36aac138
fix docs (#2987) 2020-08-15 08:36:17 -04:00
William Falcon b8371fa56c
Fixes #2972 #2946 (#2986)
* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add step metrics

* add step metrics
2020-08-15 08:36:00 -04:00
Nathan Raw b9695237f1
Save test predictions on multiple GPUs (#2926)
* Save test predictions on multiple GPUs
2020-08-14 17:52:43 -04:00
William Falcon e7794eb79a
Fixes #2407 (#2981)
* fix gpus index error
2020-08-14 16:22:48 -04:00
William Falcon 48f658fbb5
Fixes #2943 (#2970) 2020-08-13 21:44:55 -04:00
William Falcon 639a4cbd25
autoplay (#2968) 2020-08-13 19:06:55 -04:00
Lezwon Castelino cfd06a083b
Bugfix/2956 tpu distrib backend fix (#2959)
* override dist backend when using tpus

* added test

* updated doc string

* drop redundant info...

* more redundant info

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-08-13 18:57:23 -04:00
William Falcon b7fc805dcf
pep 8 (#2967) 2020-08-13 18:56:02 -04:00
William Falcon 9a503de6af
Replace docs gifs with videos snippets so user can play at own speed (#2966)
* update docs
2020-08-13 18:52:47 -04:00
Jeff Yang 07c023c32f
fix(docs): docstring for amp_backend (#2960)
* fix(docs): docstring for amp_backend

* fix(docs): early_stop_checkpoint -> early_stop_callback

* docs

Co-authored-by: ananyahjha93 <ananya@pytorchlightning.ai>
2020-08-13 23:25:56 +02:00
SiddhantRanade 88bfed371e
Fix enforce_datamodule_dataloader_override() for iterable datasets (#2957)
This function has the if statement `if (train_dataloader or val_dataloaders) and datamodule:`.


The issue is similar to that in https://github.com/PyTorchLightning/pytorch-lightning/pull/1560. The problem is that the `if(dl)` translates to `if(bool(dl))`, but there's no dataloader.__bool__ so bool() uses dataloader.__len__ > 0. But... dataloader.__len__ uses IterableDataset.__len__ for IterableDatasets for which __len__ is undefined.

The fix is also the same, the `if dl` should be replaced by `if dl is not None`.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-13 23:06:17 +02:00
William Falcon 2c935d048e
track batch size (#2954) 2020-08-13 12:40:54 -04:00
Jirka Borovec 4354690e55
add apex test (#2921)
* add apex test

* rename

* level

* events

* wrap

* evt

* miss

* apex

* apex

* apex

* apex

* apex

* apex

* Update tests/models/test_amp.py

Co-authored-by: William Falcon <waf2107@columbia.edu>

* notes

* notes

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-13 10:03:13 -04:00
William Falcon 6c5a0a172f
Resultd (#2947)
* updated docs
2020-08-13 09:58:05 -04:00
Jirka Borovec 519b97effd
Clean save (#2933)
* thr
deterministic=True

* clean

* clean

* Apply suggestions from code review

Co-authored-by: Vadym Stupakov <vadim.stupakov@gmail.com>

* Apply suggestions from code review

Co-authored-by: Vadym Stupakov <vadim.stupakov@gmail.com>
2020-08-13 07:26:33 -04:00
William Falcon a46130cdc1
add weighted average to results obj (#2930)
* track batch size in result obj
2020-08-12 08:02:00 -04:00
Brendan Fahy 56396abe98
fix checkpointing to remote file paths (#2925) 2020-08-12 06:31:17 -04:00
William Falcon d13e5c9e53
document lightiningmodule better (#2920)
* updated docs
2020-08-11 19:39:43 -04:00
Brendan Fahy 97e6f35b34
fix missing return statement. Do not normalize remote paths (#2894)
* fix missing return statement. Do not normalize remote paths

* Update pytorch_lightning/utilities/cloud_io.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Add some documentation that we now support s3 and hdfs paths

* suggestion from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-09 22:38:43 +00:00
Uladzislau Sazanovich e9846dd758
Add tracking of basic states in Trainer [wip - to-be-merged after v0.9] (#2541)
* Add initial tracking of states in Trainer.

* Add INTERRUPTED state, improve tests, move state switching from callback to a trainer.

* Move part of a trainer state switching to a decorator.

* Add documentation.

* Fix docs, rename state enum, restore state to previous on exit if None, add tests for decorator only.

* Fix callback typing.

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-09 06:24:09 -04:00
Brendan Fahy 6e77181ec7
Squashed commit of the following: (#2164)
commit 29fb0506cd38a15c359e369cc8bc4435916b0c78
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 19:35:30 2020 +0000

    fix checking for version for docs to build

commit 467fd640db02275972c7111af031c86bb59333e9
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:56:05 2020 +0000

    remove no local test

commit a7cc9f88de00feec1a5406874d05313c42bd004c
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:46:44 2020 +0000

    fix

commit 3fdbb729da79ae9348c83410a138666bad467951
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:23:30 2020 +0000

    revert requirements

commit 9b8686bd83e2bc243cf329e26f1c667c6949cf67
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:16:42 2020 +0000

    make it a fixture

commit eec74953d24c8b25268d3b6dde3cc4affdd5cb8f
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:01:32 2020 +0000

    fix up the testing

commit 896d94a0e60083d52c81db2a036b7f1e015cad11
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 17:47:28 2020 +0000

    fix some tests

commit 6d22bde19767bf2b71dfd44839b01efdf6888f83
Merge: 6175d4e2 6ebe0d72
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 10:20:47 2020 +0000

    Merge remote-tracking branch 'origin/master' into tb_use_gfile

commit 6175d4e26b15a43c412c26d501762cd0b570616a
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Fri Aug 7 10:16:36 2020 +0000

    Use tensorboard.compat.gfile to support remote writing
2020-08-09 06:08:44 -04:00
William Falcon 256059a1d0
tracks all outputs including TBPTT and multiple optimizers (#2890)
* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update

* pl 0.9 update
2020-08-09 06:00:15 -04:00
Adrian Wälchli 1bb268ad8a
Clarify what gpus=0 means in docs (#2876)
* docs clarify what gpus=0 means

* add example suggested by @ydcjeff
2020-08-08 11:50:08 -04:00
Adrian Wälchli f798cffd02
save last model after saving top_k when save_last=True (#2881)
* save_last should be last

* changelog

* seed, docs

* retrigger ci

* compare filenames

* move constants

* fix test

* epoch, global step

* improve test
2020-08-08 06:02:43 -04:00
Jirka Borovec a6e7aa7796
allow using apex with any PT version (#2865)
* wip

* setup

* type

* name

* wip

* docs

* imports

* fix if

* fix if

* use_amp

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* todos

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-08 11:07:32 +02:00
Santiago Castro fed0ac838b
Fix Trainer arg name in docs (#2879)
* Fix Trainer arg name in docs

* Fix a PR comment
2020-08-08 07:52:35 +02:00
Jirka Borovec b7d72706c3
clean imports (#2867)
* clean imports

* miss
2020-08-08 00:33:51 +02:00
Jirka Borovec f8c058215f
simplify tests & cleaning (#2588)
* simplify

* tmpdir

* revert

* clean

* accel

* types

* test

* edit test acc

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update test acc

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-07 23:22:05 +02:00
Iz Beltagy 2cc60c625e
fix set_epoch on TPUs (#2740)
* fix https://github.com/PyTorchLightning/pytorch-lightning/issues/2622

* Update training_loop.py
2020-08-07 09:31:30 -04:00
William Falcon f82d7feb6c
updated hooks (#2850)
* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks

* modified hooks
2020-08-07 09:29:57 -04:00
ananthsub b39f4798a6
Add support to Tensorboard logger for OmegaConf hparams (#2846)
* Add support to Tensorboard logger for OmegaConf hparams

Address https://github.com/PyTorchLightning/pytorch-lightning/issues/2844

We check if we can import omegaconf, and if the hparams are omegaconf instances. if so, we use OmegaConf.merge to preserve the typing, such that saving hparams to yaml actually triggers the OmegaConf branch

* avalaible

* chlog

* test

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-08-07 09:13:21 -04:00
Rohit Gupta a642349228
Support limit_mode_batches (int) for infinite dataloader (#2840)
* Support limit_mode_batches(int) for infinite dataloader

* flake8

* revert and update

* add and update tests

* pep8

* chlog

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Add suggestions by @awaelchli

* docs

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Apply suggestions from code review

* fix

* max

* check

* add and update tests

* max

* check

* check

* check

* chlog

* tests

* update exception message

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-08-07 13:02:36 +02:00
Nima Sarang 793036d29c
Support returning python scalars in DP (#1935)
* Override the default gather method to support scalars

* add computing average of a list

* bug: change if to elif

* add some tests

* change style

* change documentation

* use apply_to_collection in DP gather

* use apply_to_collection in DP gather

* fix warning msg

* override gather method in DP

* add tests for python scalars

* add python scalars to docstring

* Update message

* override gather method in DP

* formatting

* chlog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-08-07 09:18:29 +02:00
Nicki Skafte 9a402461da
Bugfix: Lr finder and hparams compatibility (#2821)
* fix hparams lr finder bug

* add tests for new functions

* better tests

* fix codefactor

* fix styling

* fix tests

* fix codefactor

* Apply suggestions from code review

* modified hook

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-07 00:34:48 +02:00
William Falcon b507c42c47
clarify batch hooks (#2842)
* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook

* modified hook
2020-08-05 20:01:30 -04:00
Ananya Harsh Jha a5f2b89ed0
updated sync bn (#2838)
* updated sync bn

* updated sync bn

* updated sync bn

* updated sync bn

* updated sync bn

* updated sync bn

* updated sync bn

* updated sync bn

* added ddp_spawn test

* updated test

* clean

* clean

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-08-06 01:12:11 +02:00
William Falcon 5d0f0325d8
Revert "Support limit_mode_batches (int) for infinite dataloader" (#2839)
* Revert "Support limit_mode_batches (int) for infinite dataloader (#2787)"

This reverts commit de9c9f0864.

* Update training_tricks.py
2020-08-05 15:57:26 -04:00
Ruotian(RT) Luo bef27c58ed
save apex scaler states (#2828) 2020-08-05 13:43:50 -04:00
Ruotian(RT) Luo 6034d5e37d
fix apex gradient clipping (#2829) 2020-08-05 13:42:21 -04:00
Ananya Harsh Jha e31c520c21
add support for sync_bn (#2801)
* initial commit for sync_bn

* updated changelog

* tests

* tests

* ddp tests hanging with script tests

* updated trainer

* updated params

* test

* passingtests

* passing tests

* passing tests

* passing tests

* tests

* removed apex

* doc

* doc

* doc

* doc

* docs

* tests

* tests

* tests
2020-08-05 13:29:05 -04:00