Commit Graph

1479 Commits

Author SHA1 Message Date
Justus Schock 7358d456f3
Retrieve last logged val from result by key (#3049)
* return last logged value

* Update test_results.py

* Update step_result.py

* Update step_result.py

* pep8

* pep8
2020-08-19 18:59:14 -04:00
Adrian Wälchli 89a5d8fee9
fix auto scale batch size not working with precision=16 (#3045)
* add test

* test

* test

* add fix

* changelog

* check batch size changed
2020-08-19 20:41:33 +00:00
Adrian Wälchli 9031dc3b81
Fix result gathering with varying tensor shapes (#3020)
* test for gethering results

* fix gather

* document tests

* changelog

* assert dtype

* default to concat

* additional test
2020-08-18 20:27:48 -04:00
Ananya Harsh Jha 9445c800b0
set device to root gpu (#3042) 2020-08-18 19:28:35 -04:00
William Falcon ca18e11f6e
Update __init__.py 2020-08-17 15:06:42 -04:00
William Falcon 8315a65d0a
fix result obj dp auto reduce (#3013)
* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* fix result for dp

* added warning when changing monitor and using results obj
2020-08-17 10:29:39 -04:00
William Falcon 51de6802ed
added warning when changing monitor and using results obj (#3014)
* added warning when changing monitor and using results obj

* added warning when changing monitor and using results obj

* added warning when changing monitor and using results obj
2020-08-17 10:29:28 -04:00
William Falcon 465d4ffd2c
added lr scheduler test using dev debugger (#3004)
* added lr scheduler test using dev debugger

* added lr scheduler test using dev debugger

* added lr scheduler test using dev debugger
2020-08-16 11:37:38 -04:00
Adrian Wälchli 188e06c261
ddp fix for trainer.test() + add basic ddp tests (#2997)
* add ddp script variations

* add ddp test

* rename

* shell

* test

* test

* try call

* try without subprocess

* test

* display the error

* list all variations

* try string

* try copy env

* debug

* pythonpath

* path

* update test

* change

* simple ddp test

* replace

* remove random port

* random port

* str

* clean up

* check run spawn

* clean up

* docs

* docs

* update test

* docs

* changelog

* changelog
2020-08-16 11:19:57 -04:00
William Falcon ed231c93ca
updated docs (#2999)
* updated docs

* updated docs
2020-08-16 07:38:18 -04:00
William Falcon eeea59c3da
Update __init__.py 2020-08-16 00:02:55 -04:00
William Falcon 44802f7697 tasks docs 2020-08-15 22:36:53 -04:00
William Falcon d702d4d393
removed callback metrics from test results obj (#2994)
* removed callback metrics from test results obj

* removed callback metrics from test results obj
2020-08-15 21:45:41 -04:00
Jeff Yang 73ebd1066d
Fix accumulate_grad_batches for last batch (#2853)
* first attempt

* update changelog

* fix pep8 and tests

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* added new tests

* fixed tests

* Apply suggestions from code review

* used num_training_batches

* fixed pep8

* fixed with is_last_batch suggested by @awaelchli

* fixed with num_training_batches

* fixed with num_training_batches

* cleanup

* fix test and update docs

* fixed for alignment, update docs

* minor changes

* update doc

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-15 15:06:37 -04:00
William Falcon 5c35db94fa
Update __init__.py 2020-08-15 09:21:22 -04:00
William Falcon 7d36aac138
fix docs (#2987) 2020-08-15 08:36:17 -04:00
William Falcon b8371fa56c
Fixes #2972 #2946 (#2986)
* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add val step arg to metrics

* add step metrics

* add step metrics
2020-08-15 08:36:00 -04:00
William Falcon 62ddcfcfb1
Update __init__.py 2020-08-14 17:54:25 -04:00
Nathan Raw b9695237f1
Save test predictions on multiple GPUs (#2926)
* Save test predictions on multiple GPUs
2020-08-14 17:52:43 -04:00
William Falcon e7794eb79a
Fixes #2407 (#2981)
* fix gpus index error
2020-08-14 16:22:48 -04:00
Jirka Borovec 5bce06c050
nb. devices (#2973) 2020-08-14 11:37:21 +02:00
William Falcon 0c264689cb
Fixes #2942 (#2969)
* Fixes #2942

* doc fix
2020-08-13 21:54:57 -04:00
William Falcon 48f658fbb5
Fixes #2943 (#2970) 2020-08-13 21:44:55 -04:00
William Falcon 639a4cbd25
autoplay (#2968) 2020-08-13 19:06:55 -04:00
Nicki Skafte 6a051c887f
Add docs for GpuUsageLogger (#2945)
* add docs

* fix spelling
2020-08-13 18:58:14 -04:00
Lezwon Castelino cfd06a083b
Bugfix/2956 tpu distrib backend fix (#2959)
* override dist backend when using tpus

* added test

* updated doc string

* drop redundant info...

* more redundant info

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
2020-08-13 18:57:23 -04:00
William Falcon b7fc805dcf
pep 8 (#2967) 2020-08-13 18:56:02 -04:00
William Falcon 9a503de6af
Replace docs gifs with videos snippets so user can play at own speed (#2966)
* update docs
2020-08-13 18:52:47 -04:00
Jeff Yang 07c023c32f
fix(docs): docstring for amp_backend (#2960)
* fix(docs): docstring for amp_backend

* fix(docs): early_stop_checkpoint -> early_stop_callback

* docs

Co-authored-by: ananyahjha93 <ananya@pytorchlightning.ai>
2020-08-13 23:25:56 +02:00
SiddhantRanade 88bfed371e
Fix enforce_datamodule_dataloader_override() for iterable datasets (#2957)
This function has the if statement `if (train_dataloader or val_dataloaders) and datamodule:`.


The issue is similar to that in https://github.com/PyTorchLightning/pytorch-lightning/pull/1560. The problem is that the `if(dl)` translates to `if(bool(dl))`, but there's no dataloader.__bool__ so bool() uses dataloader.__len__ > 0. But... dataloader.__len__ uses IterableDataset.__len__ for IterableDatasets for which __len__ is undefined.

The fix is also the same, the `if dl` should be replaced by `if dl is not None`.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-13 23:06:17 +02:00
shijianjian 18d31a3b63
Added strict=False for load_from_checkpoint (#2819)
* Added strict=False and hparams_file accepcts dict

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Type check fix

* Added tests

* Linting & test fix

* Removed redundant code & test

* Added strict=False and hparams_file accepcts dict

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* Type check fix

* Added tests

* Linting & test fix

* Removed redundant code & test

* Apply suggestions from code review

* tests

* tests

* chlog

* Update tests/models/test_restore.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update test comments

* Added docstring for the strict attribute

* Added supplementary tests

* Update saving.py

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* pep8, removed extra func

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananyahjha93 <ananya@pytorchlightning.ai>
2020-08-13 16:25:43 -04:00
William Falcon 2c935d048e
track batch size (#2954) 2020-08-13 12:40:54 -04:00
Jirka Borovec 4354690e55
add apex test (#2921)
* add apex test

* rename

* level

* events

* wrap

* evt

* miss

* apex

* apex

* apex

* apex

* apex

* apex

* Update tests/models/test_amp.py

Co-authored-by: William Falcon <waf2107@columbia.edu>

* notes

* notes

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-13 10:03:13 -04:00
William Falcon 6c5a0a172f
Resultd (#2947)
* updated docs
2020-08-13 09:58:05 -04:00
Jirka Borovec 519b97effd
Clean save (#2933)
* thr
deterministic=True

* clean

* clean

* Apply suggestions from code review

Co-authored-by: Vadym Stupakov <vadim.stupakov@gmail.com>

* Apply suggestions from code review

Co-authored-by: Vadym Stupakov <vadim.stupakov@gmail.com>
2020-08-13 07:26:33 -04:00
Gerardo Roa Dabike f6a3d8fd8d
GPU Usage Logger (#2932)
* GPU utilisation Callback

* GPU utilisation Callback

* Fixing style

* Fixing style

* Fixing CodeFactor: partial executable path

* Fix a misspelling in the Class name
2020-08-12 15:09:34 -04:00
Adrian Wälchli 411914bd2b
Fix hparams loading for model that accepts *args (#2911)
* fix hparams loading for model that accepts *args

* add test case

* changelog

* pep

* fix test

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-12 09:58:35 -04:00
Rosario Scalise f9d88f8088
Support **DictConfig hparam serialization (#2519)
* change to OmegaConf API

Co-authored-by: Omry Yadan <omry@fb.com>

* Swapped Container for OmegaConf sentinel; Limited ds copying

* Add Namespace check.

* Container removed. Pass local tests.

Co-authored-by: Omry Yadan <omry@fb.com>
2020-08-12 08:10:17 -04:00
William Falcon a46130cdc1
add weighted average to results obj (#2930)
* track batch size in result obj
2020-08-12 08:02:00 -04:00
Phil e3528afae3
Move optimizer creation after device placement for ddp backends. (#2904) 2020-08-12 06:34:59 -04:00
Brendan Fahy 56396abe98
fix checkpointing to remote file paths (#2925) 2020-08-12 06:31:17 -04:00
William Falcon d13e5c9e53
document lightiningmodule better (#2920)
* updated docs
2020-08-11 19:39:43 -04:00
Adrian Wälchli 69d241c82e
Do not pass non_blocking=True if it does not support this argument (#2910)
* add docs

* non blocking only on tensor

* changelog

* add test case

* add test comment

* update changelog


changelog


chlog
2020-08-11 19:28:37 -04:00
William Falcon 28f79d9f7a
Mapkeys (#2900)
* added a map dict

* added a map dict
2020-08-09 18:50:39 -04:00
Brendan Fahy 97e6f35b34
fix missing return statement. Do not normalize remote paths (#2894)
* fix missing return statement. Do not normalize remote paths

* Update pytorch_lightning/utilities/cloud_io.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Add some documentation that we now support s3 and hdfs paths

* suggestion from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-09 22:38:43 +00:00
ananda seelan 4d3dfd43e4
Minor doc fixes (#2893)
* Minor language fixes

* Typo fix
2020-08-09 15:00:08 -04:00
Caldera 6c18fd9a24
Update lr_logger.py (#2847)
* Update lr_logger.py

when logging learning_rate, we should provide different choices to log including 'step' and 'epoch'

* Update lr_logger.py

add some type annotations and docstrings

* Update lr_logger.py

fixed a bug where `on_train_batch_start()` can't be triggered, instead, we should use on_batch_start(); add `interval` args so that we can record learning_rates with respect to `global_step` or `current_epoch`.

* Update lr_logger.py

restore _extract_lr()

* suggestion

* Update lr_logger.py

modify _extract_lr(), it no more need to pass `interval` parameter.

* Update test_lr_logger.py

SkafteNicki 's suggetion

* log_interval now supports `None`, `step`, `epoch`

* change `log_interval` to `logging_interval`

* Update test_lr_logger.py

* Update lr_logger.py

* put types check into `on_train_start()`

* cleanup

* docstring typos

* minor changes from suggestions

Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2020-08-09 16:30:43 +00:00
William Falcon 38c018e4ba
Update __init__.py 2020-08-09 10:05:39 -04:00
Uladzislau Sazanovich e9846dd758
Add tracking of basic states in Trainer [wip - to-be-merged after v0.9] (#2541)
* Add initial tracking of states in Trainer.

* Add INTERRUPTED state, improve tests, move state switching from callback to a trainer.

* Move part of a trainer state switching to a decorator.

* Add documentation.

* Fix docs, rename state enum, restore state to previous on exit if None, add tests for decorator only.

* Fix callback typing.

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-08-09 06:24:09 -04:00
Brendan Fahy 6e77181ec7
Squashed commit of the following: (#2164)
commit 29fb0506cd38a15c359e369cc8bc4435916b0c78
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 19:35:30 2020 +0000

    fix checking for version for docs to build

commit 467fd640db02275972c7111af031c86bb59333e9
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:56:05 2020 +0000

    remove no local test

commit a7cc9f88de00feec1a5406874d05313c42bd004c
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:46:44 2020 +0000

    fix

commit 3fdbb729da79ae9348c83410a138666bad467951
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:23:30 2020 +0000

    revert requirements

commit 9b8686bd83e2bc243cf329e26f1c667c6949cf67
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:16:42 2020 +0000

    make it a fixture

commit eec74953d24c8b25268d3b6dde3cc4affdd5cb8f
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 18:01:32 2020 +0000

    fix up the testing

commit 896d94a0e60083d52c81db2a036b7f1e015cad11
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 17:47:28 2020 +0000

    fix some tests

commit 6d22bde19767bf2b71dfd44839b01efdf6888f83
Merge: 6175d4e2 6ebe0d72
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Sat Aug 8 10:20:47 2020 +0000

    Merge remote-tracking branch 'origin/master' into tb_use_gfile

commit 6175d4e26b15a43c412c26d501762cd0b570616a
Author: Brendan Fahy <bmfahy@gmail.com>
Date:   Fri Aug 7 10:16:36 2020 +0000

    Use tensorboard.compat.gfile to support remote writing
2020-08-09 06:08:44 -04:00