Commit Graph

1158 Commits

Author SHA1 Message Date
William Falcon 460ab5485e
Gen ddp support (#1961)
* updated docs

* added mixed

* added mixed
2020-05-26 19:02:30 -04:00
Rohit Gupta d0ec11b9d6
Remove unused param tpu_core_idx (#1948) 2020-05-25 16:04:53 -04:00
Adrian Wälchli 34237cfcaf
handle unknown args passed to Trainer.from_argparse_args (#1932)
* filter valid args

* error on unknown manual args

* added test

* changelog

* update docs and doctest

* simplify

* doctest

* doctest

* doctest

* better test with mock check for init call

* fstring

* extend test

* skip test on 3.6 not working

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-05-25 16:01:29 -04:00
William Falcon f46a7bae77
updated docs (#1941) 2020-05-25 15:59:32 -04:00
Federico Baldassarre 65b4352930
early stopping checks on_validation_end (#1458)
* Fixes PyTorchLightning/pytorch-lightning#490

`EarlyStopping` should check the metric of interest `on_validation_end` rather than `on_epoch_end`. 
In a normal scenario, this does not cause a problem, but in combination with `check_val_every_n_epoch>1` in the `Trainer` it results in a warning or in a `RuntimeError` depending on `strict`.

* Highlighted that ES callback runs on val epochs in docstring

* Updated EarlyStopping in rst doc

* Update early_stopping.py

* Update early_stopping.rst

* Update early_stopping.rst

* Update early_stopping.rst

* Update early_stopping.rst

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update docs/source/early_stopping.rst

* fix doctest indentation warning

* Train loop calls early_stop.on_validation_end

* chlog

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-05-25 17:33:00 +00:00
Adrian Wälchli 8ca8336ce5
protect progress bar callback (#1855)
* wip protected progress bar settings

* remove callback attr from LRfinder

* whitespace

* changelog
2020-05-25 07:49:23 -04:00
Lucas Vazquez 112dd5c4f6
Adds the option of saving the last model on checkpoint (#1908)
* saves model every epoch

* implement test for save_last

* Update CHANGELOG.md

* Update CHANGELOG.md

* changes test description

Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>

Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
2020-05-25 07:47:44 -04:00
Nicki Skafte a34eb9e169
Fix logger bug and prepare data bug (#1933)
* tests, fix logger bug and prepare data bug

* add CHANGELOG.md

Co-authored-by: Nicki Skafte <nugginea@gmail.com>
2020-05-25 07:43:56 -04:00
Justus Schock 6456247287
Re-Enable Import Errors (#1938)
* update logger imports

* pep8 fixes

* pep8
2020-05-25 07:31:35 -04:00
William Falcon caa9c6760b
replace Hparams by init args (#1896)
* remove the need for hparams

* remove the need for hparams

* remove the need for hparams

* remove the need for hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* replace self.hparams

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* fixed

* finished moco

* basic

* testing

* todo

* recurse

* hparams

* persist

* hparams

* chlog

* tests

* tests

* tests

* tests

* tests

* tests

* review

* saving

* tests

* tests

* tests

* docs

* finished moco

* hparams

* review

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* hparams

* overwrite

* transform

* transform

* transform

* transform

* cleaning

* cleaning

* tests

* examples

* examples

* examples

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* chp key

* tests

* Apply suggestions from code review

* class

* updated docs

* updated docs

* updated docs

* updated docs

* save

* wip

* fix

* flake8

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-05-24 18:59:08 -04:00
Nicki Skafte 8f6b7a2b4f
Fix user warning produced by apex + scheduler combination (#1873)
* fix user error produced by apex + scheduler combination

* add changelog

* added reinit to every configure_apex call

* fix styling

Co-authored-by: Nicki Skafte <nugginea@gmail.com>
2020-05-22 07:19:37 -04:00
Maxim Grechkin 98f7842970
Allow dataloaders without sampler field present (#1907)
* Allow dataloaders without sampler field present

Sometimes we have a custom dataloader that doesn't have a sampler, better to check that the field is there before reading it.

* chlog

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-05-20 20:57:12 +00:00
Kevin Trebing 3459a54667
Changed order of `update_learning_rates()` and `run_training_teardown()`. (#1891) 2020-05-19 13:16:26 -04:00
Justus Schock 9b629637b8
New metric classes (#1326) (#1877)
* New metric classes (#1326)

* Create metrics package

* Create metric.py

* Create utils.py

* Create __init__.py

* add tests for metric utils

* add docstrings for metrics utils

* add function to recursively apply other function to collection

* add tests for this function

* update test

* Update pytorch_lightning/metrics/metric.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* update metric name

* remove example docs

* fix tests

* add metric tests

* fix to tensor conversion

* fix apply to collection

* Update CHANGELOG.md

* Update pytorch_lightning/metrics/metric.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* remove tests from init

* add missing type annotations

* rename utils to convertors

* Create metrics.rst

* Update index.rst

* Update index.rst

* Update pytorch_lightning/metrics/convertors.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/metrics/convertors.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/metrics/convertors.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/metrics/metric.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/utilities/test_apply_to_collection.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/utilities/test_apply_to_collection.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/metrics/convertors.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Apply suggestions from code review

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* add doctest example

* rename file and fix imports

* added parametrized test

* replace lambda with inlined function

* rename apply_to_collection to apply_func

* Separated class description from init args

* Apply suggestions from code review

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* adjust random values

* suppress output when seeding

* remove gpu from doctest

* Add requested changes and add ellipsis for doctest

* forgot to push these files...

* add explicit check for dtype to convert to

* fix ddp tests

* remove explicit ddp destruction

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* move dtype device mixin to more general place

* refactor to general device dtype mixin

* add initial metric package description

* change default to none for mac os

* pep8

* fix import

* Update index.rst

* Update ci-testing.yml

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update CHANGELOG.md

* Update pytorch_lightning/metrics/converters.py

* readme

* Update metric.py

* Update pytorch_lightning/metrics/converters.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-05-19 11:05:07 -04:00
Rohit Gupta ac76dfcf62
Remove NaNs from loss in LRFinder (#1862)
* Remove NaNs from loss in LRFinder

* np.isfinite

* chlog

* add test

* chlog

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-05-19 08:39:19 +02:00
Ashraful Islam e0a5aee3a3
fix porgressbar postfix order (#1874) 2020-05-18 20:33:51 -04:00
Ashraful Islam 981169cacc
add warning for shuffling in test/val (#1865) 2020-05-18 09:53:02 -04:00
Lezwon Castelino 7c7e50ca47
Allow user to select individual TPU core to train on (#1729)
* added tpu_id

added tpu_id to mixins

* train on individual tpu

* parallel loader if tpu_id is None

* removed progress_bar_refresh_rate

* chlog

* replaced num_tpu_cores with tpu_cores

* set tpu_id to None if int

* changed num_tpu_cores to tpu_cores in docs

* updated docs

* updated __init__.py
removed self.tpu_id for ParallelLoader

* Update pytorch_lightning/trainer/__init__.py

* check if tpu_cores is a list

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* xla device conditional

* num_tpu_cores deprecation

* removed duplicate warning

* fixed pep8 error

* Revert "removed duplicate warning"

This reverts commit 8adb0a9b

* deprecated api update

* fixed recursion error

* fixed tests

* fixed flake errors

* removed current_tpu_index

* Update CHANGELOG.md

* Update trainer.py

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-05-17 16:30:54 -04:00
Fabio Natanael Kepler 8c4c7b105e
Fix `save_weights_only` flag in ModelCheckpoint (#1780)
* Add flag to `dump_checkpoint` for only including weights

`ModelCheckpoint` then passes `self.save_weights_only` to the save function.

* Fix tests and add changelog entry

* Add check and descriptive message when training state is restored from a weights only checkpoint

Also add a test for making sure `ModelCheckpoint.save_weights_only` works as expected.

* Fix weights-only test to properly match expected exception

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-17 09:24:17 -04:00
Adrian Wälchli 769a459d27
remove extra kwargs from Trainer init (#1820)
* remove kwargs

* remove useless test

* rename unknown trainer flag

* trainer inheritance and test

* blank line

* test for unknown arg

* changelog
2020-05-17 09:14:54 -04:00
Jirka Borovec 692f302837
continue devel (#1793)
* miss

* miss

* miss

* update

* format
2020-05-17 08:30:45 -04:00
Rohit Gupta 56d521a317
Fix test configuration check and testing (#1804)
* Fix test configuration check and testing

* Fix test configuration check and testing

* Remove check_testing_configuration during test

* Fix docstring

* fix function name

* remove conflicts
2020-05-17 08:22:44 -04:00
Adrian Wälchli 4cdebf9a64
remove obsolete self._device in Trainer (#1849)
* remove unused device attribute

* dtype

* move on_gpu to model
2020-05-17 08:20:51 -04:00
William Falcon b84b02400a
enable any dict and namespace in hparams (#1847) 2020-05-15 15:08:16 -04:00
Jirka Borovec e95e1d71c7
release 0.7.6 (#1813)
* release 0.7.6rc2

* release 0.7.6

* include img

* smaller image

* missing

* miss

* miss

* miss

* up
2020-05-15 08:36:40 -04:00
William Falcon c8c5d33208
Update __init__.py 2020-05-14 18:44:46 -04:00
Justus Schock c05077fae3
Enable non-blocking for gpu device transfer (#1843)
* Update distrib_parts.py

* Update CHANGELOG.md
2020-05-14 17:56:40 -04:00
Jirka Borovec bee0392c37
extend arg parser (#1842)
* extend arg parser

* flake8

* tests

* example

* fix test
2020-05-14 17:56:11 -04:00
Peter Yu a6f6edd07d
Update args, kwargs doc for load_from_checkpoint() (#1839) 2020-05-14 15:43:47 -04:00
Nicki Skafte 88f816ed06
dummy logger (#1836)
Co-authored-by: Nicki Skafte <nugginea@gmail.com>
2020-05-14 10:34:11 -04:00
William Falcon 1265b2fe02
Update __init__.py 2020-05-13 19:51:41 -04:00
William Falcon 53d9316a56
fixes ddp bugs (#1819)
* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug

* debug
2020-05-13 19:17:04 -04:00
William Falcon 648d516668
Use store_true for bool args (#1822)
*  Use store_true for bool args

* debug

Co-authored-by: Nate Raw <nxr9266@g.rit.edu>
2020-05-13 19:12:06 -04:00
Peter Yu e961f7e344
args should come after the last positional argument (#1807) 2020-05-13 17:29:54 -04:00
Ashwin Bharambe 0e71705a0a
[checkpoint logic] Fix bug which doesn't account for NoneType for `model.hparams` (#1817)
The intention of the code is to output a warning message when `hparams`
is null or not set. Instead the code now fatals when
`model.hparams = None`. Prevent that.
2020-05-13 17:14:11 -04:00
William Falcon 12138ced7c
Update __init__.py 2020-05-13 14:42:50 -04:00
Nicki Skafte 663b90035c
Bugfix: accumulation and suggestion for learning rate finder (#1801)
* fix suggestion being too naive

* fix accumulation error and added new tests

* fix styling

* update CHANGELOG.md

* update based on review

* fix tests

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-13 14:40:44 -04:00
Ashwin Bharambe aefc5314bc
[ddp] Support multi-node distributed execution under torchelastic (#1811)
The changes are quite local and limited in nature -- viz., checking for
some indicator environment variables. We check for (SLURM_LOCALID,
NODE_RANK, GROUP_RANK) in order. If multiple are found set, a warning is
logged.

This patch also fixes a minor bug with comparing the `WORLD_SIZE`
environment variable. This can be a string type.
2020-05-13 14:06:59 -04:00
So Uchida 22d7d03118
Replace meta_tags.csv with hparams.yaml (#1271)
* Add support for hierarchical dict

* Support nested Namespace

* Add docstring

* Migrate hparam flattening to each logger

* Modify URLs in CHANGELOG

* typo

* Simplify the conditional branch about Namespace

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update CHANGELOG.md

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* added examples section to docstring

* renamed _dict -> input_dict

* mata_tags.csv -> hparams.yaml

* code style fixes

* add pyyaml

* remove unused import

* create the member NAME_HPARAMS_FILE

* improve tests

* Update tensorboard.py

* pass the local test w/o relavents of Horovod

* formatting

* update dependencies

* fix dependencies

* Apply suggestions from code review

* add savings

* warn

* docstrings

* tests

* Apply suggestions from code review

* saving

* Apply suggestions from code review

* use default

* remove logging

* typo fixes

* update docs

* update CHANGELOG

* clean imports

* add blank lines

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* back to namespace

* add docs

* test fix

* update dependencies

* add space

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-05-13 15:05:15 +02:00
William Falcon 35fe2efe27
added override for hparams in load_from_ckpt (#1797)
* added override for hparams in load_from_ckpt

* override hparams

* override hparams

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update doctest

* typo

* chlog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2020-05-13 10:27:22 +02:00
Jirka Borovec 10ce1c0256
device property (#1791)
* device property

* add/copy properties

* inherit

* rename

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* dtype

* prop

* pt api

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2020-05-12 23:18:39 -04:00
Adrian Wälchli 8978794730
add missing flag (#1805) 2020-05-12 17:06:38 -04:00
Oliver Neumann 9059d21042
Missing profiler attribute in add_argparse_args() ArgumentParser (#1794)
* Fixed typing annotation by adding boolean type. After that Profiler flag will be added to argparse.

* Updated CHANGELOG.md

* Updated git_init_arguments_and_types() to pass doctests.

* Added doctest example to add_argparse_parser()
2020-05-12 08:53:26 -04:00
kumuji 619f984c36
Option to provide seed to random generators to ensure reproducibility (#1572)
* Option to provide seed to random generators to ensure reproducibility

I added small function in utilities which imports torch, numpy, python
random and sets seed for all of the libraries to ensure reproducibility
of results.

* Apply recommendations from core contributors on seeding

1. Moved the seeding code to another file
2. Make deterministic as a parameter for trainer class
3. Add assertions for seeding numpy
4. Added warnings
5. torch.manual_seed should be enough for seeding torch

* Revert "Apply recommendations from core contributors on seeding"

This reverts commit a213c8e6882eec8a9e7408b9418926d2db7c5461.

* Revert "Revert "Apply recommendations from core contributors on seeding""

This reverts commit 59b2da53c62878de7aab0aa3feb3115e105eea06.

* Change in test, for correct seeding

* Allow seed equal to 0

* Allow seed to be uint32.max

* Added deterministic to benchmarks

* Cuda manual seed as in benchmark seeding

* Seeding should be done before model initialization

* cuda manual_seed is not necessary

* Fixing seed test_cpu_lbfgs

On some seeds seems like lbfgs doesn't converge.
So I fixed the seed during testing.

* rebasing issue with old reproducibility.py

* Improved documentation and ability to seed before initializing Train
class

* Change in docs

* Removed seed from trainer, update for documentation

* Typo in the docs

* Added seed_everything to _all_

* Fixing old changes

* Model initialization should be earlier then Trainer

* Update pytorch_lightning/trainer/__init__.py

From Example to testcode

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Fixing according to the contributors suggestions

* Moving horovod deterministic to Trainer class

* deterministic flag affects horovod docs update

* Improved static typing

* Added deterministic to test runners of horovod

It is failing on some versions, not very predictable

* static seeds for horovod tests

* Change for reset_seed function in tests

* Seeding horovod using reset_seed from tutils

* Update pytorch_lightning/trainer/__init__.py

* chlog

* Update trainer.py

* change "testcode" to "Example" in trainer init documentation

* Update pytorch_lightning/trainer/seed.py, first line in comment

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-05-12 07:53:20 -04:00
Justus Schock 5f292390fd
Bug fix hparam logging with metrics (#1647)
* add metric logging

* Use pytorch built-in method

* Update tensorboard.py

* Update tensorboard.py
2020-05-12 07:25:12 -04:00
William Falcon 10b16dbfab
made ddp the default if no backend specified with multiple GPUs (#1789)
* made ddp the default if no backend specified with multiple GPUs

* fix

* spawn

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2020-05-12 06:54:23 -04:00
Travis Addair acab068c74
Join Horovod workers at the end of trainer.fit() to prevent race conditions following training (#1786)
* Join Horovod workers at the end of trainer.fit() to prevent race conditions following training

* flake8

* flake8

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2020-05-12 09:15:25 +00:00
William Falcon 7b60d49432
fixed native amp + ddp (#1788)
* fixed native amp + ddp

* fixed native amp + ddp
2020-05-12 00:25:06 -04:00
Jeremy Jordan 1df0d2dc97
set logger level for package (#1718)
* move logging config to trainer class init

* alternate logging config
2020-05-12 00:14:35 -04:00
William Falcon 4b30ef6480
Device (#1790)
* added self.device

* added docs
2020-05-12 00:09:48 -04:00