lightning

Commit Graph

Author	SHA1	Message	Date
William Falcon	460ab5485e	Gen ddp support (#1961 ) * updated docs * added mixed * added mixed	2020-05-26 19:02:30 -04:00
Rohit Gupta	d0ec11b9d6	Remove unused param tpu_core_idx (#1948 )	2020-05-25 16:04:53 -04:00
Adrian Wälchli	34237cfcaf	handle unknown args passed to Trainer.from_argparse_args (#1932 ) * filter valid args * error on unknown manual args * added test * changelog * update docs and doctest * simplify * doctest * doctest * doctest * better test with mock check for init call * fstring * extend test * skip test on 3.6 not working Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-05-25 16:01:29 -04:00
William Falcon	f46a7bae77	updated docs (#1941 )	2020-05-25 15:59:32 -04:00
Federico Baldassarre	65b4352930	early stopping checks on_validation_end (#1458 ) * Fixes PyTorchLightning/pytorch-lightning#490 `EarlyStopping` should check the metric of interest `on_validation_end` rather than `on_epoch_end`. In a normal scenario, this does not cause a problem, but in combination with `check_val_every_n_epoch>1` in the `Trainer` it results in a warning or in a `RuntimeError` depending on `strict`. * Highlighted that ES callback runs on val epochs in docstring * Updated EarlyStopping in rst doc * Update early_stopping.py * Update early_stopping.rst * Update early_stopping.rst * Update early_stopping.rst * Update early_stopping.rst * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update docs/source/early_stopping.rst * fix doctest indentation warning * Train loop calls early_stop.on_validation_end * chlog Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka <jirka@pytorchlightning.ai>	2020-05-25 17:33:00 +00:00
Adrian Wälchli	8ca8336ce5	protect progress bar callback (#1855 ) * wip protected progress bar settings * remove callback attr from LRfinder * whitespace * changelog	2020-05-25 07:49:23 -04:00
Lucas Vazquez	112dd5c4f6	Adds the option of saving the last model on checkpoint (#1908 ) * saves model every epoch * implement test for save_last * Update CHANGELOG.md * Update CHANGELOG.md * changes test description Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com> Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>	2020-05-25 07:47:44 -04:00
Nicki Skafte	a34eb9e169	Fix logger bug and prepare data bug (#1933 ) * tests, fix logger bug and prepare data bug * add CHANGELOG.md Co-authored-by: Nicki Skafte <nugginea@gmail.com>	2020-05-25 07:43:56 -04:00
Justus Schock	6456247287	Re-Enable Import Errors (#1938 ) * update logger imports * pep8 fixes * pep8	2020-05-25 07:31:35 -04:00
William Falcon	caa9c6760b	replace Hparams by init args (#1896 ) * remove the need for hparams * remove the need for hparams * remove the need for hparams * remove the need for hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * finished moco * basic * testing * todo * recurse * hparams * persist * hparams * chlog * tests * tests * tests * tests * tests * tests * review * saving * tests * tests * tests * docs * finished moco * hparams * review * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * hparams * overwrite * transform * transform * transform * transform * cleaning * cleaning * tests * examples * examples * examples * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * chp key * tests * Apply suggestions from code review * class * updated docs * updated docs * updated docs * updated docs * save * wip * fix * flake8 Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-05-24 18:59:08 -04:00
Nicki Skafte	8f6b7a2b4f	Fix user warning produced by apex + scheduler combination (#1873 ) * fix user error produced by apex + scheduler combination * add changelog * added reinit to every configure_apex call * fix styling Co-authored-by: Nicki Skafte <nugginea@gmail.com>	2020-05-22 07:19:37 -04:00
Maxim Grechkin	98f7842970	Allow dataloaders without sampler field present (#1907 ) * Allow dataloaders without sampler field present Sometimes we have a custom dataloader that doesn't have a sampler, better to check that the field is there before reading it. * chlog Co-authored-by: Jirka <jirka@pytorchlightning.ai>	2020-05-20 20:57:12 +00:00
Kevin Trebing	3459a54667	Changed order of `update_learning_rates()` and `run_training_teardown()`. (#1891 )	2020-05-19 13:16:26 -04:00
Justus Schock	9b629637b8	New metric classes (#1326 ) (#1877 ) * New metric classes (#1326) * Create metrics package * Create metric.py * Create utils.py * Create __init__.py * add tests for metric utils * add docstrings for metrics utils * add function to recursively apply other function to collection * add tests for this function * update test * Update pytorch_lightning/metrics/metric.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * update metric name * remove example docs * fix tests * add metric tests * fix to tensor conversion * fix apply to collection * Update CHANGELOG.md * Update pytorch_lightning/metrics/metric.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * remove tests from init * add missing type annotations * rename utils to convertors * Create metrics.rst * Update index.rst * Update index.rst * Update pytorch_lightning/metrics/convertors.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/metrics/convertors.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/metrics/convertors.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/metrics/metric.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/utilities/test_apply_to_collection.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/utilities/test_apply_to_collection.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/metrics/convertors.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Apply suggestions from code review Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * add doctest example * rename file and fix imports * added parametrized test * replace lambda with inlined function * rename apply_to_collection to apply_func * Separated class description from init args * Apply suggestions from code review Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * adjust random values * suppress output when seeding * remove gpu from doctest * Add requested changes and add ellipsis for doctest * forgot to push these files... * add explicit check for dtype to convert to * fix ddp tests * remove explicit ddp destruction Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * move dtype device mixin to more general place * refactor to general device dtype mixin * add initial metric package description * change default to none for mac os * pep8 * fix import * Update index.rst * Update ci-testing.yml * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update CHANGELOG.md * Update pytorch_lightning/metrics/converters.py * readme * Update metric.py * Update pytorch_lightning/metrics/converters.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka <jirka@pytorchlightning.ai>	2020-05-19 11:05:07 -04:00
Rohit Gupta	ac76dfcf62	Remove NaNs from loss in LRFinder (#1862 ) * Remove NaNs from loss in LRFinder * np.isfinite * chlog * add test * chlog Co-authored-by: Jirka <jirka@pytorchlightning.ai>	2020-05-19 08:39:19 +02:00
Ashraful Islam	e0a5aee3a3	fix porgressbar postfix order (#1874 )	2020-05-18 20:33:51 -04:00
Ashraful Islam	981169cacc	add warning for shuffling in test/val (#1865 )	2020-05-18 09:53:02 -04:00
Lezwon Castelino	7c7e50ca47	Allow user to select individual TPU core to train on (#1729 ) * added tpu_id added tpu_id to mixins * train on individual tpu * parallel loader if tpu_id is None * removed progress_bar_refresh_rate * chlog * replaced num_tpu_cores with tpu_cores * set tpu_id to None if int * changed num_tpu_cores to tpu_cores in docs * updated docs * updated __init__.py removed self.tpu_id for ParallelLoader * Update pytorch_lightning/trainer/__init__.py * check if tpu_cores is a list Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * xla device conditional * num_tpu_cores deprecation * removed duplicate warning * fixed pep8 error * Revert "removed duplicate warning" This reverts commit `8adb0a9b` * deprecated api update * fixed recursion error * fixed tests * fixed flake errors * removed current_tpu_index * Update CHANGELOG.md * Update trainer.py Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-05-17 16:30:54 -04:00
Fabio Natanael Kepler	8c4c7b105e	Fix `save_weights_only` flag in ModelCheckpoint (#1780 ) * Add flag to `dump_checkpoint` for only including weights `ModelCheckpoint` then passes `self.save_weights_only` to the save function. * Fix tests and add changelog entry * Add check and descriptive message when training state is restored from a weights only checkpoint Also add a test for making sure `ModelCheckpoint.save_weights_only` works as expected. * Fix weights-only test to properly match expected exception * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-05-17 09:24:17 -04:00
Adrian Wälchli	769a459d27	remove extra kwargs from Trainer init (#1820 ) * remove kwargs * remove useless test * rename unknown trainer flag * trainer inheritance and test * blank line * test for unknown arg * changelog	2020-05-17 09:14:54 -04:00
Jirka Borovec	692f302837	continue devel (#1793 ) * miss * miss * miss * update * format	2020-05-17 08:30:45 -04:00
Rohit Gupta	56d521a317	Fix test configuration check and testing (#1804 ) * Fix test configuration check and testing * Fix test configuration check and testing * Remove check_testing_configuration during test * Fix docstring * fix function name * remove conflicts	2020-05-17 08:22:44 -04:00
Adrian Wälchli	4cdebf9a64	remove obsolete self._device in Trainer (#1849 ) * remove unused device attribute * dtype * move on_gpu to model	2020-05-17 08:20:51 -04:00
William Falcon	b84b02400a	enable any dict and namespace in hparams (#1847 )	2020-05-15 15:08:16 -04:00
Jirka Borovec	e95e1d71c7	release 0.7.6 (#1813 ) * release 0.7.6rc2 * release 0.7.6 * include img * smaller image * missing * miss * miss * miss * up	2020-05-15 08:36:40 -04:00
William Falcon	c8c5d33208	Update __init__.py	2020-05-14 18:44:46 -04:00
Justus Schock	c05077fae3	Enable non-blocking for gpu device transfer (#1843 ) * Update distrib_parts.py * Update CHANGELOG.md	2020-05-14 17:56:40 -04:00
Jirka Borovec	bee0392c37	extend arg parser (#1842 ) * extend arg parser * flake8 * tests * example * fix test	2020-05-14 17:56:11 -04:00
Peter Yu	a6f6edd07d	Update args, kwargs doc for load_from_checkpoint() (#1839 )	2020-05-14 15:43:47 -04:00
Nicki Skafte	88f816ed06	dummy logger (#1836 ) Co-authored-by: Nicki Skafte <nugginea@gmail.com>	2020-05-14 10:34:11 -04:00
William Falcon	1265b2fe02	Update __init__.py	2020-05-13 19:51:41 -04:00
William Falcon	53d9316a56	fixes ddp bugs (#1819 ) * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug * debug	2020-05-13 19:17:04 -04:00
William Falcon	648d516668	✨ Use store_true for bool args (#1822 ) * ✨ Use store_true for bool args * debug Co-authored-by: Nate Raw <nxr9266@g.rit.edu>	2020-05-13 19:12:06 -04:00
Peter Yu	e961f7e344	args should come after the last positional argument (#1807 )	2020-05-13 17:29:54 -04:00
Ashwin Bharambe	0e71705a0a	[checkpoint logic] Fix bug which doesn't account for NoneType for `model.hparams` (#1817 ) The intention of the code is to output a warning message when `hparams` is null or not set. Instead the code now fatals when `model.hparams = None`. Prevent that.	2020-05-13 17:14:11 -04:00
William Falcon	12138ced7c	Update __init__.py	2020-05-13 14:42:50 -04:00
Nicki Skafte	663b90035c	Bugfix: accumulation and suggestion for learning rate finder (#1801 ) * fix suggestion being too naive * fix accumulation error and added new tests * fix styling * update CHANGELOG.md * update based on review * fix tests * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-05-13 14:40:44 -04:00
Ashwin Bharambe	aefc5314bc	[ddp] Support multi-node distributed execution under torchelastic (#1811 ) The changes are quite local and limited in nature -- viz., checking for some indicator environment variables. We check for (SLURM_LOCALID, NODE_RANK, GROUP_RANK) in order. If multiple are found set, a warning is logged. This patch also fixes a minor bug with comparing the `WORLD_SIZE` environment variable. This can be a string type.	2020-05-13 14:06:59 -04:00
So Uchida	22d7d03118	Replace meta_tags.csv with hparams.yaml (#1271 ) * Add support for hierarchical dict * Support nested Namespace * Add docstring * Migrate hparam flattening to each logger * Modify URLs in CHANGELOG * typo * Simplify the conditional branch about Namespace Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update CHANGELOG.md Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * added examples section to docstring * renamed _dict -> input_dict * mata_tags.csv -> hparams.yaml * code style fixes * add pyyaml * remove unused import * create the member NAME_HPARAMS_FILE * improve tests * Update tensorboard.py * pass the local test w/o relavents of Horovod * formatting * update dependencies * fix dependencies * Apply suggestions from code review * add savings * warn * docstrings * tests * Apply suggestions from code review * saving * Apply suggestions from code review * use default * remove logging * typo fixes * update docs * update CHANGELOG * clean imports * add blank lines * Update pytorch_lightning/core/lightning.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/core/lightning.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * back to namespace * add docs * test fix * update dependencies * add space Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-05-13 15:05:15 +02:00
William Falcon	35fe2efe27	added override for hparams in load_from_ckpt (#1797 ) * added override for hparams in load_from_ckpt * override hparams * override hparams * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update doctest * typo * chlog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2020-05-13 10:27:22 +02:00
Jirka Borovec	10ce1c0256	device property (#1791 ) * device property * add/copy properties * inherit * rename * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * dtype * prop * pt api Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2020-05-12 23:18:39 -04:00
Adrian Wälchli	8978794730	add missing flag (#1805 )	2020-05-12 17:06:38 -04:00
Oliver Neumann	9059d21042	Missing profiler attribute in add_argparse_args() ArgumentParser (#1794 ) * Fixed typing annotation by adding boolean type. After that Profiler flag will be added to argparse. * Updated CHANGELOG.md * Updated git_init_arguments_and_types() to pass doctests. * Added doctest example to add_argparse_parser()	2020-05-12 08:53:26 -04:00
kumuji	619f984c36	Option to provide seed to random generators to ensure reproducibility (#1572 ) * Option to provide seed to random generators to ensure reproducibility I added small function in utilities which imports torch, numpy, python random and sets seed for all of the libraries to ensure reproducibility of results. * Apply recommendations from core contributors on seeding 1. Moved the seeding code to another file 2. Make deterministic as a parameter for trainer class 3. Add assertions for seeding numpy 4. Added warnings 5. torch.manual_seed should be enough for seeding torch * Revert "Apply recommendations from core contributors on seeding" This reverts commit a213c8e6882eec8a9e7408b9418926d2db7c5461. * Revert "Revert "Apply recommendations from core contributors on seeding"" This reverts commit 59b2da53c62878de7aab0aa3feb3115e105eea06. * Change in test, for correct seeding * Allow seed equal to 0 * Allow seed to be uint32.max * Added deterministic to benchmarks * Cuda manual seed as in benchmark seeding * Seeding should be done before model initialization * cuda manual_seed is not necessary * Fixing seed test_cpu_lbfgs On some seeds seems like lbfgs doesn't converge. So I fixed the seed during testing. * rebasing issue with old reproducibility.py * Improved documentation and ability to seed before initializing Train class * Change in docs * Removed seed from trainer, update for documentation * Typo in the docs * Added seed_everything to _all_ * Fixing old changes * Model initialization should be earlier then Trainer * Update pytorch_lightning/trainer/__init__.py From Example to testcode Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Fixing according to the contributors suggestions * Moving horovod deterministic to Trainer class * deterministic flag affects horovod docs update * Improved static typing * Added deterministic to test runners of horovod It is failing on some versions, not very predictable * static seeds for horovod tests * Change for reset_seed function in tests * Seeding horovod using reset_seed from tutils * Update pytorch_lightning/trainer/__init__.py * chlog * Update trainer.py * change "testcode" to "Example" in trainer init documentation * Update pytorch_lightning/trainer/seed.py, first line in comment Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-05-12 07:53:20 -04:00
Justus Schock	5f292390fd	Bug fix hparam logging with metrics (#1647 ) * add metric logging * Use pytorch built-in method * Update tensorboard.py * Update tensorboard.py	2020-05-12 07:25:12 -04:00
William Falcon	10b16dbfab	made ddp the default if no backend specified with multiple GPUs (#1789 ) * made ddp the default if no backend specified with multiple GPUs * fix * spawn Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2020-05-12 06:54:23 -04:00
Travis Addair	acab068c74	Join Horovod workers at the end of trainer.fit() to prevent race conditions following training (#1786 ) * Join Horovod workers at the end of trainer.fit() to prevent race conditions following training * flake8 * flake8 Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2020-05-12 09:15:25 +00:00
William Falcon	7b60d49432	fixed native amp + ddp (#1788 ) * fixed native amp + ddp * fixed native amp + ddp	2020-05-12 00:25:06 -04:00
Jeremy Jordan	1df0d2dc97	set logger level for package (#1718 ) * move logging config to trainer class init * alternate logging config	2020-05-12 00:14:35 -04:00
William Falcon	4b30ef6480	Device (#1790 ) * added self.device * added docs	2020-05-12 00:09:48 -04:00

1 2 3 4 5 ...

1158 Commits