lightning

Commit Graph

Author	SHA1	Message	Date
William Falcon	cd16aa9854	ref: checkpoint connector methods 4/n (#3474 ) * ref: checkpoint connector methods 4/n * ref: checkpoint connector methods 4/n * ref: checkpoint connector methods 4/n * ref: checkpoint connector methods 4/n * ref: checkpoint connector methods 4/n * ref: checkpoint connector methods 4/n * ref: checkpoint connector methods 4/n * ref: checkpoint connector methods 4/n * ref: checkpoint connector methods 4/n	2020-09-12 08:42:27 -04:00
William Falcon	4724cdf5e0	ref: checkpoint connector methods 3/n	2020-09-12 07:05:21 -04:00
William Falcon	a208d6da46	ref: organize args 2/n (#3448 ) * ref: organize args 2/n * ref: organize args 2/n * ref: organize args 2/n	2020-09-10 10:51:35 -04:00
Adrian Wälchli	e245065fbc	limit auto scaling batch size to the size of the training dataset (#3271 ) * fix * fix and test * fix merge error * test for max dataset size * changelog * update docs * fix merge * unused imports * imports	2020-09-09 10:51:43 +02:00
William Falcon	b76d9e5dd5	Refa22 (#3388 ) * ref: inner train loop (intermediate step) 20/n * ref: inner train loop (intermediate step) 21/n * ref: inner train loop (intermediate step) 21/n * ref: inner train loop (intermediate step) 21/n * ref: inner train loop (intermediate step) 21/n * ref: inner train loop (intermediate step) 21/n	2020-09-07 16:45:31 -04:00
William Falcon	38b9677638	ref: inner train loop (intermediate step) 5/n (#3365 )	2020-09-05 18:27:28 -04:00
Adrian Wälchli	48c22c8bad	update batch size in DataModule when auto scaling batch size (#3266 ) * fix datamodule hasattr * fix patch check * fix setattr * update docs * revert patch fix * changelog * fix datamodule passed in as fit arg * docs * set datamodule batch size in lightning_setattr * fix merge * check with has_attr * access datamodule via trainer * pass fit args down to tuner * docs * fix typos in docs Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-09-03 22:07:49 +02:00
LiJiezhi	0112355055	Update training_tricks.py (#3151 ) * Update training_tricks.py * pep Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2020-08-26 07:57:34 +00:00
Adrian Wälchli	7b917de946	fix setting batch_size attribute in batch_size finder (finishing PR #2523 ) (#3043 ) * lightning attr fix * revert refactor * create test * separate test * changelog update * tests * revert * Update pytorch_lightning/trainer/training_tricks.py Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-08-19 19:01:55 -04:00
Jirka Borovec	4354690e55	add apex test (#2921 ) * add apex test * rename * level * events * wrap * evt * miss * apex * apex * apex * apex * apex * apex * Update tests/models/test_amp.py Co-authored-by: William Falcon <waf2107@columbia.edu> * notes * notes Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-08-13 10:03:13 -04:00
Jirka Borovec	a6e7aa7796	allow using apex with any PT version (#2865 ) * wip * setup * type * name * wip * docs * imports * fix if * fix if * use_amp * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * fix tests * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * fix tests * todos Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-08-08 11:07:32 +02:00
Jirka Borovec	b7d72706c3	clean imports (#2867 ) * clean imports * miss	2020-08-08 00:33:51 +02:00
Ruotian(RT) Luo	6034d5e37d	fix apex gradient clipping (#2829 )	2020-08-05 13:42:21 -04:00
Rohit Gupta	de9c9f0864	Support limit_mode_batches (int) for infinite dataloader (#2787 ) * Support limit_mode_batches(int) for infinite dataloader * flake8 * revert and update * add and update tests * pep8 * chlog * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Add suggestions by @awaelchli * docs * Apply suggestions from code review Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Apply suggestions from code review * fix * max * check Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>	2020-08-05 17:04:49 +00:00
Phil	2f0fb34496	Speed up gradient clipping and allow parameters on multiple devices. (#2767 ) The speed up is achieved by: - Moving the "where" out of the loop (and replacing with min for simplicity). - Replacing manual sum and pow with torch.norm. Even though this results in unnessecary computation (computing pow(root)) this is still a lot faster. - Preallocating the output gives a slight speed up. Note that calling .to for all parameters results in a small speed penalty (~4 ms in my case) but allows parameters on different devices. Overall this reduces the time used for gradient clipping from 206ms to 74 ms for my model (Resnet50 + few additional vars, all vars on GPU).	2020-07-30 11:53:24 -04:00
Tejasvi S Tomar	8ab5bcda3d	Misleading exception raised during batch scaling (#2223 ) * Misleading exception raised during batch scaling Use batch_size from `model.hparams.batch_size` instead of `model.batch_size` * Improvements considering #1896 * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-07-29 18:47:11 -04:00
William Falcon	071e09fe38	refactor 1/n for v1.0.0 (#2704 ) * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator * reactor into gpu accelerator	2020-07-25 14:38:51 -04:00
Hayden Housen	992a7e2a41	Start accumulate gradients schedule at epoch 0 (continued) (#2513 ) * Start accumulate gradients schedule at epoch 0 * Undo change in #2375 * Update test_trainer.py::test_gradient_accumulation_scheduling * Fix pep8 formatting * Remove 'Datasets/' folder * Split args for readability * Fix pep8 formatting	2020-07-09 07:11:07 -04:00
Adrian Wälchli	25ee51bc57	Continue Jeremy's early stopping PR #1504 (#2391 ) * add state_dict for early stopping * move best attr after monitor_op defined * improve early stopping and model checkpoint callbacks * fix formatting * fix attr init order * clean up setting of default_root_dir attr * logger needs default root dir set first * reorg trainer init * remove direct references to checkpoint callback * more fixes * more bugfixes * run callbacks at epoch end * update tests to use on epoch end * PR cleanup * address failing tests * refactor for homogeneity * fix merge conflict * separate tests * tests for early stopping bug regressions * small fixes * revert model checkpoint change * typo fix * fix tests * update train loop * cannot pass an int as default_save_path * refactor log message * fix test case * appease the linter * fix some doctests * move config to callback * fixes from rebase * fixes from rebase * chlog * docs * reformat * formatting * fix * fix * fixes from rebase * add new test for patience * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/callbacks/test_early_stopping.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix formatting * remove enable_early_stop attribute * add state_dict for early stopping * move best attr after monitor_op defined * improve early stopping and model checkpoint callbacks * fix formatting * fix attr init order * clean up setting of default_root_dir attr * logger needs default root dir set first * reorg trainer init * remove direct references to checkpoint callback * more fixes * more bugfixes * run callbacks at epoch end * update tests to use on epoch end * PR cleanup * address failing tests * refactor for homogeneity * fix merge conflict * separate tests * tests for early stopping bug regressions * small fixes * revert model checkpoint change * typo fix * fix tests * update train loop * fix test case * appease the linter * fix some doctests * move config to callback * fixes from rebase * fixes from rebase * chlog * docs * reformat * formatting * fix * fix * fixes from rebase * add new test for patience * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/callbacks/test_early_stopping.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix formatting * remove enable_early_stop attribute * fix test with new epoch indexing * fix progress bar totals * fix off by one error (see #2289) epoch starts at 0 now * added missing imports * fix hpc_save folderpath * fix formatting * fix tests * small fixes from a rebase * fix * tmpdir * tmpdir * tmpdir * wandb * fix merge conflict * add back evaluation after training * test_resume_early_stopping_from_checkpoint TODO * undo the horovod check * update changelog * remove a duplicate test from merge error * try fix dp_resume test * add the logger fix from master * try remove default_root_dir * try mocking numpy * try import numpy in docs test * fix wandb test * pep 8 fix * skip if no amp * dont mock when doctesting * install extra * fix the resume ES test * undo conf.py changes * revert remove comet pickle from test * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update weights_loading.rst * Update weights_loading.rst * Update weights_loading.rst * renamed flag * renamed flag * revert the None check in logger experiment name/version * add the old comments * _experiment * test chckpointing on DDP * skip the ddp test on windows * cloudpickle * renamed flag * renamed flag * parentheses for clarity * apply suggestion max epochs Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu> Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-06-28 21:36:46 -04:00
William Falcon	2411c3be70	replace train_percent_check with limit_train_batches (#2220 ) * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * chlog * deprecated * deprecated * deprecated * tests * tests * Apply suggestions from code review * tests * hydra support * tests * hydra support * hydra support * hydra support * tests * typo * typo * Update test_dataloaders.py * docs * docs * docs * docs Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-06-17 13:42:28 -04:00
William Falcon	97dfd3a80a	Revert "Misleading exception raised during batch scaling (#1973 )" (#2219 ) This reverts commit `f8103f9c7d`.	2020-06-17 08:01:53 -04:00
Tejasvi S Tomar	f8103f9c7d	Misleading exception raised during batch scaling (#1973 ) * Misleading exception raised during batch scaling Use batch_size from `model.hparams.batch_size` instead of `model.batch_size` * Improvements considering #1896 * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-06-17 08:01:04 -04:00
Udit Arora	08573d0f7e	Fix some pyright member access errors in training module (#2121 ) * Fix pyright member access errors in training module * Fix Trainer instantiation error due to inheritence order * Add GH workflow for pyright * Fix more pyright errors in trainer module * Add pyrightconfig and setup python environment in type-check workflow * Exclude pyrightconfig.json * suggestions Co-authored-by: Jirka <jirka@pytorchlightning.ai>	2020-06-12 17:23:18 +02:00
William Falcon	caa9c6760b	replace Hparams by init args (#1896 ) * remove the need for hparams * remove the need for hparams * remove the need for hparams * remove the need for hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * replace self.hparams * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * fixed * finished moco * basic * testing * todo * recurse * hparams * persist * hparams * chlog * tests * tests * tests * tests * tests * tests * review * saving * tests * tests * tests * docs * finished moco * hparams * review * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * hparams * overwrite * transform * transform * transform * transform * cleaning * cleaning * tests * examples * examples * examples * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * chp key * tests * Apply suggestions from code review * class * updated docs * updated docs * updated docs * updated docs * save * wip * fix * flake8 Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-05-24 18:59:08 -04:00
Nicki Skafte	88f816ed06	dummy logger (#1836 ) Co-authored-by: Nicki Skafte <nugginea@gmail.com>	2020-05-14 10:34:11 -04:00
William Falcon	5bb6b41b78	dataloaders with fast_dev_run (#1787 ) * dataloaders with fast_dev_run * dataloaders with fast_dev_run * dataloaders with fast_dev_run * fix * pep 8	2020-05-11 23:32:44 -04:00
Nicki Skafte	4970927ec8	Feature: auto scale batch size (#1638 ) * auto batch finder * fix styling * add description * add different modes * fix copy paste error * better organised code * fix styling * add tests * fix * fix * add some documentation * added CHANGELOG.md * some documentation * update based on review * Update trainer.py * Update docs/source/training_tricks.rst Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update tests/trainer/test_trainer_tricks.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/trainer/test_trainer_tricks.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * use EvalModelTemplate * param tests * rename * wrap params * rename function * rename * rename param * fix * abs * rename * refactor code * add docs * try * arg * loop * exept * loop * drop bool * docs * docs * added check and test for passing dataloader to fit * styling fix * update based on review Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2020-05-09 08:28:36 -04:00
William Falcon	29ebe92208	support for native amp (#1561 ) * adding native amp suppport * adding native amp suppport * adding native amp suppport * adding native amp suppport * autocast * autocast * autocast * autocast * autocast * autocast * removed comments * removed comments * added state saving * added state saving * try install amp again * added state saving * drop Apex reinstall Co-authored-by: J. Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-04-23 14:47:08 -04:00
Jonas-Jaeger	e02146943d	Removed redundant computations in clip_gradients that slowed down the gradient clipping. (#1523 ) Fixes #1522	2020-04-18 23:07:15 -04:00
Alex Sergeev	8dd9b80d7a	Fix gradient clipping (#1438 ) * Fix gradient clipping * Relax accuracy constraint	2020-04-09 21:08:28 -04:00
Adrian Wälchli	732eaee4d7	nan detection and intervention (#1097 ) * check for nan values * test nan detection on loss * sys.exit * whitespace * detect nan and inf values in loss and params * update * added documentation * moved detect nan to training loop, remove flag for print * blank line * test * rename * deprecate print_nan_grads * deprecated print_nan_grads * remove unused imports * update changelog * fix line too long * correct deprecated version Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * raise exception instead of sysexit Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * raise exception instead of sysexit Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/training_tricks.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/training_tricks.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * fix test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-03-19 09:24:45 -04:00
Jacob Zhong	1a73fa0b03	change default logger to dedicated one (#1064 ) Fix test Fix format Update pytorch_lightning/__init__.py Separate imports	2020-03-17 18:44:00 -04:00
Jirka Borovec	514d182b7f	cleaning imports (#1032 )	2020-03-12 12:41:37 -04:00
William Falcon	4c5e82c065	Skepticleo trainer argparser (#1023 ) * Added default parser for trainer and class method to construct trainer from default args * Removed print statement * Added test for constructing Trainer from command line args * Removed extra line * Removed redundant imports, removed whitespace from empty lines * Fixed typo * Updated default parser creation to get class attributes automatically * Updated default parser creation to get class attributes automatically * Added method to get default args for trainer * Trimmed trainer get default args method * Updated from argparse method to not return trainer with static arguments * Update trainer get default args to classmethod * adjustment * fix * Fixed variable name * Update trainer.py * Update test_trainer.py * Update trainer.py * Update tests/trainer/test_trainer.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update trainer.py * Update test_trainer.py * Update trainer.py * Update test_trainer.py * Update tests/trainer/test_trainer.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/trainer.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update trainer.py * Update test_trainer.py Co-authored-by: Mudit Tanwani <mudittanwani@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-03-03 09:32:15 -05:00
Jirka Borovec	7beed7cae6	Trainer cleanup (#934 ) * Trainer cleanup * update abstract * remove ... * remove __init__ * update mixin types * update callbacks * fix * lower test acc	2020-02-27 16:21:14 -05:00
srush	27a3be0287	TPU gradient clipping. (#963 ) * clip * Update pytorch_lightning/trainer/training_tricks.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/training_tricks.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * pull out epsilon * add fp16 case * Update pytorch_lightning/trainer/training_tricks.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-02-27 15:46:47 -05:00
Hadrien Mary	be244560b2	Callbacks [wip] (#889 ) * Add callback system + associated test * Add trainer and pl_module args to callback methods * typing * typo in docstring * Switch to on_._start() fix on_test_start * fix the mess after rebasing	2020-02-25 23:17:27 -05:00
Vadim Bereznyuk	edd4a87fb0	Refactor callbacks (#776 ) * Refactor callbacks * flake8 * Update docstrings * Simplified callback, protected trainer * .set_trainer() check * update docs * missed super().__ini__() * Updated tests * Use uppercase * refine checkpoint callback tests * Added test_begin() and test_end()	2020-02-16 00:03:05 -05:00
Jirka Borovec	76a1c67d87	rename logging -> loggers (#767 ) * move logging >> loggers * add warning * fix tests * logging alias * formatting * formatting	2020-02-01 15:47:58 -05:00
Jirka Borovec	ea59a99426	update org paths & convert logos (#685 ) * fix typos * update org paths * update links from READMe to docs * add svg logo * add svg logo-text * update logos * testing temp paths * prune links from readme * optimize imports * update logo * update paths in README * missing imports	2020-01-20 14:50:31 -05:00
Jirka Borovec	1d4b6be17b	rename trainer modules, drop `_mixin` (#571 ) * rename trainer modules, drop _mixin * fix imports	2019-12-04 11:39:14 -05:00

41 Commits