lightning

Commit Graph

Author	SHA1	Message	Date
Adrian Wälchli	25ee51bc57	Continue Jeremy's early stopping PR #1504 (#2391 ) * add state_dict for early stopping * move best attr after monitor_op defined * improve early stopping and model checkpoint callbacks * fix formatting * fix attr init order * clean up setting of default_root_dir attr * logger needs default root dir set first * reorg trainer init * remove direct references to checkpoint callback * more fixes * more bugfixes * run callbacks at epoch end * update tests to use on epoch end * PR cleanup * address failing tests * refactor for homogeneity * fix merge conflict * separate tests * tests for early stopping bug regressions * small fixes * revert model checkpoint change * typo fix * fix tests * update train loop * cannot pass an int as default_save_path * refactor log message * fix test case * appease the linter * fix some doctests * move config to callback * fixes from rebase * fixes from rebase * chlog * docs * reformat * formatting * fix * fix * fixes from rebase * add new test for patience * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/callbacks/test_early_stopping.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix formatting * remove enable_early_stop attribute * add state_dict for early stopping * move best attr after monitor_op defined * improve early stopping and model checkpoint callbacks * fix formatting * fix attr init order * clean up setting of default_root_dir attr * logger needs default root dir set first * reorg trainer init * remove direct references to checkpoint callback * more fixes * more bugfixes * run callbacks at epoch end * update tests to use on epoch end * PR cleanup * address failing tests * refactor for homogeneity * fix merge conflict * separate tests * tests for early stopping bug regressions * small fixes * revert model checkpoint change * typo fix * fix tests * update train loop * fix test case * appease the linter * fix some doctests * move config to callback * fixes from rebase * fixes from rebase * chlog * docs * reformat * formatting * fix * fix * fixes from rebase * add new test for patience * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/callbacks/test_early_stopping.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix formatting * remove enable_early_stop attribute * fix test with new epoch indexing * fix progress bar totals * fix off by one error (see #2289) epoch starts at 0 now * added missing imports * fix hpc_save folderpath * fix formatting * fix tests * small fixes from a rebase * fix * tmpdir * tmpdir * tmpdir * wandb * fix merge conflict * add back evaluation after training * test_resume_early_stopping_from_checkpoint TODO * undo the horovod check * update changelog * remove a duplicate test from merge error * try fix dp_resume test * add the logger fix from master * try remove default_root_dir * try mocking numpy * try import numpy in docs test * fix wandb test * pep 8 fix * skip if no amp * dont mock when doctesting * install extra * fix the resume ES test * undo conf.py changes * revert remove comet pickle from test * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update weights_loading.rst * Update weights_loading.rst * Update weights_loading.rst * renamed flag * renamed flag * revert the None check in logger experiment name/version * add the old comments * _experiment * test chckpointing on DDP * skip the ddp test on windows * cloudpickle * renamed flag * renamed flag * parentheses for clarity * apply suggestion max epochs Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu> Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-06-28 21:36:46 -04:00
Jirka Borovec	1e16681693	fix loading with hparams (#2403 ) * fix #2386 * extra test * extra case * extra test * chlog * fix test	2020-06-28 20:22:03 -04:00
Adrian Wälchli	058c500300	fix when torchtext not installed (#2402 )	2020-06-28 20:03:51 -04:00
Jirka Borovec	861a73be12	fix loading past checpoints (#2405 ) * fix #2334 * chlog	2020-06-28 17:20:33 -04:00
William Falcon	66ffbaddf5	updates teardown to account for ddp (#2389 ) * remove warnings * remove warnings * added doc lines * added doc lines	2020-06-28 07:01:04 -04:00
Adrian Wälchli	d910cc5200	docs: dont mock imports when running sphinx doctest (#2396 ) * skip if no amp * dont mock when doctesting * install extra	2020-06-27 23:31:06 -04:00
Jirka Borovec	75f0a2062c	move torchtext as optional (#2395 ) * torchtext * Update pytorch_lightning/utilities/apply_func.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update apply_func.py Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-06-27 20:15:10 -04:00
Jirka Borovec	51711c265a	fix loading model with kwargs (#2387 ) * test * fix * fix	2020-06-27 16:38:03 -04:00
Mateusz Pieniak	e82d9cdb66	Support torchtext on a single GPU (#2379 ) * Handle torchtext.data.Batch on GPU * Update CHANGELOG.md * Apply code review requests * Correct the docs * Change requirements	2020-06-27 16:36:45 -04:00
Jirka Borovec	73a78a13c7	CI: partial move from CircleCI (#2378 ) * move from CircleCI * req * tex * tex * sudo * extra * recom * pic * dvipng	2020-06-27 16:25:33 -04:00
William Falcon	90f641af0d	fixes logger crash on ddp (#2388 ) * remove warnings * remove warnings * remove warnings * remove warnings * remove warnings * remove warnings * remove warnings * remove warnings * remove warnings * remove warnings	2020-06-27 15:08:22 -04:00
Jirka Borovec	41f5df18a4	move Trains logger to Bolts (#2384 ) * move Trains logger * chlog	2020-06-27 09:14:05 -04:00
Jirka Borovec	4e13e419ea	add CLI test for examples (#2285 ) * cli examples * ddp * CI * CI * req * tests * skip DDP Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-06-27 09:13:29 -04:00
Jirka Borovec	6673fc9a0b	fix docker builds (#2383 )	2020-06-27 08:49:19 -04:00
Jirka Borovec	2f739f5977	fix key typo (#2374 )	2020-06-26 21:46:08 -04:00
Kshitij09	20d0f53896	Fix ModelCheckpoint example (#2321 ) `save_top_k` should be an `int` and have been mentioned as `save_top_k=True` in the snippet provided under 'Saving and Loading Weights' docs. Changed it to its default value (1) to make it consistent. Signed-off-by: Kshitij Patil <kshitijpatil98@gmail.com>	2020-06-26 21:45:41 -04:00
Jirka Borovec	0be78d13aa	native amp (#2373 ) * native amp * typo * imports * apex	2020-06-26 21:45:13 -04:00
Jirka Borovec	f1c96930b1	repair CI for Win (#2358 ) * no cov * no cov * ReduceOp * group * reduce_op.sum * Update sklearns.py * formatting * horovod * Apply suggestions from code review * horovod * horovod * horovod * horovod * ci * print * ci * timeout * timeout * time * fix * distributed cpu * pipes * time * cpu * spawn * spawn * spawn * tp * separate * os * os * npm * Fix load_from_checkpoint() not working with URL on Windows * Update CHANGELOG * Update CHANGELOG.md Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com> * Apply suggestions from code review * fix * fix meta tags creating empty lines * pyright * node * fix httpserver address * drop tutils.default_trainer_options * imports * Better fix for load_from_checkpoint() not working with absolute path on Windows (#2294) * Fix load_from_checkpoint() not working with URL on Windows * Update CHANGELOG * Update CHANGELOG.md Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com> * drop duplicate Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: airium <airium@outlook.com> Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: AIRIUM <38249940+airium@users.noreply.github.com>	2020-06-26 21:38:25 -04:00
Jirka Borovec	a5f45787ea	fix get dataloader size (#2375 ) * get dataloader size * pyright	2020-06-26 15:38:48 -04:00
Thomas Schaaf	7c0a3f4745	Bugfix/_has_len (#2307 ) * deal with NotImplementedError raised by torchtext * deal with NotImplementedError raised by torchtext * Added tests for dataloader which raise NotImplementedError in __len__() * Fixed some typos * enabled tests for dataloader raising NotImplementedError in __len__ and corrected match string for raised exception * deleted empty line for style compliance * refactored CustomNotImplementedErrorDataloader to derive from CustomInfDataloader * enabled reduced number of not_implemented_error dataloader test to reduce runtime for continuous integration * reduced test number of not_implemented_error dataloader test further to reduce test time * reduced test number of not_implemented_error dataloader test to one to reduce test time * disabled all not_implemented_error dataloader test to see if test pass in time * added __next__ with a reduced number (5) of elements after which CustomNotImplementedErrorDataloader stops to speedup test. * enabling all not_implemented_error dataloader test * added brief description of change and relation of torchtext * CustomNotImplementedErrorDataloader reduced number of batches served to 2. * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Disable parallelism in dataloader Suspect that it might cause pytest to hang more frequent * added max_steps=None to Trainer in not_implemented_error dataloader tests * rearranged not_implemented_error test in file to group them together * disabled parallel data loading Reason: testing if that stops the test framework from hanging. * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Thomas Schaaf <tschaaf@cs.cmu.edu> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-06-26 09:31:08 -04:00
William Falcon	cbb2427f0d	changed apex level (#2362 )	2020-06-25 18:54:32 -04:00
William Falcon	0a092f6683	making optimization steps for hooks (#2363 ) *simplified optimizer step and zero grad overriding	2020-06-25 16:02:16 -04:00
William Falcon	d22181714a	fix 2333 (#2360 )	2020-06-25 11:10:17 -04:00
William Falcon	f2710bb500	adds tensorboard hparams logging test (#2342 ) * fixes hparam logging * fixes hparam logging * fixes hparam logging * fixes hparam logging * fixes hparam logging * Apply suggestions from code review * skipif * rename * Update test_tensorboard.py * Update test_tensorboard.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka <jirka@pytorchlightning.ai>	2020-06-25 09:22:28 -04:00
William Falcon	c275e1fc91	swaps lr sched order (#2356 ) * swaps lr sched order * Update optimizers.py * added amdim encoder choice	2020-06-25 09:21:41 -04:00
davinnovation	b6ab7ca121	[docs] add community example : pl + ms nni (#2340 ) https://github.com/PyTorchLightning/pytorch-lightning/issues/2329	2020-06-24 23:13:49 -04:00
Adrian Wälchli	220bb6db57	remove wrong annotation (#2349 )	2020-06-24 22:29:26 -04:00
Adrian Wälchli	9b2e60530f	Python logging level docs (#2348 ) * docs about Python logging * add link to Python logging docs	2020-06-24 22:29:01 -04:00
David Waterworth	cc07dcae96	corrected example usage of save_hyperparameters from List[str] to seperate str (#2353 ) Co-authored-by: David Waterworth <david.waterworth@cim.io>	2020-06-24 22:28:38 -04:00
Adrian Wälchli	aab9e77d2d	Fix lost compatibility with custom datatypes implementing `.to` (#2335 ) * generalize data transfer * added test * update docs * fix spelling error * changelog * update docs	2020-06-23 23:41:02 -04:00
William Falcon	598f5140c5	refactor training loop (#2336 ) * refactoring training epoch * refactored training epoch * refactored training epoch * refactored training epoch * refactored training epoch * refactored training epoch * fixes slurm weights saving * fixes slurm weights saving	2020-06-23 23:38:22 -04:00
William Falcon	c09b2ffb91	test (#2341 ) * fixes rank zero issue	2020-06-23 21:57:45 -04:00
William Falcon	a915280427	fixes slurm weights saving (#2339 )	2020-06-23 20:16:34 -04:00
Lezwon Castelino	9446390779	fix TPU parsing and TPU tests (#2094 ) * added tpu params test * added tests * removed xla imports * added test cases for TPU * fix pep 8 issues * refactorings and comments * add message to MisconfigurationException Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * test if device is set correctly * added TPU device check removed mark.spawn * removed device selection * remove xla_device call * readded spawn due to test failures * add TODO for tpu check * Apply suggestions from code review * Apply suggestions from code review * flake8 * added tpu args to cli tests * added support for tpu_core selection via cli * fixed flake formatting * replaced default_save_path with default_root_dir * added check for data type for tpu_cores * fixed flake indent * protected * protected * added tpu params test * added tests * removed xla imports * test if device is set correctly * added support for tpu_core selection via cli * replaced default_save_path with default_root_dir * added check for data type for tpu_cores * chlog * fixed tpu cores error * rebased with latest changes * flake fix * Update pytorch_lightning/trainer/distrib_parts.py added suggesstion Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka <jirka@pytorchlightning.ai>	2020-06-23 12:06:57 -04:00
Adrian Wälchli	e085e93dd3	Add missing test for "multiple dataloader + percent_check fix" (#2226 ) * Init fix num_batches * Fix num_batches in case of multiple dataloaders * Apply suggestions from code review * Changes based on suggestions * Flake8 * Add test to check num_batches * generalize dataloader percent check test * fix formatting * remove hparams * tests * CHANGELOG * Update CHANGELOG.md * max_batches can be int * conflict and rebase * add back the test fix fix message 0.0 works Revert "fix message" This reverts commit 839cacf8b8610f4e697e654ef6f3d2501bf23984. * update changelog * Update CHANGELOG.md * Fix num batches in case of multiple dataloaders and percent_check (#1920) * git conflict Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * missing union * doc update suggestion by @rohitgr7 * extend test * changelog * docs add note about multiple loaders * update changelog * remove unused variable Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-06-23 11:21:24 -04:00
Siavash Sakhavi	44385bb582	Checking if the parameters are a DictConfig Object (#2216 ) * Checking if the parameters are a DictConfig Object This is in reference to #2058 . To be honest, I have no idea how I should go about writing a test for this. * Update pytorch_lightning/loggers/base.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix ... Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka <jirka@pytorchlightning.ai>	2020-06-23 17:20:44 +02:00
Adrian Wälchli	bdee1cd106	update docs for "overfit_batches" (#2324 ) * update docs * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-06-23 11:19:38 -04:00
William Falcon	0f073819d3	refactored training_batch + tests to verify correctness (#2328 ) * refactored training_bath * refactored training_bath * refactored training_bath * refactored training_bath * refactored training_bath * refactored training_bath * refactored training_bath * refactored training_bath * refactored training_bath * refactored training_bath * refactored training_bath	2020-06-23 11:17:10 -04:00
Tri Dao	29179dbfcc	Fix ROC metric for CUDA tensors (#2304 ) * Fix ROC metric for CUDA tensors Previously roc metric (and auroc) errors when passed in CUDA tensors, due to torch.tensor construction without specifying device. This fixes the error by using F.pad instead. * Update test_classification.py * Update test_classification.py * chlog * Update test_classification.py * Update test_classification.py * Update tests/metrics/functional/test_classification.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update test_classification.py Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-06-23 15:19:16 +02:00
elias-ramzi	92f122e0df	Fix average_precision metric (#2319 ) * Fixed average_precision metric, parenthesis were missing. Added test test that failed with the old implementation * Modified CHANGELOG.md * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-06-23 13:21:00 +02:00
Rezyapkin-Vyacheslav	63bd0582e3	fix typo in forward return (#2301 )	2020-06-21 15:54:17 -04:00
Adrian Wälchli	f972ab3a82	Fix summary hook handles not getting removed (#2298 ) * detach hooks after completion * detach hook * update docs * add test * docs * changelog	2020-06-20 07:38:47 -04:00
Jirka Borovec	c7f8367650	devel version (#2292 )	2020-06-19 23:42:57 -04:00
Jirka Borovec	4b90b79080	check omegaconf gpus (#2273 ) * check omegaconf gpus * test * test * Apply suggestions from code review Co-authored-by: Omry Yadan <omry@fb.com> Co-authored-by: Omry Yadan <omry@fb.com>	2020-06-19 23:42:11 -04:00
Jirka Borovec	7ecb0d2528	test CLI parsing gpus (#2284 ) * cli gpus * test * test	2020-06-19 23:41:42 -04:00
Rohit Gupta	b96dd21d69	Update new project code sample (#2287 )	2020-06-19 23:41:03 -04:00
Jirka Borovec	f278ac42c8	Revert/Fix: epoch indexing from 1, to be from 0 (#2289 ) * Revert "deprecated: epoch indexing from 1 (#2206)" This reverts commit `f94b919b` * chlog * grad index * Apply suggestions from code review * tests * fix * test	2020-06-19 23:39:53 -04:00
thschaaf	554fb4754c	Bugfix/_has_len (#2293 ) * deal with NotImplementedError raised by torchtext * deal with NotImplementedError raised by torchtext * Added tests for dataloader which raise NotImplementedError in __len__() * Fixed some typos Co-authored-by: Thomas Schaaf <tschaaf@cs.cmu.edu>	2020-06-19 23:38:15 -04:00
Paweł Biernat	3256fe4e5a	Update progress.py (#2268 ) Fixes a minor bug introduced in #2213	2020-06-19 15:47:39 -04:00
Jirka Borovec	e0b7fed92e	deprecated Trainer proc_rank (#2269 ) * deprecated * test	2020-06-19 15:46:27 -04:00

1 2 3 4 5 ...

2605 Commits All Branches Search

2605 Commits

All Branches