lightning

Commit Graph

Author	SHA1	Message	Date
William Falcon	a38d108a68	add dist lib to enable syncing anything across devices (#3762 ) * add dist lib to enable syncing anything across devices	2020-10-01 01:21:38 -04:00
William Falcon	cf182e80fc	Finish Allow on_save_checkpoint... (#3688 ) * Finish #3562 * Apply suggestions from code review * Apply suggestions from code review * fix tests * Finish #3562 * Apply suggestions from code review * Apply suggestions from code review * fix tests * fix structure * fix structure * make save_last test pass * unnecessary global rank check * fix test * update test * update test * test * test * run save on all * remove assert * tracking saves * check if fails * test * clean up * adjust horovod test * clean up * remove unnecessary makdirs * change * undo * debug * debug * debug * debug * mock * undo debug code * add extra assertions * test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-30 16:15:29 -04:00
Adrian Wälchli	c73032e39d	Make ModelCheckpoint(save_top_k=-1) track the best models (#3735 ) * fix topk=-1 tracking best * update test * clean up * add changelog * enable loading best topk in trainer.test() * make trivial * return right away * make windows test path happy	2020-09-30 08:34:02 -04:00
Jirka Borovec	31a36f04df	define distributed as a type (#3740 ) * define type * miss * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * miss * warn Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-09-30 08:33:01 -04:00
Adrian Wälchli	9405c880af	log/save_interval based on global step (#3667 ) * log interval based on global step * test * test * test * test * pep * pep * added changelog * pep * merge * remove unused arg	2020-09-30 12:26:27 +02:00
William Falcon	b3be8022bd	tests for val step flow and logging (#3731 ) * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end * ref: test log dict * ref: test log dict * ref: test log dict * ref: test log dict	2020-09-29 22:12:56 -04:00
ananthsub	3dcf7130c5	Support checkpoint hooks on data module (#3563 ) * Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter * Store a reference to the trainer on the datamodule Fixes #3682 * Update data_connector.py * Update data_connector.py * Update test_datamodules.py * Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter * support checkpoint hooks for datamodule refactor on_{save/load}_checkpoint to a separate hook class that both the lightning module and data module inherit add spots in callback connector to call new datamodule hooks if available * hooks formatting * Update hooks.py * Update checkpoint_connector.py * Update lightning.py * update based on upstream/master checkout upstream/master * Update checkpoint_connector.py * add tests * undo format revert * Updated CHANGELOG.md * add checkpoint hooks * add Dict type * import CheckpointHooks	2020-09-29 19:51:44 +02:00
William Falcon	c14928a72a	ref: test val flow steps (#3723 ) * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end	2020-09-29 11:42:38 -04:00
Maxim Grechkin	7bb139816a	Add a more direct test of multi-gpu training working (#2084 ) * Add a more direct test of multi-gpu training working * Update tests/base/develop_pipelines.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-29 15:38:09 +02:00
Carlos Mocholí	3b2efe5b2a	Fix ModelCheckpoint period (#3630 ) * Fix ModelCheckpoint period * Remove comma * Minor changes * skip check * Revert "skip check" Already pushed to master This reverts commit `00d9e77b81`. Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>	2020-09-29 15:36:45 +02:00
William Falcon	f42ea303c9	ref: enable self.log for eval loop metrics (#3715 ) * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end	2020-09-29 02:00:28 -04:00
William Falcon	c41ea86b35	ref: move backends back to individual files (1/5) (ddp_cpu) (#3712 ) * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: test val epoch end * ref: test val epoch end	2020-09-29 01:59:18 -04:00
Rohit Gupta	783750547d	disable optimizers setup during testing (#3059 ) * disable configure_optimizers during testing * minor changes * hvd and ddp * fix precision during testing * fix ddp * fix amp * fix cpu * update dp * simplify optimizers * add test * codefactor * ref optimizer setup * chlog * suggestions * isort * rebased with master	2020-09-29 01:09:04 +02:00
William Falcon	4d5c0fa1bc	ref: separate flow vs log tests (#3704 )	2020-09-28 12:01:52 -04:00
William Falcon	cdd7266cd8	ref: enable self.log from val step (#3701 ) * .log in eval * ref * ref: enable self.log in val step	2020-09-28 10:49:07 -04:00
William Falcon	2ecaa2a8be	ref: (2/n) fix no log in epoch end (#3699 )	2020-09-28 08:25:44 -04:00
William Falcon	ddd11075bd	[WIP] ref: deprecated results obj, added support for simpler comms (1/n) (#3681 ) * ref: deprecated results obj, added support for simpler comms. Decouples logging from loops * ref: deprecated results obj, added support for simpler comms. Decouples logging from loops * ref: deprecated results obj, added support for simpler comms. Decouples logging from loops * ref: deprecated results obj, added support for simpler comms. Decouples logging from loops * ref: deprecated results obj, added support for simpler comms. Decouples logging from loops * ref: deprecated results obj, added support for simpler comms. Decouples logging from loops * fix global step err * fix global step err * fix global step err * fix global step err * fix global step err * fix typing err * fix str * fix typing err	2020-09-27 23:19:46 -04:00
William Falcon	ff2bab0996	ref: (results 1/n) enable tracking original metric when step and epoch are both true (#3685 ) * enable tracking original metric when step and epoch are both true	2020-09-27 22:08:31 -04:00
William Falcon	931995b55b	remove flake 8 (#3687 )	2020-09-27 20:40:02 -04:00
Adrian Wälchli	f37e9e8a83	Fix global step increment on training_epoch_end (#3673 ) * fix * fix global step err * fix global step err * fix global step err * fix global step err * fix global step err * fix global step err Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-09-27 20:19:51 -04:00
Adrian Wälchli	d15fd751c7	change default save_top_k, save_last to None (#3680 ) * topk default * fix test that doesn't have best available * remove print * #3680 changes * fix backward * temp revert te * add warning by carmocca * format docstring for test * specify monitor in ES test with top k * improve docstring for save_last * remove commented lines * revert passing model to test * undo regex mistake * changelog * fix test covering case monitor=None and savetopk=-1 * docstring * fix test for saving all checkpoints * don't save checkpoints for save_top_k=0 * add test for savetopk=0 Co-authored-by @carmocca Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2020-09-27 20:05:02 -04:00
ananthsub	94c79bb3ba	Add a reference to the Trainer on the LightningDataModule (#3684 ) * Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter * Store a reference to the trainer on the datamodule Fixes #3682 * Update data_connector.py * Update data_connector.py * Update test_datamodules.py	2020-09-27 19:48:01 -04:00
Pariente Manuel	3d76f604bd	Add ModelCheckpoint.to_yaml method (#3048 ) * Add ModelCheckpoint.to_json() * Add ModelCheckpoint.to_json() test * Fix W292: Add new line at end of file * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * Fixed tests * Update pytorch_lightning/callbacks/model_checkpoint.py * Apply suggestions from code review * fix test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-09-27 14:39:40 +02:00
William Falcon	d79bce1dff	enable None model checkpoint default (#3669 ) * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default	2020-09-26 23:14:04 -04:00
Adrian Wälchli	3ff5327e83	Mocking loggers (part 1, wandb) (#3596 ) * mocking for wandb * remove wandb import in amp test * mock loggers in sphinx * check tests * Update extra.txt * setup * dev * min * revert Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>	2020-09-25 16:00:02 +02:00
Carlos Mocholí	e70aea7642	Allow ModelCheckpoint monitor to be None (#3633 ) * Fix ModelCheckpoint period * Test for less epochs	2020-09-25 15:54:04 +02:00
Carlos Mocholí	ed12e422a4	Fix incorrect "Saving latest checkpoint" warning (#3588 ) * Fix incorrect "Saving latest checkpoint" warning * Replace warning with info. Run PyCharm's optimize imports * Remove unused class variable. Refactor logic. Improve test * Fix De Morgan's	2020-09-25 14:18:06 +02:00
Antoine Broyelle	17c8c95fbc	Wrap prepare_data and setup only once inside DataModule (#3654 ) Fix #3652	2020-09-25 07:09:50 -04:00
Carlos Mocholí	908382f196	Split GPUStatsMonitor function (#3644 ) * Split function * Add docstrings * Add typing annotations * Minor refactor * Make static to add a test	2020-09-25 07:30:30 +02:00
Jirka Borovec	aa52c930f4	test examples (#3643 ) * test examples * testing * testing * typo * req * exception Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-24 17:33:11 +02:00
Adrian Wälchli	3affa0e49a	use tmpdir in tests when writing predictions to disk (#3561 ) * save to tmpdir * path	2020-09-23 07:44:15 -04:00
William Falcon	031274c25d	fix dp issues + update examples and test examples (#3618 ) * fix dp * fix dp * fix dp * fix dp * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples * fix examples	2020-09-23 00:19:46 -04:00
William Falcon	c591013708	enable any logged metric to be accessible in callbacks (#3598 ) * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * enable any logged or written metric to be accessible in callbacks * clarify forward * clarify forward * clarify forward * clarify forward	2020-09-22 18:00:23 -04:00
Nicki Skafte	88e6b29bba	faster tests (#3604 )	2020-09-22 07:37:34 -04:00
Carlos Mocholí	1223cdbaa1	Add missing line. Add a test (#3594 )	2020-09-21 22:17:51 -04:00
Nicki Skafte	b1347c956a	[Metrics] AUROC error on multilabel + improved testing (#3350 ) * error on multilabel * fix tests * fix pep8 * changelog * update doc test * fix doctest * fix doctest * update from suggestion * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update test_classification.py * Update test_classification.py * retrigger test * 'pep8 Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-21 11:46:48 +02:00
William Falcon	21cfdf6874	ref: result 1/n (make monitor default to checkpoint_on to simplify re… (#3571 ) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * force crash when max_epochs < epochs in a checkpoint Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2020-09-20 22:58:43 -04:00
William Falcon	277538970d	force crash when max_epochs < epochs in a checkpoint (#3580 ) * force crash when max_epochs < epochs in a checkpoint * force crash when max_epochs < epochs in a checkpoint	2020-09-20 22:04:22 -04:00
William Falcon	9acee67c31	fixes 3549 (#3564 )	2020-09-19 20:00:50 -04:00
Rohit Gupta	07b857769a	Allow kwargs in Wandb & Neptune + kwargs docstring (#3475 ) * Allow kwargs in WandbLogger * isort * kwargs docstring * typo * kwargs for other loggers * pep and isort * formatting * fix failing test Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-19 18:51:43 +02:00
Jirka Borovec	8eb77cd06a	drop v0.10 deprecated (#3454 ) * drop v0.10 deprecated * import * missed	2020-09-19 11:47:26 -04:00
Boris Feld	e2af4f120e	Improve Comet Logger pickled behavior (#2553 ) * Improve Comet Logger pickled behavior * Delay the creation of the actual experiment object for as long as we can. * Save the experiment id in case an Experiment object is created so we can continue the same experiment in the sub-processes. * Run pre-commit on the comet file. * Handle review comment Make most Comet Logger attribute protected as they might not reflect the final Experiment attributes. Also fix the typo in the test name. * Ensure that CometLogger.name and CometLogger.version always returns str * Add new test for CometLogger.version behavior * Add new tests for CometLogger.name and CometLogger.version * Apply review suggestions * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Remove extraneous comments in Comet logger tests * Fix lint issues * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-18 23:26:29 +02:00
Carlos Mocholí	580b04b490	Fix ModelCheckpoints name formatting (#3163 ) * Fix ModelCheckpoint's name formatting * Fix failing tests * Add dot to CHECKPOINT_SUFFIX * Set variables to their default values at the end of tests * Fix logic for filepath='' and filename=None. Add test * Fix Windows tests * Fix typo. Remove leading line break and zeroes * Remove CHECKPOINT_SUFFIX * Fix typos. Use appropriate f-string format * Apply suggestions from code review * Fix broken tests after #3320 * Finish changes suggested by Borda * Use explicit test var names * Apply suggestions Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Apply suggestions Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update CHANGELOG * Apply suggestions from code review * for * prepend whitespace in warn msg Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-18 23:09:11 +02:00
Lucas Steinmann	197acd535f	Fix early stopping with training step's return dict (#3347 ) * Fixes the test for early stopping without val step. The expression which checked, if early stopping was triggered, had an off-by-one error and hence was true even if early stopping was not triggered. Furthermore set patience to 0 and max epochs to 10, to ensure loss has enough time to flatten. * Fixes early stopping without val step. The issue has been, that only `early_stop_on` key was checked and not an arbitrary monitor key. * Fixes branch, which checks whether early stopping is done during validation. Before only `val_early_stop_on` was checked. Since arbitrary keys can be used, the set of possible validation keys cannot be exhaustive. Hence this disables "early stopping on_train_epoch_end" via an instance attribute if early stopping was executed in on_validation_epoch_end. Furthermore adds a test, which ensures arbitrary keys work. * Improve check whether eval results are used. Only disable early checking with train results if eval results are actually used. Before they were always disabled in ``on_validation_epoch_end``. Rename and document instance variable, to make it more clear. * Remove wrong documentation on behaviour of early stopping with train result' dict. * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-09-18 23:08:04 +02:00
Jirka Borovec	7b64472ced	fix lib paths after Wandb 0.10 (#3520 ) * try * try * drop 0.20 * drop 0.19.5 * -U * Fixed Horovod in CI due to wandb==0.10.0 sys.path modifications (#3525) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * format * wb freeze * types Co-authored-by: Travis Addair <taddair@uber.com>	2020-09-17 08:37:49 -04:00
Abe Botros	76c4afb840	Fix IoU score for classes not present in target or pred (#3098 ) * Fix IoU score for classes not present in target or pred Fixes #3097 - Allow configurable not_present_score for IoU for classes not present in target or pred. Defaults to 1.0. - Also allow passing `num_classes` parameter through from iou metric class down to its underlying functional iou call. * Changelog: move IoU not-present score fix to [unreleased] * IoU: avoid recomputing class presence in target and pred Use already-computed support, true positives, and false positives to determine if a class is not present in either target or pred. * Test IoU against sklearn jaccard_score Also add TODO to test our IoU's not_present_score against sklearn's jaccard_score's zero_division when it beecomes available. * IoU: remove_bg -> ignore_index Fixes #2736 - Rename IoU metric argument from `remove_bg` -> `ignore_index`. - Accept an optional int class index to ignore, instead of a bool and instead of always assuming the background class has index 0. - If given, ignore the class index when computing the IoU output, regardless of reduction method. * Improve documentation for IoU not_present_score * Update default IoU not_present_score to 0.0 * Add note about IoU division by zero * Rename IoU not_present_score -> absent_score * Update IoU absent score changelog wording * Condense IoU absent_score argument docstring * Remove unnecessary IoU ignore_index comment * docstrings * isort * flake8 * Fix test of IoU against sklearn jaccard Use macro instead of micro averaging in sklearn's jaccard score, to match multi-class IoU, which conventionally takes per-class scores before averaging. Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2020-09-17 10:37:49 +02:00
Jirka Borovec	c64520e658	fix tensorboard version (#3132 ) * tensorboard version * WIP test tb hparams logs (#3040) * optional * req * tensorboard>=2.2.0 * data * data * TB Co-authored-by: Rosario Scalise <rosario@cs.washington.edu>	2020-09-15 23:48:48 +02:00
Adrian Wälchli	4ed96b2eb4	fix gradient norm tracking for row_log_interval > 1 (#3489 ) * fix + test * changelog * Apply suggestions from code review Co-authored-by: Tim Chard <timchard@hotmail.com> * improve test Co-authored-by: Tim Chard <timchard@hotmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2020-09-15 18:41:27 +02:00
Nicki Skafte	28af34bc51	[Metrics] Class reduction similar to sklearn (#3322 ) * new class reduce interface * update docs * pep8 * update_class_metrics * fix doctest * changelog * fix docs * fix codefactor * fix codefactor * formatting * fix typo * fix typo * typo pr -> per * update from suggestion * fix error * Apply suggestions from code review * Update CHANGELOG.md * formatting * timeouts * docstring formatting for reg metrics * pep * flake8 * revert workflow changes * suggestions Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>	2020-09-15 14:36:14 +02:00
Alexander	5732a56560	Pass epoch argument to Comet Logger (#3438 ) * Pass epoch argument * Copy epoch instead of inplace pop * Remove whitespace * Add test for epoch logging * add docstring Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-15 14:30:42 +02:00

1 2 3 4 5 ...

753 Commits