lightning

Commit Graph

Author	SHA1	Message	Date
William Falcon	b34c7add23	Fixes #3668 , #3887 as a bonus (#3888 ) * Fixes #3668, #3887 as a bonus * Fixes #3668, #3887 as a bonus	2020-10-05 21:30:41 -04:00
Nathan Raw	1954d7c87a	Write predictions in LightningModule instead of EvalResult (#3882 ) * ✨ add self.write_prediction * ✨ add self.write_prediction_dict to lightning module	2020-10-05 18:04:02 -04:00
Jean-Baptiste SCHIRATTI	cea5f1f538	Fix for `load_from_checkpoint` (#2776 ) * Fix. * Fix #2550: allow to load model from checkpoint if self.save_hyperparameters() was not called. * Fix? Cleaner way of not calling self.save_hyperparameters in EvalModelTemplate. * Fix? `_load_model_state` cleanup * Fix? * Fix #2550: allow to load model from checkpoint if self.save_hyperparameters() was not called. * Fix. * Fix? Cleaner way of not calling self.save_hyperparameters in EvalModelTemplate. * Fix? `_load_model_state` cleanup * Fixed side effect in `test_load_model_from_checkpoint_extra_args`. * Apply suggestions from code review * fix * try * fixed missing arg in evalmodel * fixed missing arg in evalmodel * fix * update * fix loading * add test * prune Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-10-05 12:44:23 -04:00
Nrupatunga	7d47ed178b	[Bug-Fix]:properties `current_epoch` and `global_step` between model and trainer same always (#3785 ) * make current_epoch and global_step to be same as trainer, after model restore. * remove assignment here * test * minor modification * Update pytorch_lightning/core/lightning.py type check, better clarity Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Update pytorch_lightning/core/lightning.py type check, better clarity Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * comments for current_epoch and global_step properties * Update tests/models/test_restore.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update comments according to the changes made * Update tests/models/test_restore.py * add current_epoch, global_step to jit ignore list * Add comments to CHANGELOG * Update CHANGELOG.md * Update tests/models/test_restore.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-10-05 11:10:40 -04:00
Jirka Borovec	6ac0958166	fix init nan for checkpointing (#3863 ) * add test for checkpoint nan * fix * pep	2020-10-05 07:36:12 -04:00
William Falcon	b014223f72	Fixes #2678 - enables training_step to return None (#3862 ) * Fixes #2678 - enables training_step to return None * Fixes #2678 - enables training_step to return None	2020-10-05 07:33:46 -04:00
William Falcon	d787208e76	Fixes #2792 (#3857 )	2020-10-04 23:25:02 -04:00
Adrian Wälchli	ab5e9496d0	refactor (#3851 )	2020-10-04 23:23:58 -04:00
William Falcon	f58c760409	Fixes #2551 (#3858 )	2020-10-04 23:02:35 -04:00
William Falcon	97e62b38cf	Fixed #2143 and many more :) (#3855 )	2020-10-04 22:18:49 -04:00
William Falcon	d9656d166c	fixed model checkpoint frequency (#3852 ) * fixed model checkpoint frequency * fixed model checkpoint frequency * fixed model checkpoint frequency * fixed model checkpoint frequency * merged	2020-10-04 21:49:20 -04:00
Adrian Wälchli	e0f8505394	Mocking loggers (part 2, neptune) (#3617 ) * mock neptune base tests * neptune doctest * remove extra * mock loggers * typo * mock import * neptune not compatible with multigpu * add back experiment	2020-10-04 21:20:06 -04:00
William Falcon	2bca89a752	added tbptt test for logging (#3850 ) * added tbptt test for logging * added tbptt test for logging	2020-10-04 19:38:42 -04:00
William Falcon	00f0d19a61	fixes #3798 (#3849 ) * fix #3798 * added tbptt test for logging	2020-10-04 19:36:51 -04:00
Adrian Wälchli	cc9781a0ad	Deprecate early_stop_callback Trainer argument (part 2) (#3845 ) * update tests with EarlyStopping default * imports * revert legacy tests * fix test * revert * revert	2020-10-04 17:36:47 -04:00
Carlos Mocholí	89cc12311f	Fix tbptt_reduce_fx when non-floating tensors are logged (#3796 ) * Add failing test * force all tbptt vals to be floats for reduce Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-10-04 17:10:25 -04:00
Rohit Gupta	d3696052cf	Add back sanity checks (#3846 ) * Add back sanity checks * pep	2020-10-04 17:05:26 -04:00
William Falcon	70e792344a	test selecting the correct backend. temp backends while slurm and TE are decoupled (#3848 ) * test selecting the correct backend. tem backends while slurm and TE are decoupled * test selecting the correct backend. tem backends while slurm and TE are decoupled	2020-10-04 15:44:50 -04:00
William Falcon	1aa9d39506	Eval epoch can now log independently (#3843 ) * ref: routed epoch outputs to logger * ref: routed epoch outputs to logger * ref: routed epoch outputs to logger * ref: routed epoch outputs to logger	2020-10-04 13:36:35 -04:00
Rohit Gupta	a628d181ee	Fix val_progress_bar total with num_sanity_val_steps (#3751 ) * Fix val_progress_bar total with num_sanity_val_steps * chlog * Fix val_progress_bar total with num_sanity_val_steps * move test * replaced with sanity flag and suggestions	2020-10-04 08:32:18 -04:00
Lezwon Castelino	4da240ea1b	added broadcast option to tpu (#3814 ) * added broadcast option to tpu * add device * moved tpu broadcast to tpu_backend * removed Lightning dist * decode bytes * pep8 fix * fix bug * test for broadcast * updated changelog	2020-10-04 07:47:33 -04:00
William Falcon	66aef10239	verified epoch logging (#3830 ) * ref: fix epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging	2020-10-03 21:17:24 -04:00
William Falcon	35d1111994	[WIP] ref: decoupled ddp, ddp spawn (finish 3733) (#3819 ) * ref: finish #3733 * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * Update pytorch_lightning/accelerators/ddp_backend.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * remove deprecated test * remove deprecated test * remove deprecated test Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2020-10-03 14:05:31 -04:00
William Falcon	3903cf63c6	ref: training flag tests (val_check_interval) (#3825 ) * added test_val_check_interval tests * added test_val_check_interval tests * added test_val_check_interval tests	2020-10-03 14:05:01 -04:00
William Falcon	0fb8c54fda	remove deprecated test (#3820 )	2020-10-03 13:21:10 -04:00
William Falcon	d9bc95f83e	ref: bug fix with logging val epoch end + monitor (#3812 ) * ref: fix metric err * ref: fix metric err * ref: fix metric err * ref: merge * ref: merge * ref: merge * ref: merge * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix	2020-10-03 12:33:29 -04:00
Jeff Yang	9942f3ebdf	Fix `on_train_batch_start` hook to end epoch early (#3700 ) * init * add test * changelog and docs * fix test * Apply suggestion from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-10-02 21:46:46 +02:00
Jirka Borovec	62eabdd535	revert backend types (#3788 ) * revert backend types * todo * todo	2020-10-02 06:18:44 -04:00
Jirka Borovec	1160270882	fix path in CI for release & python version in all dockers & duplicated badges (#3765 ) * typo * path * check * trigger * fix conda * pip ver * fix cuda * fix XLA * fix xla * ci * docker * BIULD * unBIULD * update * py 3.8 * apex * apex	2020-10-02 05:26:21 -04:00
Akihiro Nitta	ebc1b23fa3	Use `raise .. from ..` to explicitly chain exceptions (#3750 ) * Fix exception chaining * names * Change exception names for consistency Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Change exception names for consistency Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2020-10-01 21:45:44 +02:00
William Falcon	e17712e5c3	part 5 of #3733 (#3774 ) * ref: part 4 of #3733 * ref: part 4 of #3733 * ref: part 4 of #3733	2020-10-01 12:34:12 -04:00
William Falcon	622c5c3982	ref: part 4 of #3733 (#3773 ) * ref: part 4 of #3733 * ref: part 4 of #3733 * ref: part 4 of #3733 * ref: part 4 of #3733	2020-10-01 11:26:58 -04:00
Nicki Skafte	fe290280be	Metric aggregation testing (#3517 ) * aggregation testing * add more tests * mse * more tests * fix tests * fix doctest * fix codefactor * fix import error * fix doctest * revert docfix * test for model integration * fix integration test * added test cases * fix rmsle * aggregation testing * add more tests * mse * more tests * fix tests * fix doctest * fix codefactor * fix import error * fix doctest * revert docfix * test for model integration * fix integration test * fix psnr * add warning/valueerror to embedding similarity * fixed f scores * disable some test * fix tests * fixing codefactor * fix pep8 * changelog * fix doctest * cleaning test * fix pickle error * pickle fix * fix pickle error * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * code cleanup + changes based on suggestions * update based on suggestion * update based on suggestions * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-10-01 15:37:51 +02:00
William Falcon	ac2b0f0f06	ref: continue #3733 (#3767 ) * ref: #3733 part 2 * ref: #3733 part 2	2020-10-01 09:25:33 -04:00
William Falcon	440f837f6d	ref: part a of #3733 (#3766 ) * ref: part a of #3733 * ref: part a of #3733	2020-10-01 08:15:23 -04:00
Nicki Skafte	9a7d1a1876	[metrics] Accuracy num_classes error fix (#3764 ) * change accuracy error to warning * changelog	2020-10-01 13:00:42 +02:00
GimmickNG	e4e60e9b82	Add datamodule parameter to lr_find() (#3425 ) * Add datamodule parameter to lr_find() * Fixed missing import * Move datamodule parameter to end * Add datamodule parameter test with auto_lr_find * Change test for datamodule parameter * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Fix lr_find documentation Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * formatting * Add description to datamodule param in lr_find * pep8: remove trailing whitespace on line 105 * added changelog Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-10-01 10:33:12 +02:00
Teddy Koker	5ec00ccd28	Added gradient clip test for native AMP (#3754 ) * added gradient clip test for fp16 * pep8	2020-10-01 01:36:34 -04:00
William Falcon	a38d108a68	add dist lib to enable syncing anything across devices (#3762 ) * add dist lib to enable syncing anything across devices	2020-10-01 01:21:38 -04:00
William Falcon	cf182e80fc	Finish Allow on_save_checkpoint... (#3688 ) * Finish #3562 * Apply suggestions from code review * Apply suggestions from code review * fix tests * Finish #3562 * Apply suggestions from code review * Apply suggestions from code review * fix tests * fix structure * fix structure * make save_last test pass * unnecessary global rank check * fix test * update test * update test * test * test * run save on all * remove assert * tracking saves * check if fails * test * clean up * adjust horovod test * clean up * remove unnecessary makdirs * change * undo * debug * debug * debug * debug * mock * undo debug code * add extra assertions * test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-30 16:15:29 -04:00
Adrian Wälchli	c73032e39d	Make ModelCheckpoint(save_top_k=-1) track the best models (#3735 ) * fix topk=-1 tracking best * update test * clean up * add changelog * enable loading best topk in trainer.test() * make trivial * return right away * make windows test path happy	2020-09-30 08:34:02 -04:00
Jirka Borovec	31a36f04df	define distributed as a type (#3740 ) * define type * miss * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * miss * warn Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-09-30 08:33:01 -04:00
Adrian Wälchli	9405c880af	log/save_interval based on global step (#3667 ) * log interval based on global step * test * test * test * test * pep * pep * added changelog * pep * merge * remove unused arg	2020-09-30 12:26:27 +02:00
William Falcon	b3be8022bd	tests for val step flow and logging (#3731 ) * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end * ref: test log dict * ref: test log dict * ref: test log dict * ref: test log dict	2020-09-29 22:12:56 -04:00
ananthsub	3dcf7130c5	Support checkpoint hooks on data module (#3563 ) * Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter * Store a reference to the trainer on the datamodule Fixes #3682 * Update data_connector.py * Update data_connector.py * Update test_datamodules.py * Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter * support checkpoint hooks for datamodule refactor on_{save/load}_checkpoint to a separate hook class that both the lightning module and data module inherit add spots in callback connector to call new datamodule hooks if available * hooks formatting * Update hooks.py * Update checkpoint_connector.py * Update lightning.py * update based on upstream/master checkout upstream/master * Update checkpoint_connector.py * add tests * undo format revert * Updated CHANGELOG.md * add checkpoint hooks * add Dict type * import CheckpointHooks	2020-09-29 19:51:44 +02:00
William Falcon	c14928a72a	ref: test val flow steps (#3723 ) * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end	2020-09-29 11:42:38 -04:00
Maxim Grechkin	7bb139816a	Add a more direct test of multi-gpu training working (#2084 ) * Add a more direct test of multi-gpu training working * Update tests/base/develop_pipelines.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-29 15:38:09 +02:00
Carlos Mocholí	3b2efe5b2a	Fix ModelCheckpoint period (#3630 ) * Fix ModelCheckpoint period * Remove comma * Minor changes * skip check * Revert "skip check" Already pushed to master This reverts commit `00d9e77b81`. Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>	2020-09-29 15:36:45 +02:00
William Falcon	f42ea303c9	ref: enable self.log for eval loop metrics (#3715 ) * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end * ref: test val epoch end	2020-09-29 02:00:28 -04:00
William Falcon	c41ea86b35	ref: move backends back to individual files (1/5) (ddp_cpu) (#3712 ) * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: make each backend independent for easier debugging and independent debugging * ref: test val epoch end * ref: test val epoch end	2020-09-29 01:59:18 -04:00

1 2 3 4 5 ...

791 Commits