lightning

Commit Graph

Author	SHA1	Message	Date
William Falcon	1aa9d39506	Eval epoch can now log independently (#3843 ) * ref: routed epoch outputs to logger * ref: routed epoch outputs to logger * ref: routed epoch outputs to logger * ref: routed epoch outputs to logger	2020-10-04 13:36:35 -04:00
Jeff Yang	b76fc5bae5	use docker for conda CI (#3841 ) * use docker in conda CI * update env if needed * update with pip * remove setting pytorch	2020-10-04 13:18:20 -04:00
Adrian Wälchli	1906867fd4	deprecation warning (#3844 )	2020-10-04 13:17:09 -04:00
William Falcon	2c21f7d7e2	ref: adding compute environments (2/n) (#3842 ) * ref: adding compute environments (2/n) * ref: adding compute environments (2/n) * ref: adding compute environments (2/n) * ref: adding compute environments (2/n)	2020-10-04 08:48:46 -04:00
Rohit Gupta	a628d181ee	Fix val_progress_bar total with num_sanity_val_steps (#3751 ) * Fix val_progress_bar total with num_sanity_val_steps * chlog * Fix val_progress_bar total with num_sanity_val_steps * move test * replaced with sanity flag and suggestions	2020-10-04 08:32:18 -04:00
Lezwon Castelino	4da240ea1b	added broadcast option to tpu (#3814 ) * added broadcast option to tpu * add device * moved tpu broadcast to tpu_backend * removed Lightning dist * decode bytes * pep8 fix * fix bug * test for broadcast * updated changelog	2020-10-04 07:47:33 -04:00
William Falcon	093535d433	ref: adding compute environments (1/n) (#3837 ) * ref: adding compute environments (1/n) * ref: adding compute environments (1/n) * ref: adding compute environments (1/n)	2020-10-04 07:31:19 -04:00
Daniel Li	a3503ce3fd	Explicitly point out where should we set the random seed (#3839 ) * Explicitly point out where should we set the random seed * Update docs/source/multi_gpu.rst Co-authored-by: Jeff Yang <ydcjeff@outlook.com> Co-authored-by: Qinru Li <q4li@eng.ucsd.edu> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Jeff Yang <ydcjeff@outlook.com>	2020-10-04 07:30:45 -04:00
William Falcon	1f8ff7c48c	ref: callback system and init ddp (1/n) (#3836 ) * refactored callback system and init ddp * refactored callback system and init ddp * refactored callback system and init ddp * refactored callback system and init ddp	2020-10-03 23:39:17 -04:00
ananthsub	b8a6408a11	Update trainer.py (#3834 )	2020-10-03 22:18:05 -04:00
William Falcon	66aef10239	verified epoch logging (#3830 ) * ref: fix epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging * verified epoch logging	2020-10-03 21:17:24 -04:00
William Falcon	35d1111994	[WIP] ref: decoupled ddp, ddp spawn (finish 3733) (#3819 ) * ref: finish #3733 * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * remove deprecated test * Update pytorch_lightning/accelerators/ddp_backend.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * remove deprecated test * remove deprecated test * remove deprecated test Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2020-10-03 14:05:31 -04:00
William Falcon	3903cf63c6	ref: training flag tests (val_check_interval) (#3825 ) * added test_val_check_interval tests * added test_val_check_interval tests * added test_val_check_interval tests	2020-10-03 14:05:01 -04:00
ananthsub	8dd37e7c4a	Use fsspec in load to resolve more paths/URLs from storage backends (#3692 ) * special case http for torch hub load * Update CHANGELOG.md * Update test.txt	2020-10-03 13:29:03 -04:00
William Falcon	0fb8c54fda	remove deprecated test (#3820 )	2020-10-03 13:21:10 -04:00
William Falcon	d9bc95f83e	ref: bug fix with logging val epoch end + monitor (#3812 ) * ref: fix metric err * ref: fix metric err * ref: fix metric err * ref: merge * ref: merge * ref: merge * ref: merge * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix	2020-10-03 12:33:29 -04:00
William Falcon	ed1450a293	ref: clean up ddp before final fix (#3817 ) * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix	2020-10-03 12:01:02 -04:00
William Falcon	0838c6bfce	ref: decoupled ddp2 (#3816 )	2020-10-03 09:02:35 -04:00
Jeff Yang	62320632d4	Some docs update (#3794 ) * docs update * docs update * suggestions * Update docs/source/introduction_guide.rst Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-10-03 08:15:07 -04:00
William Falcon	a677833f84	ref: separate slurm from ddp (#3809 ) * ref: separate slurm from ddp * ref: separate te from ddp * ref: merge * ref: merge * ref: merge	2020-10-02 23:08:34 -04:00
Brendan Fahy	b14c4d4c70	handle fsspec inconsistency in PyArrowHDFS (#3805 )	2020-10-02 22:35:42 -04:00
William Falcon	74484edecd	ref: separate te from ddp (#3810 ) * ref: separate te from ddp * ref: separate te from ddp * ref: separate te from ddp	2020-10-02 21:00:51 -04:00
William Falcon	a28528cc8b	ref: remove weight loading hack for ddp_cpu (#3808 )	2020-10-02 19:28:50 -04:00
William Falcon	afa43837a4	ref: part 8 of #3733 (#3806 )	2020-10-02 18:46:18 -04:00
Jeff Yang	9942f3ebdf	Fix `on_train_batch_start` hook to end epoch early (#3700 ) * init * add test * changelog and docs * fix test * Apply suggestion from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-10-02 21:46:46 +02:00
ananthsub	3ab730e316	Swap torch.load for fsspec load in ddp spawn backend (#3787 ) * Update ddp_spawn_backend.py * Update ddp_cpu_spawn_backend.py * log Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>	2020-10-02 21:00:01 +02:00
ananthsub	192fc018f3	Update model_checkpoint.py (#3801 )	2020-10-02 14:49:46 -04:00
William Falcon	7c6ed1fa28	ref: part 7 of #3733 (#3802 ) * ref: part 7 of #3733 * ref: part 7 of #3733	2020-10-02 14:23:27 -04:00
Jirka Borovec	22efce8f40	fix warning (#3800 )	2020-10-02 13:51:02 -04:00
zcain117	0c12065efd	[TPU CI] Use timestamp+pythonVersion to form the docker image tag. (#3779 ) * Use timestamp+pythonVersion to form the docker image tag. * Remove temporary step to check new env var.	2020-10-02 16:22:47 +02:00
ananthsub	88ad4513c1	Use fsspec with OmegaConf saving in saving.py (#3782 )	2020-10-02 15:37:37 +02:00
Nathan Raw	698f90164c	remove torch<1.3.0 warning from tb logger (#3784 )	2020-10-02 15:36:55 +02:00
Jirka Borovec	62eabdd535	revert backend types (#3788 ) * revert backend types * todo * todo	2020-10-02 06:18:44 -04:00
edenlightning	ab7d9bd1a5	Add link to PL forum in GH questions template (#3708 ) * Update how-to-question.md * Update how-to-question.md * Apply suggestions from code review * typo Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-10-02 12:05:46 +02:00
Jirka Borovec	1160270882	fix path in CI for release & python version in all dockers & duplicated badges (#3765 ) * typo * path * check * trigger * fix conda * pip ver * fix cuda * fix XLA * fix xla * ci * docker * BIULD * unBIULD * update * py 3.8 * apex * apex	2020-10-02 05:26:21 -04:00
Akihiro Nitta	ebc1b23fa3	Use `raise .. from ..` to explicitly chain exceptions (#3750 ) * Fix exception chaining * names * Change exception names for consistency Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Change exception names for consistency Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2020-10-01 21:45:44 +02:00
William Falcon	e17712e5c3	part 5 of #3733 (#3774 ) * ref: part 4 of #3733 * ref: part 4 of #3733 * ref: part 4 of #3733	2020-10-01 12:34:12 -04:00
William Falcon	622c5c3982	ref: part 4 of #3733 (#3773 ) * ref: part 4 of #3733 * ref: part 4 of #3733 * ref: part 4 of #3733 * ref: part 4 of #3733	2020-10-01 11:26:58 -04:00
Jeff Yang	128f9ee931	Fix for PyTorch 1.7 CI (#3768 ) * changed to __jit_unsed_properties__	2020-10-01 16:37:00 +02:00
Nicki Skafte	fe290280be	Metric aggregation testing (#3517 ) * aggregation testing * add more tests * mse * more tests * fix tests * fix doctest * fix codefactor * fix import error * fix doctest * revert docfix * test for model integration * fix integration test * added test cases * fix rmsle * aggregation testing * add more tests * mse * more tests * fix tests * fix doctest * fix codefactor * fix import error * fix doctest * revert docfix * test for model integration * fix integration test * fix psnr * add warning/valueerror to embedding similarity * fixed f scores * disable some test * fix tests * fixing codefactor * fix pep8 * changelog * fix doctest * cleaning test * fix pickle error * pickle fix * fix pickle error * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * code cleanup + changes based on suggestions * update based on suggestion * update based on suggestions * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-10-01 15:37:51 +02:00
William Falcon	ac2b0f0f06	ref: continue #3733 (#3767 ) * ref: #3733 part 2 * ref: #3733 part 2	2020-10-01 09:25:33 -04:00
William Falcon	440f837f6d	ref: part a of #3733 (#3766 ) * ref: part a of #3733 * ref: part a of #3733	2020-10-01 08:15:23 -04:00
Nicki Skafte	9a7d1a1876	[metrics] Accuracy num_classes error fix (#3764 ) * change accuracy error to warning * changelog	2020-10-01 13:00:42 +02:00
Lezwon Castelino	8be002ccc7	skip best_model_path if checkpoint_callback is None (#2962 ) * skip best_model_path if checkpoint_callback is None * removed test	2020-10-01 06:57:26 -04:00
GimmickNG	e4e60e9b82	Add datamodule parameter to lr_find() (#3425 ) * Add datamodule parameter to lr_find() * Fixed missing import * Move datamodule parameter to end * Add datamodule parameter test with auto_lr_find * Change test for datamodule parameter * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * Fix lr_find documentation Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * formatting * Add description to datamodule param in lr_find * pep8: remove trailing whitespace on line 105 * added changelog Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-10-01 10:33:12 +02:00
William Falcon	7c61fc7c27	ref: fixes logging for eval steps (#3763 ) * fixes logging for eval steps	2020-10-01 02:31:11 -04:00
Teddy Koker	5ec00ccd28	Added gradient clip test for native AMP (#3754 ) * added gradient clip test for fp16 * pep8	2020-10-01 01:36:34 -04:00
William Falcon	a38d108a68	add dist lib to enable syncing anything across devices (#3762 ) * add dist lib to enable syncing anything across devices	2020-10-01 01:21:38 -04:00
William Falcon	cf182e80fc	Finish Allow on_save_checkpoint... (#3688 ) * Finish #3562 * Apply suggestions from code review * Apply suggestions from code review * fix tests * Finish #3562 * Apply suggestions from code review * Apply suggestions from code review * fix tests * fix structure * fix structure * make save_last test pass * unnecessary global rank check * fix test * update test * update test * test * test * run save on all * remove assert * tracking saves * check if fails * test * clean up * adjust horovod test * clean up * remove unnecessary makdirs * change * undo * debug * debug * debug * debug * mock * undo debug code * add extra assertions * test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-30 16:15:29 -04:00
ananthsub	1eb1d17e25	Add trainer attribute to datamodule (#3749 ) * Split out changes from #3563 to make that PR easier to review. This formats the file according to the Black formatter * Store a reference to the trainer on the datamodule Fixes #3682 * Update data_connector.py * Update data_connector.py * Update test_datamodules.py * Add attribute to datamodule for trainer	2020-10-01 00:41:19 +05:30

... 2 3 4 5 6 ...

3447 Commits All Branches Search

3447 Commits

All Branches