lightning

Commit Graph

Author	SHA1	Message	Date
Adrian Wälchli	cc9781a0ad	Deprecate early_stop_callback Trainer argument (part 2) (#3845 ) * update tests with EarlyStopping default * imports * revert legacy tests * fix test * revert * revert	2020-10-04 17:36:47 -04:00
Rohit Gupta	a628d181ee	Fix val_progress_bar total with num_sanity_val_steps (#3751 ) * Fix val_progress_bar total with num_sanity_val_steps * chlog * Fix val_progress_bar total with num_sanity_val_steps * move test * replaced with sanity flag and suggestions	2020-10-04 08:32:18 -04:00
William Falcon	d9bc95f83e	ref: bug fix with logging val epoch end + monitor (#3812 ) * ref: fix metric err * ref: fix metric err * ref: fix metric err * ref: merge * ref: merge * ref: merge * ref: merge * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: decoupled ddp2 * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix * ref: clean up ddp before final fix	2020-10-03 12:33:29 -04:00
William Falcon	a38d108a68	add dist lib to enable syncing anything across devices (#3762 ) * add dist lib to enable syncing anything across devices	2020-10-01 01:21:38 -04:00
William Falcon	cf182e80fc	Finish Allow on_save_checkpoint... (#3688 ) * Finish #3562 * Apply suggestions from code review * Apply suggestions from code review * fix tests * Finish #3562 * Apply suggestions from code review * Apply suggestions from code review * fix tests * fix structure * fix structure * make save_last test pass * unnecessary global rank check * fix test * update test * update test * test * test * run save on all * remove assert * tracking saves * check if fails * test * clean up * adjust horovod test * clean up * remove unnecessary makdirs * change * undo * debug * debug * debug * debug * mock * undo debug code * add extra assertions * test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-30 16:15:29 -04:00
Adrian Wälchli	c73032e39d	Make ModelCheckpoint(save_top_k=-1) track the best models (#3735 ) * fix topk=-1 tracking best * update test * clean up * add changelog * enable loading best topk in trainer.test() * make trivial * return right away * make windows test path happy	2020-09-30 08:34:02 -04:00
Carlos Mocholí	3b2efe5b2a	Fix ModelCheckpoint period (#3630 ) * Fix ModelCheckpoint period * Remove comma * Minor changes * skip check * Revert "skip check" Already pushed to master This reverts commit `00d9e77b81`. Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>	2020-09-29 15:36:45 +02:00
William Falcon	931995b55b	remove flake 8 (#3687 )	2020-09-27 20:40:02 -04:00
Adrian Wälchli	d15fd751c7	change default save_top_k, save_last to None (#3680 ) * topk default * fix test that doesn't have best available * remove print * #3680 changes * fix backward * temp revert te * add warning by carmocca * format docstring for test * specify monitor in ES test with top k * improve docstring for save_last * remove commented lines * revert passing model to test * undo regex mistake * changelog * fix test covering case monitor=None and savetopk=-1 * docstring * fix test for saving all checkpoints * don't save checkpoints for save_top_k=0 * add test for savetopk=0 Co-authored-by @carmocca Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2020-09-27 20:05:02 -04:00
Pariente Manuel	3d76f604bd	Add ModelCheckpoint.to_yaml method (#3048 ) * Add ModelCheckpoint.to_json() * Add ModelCheckpoint.to_json() test * Fix W292: Add new line at end of file * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * Fixed tests * Update pytorch_lightning/callbacks/model_checkpoint.py * Apply suggestions from code review * fix test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-09-27 14:39:40 +02:00
William Falcon	d79bce1dff	enable None model checkpoint default (#3669 ) * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default * enable None model checkpoint default	2020-09-26 23:14:04 -04:00
Carlos Mocholí	e70aea7642	Allow ModelCheckpoint monitor to be None (#3633 ) * Fix ModelCheckpoint period * Test for less epochs	2020-09-25 15:54:04 +02:00
Carlos Mocholí	ed12e422a4	Fix incorrect "Saving latest checkpoint" warning (#3588 ) * Fix incorrect "Saving latest checkpoint" warning * Replace warning with info. Run PyCharm's optimize imports * Remove unused class variable. Refactor logic. Improve test * Fix De Morgan's	2020-09-25 14:18:06 +02:00
Carlos Mocholí	908382f196	Split GPUStatsMonitor function (#3644 ) * Split function * Add docstrings * Add typing annotations * Minor refactor * Make static to add a test	2020-09-25 07:30:30 +02:00
Carlos Mocholí	1223cdbaa1	Add missing line. Add a test (#3594 )	2020-09-21 22:17:51 -04:00
William Falcon	21cfdf6874	ref: result 1/n (make monitor default to checkpoint_on to simplify re… (#3571 ) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * ref: result 1/n (make monitor default to checkpoint_on to simplify result syntax) * force crash when max_epochs < epochs in a checkpoint Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2020-09-20 22:58:43 -04:00
William Falcon	277538970d	force crash when max_epochs < epochs in a checkpoint (#3580 ) * force crash when max_epochs < epochs in a checkpoint * force crash when max_epochs < epochs in a checkpoint	2020-09-20 22:04:22 -04:00
Jirka Borovec	8eb77cd06a	drop v0.10 deprecated (#3454 ) * drop v0.10 deprecated * import * missed	2020-09-19 11:47:26 -04:00
Carlos Mocholí	580b04b490	Fix ModelCheckpoints name formatting (#3163 ) * Fix ModelCheckpoint's name formatting * Fix failing tests * Add dot to CHECKPOINT_SUFFIX * Set variables to their default values at the end of tests * Fix logic for filepath='' and filename=None. Add test * Fix Windows tests * Fix typo. Remove leading line break and zeroes * Remove CHECKPOINT_SUFFIX * Fix typos. Use appropriate f-string format * Apply suggestions from code review * Fix broken tests after #3320 * Finish changes suggested by Borda * Use explicit test var names * Apply suggestions Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Apply suggestions Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update CHANGELOG * Apply suggestions from code review * for * prepend whitespace in warn msg Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-09-18 23:09:11 +02:00
Lucas Steinmann	197acd535f	Fix early stopping with training step's return dict (#3347 ) * Fixes the test for early stopping without val step. The expression which checked, if early stopping was triggered, had an off-by-one error and hence was true even if early stopping was not triggered. Furthermore set patience to 0 and max epochs to 10, to ensure loss has enough time to flatten. * Fixes early stopping without val step. The issue has been, that only `early_stop_on` key was checked and not an arbitrary monitor key. * Fixes branch, which checks whether early stopping is done during validation. Before only `val_early_stop_on` was checked. Since arbitrary keys can be used, the set of possible validation keys cannot be exhaustive. Hence this disables "early stopping on_train_epoch_end" via an instance attribute if early stopping was executed in on_validation_epoch_end. Furthermore adds a test, which ensures arbitrary keys work. * Improve check whether eval results are used. Only disable early checking with train results if eval results are actually used. Before they were always disabled in ``on_validation_epoch_end``. Rename and document instance variable, to make it more clear. * Remove wrong documentation on behaviour of early stopping with train result' dict. * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-09-18 23:08:04 +02:00
ananthsub	d1d48e2ea1	Fix trivial comparison in model checkpoint test (#3464 ) We were comparing keys across the same checkpoint dict instead of ckpt_last vs ckpt_last_epoch All other changes here are formatting	2020-09-11 20:50:46 +02:00
Rohit Gupta	24809b0b26	Refactor GPUStatsMonitor to improve training speed (#3257 ) * Refactor GPUMonitor to improve training speed * added gpu ids to monitor * update tests * added deprecation warning * pep * fix test * fix docs * fix log_gpu_memory * move deprecation check * chlog * Update CHANGELOG.md * suggestions and fix Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-09-04 06:02:16 -04:00
Rohit Gupta	4a22fca524	Changed LearningRateLogger to LearningRateMonitor (#3251 ) * Change LearningRateLogger to LearningRateMonitor * file rename * docs * add LearningRateLogger with deprecation warning * deprecated LearningRateLogger * move deprecation check * chlog Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>	2020-09-03 18:17:15 +00:00
Jeremy Jordan	a5d1176cf6	callback method for on_save_checkpoint (#2501 ) * initial draft * fix test * Update pytorch_lightning/trainer/callback_hook.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * fix tests * remove old code * untested upgrade script * document limitations * clean up and add tests * Update pytorch_lightning/trainer/training_io.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect PR comments * fix formatting * Update docs/source/callbacks.rst * clarify docs * revert change for loading checkpoints * small edits Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-08-28 16:50:52 +02:00
Rohit Gupta	f03943ee94	Fix GpuUsageLogger to work on different platforms (#3008 ) * Fix GpuUsageLogger * docstrings * misconfigexception * add basic tests * skip doctest * fix parameter and docstring * rm cl * skip doctest * cleanup * chlog * add suggestions from review * add test from suggestions * fix import * fix test * fix test * fix test * fix test * rename GpuUsageLogger to GPUStatsMonitor * doc fix * Apply suggestions from code review * update docs format * update docs * miss * merge * fix title formatting * unindent * punctuation * simplify if statements * fix test * suggestions * pep * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix on_train_batch_* * use AttributeDict * usage * rank zero Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * import * minor changes Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-08-27 19:50:32 +02:00
William Falcon	a1705441a9	ref: remove _evaluate fx (#3197 ) * remove _evaluate * remove _evaluate * remove _evaluate * remove _evaluate * remove _evaluate * remove _evaluate * remove _evaluate * remove _evaluate	2020-08-26 12:28:14 -04:00
William Falcon	bda1400225	ref: restore on_eval_start hook (#3183 ) * restore eval loop hook	2020-08-26 00:45:43 -04:00
William Falcon	2f6d82e0e6	ref: remove on_eval_start hook (#3176 ) * remove on_eval_start hook * remove on_eval_start hook	2020-08-25 22:28:00 -04:00
William Falcon	3453bba898	re-enabled naming metrics in ckpt name (#3060 ) * re-enabled naming metrics in ckpt name * re-enabled naming metrics in ckpt name * re-enabled naming metrics in ckpt name * re-enabled naming metrics in ckpt name * re-enabled naming metrics in ckpt name * re-enabled naming metrics in ckpt name	2020-08-19 20:34:09 -04:00
Caldera	6c18fd9a24	Update lr_logger.py (#2847 ) * Update lr_logger.py when logging learning_rate, we should provide different choices to log including 'step' and 'epoch' * Update lr_logger.py add some type annotations and docstrings * Update lr_logger.py fixed a bug where `on_train_batch_start()` can't be triggered, instead, we should use on_batch_start(); add `interval` args so that we can record learning_rates with respect to `global_step` or `current_epoch`. * Update lr_logger.py restore _extract_lr() * suggestion * Update lr_logger.py modify _extract_lr(), it no more need to pass `interval` parameter. * Update test_lr_logger.py SkafteNicki 's suggetion * log_interval now supports `None`, `step`, `epoch` * change `log_interval` to `logging_interval` * Update test_lr_logger.py * Update lr_logger.py * put types check into `on_train_start()` * cleanup * docstring typos * minor changes from suggestions Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2020-08-09 16:30:43 +00:00
Rohit Gupta	4d0406ec8b	deepcopy model state_dict in tests (#2887 ) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2020-08-08 16:13:06 +00:00
Adrian Wälchli	f798cffd02	save last model after saving top_k when save_last=True (#2881 ) * save_last should be last * changelog * seed, docs * retrigger ci * compare filenames * move constants * fix test * epoch, global step * improve test	2020-08-08 06:02:43 -04:00
William Falcon	f82d7feb6c	updated hooks (#2850 ) * modified hooks * modified hooks * modified hooks * modified hooks * modified hooks * modified hooks * modified hooks * modified hooks * modified hooks	2020-08-07 09:29:57 -04:00
Jirka Borovec	ed3ee982b3	clean tests imports (#2834 )	2020-08-06 16:58:51 +02:00
William Falcon	b507c42c47	clarify batch hooks (#2842 ) * modified hook * modified hook * modified hook * modified hook * modified hook * modified hook * modified hook * modified hook * modified hook * modified hook * modified hook * modified hook * modified hook	2020-08-05 20:01:30 -04:00
Jirka Borovec	06e8910f06	pytorch 1.6 (#2745 ) * pt 1.6 * don't use the new zipfile serialization for now * quick flake8 fixes * remove unnecessary f * coalesce strings * remove comma * remove extra commas * Apply suggestions from code review Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com> * set _use_new_zipfile_serialization to False only for pytorch 1.6.0 * remove unnecessary comments * flake8 fixes * use pkg_resources instead of packaging * readme * format * version * chlog Co-authored-by: Peter Yu <peter@asapp.com> Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>	2020-07-31 11:18:32 +02:00
Rohit Gupta	84c507c4df	Fix max_batches with fast_dev_run. (#2581 ) * Fix fast_dev_run to run for all val_dataloaders * fast_dev_run check * changelog * explicit * limit_batches with fast_dev_run in init * add test * whitespace and comment fix * comment and assertion * added tests * Fix fast_dev_run to run for all val_dataloaders * fast_dev_run check * changelog * explicit * limit_batches with fast_dev_run in init * add test * whitespace and comment fix * comment and assertion * added tests * added tests * added tests * added tests * update rtol * Revert "update rtol" This reverts commit `4320329540`. * added tests Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-07-27 17:56:55 -04:00
Adrian Wälchli	d03953260d	Fix weights_save_path when logger is used + simplify path handling + better docs (#2681 ) * fix weights_save path and drop ckpt_path * add tests * unused import * update docs * changelog * pep8 * fix horovod test * make backward compatible * perform same test for all loggers * fix for when logger=False and weights_save_path is set * update changelog * update docs * update tests * do not set save dir dynamically * remove duplicate test * remove duplicated tests * update tests * update tests * remove remaining ckpt_path references * move defaults to init as suggested by @Borda * test deprecation	2020-07-27 12:53:11 -04:00
Adrian Wälchli	938ec5a6c1	remove duplicate tests (#2685 ) * remove duplicate test * remove duplicated tests	2020-07-24 08:15:40 -04:00
William Falcon	6d10ac2ac8	Structured results (train loop only. val loop separate PR) (PR 2/5) (#2615 ) * r * r * r * patched optimizer closure with sr * patched optimizer closure with sr * patched optimizer closure with sr * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added train step structured result * added autoreduce for train step * added auto reduce on train * added auto reduce on train * added auto reduce on train * added auto reduce on train * added auto reduce on train * added auto reduce on train * added hooks * added hooks * added hooks * added hooks * added hooks * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * cache * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * Update pytorch_lightning/callbacks/early_stopping.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/early_stopping.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/early_stopping.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/model_checkpoint.py * Update pytorch_lightning/core/step_result.py * finished tests for structured results on train epoch * finished tests for structured results on train epoch * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * simple * finished tests for structured results on train epoch * simple * simple * revert * finished tests for structured results on train epoch * finished tests for structured results on train epoch * Update tests/base/deterministic_model.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * finished tests for structured results on train epoch * docstring typos * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * finished tests for structured results on train epoch * Update pytorch_lightning/core/step_result.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/overrides/data_parallel.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2020-07-20 19:00:20 -04:00
William Falcon	b73812648f	don't pass tpu weights back on test (#2566 ) * enable none checkpoint * enable none checkpoint	2020-07-09 12:11:56 -04:00
William Falcon	11069c8784	Fix ddp tests + .test() (#2512 ) * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * fix deprecation warnings * added base tests for tpu * added base tests for tpu * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com> * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu * added base tests for tpu Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>	2020-07-07 12:24:56 -04:00
Adrian Wälchli	25ee51bc57	Continue Jeremy's early stopping PR #1504 (#2391 ) * add state_dict for early stopping * move best attr after monitor_op defined * improve early stopping and model checkpoint callbacks * fix formatting * fix attr init order * clean up setting of default_root_dir attr * logger needs default root dir set first * reorg trainer init * remove direct references to checkpoint callback * more fixes * more bugfixes * run callbacks at epoch end * update tests to use on epoch end * PR cleanup * address failing tests * refactor for homogeneity * fix merge conflict * separate tests * tests for early stopping bug regressions * small fixes * revert model checkpoint change * typo fix * fix tests * update train loop * cannot pass an int as default_save_path * refactor log message * fix test case * appease the linter * fix some doctests * move config to callback * fixes from rebase * fixes from rebase * chlog * docs * reformat * formatting * fix * fix * fixes from rebase * add new test for patience * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/callbacks/test_early_stopping.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix formatting * remove enable_early_stop attribute * add state_dict for early stopping * move best attr after monitor_op defined * improve early stopping and model checkpoint callbacks * fix formatting * fix attr init order * clean up setting of default_root_dir attr * logger needs default root dir set first * reorg trainer init * remove direct references to checkpoint callback * more fixes * more bugfixes * run callbacks at epoch end * update tests to use on epoch end * PR cleanup * address failing tests * refactor for homogeneity * fix merge conflict * separate tests * tests for early stopping bug regressions * small fixes * revert model checkpoint change * typo fix * fix tests * update train loop * fix test case * appease the linter * fix some doctests * move config to callback * fixes from rebase * fixes from rebase * chlog * docs * reformat * formatting * fix * fix * fixes from rebase * add new test for patience * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/callbacks/test_early_stopping.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * fix formatting * remove enable_early_stop attribute * fix test with new epoch indexing * fix progress bar totals * fix off by one error (see #2289) epoch starts at 0 now * added missing imports * fix hpc_save folderpath * fix formatting * fix tests * small fixes from a rebase * fix * tmpdir * tmpdir * tmpdir * wandb * fix merge conflict * add back evaluation after training * test_resume_early_stopping_from_checkpoint TODO * undo the horovod check * update changelog * remove a duplicate test from merge error * try fix dp_resume test * add the logger fix from master * try remove default_root_dir * try mocking numpy * try import numpy in docs test * fix wandb test * pep 8 fix * skip if no amp * dont mock when doctesting * install extra * fix the resume ES test * undo conf.py changes * revert remove comet pickle from test * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update weights_loading.rst * Update weights_loading.rst * Update weights_loading.rst * renamed flag * renamed flag * revert the None check in logger experiment name/version * add the old comments * _experiment * test chckpointing on DDP * skip the ddp test on windows * cloudpickle * renamed flag * renamed flag * parentheses for clarity * apply suggestion max epochs Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu> Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-06-28 21:36:46 -04:00
Jirka Borovec	f1c96930b1	repair CI for Win (#2358 ) * no cov * no cov * ReduceOp * group * reduce_op.sum * Update sklearns.py * formatting * horovod * Apply suggestions from code review * horovod * horovod * horovod * horovod * ci * print * ci * timeout * timeout * time * fix * distributed cpu * pipes * time * cpu * spawn * spawn * spawn * tp * separate * os * os * npm * Fix load_from_checkpoint() not working with URL on Windows * Update CHANGELOG * Update CHANGELOG.md Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com> * Apply suggestions from code review * fix * fix meta tags creating empty lines * pyright * node * fix httpserver address * drop tutils.default_trainer_options * imports * Better fix for load_from_checkpoint() not working with absolute path on Windows (#2294) * Fix load_from_checkpoint() not working with URL on Windows * Update CHANGELOG * Update CHANGELOG.md Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com> * drop duplicate Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: airium <airium@outlook.com> Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: AIRIUM <38249940+airium@users.noreply.github.com>	2020-06-26 21:38:25 -04:00
Jirka Borovec	f278ac42c8	Revert/Fix: epoch indexing from 1, to be from 0 (#2289 ) * Revert "deprecated: epoch indexing from 1 (#2206)" This reverts commit `f94b919b` * chlog * grad index * Apply suggestions from code review * tests * fix * test	2020-06-19 23:39:53 -04:00
William Falcon	03ab574b0f	decrease some training times (#2256 )	2020-06-18 23:30:16 -04:00
William Falcon	79e1426161	Docs clean-up (#2234 ) * update docs * update docs * update docs * update docs * update docs * update docs	2020-06-18 08:29:18 -04:00
William Falcon	34816e9ec4	adds setup+teardown hook (#2229 ) * allow regression metrics to import * allow regression metrics to import * allow regression metrics to import * allow regression metrics to import * allow regression metrics to import * allow regression metrics to import * allow regression metrics to import * allow regression metrics to import * allow regression metrics to import	2020-06-17 19:49:58 -04:00
William Falcon	2411c3be70	replace train_percent_check with limit_train_batches (#2220 ) * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * drop train_percent_check * chlog * deprecated * deprecated * deprecated * tests * tests * Apply suggestions from code review * tests * hydra support * tests * hydra support * hydra support * hydra support * tests * typo * typo * Update test_dataloaders.py * docs * docs * docs * docs Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-06-17 13:42:28 -04:00
William Falcon	04c794ca72	[WIP] Rename overfit_pct to overfit_batches (and fix) and val_percent_check and test_percent_check (and fix) (#2213 ) * fixed percent check for val/test * fixed percent check for val/test * fixed percent check for val/test * fixed percent check for val/test * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * overfit_pct now uses train loaders for val and test and does not shuffle * add on fit_start on fit_end hooks * add on fit_start on fit_end hooks * add on fit_start on fit_end hooks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-06-17 08:03:28 -04:00

1 2

72 Commits