lightning

Commit Graph

Author	SHA1	Message	Date
Carlos Mocholí	321689f52e	Add `ModelCheckpoint(save_on_train_epoch_end)` (#8389 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-13 14:47:59 +00:00
Carlos Mocholí	c4353ea702	Remove `dev_debugger.call_count` (#8317 )	2021-07-07 19:59:59 +02:00
Carlos Mocholí	441e16f61c	Default `EarlyStopping.check_on_train_epoch_end=True` (#8286 )	2021-07-05 15:45:23 +02:00
Kaushik B	3a8322deda	Add XLAStatsMonitor Callback (#8235 )	2021-07-05 17:09:46 +05:30
Adrian Wälchli	e7139ab9f7	Support `DDPPlugin` to be used on CPU (#6208 ) * Skip test due to 'Python bus error' * Debug NCCL * Remove NCCL_DEBUG statement * Revert "Skip test due to 'Python bus error'" This reverts commit `e0a3e8785d`. * fix * add test * changelog * yapf * patch os environ * make a special test * destroy pg * debug * revert * revert * problematic test * skip * try the fixture * test * update sensitive test * update changelog * remove comment * update wrong test * update test name * parameterization * Revert "parameterization" This reverts commit b0542f43f59c5ce66800883b5e2f0c66a97408cc. * remove conftest * ignore test * teardown * fix merge * deep speed parameterization * uncomment test * update chlog * update changelog * split tests * update test update test update test update test * update test comments * unroll test * unroll test * unroll test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * increase shm * sudo * unroll ipu * Revert "sudo" This reverts commit `6cc68c1478`. * Revert "increase shm" This reverts commit `8c27163483`. * x * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * find guilty test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * POPTORCH_WAIT_FOR_IPU=1 * move test * redo parameterize for ipu * de-comment test * move chlog * Update tests/accelerators/test_accelerator_connector.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * Update tests/accelerators/test_accelerator_connector.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-07-02 12:00:24 +01:00
Carlos Mocholí	a2e41045d2	Mark some loop attributes as protected (#8250 )	2021-07-02 11:51:51 +01:00
Justus Schock	d6435a5b73	Bugfix/swa iterable dset (#8172 ) * add test * add fix * Update CHANGELOG.md	2021-06-28 21:18:25 +00:00
Ethan Harris	2a372e3682	Fix module dict in base finetuning (#8170 ) * Fix module dict in base finetuning * Update CHANGELOG.md	2021-06-28 10:55:32 +00:00
deepsource-autofix[bot]	e11fe19673	Remove unnecessary use of comprehension (#8149 ) Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>	2021-06-27 10:00:02 +01:00
Adrian Wälchli	4becd1cf31	rename old `Trainer.train_loop` -> `Trainer.fit_loop` (#8025 )	2021-06-22 11:49:32 +02:00
Carlos Mocholí	f1fa4c4727	Update fit with val hook test (#8060 )	2021-06-21 17:27:37 +00:00
simran2905	d1efae2e47	Fix checkpointed state for lr_schedulers with step interval (#7877 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-06-21 15:08:07 +00:00
Carlos Mocholí	e55f01e665	Update evaluation hook tests (#8013 )	2021-06-18 16:41:27 +00:00
Adrian Wälchli	eebdc910dd	progressive restoring of trainer state (#7652 )	2021-06-17 08:13:53 +00:00
Austin Basye	906de2a7fa	[feat] Named Parameter Groups in `LearningRateMonitor` (#7987 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-06-17 03:13:54 +02:00
Carlos Mocholí	4ffba600c9	Add predict hook test (#7973 )	2021-06-16 15:09:24 +02:00
Adrian Wälchli	971908a1aa	Loop Refactor 1/N - Training Loop (#7871 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: Justus Schock <justus.schock@posteo.de> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-06-15 12:55:06 +00:00
Dan Dale	3a0ed02bd4	Properly handle parent modules w/ parameters in `BaseFinetuning` callback (#7931 ) Co-authored-by: Daniel Dale <dan@distributedinsight.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-06-14 16:01:07 +00:00
Adrian Wälchli	c1eac483e9	split `restore_training_state` into logical parts [2 / 2] (#7900 )	2021-06-10 21:54:21 +02:00
Carlos Mocholí	ec4f8856af	Enable logger connector re-design (#7891 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-06-09 14:24:45 +00:00
Carlos Mocholí	5593b6f772	Merge pull request #7872 from PyTorchLightning/refactor/logger-poc-changes Random fixes for logger connector PoC	2021-06-08 09:04:16 -04:00
thomas chaton	ea71cf4a5f	[Test] Add extra test for val_check_interval in distributed scenario (#7863 ) * add extra test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add computation * Update docs/source/common/trainer.rst Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update docs/source/common/trainer.rst Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update tests/trainer/test_dataloaders.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * use tmpdir * update on comments * update * Update tests/callbacks/test_progress_bar.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-06-07 10:37:32 +00:00
thomas chaton	d1becce4c1	[bugfix] Resolve LearningRateMonitor + BackboneFinetuning (#7835 ) * add test + resolve bug * update changelog * resolve bug * resolve bug * Update pytorch_lightning/callbacks/lr_monitor.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/callbacks/lr_monitor.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update on comments * resolve comments * update * Update tests/callbacks/test_lr_monitor.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/callbacks/lr_monitor.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-06-07 10:17:11 +00:00
Sean Naren	10839376e2	[IPU] Add special tests for IPUs 2/n (#7833 ) * Add special tests for IPUs, run nvprof only if cuda available * Add missing min_gpu	2021-06-04 23:23:09 +05:30
Adrian Wälchli	7e6010fc93	fix info message when max training time reached (#7780 ) * call time_elapsed * elapsed formatting * format * update test * changelog	2021-05-31 14:50:16 +02:00
Carlos Mocholí	311d9fe67e	Always run validation inside the training loop epoch (#7357 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-26 14:26:48 +02:00
Carlos Mocholí	d26953c8bc	Add `ModelPruning(prune_on_train_epoch_end)` to choose when to apply pruning (#7704 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-26 00:57:56 +02:00
Carlos Mocholí	e2ead9abd7	Refactor some loops code and hook tests (#7682 )	2021-05-25 13:27:54 +02:00
Gyeongjae Choi	a54bc5dba3	Fix progress bar print error when called before training (#7674 ) * Check progress bar existence before printing * Add tests for predict_progres_bar * Add tests for progress_bar printing without training * Update changelog	2021-05-24 17:33:28 +02:00
Yifu Wang	ed271905cf	Clear predict_progress_bar in ProgressBar.__getstate__ (#7608 ) Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-05-20 01:38:49 +00:00
Adrian Wälchli	a1a655d006	Reduce log output size in special tests (#7481 )	2021-05-11 17:36:20 +02:00
Adrian Wälchli	ad9118f04a	remove trainer hidden state \| sanity refactor [1 / n] (#7437 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-11 11:09:08 +02:00
ananthsub	7b45bcfedb	[2/2] Remove outputs from evaluation epoch end hooks (#7338 ) * Remove outputs from on_train_epoch_end * iterate * Update callback_hook.py * update * early stop? * fix * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update trainer.py * update * Update training_loop.py * early stop? * fix * Remove outputs from evaluation epoch end hooks * update * Update test_remove_1-5.py * fix lints * Update base.py * rm-outputs * Update evaluation_loop.py * try-save-more-memory * Update trainer.py * Update trainer.py * cache-at-start * Update evaluation_loop.py * Update training_loop.py * Update training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>	2021-05-05 19:50:58 +00:00
ananthsub	6104a6316a	[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks (#7339 ) * Remove outputs from on_train_epoch_end * iterate * Update callback_hook.py * update * Update training_loop.py * Update test_training_loop.py * early stop? * fix * update tests * Update test_hooks.py * Update pytorch_lightning/trainer/callback_hook.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update trainer.py * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 17:18:16 +02:00
Carlos Mocholí	8c0ea92af2	`TrainerState` refactor [5/5] (#7173 ) * `TrainerState` refactor * flake8 * Update finished check * Test cleanup * Fix tests * Fixes * Reorder * flake8 * Update CHANGELOG * Better docs * Better docs * Remove default * Update tests * Bad merge	2021-05-04 12:50:56 +02:00
thomas chaton	80b9ca0e38	[bugfix] Add reloading support using BaseFinetuning (#7253 ) * update * wip * udpate * update * update * update * resolve bug * update on comments * update on comments * update * update * formatting * add comments * update on comments * update * Update pytorch_lightning/callbacks/base.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * update * Typing and minor changes * Refactor * Fix deprecated test * Broken commit * Fix broken commit * flake8 * Update CHANGELOG * update on comments Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-30 11:14:43 -04:00
ananthsub	14b8dd479a	[2/2] Remove training loop force calling early stopping callback (#7069 ) * rebase * doc * Update training_loop.py * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md	2021-04-29 09:14:53 -07:00
Carlos Mocholí	40f80230fe	Remove `trainer.fit` return value [2/n] (#7237 ) * `_fit_impl` refactor and types * Fix return * Remove return docstring * Fixes * Fixes * Remove `trainer.fit` return value * Update CHANGELOG * flake8 * Undo results change * Fix test * Revert changes for a separate PR * flake8	2021-04-28 19:11:32 +01:00
ananthsub	947d1cb757	[1/2] Add support for early stopping during training epoch end (#6944 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-04-28 15:18:56 +02:00
thomas chaton	e76ebd640e	[feat] Add BasePredictionWriter 3/3 (#7127 ) * wip * update * update * update * update * update * typo * update on comments * update * update * update * update * update changelog * update * Fix merge * Fix merge * move code * resolve test * add extra test * add an extra test * update on comments * add typing * resolve flake8 * Refactor and Docs * Fix tests * Fix tests * Fix tests * Duplicate * Fix tests * resolve bug * update * update on comments * Update pytorch_lightning/utilities/imports.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/utilities/device_parser.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update * update * update * update on comments * resolve flkae8 * update test * Apply suggestions from code review * update on comments * Update pytorch_lightning/callbacks/prediction_writer.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/callbacks/prediction_writer.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/callbacks/prediction_writer.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * update on comments * update * update on comment * Apply suggestions from code review * update Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-04-27 20:23:55 +00:00
Adrian Wälchli	3b36d81c03	Fixed `num_sanity_val_steps` affecting reproducibility of training data shuffling (#7014 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-27 09:51:39 +00:00
Carlos Mocholí	33066f8fd9	Add `on_predict_{batch,epoch}_{start,end}` and `Callback.on_predict_{start,end}` (#7141 ) * Update hooks typing and predict hooks * Update CHANGELOG * Progress * Progress * Add back `on_predict_{start,end}` * Typing and fix * Update tests/trainer/logging_/test_logger_connector.py * Update tests/callbacks/test_lambda_function.py	2021-04-22 10:05:28 -04:00
Adrian Wälchli	d12c6cf2b3	more early stopping options (convergence and divergence threshold) (#6868 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-19 16:49:52 +02:00
Adrian Wälchli	67d21609c9	Add Trainer max_time argument + Callback (#6823 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2021-04-16 13:38:57 +02:00
shuyingsunshine21	03a73b37bc	Train End Error Handling Fix (#6864 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>	2021-04-14 20:35:42 +02:00
Carlos Mocholí	15926b462c	Add SWA warning if not running every epoch (#6987 ) * Add SWA warning if not running every epoch * Typo	2021-04-13 18:34:40 +02:00
Ethan Harris	b9bc77293b	Fix inconsistent outputs in `on__end` and `_end` (#6969 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-13 15:16:21 +01:00
scart97	eb15abcd82	Fix finetuning complex models correctly unfreezes. (#6880 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-08 12:59:06 +05:30
Michael Baumgartner	6dc1078822	Enforce an epoch scheduler interval when using SWA (#6588 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-06 02:57:33 +00:00
Karthik Prasad	c3da7f50bb	Sanitize `None` params during pruning (#6836 ) * sanitize none params during pruning * amend	2021-04-06 01:47:59 +02:00

1 2 3 4 5

208 Commits