lightning

Commit Graph

Author	SHA1	Message	Date
jjenniferdai	6d79184ec5	Unify checkpoint load paths [redo #9693 ] (#10061 )	2021-10-25 19:05:31 +00:00
Carlos Mocholí	b376799430	Minor fixes related to clipping (#10130 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-25 16:40:22 +00:00
Kaushik B	56bc55db71	Update strategy flag in docs (#10000 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-10-20 21:02:53 +05:30
Kaushik B	5e8829b97d	(1/n) tests: Use strategy flag instead of accelerator for training strategies (#9931 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-10-16 20:40:25 +05:30
Rohit Gupta	23e8b59ae7	Add `configure_gradient_clipping` hook in `LightningModule` (#9584 ) * init hook * docs * dep train args * update tests * doc * doc * .gitignore * not dep * add trainer args * add & update tests * fix tests * pre-commit * docs * add docs * add exception * code review * deepspeed * update tests * not * try fix * Apply suggestions from code review * update deepspeed * disable some tests * disable some tests * enable all tests	2021-10-13 20:15:13 +05:30
ananthsub	28fc8d2016	Add `enable_model_summary` flag and deprecate `weights_summary` (#9699 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-10-13 17:20:54 +05:30
Sean Naren	83acb8671d	Update DeepSpeed version, fix failing tests (#9898 )	2021-10-11 22:35:33 +00:00
Rohit Gupta	4decbc0d95	Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816 ) * deprecate hooks * dep todo * explicit * Apply suggestions from code review * Apply suggestions from code review * code review * base	2021-10-07 10:18:11 +00:00
Carlos Mocholí	0ddd6a8c19	Remove `_NATIVE_AMP_AVAILABLE` checks (#9747 )	2021-09-29 15:34:26 +02:00
Danielle Pintz	b3a5c7f442	Add `enable_progress_bar` to Trainer constructor (#9664 )	2021-09-24 22:53:31 -07:00
Danielle Pintz	160e7e1289	Deprecate LightningModule.get_progress_bar_dict (#8985 ) * Move get_progress_bar_dict from lightning module to progress bar callback	2021-09-09 20:53:47 +00:00
Carlos Mocholí	6892d533ea	Run plugin closure before `on_before_optimizer_step` [1/2] (#9288 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-09-07 11:52:20 +00:00
Jirka Borovec	6e124e7207	CI: precommit - docformatter (#8584 ) * CI: precommit - docformatter * fix deprecated Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-09-06 12:49:09 +00:00
Carlos Mocholí	d0efb55b0f	Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 (#8911 )	2021-08-16 13:34:42 +02:00
Carlos Mocholí	c99e2fe0d2	Test `Callback.on_load_checkpoint` order (#8588 )	2021-07-29 12:28:29 +02:00
Carlos Mocholí	47c47faeae	Remove `outputs` in `on_train_epoch_end` hooks (#8587 )	2021-07-28 18:27:54 +02:00
Sean Naren	aadd2a9d9c	Load ckpt path when model provided in validate/test/predict (#8352 ) * Change trainer loading behaviour for validate/test/predict * Fix * Fix/add tests * remove * Cleanups * Space * cleanups * Add CHANGELOG.md * Move after setup * Cleanups on logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remve * fix test * feedback * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update pytorch_lightning/trainer/properties.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Feedback * Same fix * Same fix * Add test for behaviour, modify based on feedback * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Wording * Apply suggestions from code review Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Cleanup docs * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * feedback * Fixes to test API * Add carlos description * Move logic further * Move checkpoint connector logic Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-07-28 10:12:46 +00:00
Carlos Mocholí	a64cc37394	Replace `yapf` with `black` (#7783 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-26 13:37:35 +02:00
Carlos Mocholí	321689f52e	Add `ModelCheckpoint(save_on_train_epoch_end)` (#8389 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-13 14:47:59 +00:00
Dusan Drevicky	1b06edf2f2	Add the `on_before_optimizer_step` hook (#8048 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-07-09 13:30:52 +02:00
thomas chaton	1c825a2a9c	Add the `on_before_backward` hook (#7865 ) * Add callback to hook tests and add predict test * Fix lambda callback test * Simplify lambda call test * Use LambdaCallback * Dynamically append to called for the model * Remove print * Consistency * Consistency * Prepare args/kwargs testing * yapf doesn't like dict literals * Add arguments for fit no val test * Add arguments for fit no val test * add before_backward_hook * add test * resolve flake8 * resolve tests * update changelog * add on_before_backward to LightningModule * update on comments * Test arguments * Datamodule refactor * Fix eval test * remove extra file * resolve bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to hooks * update * resolve flake8 * update on comments * Update full fit + val test * Update test * Remove FIXME * Remove FIXME * Undo change * Fix * Parametrize fit hook test * Comment * Parametrize fit hook test with different precision plugins * Fix tests * Parametrize fit hook test with manual optimization * Unnecessary parenthesis * WIP * Comments * Fix message * Test CI error * Revert "Test CI error" This reverts commit `39c4a85a83`. * Add ddp training type teardown * Update CHANGELOG * Adrian's fix * Use destructor * Update CHANGELOG.md * RPC destructor * Update pytorch_lightning/plugins/training_type/ddp.py * Why do you not work :( * Missing condition * Fix deepspeed test * GC collect in conftest * Do not show warnings for special tests * Needs to run on 1.8 To avoid: "RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8" * Run torch 1.8 * Skip test due to 'Python bus error' * Debug NCCL * shm size * Disable warnings for special tests * Remove NCCL_DEBUG statement * Try smaller shm size * Revert "Skip test due to 'Python bus error'" This reverts commit `e0a3e8785d`. * README and adjust versions * Avoid self.on_gpu call * empty cache cleanup * More garbage collection * Unroll parametrizations * Do not reuse mock * Undo changes * Undo notebooks modification * resolve test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete file * Undo * Fix test * Revert "WIP" This reverts commit `f5828a8c42`. * Rename * Remove optimizers * Fix bug with LightningOptimizer * Add optimizers * update * update * Update CHANGELOG * On after backward refactor * Do not call super * Fixes * Remove should_accumulate * pre/post backward refactor * Call the LM backward hook * Update tests * Remove dev debug patch * Fix test * Remove optimizer arguments and typing * Docs fixes * Fix comment * Undo changes * Split manual and auto * Undo change * Deepsource * Remove optimizers * Undo changes * Call the hook * Docs * Docs Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-09 06:15:57 +00:00
Carlos Mocholí	ea88105b88	Parametrize fit hook test with different precision plugins (#8070 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-05 10:50:01 +00:00
Adrian Wälchli	ea5cfd2005	move batch to device before sending it to hooks (#7378 ) * update train step * test * x * limits * val * typeo * x * x * step * min gpus * run all loops * x * limit test * profiler * clean up accelerator code * move files * rename * move tests * changelog * reorder callbacks and model hooks * add test description * replace unneccessary method * fix chlog * adjust batch_to_device for DP Plugin * update tests for dataloader idx * unused imports * hook change * switch None * clear memory * change to None * None * None * memory savings * remove redundant todo * hack * cheat * Revert "cheat" This reverts commit `a8433bd0b4`. * Revert "hack" This reverts commit `43a6d1edeb`. * update new epoch loop * remove from old loop code * update chlog * update hook test * changelog * teardown * integrate changes in new eval loop * fix hook calls * add prediction step * bad merge * Revert "bad merge" This reverts commit `488080863c`. * fix train batch hook test * rm -rf _notebooks * update chlog * release memory * fix type * notebooks mess * debug * Revert "debug" This reverts commit `eec4ee2f77`. * teardown * fix teardown bug * debug * x * debug * Revert "debug" This reverts commit `a6e6101946`. Revert "debug" This reverts commit `5ddeaec069`. debug debug Revert "debug" This reverts commit 605be746f7daedf265b2c05a1c153ce543394435. Revert "Revert "debug"" This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8. debug x x x s tol x tol * Fix changelog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-07-05 09:31:39 +01:00
thomas chaton	24db914093	Support state restoration of logged results 2/2(#7966 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-06-25 19:16:11 +00:00
Carlos Mocholí	54ac4e03cb	Update fit with no validation hook test (#7738 ) * Add callback to hook tests and add predict test * Fix lambda callback test * Simplify lambda call test * Use LambdaCallback * Dynamically append to called for the model * Remove print * Consistency * Consistency * Prepare args/kwargs testing * yapf doesn't like dict literals * Add arguments for fit no val test * Add arguments for fit no val test * Test arguments * Datamodule refactor * Fix eval test * Update full fit + val test * Update test * Update resume test * Remove changes * Fix	2021-06-23 09:34:00 +02:00
Carlos Mocholí	f1fa4c4727	Update fit with val hook test (#8060 )	2021-06-21 17:27:37 +00:00
Carlos Mocholí	e55f01e665	Update evaluation hook tests (#8013 )	2021-06-18 16:41:27 +00:00
Adrian Wälchli	eebdc910dd	progressive restoring of trainer state (#7652 )	2021-06-17 08:13:53 +00:00
Carlos Mocholí	4ffba600c9	Add predict hook test (#7973 )	2021-06-16 15:09:24 +02:00
Carlos Mocholí	03e7bdf8d5	Improve `LightningModule` hook tests (#7944 )	2021-06-14 18:16:42 +02:00
Carlos Mocholí	436fc53c89	Improve `LightningDataModule` hook test and fix `dataloader_idx` argument (#7941 ) Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-06-14 12:42:13 +00:00
Carlos Mocholí	906c067b07	Update hooks pseudocode (#7713 )	2021-05-27 12:27:26 +02:00
Carlos Mocholí	311d9fe67e	Always run validation inside the training loop epoch (#7357 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-26 14:26:48 +02:00
Carlos Mocholí	e2ead9abd7	Refactor some loops code and hook tests (#7682 )	2021-05-25 13:27:54 +02:00
Carlos Mocholí	fe1c4ca273	Move test_hooks.py code (#7689 )	2021-05-24 22:26:32 +00:00
Carlos Mocholí	8b01497e42	Fix global step update when the epoch is skipped (#7677 ) * Fix global step update when the epoch is skipped * Update CHANGELOG * Move test	2021-05-24 17:36:56 +01:00
Rohit Gupta	7ca41734da	Add `dataloader_idx` to batch transfer hooks (#6241 ) * replace with kwargs * chlog * fix * add test * fix * device * deepspeed * pep * optional * docs * bc * comments * pep * mypy * pep * Apply suggestions from code review * kwargs * docs * . * . * 1.3 -> 1.4 * kwargs -> step_kwargs	2021-05-13 23:03:55 +05:30
Adrian Wälchli	ad9118f04a	remove trainer hidden state \| sanity refactor [1 / n] (#7437 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-11 11:09:08 +02:00
ananthsub	6104a6316a	[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks (#7339 ) * Remove outputs from on_train_epoch_end * iterate * Update callback_hook.py * update * Update training_loop.py * Update test_training_loop.py * early stop? * fix * update tests * Update test_hooks.py * Update pytorch_lightning/trainer/callback_hook.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update trainer.py * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 17:18:16 +02:00
Carlos Mocholí	8c0ea92af2	`TrainerState` refactor [5/5] (#7173 ) * `TrainerState` refactor * flake8 * Update finished check * Test cleanup * Fix tests * Fixes * Reorder * flake8 * Update CHANGELOG * Better docs * Better docs * Remove default * Update tests * Bad merge	2021-05-04 12:50:56 +02:00
Ethan Harris	b9bc77293b	Fix inconsistent outputs in `on__end` and `_end` (#6969 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-13 15:16:21 +01:00
thomas chaton	1302766f83	DeepSpeed ZeRO Update (#6546 ) * Add context to call hook to handle all modules defined within the hook * Expose some additional parameters * Added docs, exposed parameters * Make sure we only configure if necessary * Setup activation checkpointing regardless, saves the user having to do it manually * Add some tests that fail currently * update * update * update * add tests * change docstring * resolve accumulate_grad_batches * resolve flake8 * Update DeepSpeed to use latest version, add some comments * add metrics * update * Small formatting fixes, clean up some code * Few cleanups * No need for default state * Fix tests, add some boilerplate that should move eventually * Add hook removal * Add a context manager to handle hook * Small naming cleanup * wip * move save_checkpoint responsability to accelerator * resolve flake8 * add BC * Change recommended scale to 16 * resolve flake8 * update test * update install * update * update test * update * update * update test * resolve flake8 * update * update * update on comments * Push * pull * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * Apply suggestions from code review * Swap to using world size defined by plugin * update * update todo * Remove deepspeed from extra, keep it in the base cuda docker install * Push * pull * update * update * update * update * Minor changes * duplicate * format * format2 Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-03-30 13:39:02 -04:00
Rohit Gupta	9be092dbdb	Add on_epoch_start to run at the beginning of every loop irrespective of train/val/test (#6498 ) * update docs * add hook and update docs * update tests * chlog * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * chlog Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-25 14:20:49 +01:00
ananthsub	40976e4eba	Support teardown hook on DataModule (#4673 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2021-03-25 07:51:55 -05:00
Kaushik B	b190403e28	Add outputs param for `on_val/test_epoch_end` hooks (#6120 ) * add outputs param for on_val/test_epoch_end hooks * update changelog * fix warning message * add custom call hook * cache logged metrics * add args to docstrings * use warning cache * add utility method for param in sig check * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update docstring * add test for eval epoch end hook * add types and replace model ref * add deprecation test * fix test fx name * add model hooks warning * add old signature model to tests * add clear warning cache * sopport args param * update tests * add tests for model hooks * code suggestions * add signature utils * fix pep8 issues * fix pep8 issues * fix outputs issue * fix tests * code fixes * fix validate test * test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-16 12:15:16 -04:00
Elia Cereda	f4cc7451a9	Add Trainer.validate(…) method to run one validation epoch (#4948 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-11 03:46:37 +01:00
Carlos Mocholí	efd272a3ca	Pass {fit,validate,test,predict} to setup() and teardown() (#6386 )	2021-03-08 15:27:07 +01:00
Jirka Borovec	d1a03153f3	Refactor: runif for spec 6/6 (#6307 ) * special * rpc	2021-03-02 18:57:13 +00:00
Jirka Borovec	0f9134e043	Refactor: skipif for Windows 2/n (#6268 ) * win * isort * flake8	2021-03-02 09:36:01 +00:00
Jirka Borovec	eb815000f6	Refactor: skipif for multi - gpus 1/n (#6266 ) * ngpus * gpu * isort * pt * flake8	2021-03-02 09:03:32 +01:00

1 2

88 Commits