lightning

Commit Graph

Author	SHA1	Message	Date
ananthsub	aad86423f7	Remove more deprecated methods from base `Accelerator` class (#10448 )	2021-11-10 12:58:24 +05:30
puhuk	f9b9cdb0d1	Remove deprecated accelerator pass through functions in Accelerator (#10403 ) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-11-08 17:36:37 +00:00
Adrian Wälchli	a270a79ed9	Rename "master" methods to "main" in ClusterEnvironment plugins (#10103 ) * rename occurrences of master port, master address, maser node, master process * rename properties * add property decorators * occurrences in docs * update changelog * update changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add lost method * create deprecation * add changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo (but it was already there!!!) * Apply suggestions from code review Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * add todo * update more occurences * add types * add missing import Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-11-08 12:32:58 +00:00
Carlos Mocholí	9237106451	Clip before step (#10248 )	2021-10-30 11:27:49 +01:00
Kaushik B	cedaebfcbb	Add `auto_device_count` method to `Accelerators` (#10222 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-10-29 22:31:32 +02:00
Carlos Mocholí	81d15c5986	Implement double optimizer closure for hook structure consistency (#10167 )	2021-10-29 13:03:04 +00:00
Carlos Mocholí	03f01fb5ec	Fix gradient norm tracking and gradient clipping (#9287 ) * WIP * Progress * Undo test change * Fix plugin closure execution order * Update CHANGELOG * Fix manual optimization on AMP and skipping backward * Fix for deepspeed * Typo * Hook test for manual closure * Add skipping test with AMP * You are hideous, apex * Add deepspeed test * Update CHANGELOG * Fix for broken master * Add RunIf * FIXMEs * Rename * Fix grad norm * add a simple test * update test * update test * update test * fix merge conflicts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Sea of changes * Undo change * Introduce TPUPrecisionPlugin * Undo changes * Undo changes * Resolve FIXME * Undo change * Undo change * Undo change * Fix FIXMEs * Fix FIXME * Correct value * Bad merge * Fix circular imports * WIP * Fixing clipping * Fixes * Bad merge * Move optimizer step and clipping into the `PrecisionPlugin` * Fix AMP * Update CHANGELOG * Fix tests * Underscore * Progress * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove pre_optimizer_step * Missed one * Progress * Progress * Fix test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FIXMEs * Fix test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix test * DeepSpeed warning. mypy * Rename * Finish tests * Update CHANGELOG * Dumb fixes * accelerator=auto * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Update on comments * Use ClassifModule Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-28 15:23:27 +00:00
Carlos Mocholí	48b6292cf0	Move optimizer step and clipping into the `PrecisionPlugin` (#10143 )	2021-10-26 17:26:26 +02:00
Rohit Gupta	93266e2c22	Avoid deprecated warnings from accelerator and checkpoint connector #10142	2021-10-26 14:10:30 +02:00
Carlos Mocholí	b376799430	Minor fixes related to clipping (#10130 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-10-25 16:40:22 +00:00
Adrian Wälchli	d41902883a	Update `optimizer_step` methods in accelerator and plugins (#10023 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-10-20 21:36:27 +01:00
Carlos Mocholí	ef5a12212a	Isolate optimizer step logic to the `PrecisionPlugin` (#10029 )	2021-10-20 15:43:08 +00:00
four4fish	a002f872ea	[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator (#9901 ) Co-authored-by: tchaton <thomas@grid.ai>	2021-10-14 17:38:22 +02:00
Rohit Gupta	4decbc0d95	Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816 ) * deprecate hooks * dep todo * explicit * Apply suggestions from code review * Apply suggestions from code review * code review * base	2021-10-07 10:18:11 +00:00
Carlos Mocholí	0ddd6a8c19	Remove `_NATIVE_AMP_AVAILABLE` checks (#9747 )	2021-09-29 15:34:26 +02:00
Carlos Mocholí	9ebfbbc349	Remove unused `post_optimizer_step` (#9746 )	2021-09-29 13:09:22 +00:00
four4fish	15cd6ad45b	Call TrainingTypePlugin collective functions directly instead of going through the Accelerator (#9677 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-09-27 14:52:57 +02:00
Danielle Pintz	ab069876cb	[1/4] Add get_device_stats to accelerator interface (#9586 )	2021-09-26 21:09:16 -07:00
ananthsub	41e3be197f	Remove `call_configure_sharded_model` lifecycle property (#9612 )	2021-09-24 03:57:53 +02:00
Carlos Mocholí	b1ed1db089	Keep global step update in the loop (#8856 )	2021-09-14 19:21:39 +05:30
Kaushik B	b294c5760e	Fix type hint for filepath (#9434 )	2021-09-10 21:38:54 +00:00
Danielle Pintz	cc2ac02dd1	Move add_to_queue/get_from_queue to DDPSpawnPlugin (#9118 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-09-10 20:58:02 +00:00
Carlos Mocholí	3070a9ea6e	Fix hiddens type annotation (#9377 )	2021-09-09 08:45:52 +01:00
Jirka Borovec	6e124e7207	CI: precommit - docformatter (#8584 ) * CI: precommit - docformatter * fix deprecated Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-09-06 12:49:09 +00:00
four4fish	f01a9a6cd2	Remove `BasePlugin` (#9066 ) * Remove BasePlugin Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-08-25 19:10:28 +00:00
four4fish	c912ebf889	Remove TrainingTypePlugin.on_save and Accelerator.on_save (#9023 ) * Remove TrainingTypePlugin.on_save and Accelerator.on_save	2021-08-23 10:11:00 -07:00
ananthsub	8a931732ae	Remove unused `on_train_epoch_end` hook in accelerator (#9035 )	2021-08-23 00:20:10 +05:30
four4fish	13e64e6a80	Remove deprecated functions from accelerator.py (#9019 )	2021-08-22 00:25:42 +02:00
Carlos Mocholí	d0efb55b0f	Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 (#8911 )	2021-08-16 13:34:42 +02:00
Carlos Mocholí	93ab24d1ee	Replace DataLoader sampler once for IPUs (#8858 )	2021-08-16 11:28:05 +02:00
Carlos Mocholí	ed13040729	Connect the model to the training type plugin at the start of run (#8536 )	2021-08-04 17:43:34 +02:00
Sean Naren	07b7dc9c17	[Fix] Add delay property for checkpointing, refactor loading checkpoint (DeepSpeed Checkpointing Fix 1/n) (#8627 ) * Add property to delay checkpointing, move loading checkpoint file into the run function to allow deepspeed engine to be loaded * Add a small test * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/accelerators/accelerator.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Address review * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-30 11:31:08 +01:00
Carlos Mocholí	a64cc37394	Replace `yapf` with `black` (#7783 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-26 13:37:35 +02:00
thomas chaton	c9af1a7aec	[bugfix] Reduce memory leaks (#8490 ) * reduce memory leak * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update changelog * Apply suggestions from code review Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * resolve flake8 * update on comments * resolve bug * update * Undo whitespace changes * remove bug * resolve flake8 * revert change * update on comments * delete the ddp wrapper as it hold memory * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve flake8 * update on comments * update changelog * resolve test * Update CHANGELOG * Refactor teardown * Fix comment * Do it for non-gpu too * remove ref when the model is not a lightning_module * Fix import error * move down * resolve bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve assignement * update * move above * Fix device calls to support tpu training * Updat todo Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-07-21 11:37:05 +02:00
Carlos Mocholí	c5a120ed9d	Update to Mypy>0.9 (#8386 )	2021-07-13 08:23:36 +02:00
Carlos Mocholí	eb6d991218	Refactor plugins backward (#8328 )	2021-07-08 16:02:09 +02:00
Adrian Wälchli	ea5cfd2005	move batch to device before sending it to hooks (#7378 ) * update train step * test * x * limits * val * typeo * x * x * step * min gpus * run all loops * x * limit test * profiler * clean up accelerator code * move files * rename * move tests * changelog * reorder callbacks and model hooks * add test description * replace unneccessary method * fix chlog * adjust batch_to_device for DP Plugin * update tests for dataloader idx * unused imports * hook change * switch None * clear memory * change to None * None * None * memory savings * remove redundant todo * hack * cheat * Revert "cheat" This reverts commit `a8433bd0b4`. * Revert "hack" This reverts commit `43a6d1edeb`. * update new epoch loop * remove from old loop code * update chlog * update hook test * changelog * teardown * integrate changes in new eval loop * fix hook calls * add prediction step * bad merge * Revert "bad merge" This reverts commit `488080863c`. * fix train batch hook test * rm -rf _notebooks * update chlog * release memory * fix type * notebooks mess * debug * Revert "debug" This reverts commit `eec4ee2f77`. * teardown * fix teardown bug * debug * x * debug * Revert "debug" This reverts commit `a6e6101946`. Revert "debug" This reverts commit `5ddeaec069`. debug debug Revert "debug" This reverts commit 605be746f7daedf265b2c05a1c153ce543394435. Revert "Revert "debug"" This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8. debug x x x s tol x tol * Fix changelog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-07-05 09:31:39 +01:00
deepsource-autofix[bot]	03154eb30a	Refactor unnecessary `else` / `elif` when `if` block has a `return` statement (#8156 ) Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2021-06-28 15:27:41 +05:30
Carlos Mocholí	4d9b72b8a9	Nuke RPC (#8101 )	2021-06-23 18:31:13 +00:00
Sean Naren	41be61c6f2	[IPU] Add hooks for IPU lifecycle 4/5 (#7864 )	2021-06-07 12:06:41 +00:00
Sean Naren	6388c29e87	[IPU] Add reset dataloader hooks to training type plugin 3/n (#7861 ) * Add hooks * Add tests for hooks * Add changelog * Test changes, add typing	2021-06-07 10:37:09 +00:00
shuyingsunshine21	2242423b75	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00
Rohit Gupta	7ca41734da	Add `dataloader_idx` to batch transfer hooks (#6241 ) * replace with kwargs * chlog * fix * add test * fix * device * deepspeed * pep * optional * docs * bc * comments * pep * mypy * pep * Apply suggestions from code review * kwargs * docs * . * . * 1.3 -> 1.4 * kwargs -> step_kwargs	2021-05-13 23:03:55 +05:30
shuyingsunshine21	8538c1f61e	Accelerator model state dict (#7474 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * modify model state dict to training type plugin * remove changes * add changelog * fixing isort for pre-commit failure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address code review Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-11 16:39:04 +01:00
ananthsub	6104a6316a	[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks (#7339 ) * Remove outputs from on_train_epoch_end * iterate * Update callback_hook.py * update * Update training_loop.py * Update test_training_loop.py * early stop? * fix * update tests * Update test_hooks.py * Update pytorch_lightning/trainer/callback_hook.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update trainer.py * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 17:18:16 +02:00
ananthsub	98670c83a9	Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule (#7323 ) * deprecate-tbptt-trainer * Update CHANGELOG.md * Update lightning.py * test * Update lightning.py * Update training_loop.py * Update training_loop.py * Update lightning.py * Update training_loop.py * Update training_loop.py * update docs * Update accelerator.py * Update accelerator.py * more docs * tweaks * chlog * comments Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 11:21:00 +01:00
Carlos Mocholí	8c0ea92af2	`TrainerState` refactor [5/5] (#7173 ) * `TrainerState` refactor * flake8 * Update finished check * Test cleanup * Fix tests * Fixes * Reorder * flake8 * Update CHANGELOG * Better docs * Better docs * Remove default * Update tests * Bad merge	2021-05-04 12:50:56 +02:00
ananthsub	39274273a4	Update accelerator.py (#7318 )	2021-05-03 11:17:26 -04:00
Adrian Wälchli	e0c64f0ef6	Fix Adagrad optimizer not working with DDP/GPU (#7277 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-05-03 03:57:17 +05:30
thomas chaton	16d6c9828d	[bugfix] Apex never instantiated. (#7274 ) * update * update * update apex * update * update * update * remove test.py * update * update * update on comments * update changelog * update * update * typo	2021-04-30 13:16:28 -04:00

1 2 3

120 Commits