lightning

Commit Graph

Author	SHA1	Message	Date
Adrian Wälchli	321502fe31	Update backward hook for `PrecisionPlugin` (#10008 ) Co-authored-by: thomas chaton <thomas@grid.ai>	2021-10-19 10:51:45 +00:00
Adrian Wälchli	10d0b41977	Introduce `PrecisionPlugin.forward_context()` (#9988 ) Co-authored-by: thomas chaton <thomas@grid.ai>	2021-10-18 12:58:19 +00:00
Carlos Mocholí	0ddd6a8c19	Remove `_NATIVE_AMP_AVAILABLE` checks (#9747 )	2021-09-29 15:34:26 +02:00
Carlos Mocholí	44aed17aff	Remove duplicated native AMP + LBFGS check (#9748 )	2021-09-29 13:14:03 +00:00
Carlos Mocholí	9ebfbbc349	Remove unused `post_optimizer_step` (#9746 )	2021-09-29 13:09:22 +00:00
Carlos Mocholí	6892d533ea	Run plugin closure before `on_before_optimizer_step` [1/2] (#9288 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-09-07 11:52:20 +00:00
Jirka Borovec	6e124e7207	CI: precommit - docformatter (#8584 ) * CI: precommit - docformatter * fix deprecated Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-09-06 12:49:09 +00:00
John St. John	c30d9b9fae	Update call to `amp.autocast` from `fast_dtype` to `dtype` (#9211 ) Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-09-04 02:59:11 +00:00
four4fish	f01a9a6cd2	Remove `BasePlugin` (#9066 ) * Remove BasePlugin Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-08-25 19:10:28 +00:00
Sean Naren	bac8b1be81	Add support for CPU AMP autocast (#9084 )	2021-08-25 12:18:00 +00:00
Sean Naren	1bab0a17a9	Fix torch bfloat import version (#9089 )	2021-08-24 19:18:12 +00:00
Sean Naren	1feec8c601	Add bfloat16 support to Lightning Trainer (#9049 )	2021-08-24 09:47:21 +00:00
Carlos Mocholí	e63968ab88	Add `pyupgrade` to `pre-commit` (#8557 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-26 14:38:12 +02:00
Carlos Mocholí	a64cc37394	Replace `yapf` with `black` (#7783 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-26 13:37:35 +02:00
Dusan Drevicky	1b06edf2f2	Add the `on_before_optimizer_step` hook (#8048 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-07-09 13:30:52 +02:00
thomas chaton	1c825a2a9c	Add the `on_before_backward` hook (#7865 ) * Add callback to hook tests and add predict test * Fix lambda callback test * Simplify lambda call test * Use LambdaCallback * Dynamically append to called for the model * Remove print * Consistency * Consistency * Prepare args/kwargs testing * yapf doesn't like dict literals * Add arguments for fit no val test * Add arguments for fit no val test * add before_backward_hook * add test * resolve flake8 * resolve tests * update changelog * add on_before_backward to LightningModule * update on comments * Test arguments * Datamodule refactor * Fix eval test * remove extra file * resolve bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to hooks * update * resolve flake8 * update on comments * Update full fit + val test * Update test * Remove FIXME * Remove FIXME * Undo change * Fix * Parametrize fit hook test * Comment * Parametrize fit hook test with different precision plugins * Fix tests * Parametrize fit hook test with manual optimization * Unnecessary parenthesis * WIP * Comments * Fix message * Test CI error * Revert "Test CI error" This reverts commit `39c4a85a83`. * Add ddp training type teardown * Update CHANGELOG * Adrian's fix * Use destructor * Update CHANGELOG.md * RPC destructor * Update pytorch_lightning/plugins/training_type/ddp.py * Why do you not work :( * Missing condition * Fix deepspeed test * GC collect in conftest * Do not show warnings for special tests * Needs to run on 1.8 To avoid: "RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8" * Run torch 1.8 * Skip test due to 'Python bus error' * Debug NCCL * shm size * Disable warnings for special tests * Remove NCCL_DEBUG statement * Try smaller shm size * Revert "Skip test due to 'Python bus error'" This reverts commit `e0a3e8785d`. * README and adjust versions * Avoid self.on_gpu call * empty cache cleanup * More garbage collection * Unroll parametrizations * Do not reuse mock * Undo changes * Undo notebooks modification * resolve test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete file * Undo * Fix test * Revert "WIP" This reverts commit `f5828a8c42`. * Rename * Remove optimizers * Fix bug with LightningOptimizer * Add optimizers * update * update * Update CHANGELOG * On after backward refactor * Do not call super * Fixes * Remove should_accumulate * pre/post backward refactor * Call the LM backward hook * Update tests * Remove dev debug patch * Fix test * Remove optimizer arguments and typing * Docs fixes * Fix comment * Undo changes * Split manual and auto * Undo change * Deepsource * Remove optimizers * Undo changes * Call the hook * Docs * Docs Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-09 06:15:57 +00:00
Carlos Mocholí	eb6d991218	Refactor plugins backward (#8328 )	2021-07-08 16:02:09 +02:00
Carlos Mocholí	c4353ea702	Remove `dev_debugger.call_count` (#8317 )	2021-07-07 19:59:59 +02:00
Carlos Mocholí	ea88105b88	Parametrize fit hook test with different precision plugins (#8070 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-07-05 10:50:01 +00:00
deepsource-autofix[bot]	7e2f84e050	Remove methods with unnecessary super delegation. (#8148 ) * Remove methods with unnecessary super delegation. * Update fully_sharded.py * replace init in test Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ethanwharris@gmail.com>	2021-07-02 08:00:55 +00:00
Ethan Harris	57dce7244c	Fix double precision casting complex buffers (#8208 ) * Fix double precision casting complex buffers * Update CHANGELOG.md * Fixes * Fixes * Fix Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-06-30 10:57:42 +01:00
thomas chaton	24db914093	Support state restoration of logged results 2/2(#7966 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-06-25 19:16:11 +00:00
Edgar Riba	b378806b6c	Add `add_to_queue`/`get_from_queue` for DDP spawn(#7916 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-06-23 03:19:37 +02:00
Yifu Wang	b71aa55b9e	Make optimizers skippable when using amp (#7975 ) Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-06-16 00:23:30 +00:00
Sean Naren	96433d03ea	IPU Integration 5/5 (#7867 ) * Initial changes * Add broken example for now * Fix reference * Fix format * Code runs * Fixes * Clear up files * Add tests, helpers, fixes * Small cleanups * Refactors based on review * Swap to special tests * Add special tests * Add source * Cleanups * Add logic to attach/detach model from devices * Fixes for tests * Fixes for tests * Move earlier * Cleanups * Add check for nvcc * Add tests, cleanups * Fix errors * fix * Try condition * Add missing annotation * Clearer * Clearer message * Fix variable * Cleanups * Add comment * CHANGELOG.md * Add simple selection test * Remove special=True to see what happens * Fix test * Update tests/accelerators/test_ipu.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * Convert ipu_cores -> ipus * Add typing, fail earlier * simplify precision * Add test, add helper * fix accum * Update pytorch_lightning/plugins/training_type/ipu.py Co-authored-by: thomas chaton <thomas@grid.ai> * Use stages * Make sure warning message returned * thorw error * Add more tests, use fs * add comment * Clean * Address feedback, add IPU tests * Fixes * Fix signature * Add types * Remove autoround * Add docstring * ipu_cores -> ipus * Add test, remove unnecessary precision set * Add optimizer test * Add precision back with test * Address code review * Change to probs * Move some of the asserts earlier Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-06-11 15:07:04 +00:00
Adrian Wälchli	cfd01d7f8d	move amp checkpoint state management to precision plugin (#7831 )	2021-06-07 07:45:01 +00:00
Ethan Harris	03bb389b21	Fix double precision + ddp_spawn (#6924 ) * Initial fix * Initial fix * Initial fix * Updates * Updates * Update typing and docs * Undo accidental refactor * Remove unused imports * Add DDP double precision test * Remove unused variable * Update CHANGELOG.md * Fix test * Update tests * Formatting * Revert bad change * Add back changes * Correct wrapping order * Improve unwrapping * Correct wrapping order * Fix... finally * Respond to comments * Drop ddp test * Simplify ddp spawn test * Simplify ddp spawn test Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-06-01 15:21:17 +00:00
Carlos Mocholí	d47173bb72	Use typing forward references (#7770 ) * Use typing forward references * Update pytorch_lightning/core/lightning.py	2021-05-31 09:54:28 +02:00
shuyingsunshine21	299f2c481b	FSDP with full state dict (#7487 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * fix version for ddp plugin test * fix * fix * changelog * Update CHANGELOG.md * fsdp with full state dict * fix missing import * modify unitest * fix * fix * fix typo * modify test and add changelog * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * limit max_epoch to 1 for testing * test * fix * update * testing remove special for multi gpu * assert gpu * add assertion for gpu * fix * Re-enable special test, use ModelCheckpoint * Fix paths * Fix path passing * test * test * fix test * fix * pre-commit format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-24 08:11:45 +01:00
Carlos Mocholí	8208c330eb	Use `torch.nn.utils.clip_grad_norm_` and add `clip_grad_by_value` support for TPU (#7025 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-05-07 16:41:39 +00:00
thomas chaton	16d6c9828d	[bugfix] Apex never instantiated. (#7274 ) * update * update * update apex * update * update * update * remove test.py * update * update * update on comments * update changelog * update * update * typo	2021-04-30 13:16:28 -04:00
Carlos Mocholí	ca6c87ffbe	Add back `clip_gradients(model)` (#7231 )	2021-04-27 11:34:02 +00:00
ananthsub	3f1a08ab00	Fix mypy checks for double precision plugin (#7151 )	2021-04-22 11:29:38 +01:00
thomas chaton	013756404b	[bugfix] Add set_default_tensor_type to torch.DoubleTensor with precision=64 (#7108 ) * update * Update pytorch_lightning/plugins/precision/double.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/plugins/precision/double.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/plugins/precision/double.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve tests Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-20 15:25:37 +00:00
Carlos Mocholí	898ec8a94a	Create pytorch_lightning/utilities/types.py (#7048 )	2021-04-19 14:43:16 +02:00
Carlos Mocholí	f29ecbfd90	Typing for accelerators and plugins (#7022 )	2021-04-15 16:48:16 +00:00
Ethan Harris	f645df5e9a	Add typings for evaluation_loop.py and remove some dead code (#7015 )	2021-04-15 07:36:04 +00:00
Adrian Wälchli	d3f73a0a74	Plugin Docs (#6952 ) Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-04-14 20:53:21 +00:00
Anthony Kim	7f6154fcad	Add `Trainer(gradient_clip_algorithm='value'\|'norm')` (#6123 ) * add changelog * add clip by value * fix bug in training tricks.rst * fix bug in trainer.rst * Update trainer.rst * Update trainer.rst * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/precision/deepspeed_precision.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/utilities/enums.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * yapf formatting * update training tricks * update based on comment * update based on comment * Update pytorch_lightning/trainer/trainer.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * update based on comment * pep8 * mypy * mypy * Update docs/source/advanced/training_tricks.rst Co-authored-by: thomas chaton <thomas@grid.ai> * Update sharded_native_amp.py * Update test_sharded_parity.py * update test codes * Update test_tpu.py * Update pytorch_lightning/trainer/connectors/training_trick_connector.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update test_trainer.py * Update enums.py * Update enums.py * add super-class initialization to precision plugins. * add clip_grad horovod cpu test * add clip_grad horovod cpu test * use subprocess check_call * change order of horovod tests * set max_epochs 2 in horovod test * remove clip_grad_val test from horovod-cpu * remove "type: ignore" * divide clip grad val test in horovod * update based on comments * add super-class initialization to precision plugins. * bugfix * bugfix * revert some changes * revert some changes * Update tests/models/test_horovod.py * merge master * Delete signature test No point in testing a signature Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-04-06 08:27:37 -05:00
Kaushik B	a72a7992a2	Update clip gradients signature for precision plugins (#6764 )	2021-03-31 17:06:48 +05:30
thomas chaton	1302766f83	DeepSpeed ZeRO Update (#6546 ) * Add context to call hook to handle all modules defined within the hook * Expose some additional parameters * Added docs, exposed parameters * Make sure we only configure if necessary * Setup activation checkpointing regardless, saves the user having to do it manually * Add some tests that fail currently * update * update * update * add tests * change docstring * resolve accumulate_grad_batches * resolve flake8 * Update DeepSpeed to use latest version, add some comments * add metrics * update * Small formatting fixes, clean up some code * Few cleanups * No need for default state * Fix tests, add some boilerplate that should move eventually * Add hook removal * Add a context manager to handle hook * Small naming cleanup * wip * move save_checkpoint responsability to accelerator * resolve flake8 * add BC * Change recommended scale to 16 * resolve flake8 * update test * update install * update * update test * update * update * update test * resolve flake8 * update * update * update on comments * Push * pull * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * Apply suggestions from code review * Swap to using world size defined by plugin * update * update todo * Remove deepspeed from extra, keep it in the base cuda docker install * Push * pull * update * update * update * update * Minor changes * duplicate * format * format2 Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-03-30 13:39:02 -04:00
Ethan Harris	d02fe342c1	Feature/double precision (#6595 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>	2021-03-24 15:47:58 +05:30
Justus Schock	634d83134f	Add AMP for validation, prediction and testing (#6565 ) * Add Tests for val and test-steps * Add native AMP * pep8 tests * pep8 plugin * changelog	2021-03-20 23:15:49 +00:00
Kaushik B	87c03b1038	Update Gradient Clipping for TPU Accelerator (#6576 )	2021-03-20 01:02:57 +05:30
thomas chaton	0544efd453	[bug] Update broadcast + reduce decision ModelCheckpoint] (#6410 ) * resolve bug * update * update changelog * update PR * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * add todo * resolve issues * resolve flake8 * update * add coverage for reduce * wip * restore back to brodbact * remove test.py * resolve flake8 * update * check world size * resolve test * update * use pytorch version when defined * update on comments * update on comments * flake8 * resolve bugs * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update * update * update * update * remove test * update * resolve flake8 * update * update * update * proxy * update * update * resolve typo * prune * update parallel * update Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-14 17:14:27 +00:00
Sean Naren	39231aee1a	[Fix] Call clip gradients if clip val greater than 0 (#6330 ) * Call clip gradients if clip val greater than 0 * format * Format * Move to top of file	2021-03-04 19:45:58 +00:00
Jirka Borovec	dcec4efe03	Simplify test for AMP plugins (#6311 ) * AMP * fuse * yapf	2021-03-03 08:56:57 +01:00
Jirka Borovec	58a6d59784	simplify skip-if tests >> 0/n (#5920 ) * skipif + yapf + isort * tests * docs * pp	2021-03-01 12:17:09 +00:00
Justus Schock	0647340f3b	Add mypy typing to precision plugins. (#6149 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2021-02-26 14:27:16 +01:00
Kaushik B	e7298b5d38	fix parallel devices return type & add copyright (#6215 )	2021-02-26 11:09:08 +01:00

1 2

59 Commits