lightning

Commit Graph

Author	SHA1	Message	Date
Carlos Mocholí	ed13040729	Connect the model to the training type plugin at the start of run (#8536 )	2021-08-04 17:43:34 +02:00
Caleb Robinson	9ca02f58ae	Fix an import deprecation warning (#8687 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-08-03 22:17:28 +00:00
Jirka Borovec	f67892ea96	CI: yesqa (#8564 ) * add yesqa * fix flake8 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-08-02 16:05:56 +00:00
Sean Naren	07b7dc9c17	[Fix] Add delay property for checkpointing, refactor loading checkpoint (DeepSpeed Checkpointing Fix 1/n) (#8627 ) * Add property to delay checkpointing, move loading checkpoint file into the run function to allow deepspeed engine to be loaded * Add a small test * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/accelerators/accelerator.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Address review * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-30 11:31:08 +01:00
Santiago Castro	b256d6acd3	Avoid unnecessary list creation (#8595 )	2021-07-28 13:36:45 +05:30
Carlos Mocholí	a64cc37394	Replace `yapf` with `black` (#7783 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-07-26 13:37:35 +02:00
thomas chaton	c9af1a7aec	[bugfix] Reduce memory leaks (#8490 ) * reduce memory leak * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update changelog * Apply suggestions from code review Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * resolve flake8 * update on comments * resolve bug * update * Undo whitespace changes * remove bug * resolve flake8 * revert change * update on comments * delete the ddp wrapper as it hold memory * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve flake8 * update on comments * update changelog * resolve test * Update CHANGELOG * Refactor teardown * Fix comment * Do it for non-gpu too * remove ref when the model is not a lightning_module * Fix import error * move down * resolve bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve assignement * update * move above * Fix device calls to support tpu training * Updat todo Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com>	2021-07-21 11:37:05 +02:00
Carlos Mocholí	6ce77a102b	Set minimum PyTorch version to 1.6 (#8288 ) Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2021-07-13 17:12:49 +00:00
Carlos Mocholí	c5a120ed9d	Update to Mypy>0.9 (#8386 )	2021-07-13 08:23:36 +02:00
Carlos Mocholí	eb6d991218	Refactor plugins backward (#8328 )	2021-07-08 16:02:09 +02:00
Adrian Wälchli	d73c32ab51	move `torch.cuda.set_device()` to enable collective calls earlier in setup (#8312 )	2021-07-07 13:15:41 +02:00
Adrian Wälchli	ea5cfd2005	move batch to device before sending it to hooks (#7378 ) * update train step * test * x * limits * val * typeo * x * x * step * min gpus * run all loops * x * limit test * profiler * clean up accelerator code * move files * rename * move tests * changelog * reorder callbacks and model hooks * add test description * replace unneccessary method * fix chlog * adjust batch_to_device for DP Plugin * update tests for dataloader idx * unused imports * hook change * switch None * clear memory * change to None * None * None * memory savings * remove redundant todo * hack * cheat * Revert "cheat" This reverts commit `a8433bd0b4`. * Revert "hack" This reverts commit `43a6d1edeb`. * update new epoch loop * remove from old loop code * update chlog * update hook test * changelog * teardown * integrate changes in new eval loop * fix hook calls * add prediction step * bad merge * Revert "bad merge" This reverts commit `488080863c`. * fix train batch hook test * rm -rf _notebooks * update chlog * release memory * fix type * notebooks mess * debug * Revert "debug" This reverts commit `eec4ee2f77`. * teardown * fix teardown bug * debug * x * debug * Revert "debug" This reverts commit `a6e6101946`. Revert "debug" This reverts commit `5ddeaec069`. debug debug Revert "debug" This reverts commit 605be746f7daedf265b2c05a1c153ce543394435. Revert "Revert "debug"" This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8. debug x x x s tol x tol * Fix changelog Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-07-05 09:31:39 +01:00
Carlos Mocholí	74eb6cc7e9	Clean `cuda.empty_cache` usage (#8199 )	2021-06-30 13:04:24 +02:00
deepsource-autofix[bot]	03154eb30a	Refactor unnecessary `else` / `elif` when `if` block has a `return` statement (#8156 ) Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>	2021-06-28 15:27:41 +05:30
Carlos Mocholí	4d9b72b8a9	Nuke RPC (#8101 )	2021-06-23 18:31:13 +00:00
Sean Naren	96433d03ea	IPU Integration 5/5 (#7867 ) * Initial changes * Add broken example for now * Fix reference * Fix format * Code runs * Fixes * Clear up files * Add tests, helpers, fixes * Small cleanups * Refactors based on review * Swap to special tests * Add special tests * Add source * Cleanups * Add logic to attach/detach model from devices * Fixes for tests * Fixes for tests * Move earlier * Cleanups * Add check for nvcc * Add tests, cleanups * Fix errors * fix * Try condition * Add missing annotation * Clearer * Clearer message * Fix variable * Cleanups * Add comment * CHANGELOG.md * Add simple selection test * Remove special=True to see what happens * Fix test * Update tests/accelerators/test_ipu.py Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> * Convert ipu_cores -> ipus * Add typing, fail earlier * simplify precision * Add test, add helper * fix accum * Update pytorch_lightning/plugins/training_type/ipu.py Co-authored-by: thomas chaton <thomas@grid.ai> * Use stages * Make sure warning message returned * thorw error * Add more tests, use fs * add comment * Clean * Address feedback, add IPU tests * Fixes * Fix signature * Add types * Remove autoround * Add docstring * ipu_cores -> ipus * Add test, remove unnecessary precision set * Add optimizer test * Add precision back with test * Address code review * Change to probs * Move some of the asserts earlier Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-06-11 15:07:04 +00:00
Sean Naren	41be61c6f2	[IPU] Add hooks for IPU lifecycle 4/5 (#7864 )	2021-06-07 12:06:41 +00:00
Sean Naren	6388c29e87	[IPU] Add reset dataloader hooks to training type plugin 3/n (#7861 ) * Add hooks * Add tests for hooks * Add changelog * Test changes, add typing	2021-06-07 10:37:09 +00:00
shuyingsunshine21	2242423b75	refactor accelerator teardown -> training type plugin teardown (#7579 )	2021-05-22 13:19:24 -07:00
Rohit Gupta	7ca41734da	Add `dataloader_idx` to batch transfer hooks (#6241 ) * replace with kwargs * chlog * fix * add test * fix * device * deepspeed * pep * optional * docs * bc * comments * pep * mypy * pep * Apply suggestions from code review * kwargs * docs * . * . * 1.3 -> 1.4 * kwargs -> step_kwargs	2021-05-13 23:03:55 +05:30
shuyingsunshine21	8538c1f61e	Accelerator model state dict (#7474 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * modify model state dict to training type plugin * remove changes * add changelog * fixing isort for pre-commit failure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address code review Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-11 16:39:04 +01:00
Adrian Wälchli	6bc616d78f	fix display bug (#7395 )	2021-05-10 11:26:15 +08:00
Carlos Mocholí	8208c330eb	Use `torch.nn.utils.clip_grad_norm_` and add `clip_grad_by_value` support for TPU (#7025 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-05-07 16:41:39 +00:00
Jirka Borovec	b181b8c646	release 1.3.0 (#7404 ) * v1.3.0 * ci event * chlog * badge * formatting	2021-05-06 15:05:35 -04:00
Martin Kristiansen	c3fc0313ef	Updating docs and error message: half precision not available on CPU (#7384 ) * Updating docs and error message to specify that half precission not available on CPU * update messages Co-authored-by: Martin Kristiansen <martinkristiansen@sixgill.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-05-06 09:05:50 +00:00
ananthsub	6104a6316a	[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks (#7339 ) * Remove outputs from on_train_epoch_end * iterate * Update callback_hook.py * update * Update training_loop.py * Update test_training_loop.py * early stop? * fix * update tests * Update test_hooks.py * Update pytorch_lightning/trainer/callback_hook.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update pytorch_lightning/trainer/training_loop.py Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> * Update trainer.py * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 17:18:16 +02:00
ananthsub	98670c83a9	Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule (#7323 ) * deprecate-tbptt-trainer * Update CHANGELOG.md * Update lightning.py * test * Update lightning.py * Update training_loop.py * Update training_loop.py * Update lightning.py * Update training_loop.py * Update training_loop.py * update docs * Update accelerator.py * Update accelerator.py * more docs * tweaks * chlog * comments Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 11:21:00 +01:00
Carlos Mocholí	8c0ea92af2	`TrainerState` refactor [5/5] (#7173 ) * `TrainerState` refactor * flake8 * Update finished check * Test cleanup * Fix tests * Fixes * Reorder * flake8 * Update CHANGELOG * Better docs * Better docs * Remove default * Update tests * Bad merge	2021-05-04 12:50:56 +02:00
ananthsub	39274273a4	Update accelerator.py (#7318 )	2021-05-03 11:17:26 -04:00
Adrian Wälchli	e0c64f0ef6	Fix Adagrad optimizer not working with DDP/GPU (#7277 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-05-03 03:57:17 +05:30
thomas chaton	16d6c9828d	[bugfix] Apex never instantiated. (#7274 ) * update * update * update apex * update * update * update * remove test.py * update * update * update on comments * update changelog * update * update * typo	2021-04-30 13:16:28 -04:00
Kaushik B	94fcaaf5d7	Add `debug` flag to TPU Training Plugins (PT_XLA_DEBUG) (#7219 )	2021-04-27 20:34:25 +00:00
Carlos Mocholí	ca6c87ffbe	Add back `clip_gradients(model)` (#7231 )	2021-04-27 11:34:02 +00:00
Kaushik B	6be0a859db	Update teardown for TPU acc (#7211 )	2021-04-26 13:30:46 +01:00
ananthsub	bc3f08b0e3	[fix] Add barrier to accelerator's teardown (#6814 )	2021-04-26 09:23:29 +00:00
thomas chaton	f58865aada	Properly set `LightningModule.device` after model replacement (#7188 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-23 16:36:52 +02:00
thomas chaton	013756404b	[bugfix] Add set_default_tensor_type to torch.DoubleTensor with precision=64 (#7108 ) * update * Update pytorch_lightning/plugins/precision/double.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/plugins/precision/double.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/plugins/precision/double.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve tests Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-20 15:25:37 +00:00
Kaushik B	f168a535ca	Add MpModelWrapper in TPU Spawn (#7045 ) Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-20 13:05:27 +00:00
Carlos Mocholí	898ec8a94a	Create pytorch_lightning/utilities/types.py (#7048 )	2021-04-19 14:43:16 +02:00
thomas chaton	7b0b0d2844	update (#7056 )	2021-04-16 21:22:19 +01:00
Carlos Mocholí	f29ecbfd90	Typing for accelerators and plugins (#7022 )	2021-04-15 16:48:16 +00:00
Ethan Harris	f645df5e9a	Add typings for evaluation_loop.py and remove some dead code (#7015 )	2021-04-15 07:36:04 +00:00
CeShine Lee	24d0295ff1	Fix the `gradient_clip_algorithm` has no effect issue. (#6928 )	2021-04-14 14:17:06 +05:30
Kaushik B	1b3e4f9fb9	Fix sync_dist for tpus (#6950 )	2021-04-13 14:17:15 +05:30
Adrian Wälchli	20ff50caa6	Accelerator API docs (#6936 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-04-10 12:25:07 +05:30
Kaushik B	55525031c6	Fix TPU Spawn gather (#6896 )	2021-04-09 18:30:59 +05:30
Anthony Kim	7f6154fcad	Add `Trainer(gradient_clip_algorithm='value'\|'norm')` (#6123 ) * add changelog * add clip by value * fix bug in training tricks.rst * fix bug in trainer.rst * Update trainer.rst * Update trainer.rst * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/precision/deepspeed_precision.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/utilities/enums.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * yapf formatting * update training tricks * update based on comment * update based on comment * Update pytorch_lightning/trainer/trainer.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * update based on comment * pep8 * mypy * mypy * Update docs/source/advanced/training_tricks.rst Co-authored-by: thomas chaton <thomas@grid.ai> * Update sharded_native_amp.py * Update test_sharded_parity.py * update test codes * Update test_tpu.py * Update pytorch_lightning/trainer/connectors/training_trick_connector.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update test_trainer.py * Update enums.py * Update enums.py * add super-class initialization to precision plugins. * add clip_grad horovod cpu test * add clip_grad horovod cpu test * use subprocess check_call * change order of horovod tests * set max_epochs 2 in horovod test * remove clip_grad_val test from horovod-cpu * remove "type: ignore" * divide clip grad val test in horovod * update based on comments * add super-class initialization to precision plugins. * bugfix * bugfix * revert some changes * revert some changes * Update tests/models/test_horovod.py * merge master * Delete signature test No point in testing a signature Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-04-06 08:27:37 -05:00
ananthsub	bb9ace4333	[typing] Add typehint for broadcast in training type plugin (#6777 ) * Update training_type_plugin.py * Update accelerator.py * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2021-04-02 20:55:34 +02:00
Kaushik B	a72a7992a2	Update clip gradients signature for precision plugins (#6764 )	2021-03-31 17:06:48 +05:30
thomas chaton	1302766f83	DeepSpeed ZeRO Update (#6546 ) * Add context to call hook to handle all modules defined within the hook * Expose some additional parameters * Added docs, exposed parameters * Make sure we only configure if necessary * Setup activation checkpointing regardless, saves the user having to do it manually * Add some tests that fail currently * update * update * update * add tests * change docstring * resolve accumulate_grad_batches * resolve flake8 * Update DeepSpeed to use latest version, add some comments * add metrics * update * Small formatting fixes, clean up some code * Few cleanups * No need for default state * Fix tests, add some boilerplate that should move eventually * Add hook removal * Add a context manager to handle hook * Small naming cleanup * wip * move save_checkpoint responsability to accelerator * resolve flake8 * add BC * Change recommended scale to 16 * resolve flake8 * update test * update install * update * update test * update * update * update test * resolve flake8 * update * update * update on comments * Push * pull * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * Apply suggestions from code review * Swap to using world size defined by plugin * update * update todo * Remove deepspeed from extra, keep it in the base cuda docker install * Push * pull * update * update * update * update * Minor changes * duplicate * format * format2 Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-03-30 13:39:02 -04:00
Kaushik B	f79a13e495	[Model Parallel] Add configure sharded model hook (#6679 ) * Add base hook for model parallel * fix callback signature * Simplify hook * Add hook logic * add tests * add property setter * add logic for being called once * Update changelog * Fix * fix return type * fix lambda callback test * Fix tests * Apply code suggestions * add logic for setup_optimizers_predispatch * add common dummy model * Swap call order * Remove test that isn't needed anymore * Update tests * Add a bit more doc * Few code review fixes * Update pytorch_lightning/accelerators/accelerator.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Change hook name * Fix test * Test setup hook, refactor names * Swap call order of callbacks and model initialization * Change name of context manager Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-29 14:50:51 -06:00
thomas chaton	646cf2f7d4	[refactor] Move save_function to accelerator 1/n [DeepSpeed] (#6689 ) * move save_checkpoint responsability to accelerator * update	2021-03-29 21:02:37 +02:00
Jirka Borovec	d471fa30b3	add copyr (#6661 )	2021-03-24 14:29:46 +01:00
thomas chaton	fd5cb7fcc3	Add PyTorch 1.8 Profiler 5/5 (#6618 ) * Refactor profilers * Update PassThrough * WIP - This is broken and will change * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: thomas chaton <thomas@grid.ai> * resolve tests * resolve tests * find output * try something * update * add support for test and predict * update * update * use getattr * test * test * update * tests * update * update * update * update * update * remove file * update * update * update * update * update * test * update# * update * update tests * update * add suport for 1.8 * rename records * add support for 1.8 * update * resolve flake8 * resolve test * Refactor basic profilers * Fixes * Unused import * Introduce setup * Profile on all ranks. Print to stdout on 0 * Introduce dirpath + filename * CHANGELOG * Add tests. Address comments * add `on_run_stage_setup` * add on_run_stage_setup function * update * add test for RegisterRecordFunction * update lightnng flow direction * move variable to private * remove trace * Undo code that should be in 3/4 * Multi-stage multi-rank * 2/5 changes * Pass stage in __del__ * Remove TODOs * Describe on_evaluation_end. Add tests * Typo * Address comments * deepcopy tests * Advanced teardown * Fix teardown test * Fix tests * Minor change * Update CHANGELOG.md * Fix test * Quick fixes * Fix 6522 * resolve ddp tests * resolve tests * resolve some tests * update tests * resolve tests * update * resolve tests * resolve some tests * Missed fixes from 3/5 * Fixes * resolve some tests * resolve test for 1.7.1 * Broken refactor * Missed stage * Minor changes * resolve tests * Update CHANGELOG * resolve bug * remove print * Typo * Cleanup * resolve ddp test * remove barrier * update profiler * update * Smaller model * update * resolve tests * update * Minor changes. CHANGELOG * Minimize diff * update to 1.8.1 * RunIf. Extra code. Check segfault * resolve tests * Typo. Bad merge * Fixing a bad merge * replace for kineto * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Update pytorch_lightning/profiler/pytorch.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * Minor changes * Bad merge * Use lists for flexibility * Use sets * predict_step * Ananth's suggestion * update * Docs * Update pl_examples/basic_examples/profiler_example.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update example * update example Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-23 20:43:21 +00:00
Jirka Borovec	3cf0c3117a	fix back-compatibility for Accel (#6655 )	2021-03-23 17:41:36 +01:00
thomas chaton	0995d30fab	Flash predict step (#6577 ) * add predict_step * Update predict_loop.py * Update trainer.py * Update trainer.py * resolve bugs * update * update * update * resolve bug * resolve some failing tests * udpate tests * update * resolve tests * add a test * remove typo * add a test for attachement * update * changed to on_train_dataloader * remove __flash_special_attr__ * resolve tests * update * update * update * update on comments * Update pytorch_lightning/trainer/data_loading.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-03-23 11:13:13 -04:00
Sean Naren	58c9fa7edb	Allow training type plugin to delay optimizer creation (FSDP 2/n) (#6331 ) * Allow training_type_plugin to delay optimizer configure * Add missing references to trainer, add a CPU accelerator based test	2021-03-22 11:43:53 +00:00
Kaushik B	87c03b1038	Update Gradient Clipping for TPU Accelerator (#6576 )	2021-03-20 01:02:57 +05:30
Ethan Harris	983a888f49	Fix all_gather for tpu_cores=8 (#6587 )	2021-03-19 21:56:58 +05:30
Sean Naren	4e9b453854	[Fix] Move init dist connection into the setup function (#6506 ) * Move connection setup into the setup function. Call setup hook after we set up the accelerator * Added CHANGELOG.md * fix setup order in callback test * fix input arguments in test * Mock distributed function, remove protection to turn into training type hook * Remove import * Add missing mock, ensure custom plugin does not create children process * Skip test on windows * Update deepspeed to init connection in setup * Do not initialize distributed module * Move DeepSpeed tests to special tests since dist communication is being set up * Special the test to see if this fixes CI * Delete accelerator connector test to see if its causing build to fail * Delete deepspeed test * Revert "Delete accelerator connector test to see if its causing build to fail" This reverts commit `edde60b8` * Revert "Delete deepspeed test" This reverts commit `9d317429` * Reverse hook * Reverse setup hooks to debug again * Add todo so i know where i left off * For single device move in pre_dispatch after setup function * Add additional model to device hook if any additional parameters have been set * See if we can enable deepspeed tests * Revert "See if we can enable deepspeed tests" This reverts commit `b5450def` * See if this hook approach works * Introduce new granular hooks * Remove import, fix tpu spawn by moving the function to setup * Added missing special test Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-18 14:33:39 -07:00
Jirka Borovec	6453091b8a	Prune metrics base classes 2/n (#6530 ) * base class * extensions * chlog * _stable_1d_sort * _check_same_shape * _input_format_classification_one_hot * utils * to_onehot * select_topk * to_categorical * get_num_classes * reduce * class_reduce * tests	2021-03-15 19:28:18 +00:00
thomas chaton	0544efd453	[bug] Update broadcast + reduce decision ModelCheckpoint] (#6410 ) * resolve bug * update * update changelog * update PR * Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * add todo * resolve issues * resolve flake8 * update * add coverage for reduce * wip * restore back to brodbact * remove test.py * resolve flake8 * update * check world size * resolve test * update * use pytorch version when defined * update on comments * update on comments * flake8 * resolve bugs * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update * update * update * update * remove test * update * resolve flake8 * update * update * update * proxy * update * update * resolve typo * prune * update parallel * update Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-14 17:14:27 +00:00
Rohit Gupta	c53edce1a1	Disable batch transfer in DP mode (#6098 ) * add exceptions and test * hook * fix * clean up * clean up * regex * regex * docs * rev * comment and docs * chlog * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Apply suggestions from code review Co-authored-by: chaton <thomas@grid.ai> * Monkey-patch device count * docs * pep * api_change Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: chaton <thomas@grid.ai>	2021-03-11 10:51:10 -05:00
Elia Cereda	d0596fac94	Refactor RunningStage usage in advance of implementing Trainer.validate() (#4945 ) * Update code Co-authored-by: EliaCereda * More property updates * Move properties. Introduce trainer._fitting * Use trainer.fitting * Fix reset dataloaders * Unused code * RunningStage.SANITY_CHECKING * Use setters * Fix bugs * Fix bugs * TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING} * Fix bugs * Fix bugs * Fix tests * Update CHANGELOG. Add deprecation warning. Fix tests * Unused imports * Optional trainer * More deprecation. More refactoring * Correct version * Use properties * Address comments * flake8 * Missed renamings * Typo * is -> == It is recommended to use for Enums since they are singletons, however, since the LightningEnum subclasses str, it's not a good idea in case a user sets the state/stage with a str * Also for tests * Typo * Address @tchaton's comments * PEP8 * Correct property * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Remove called sanity check Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-03-06 12:40:19 +00:00
Kunal Mundada	49c579f1f0	docstring changes in accelerators (#6327 ) * docstring changes in accelerators * docstrings moved * whitespaces removed * PEP8 correction[1]	2021-03-04 20:21:53 +00:00
Sean Naren	d01e8fdc86	[fix] Use training type plugin hook when saving (FSDP 1/n) (#6321 ) * Rely on training type plugin when saving * Add better typing to training type plugin	2021-03-04 18:09:33 +00:00
thomas chaton	484dce11ec	[bugfix] TPU + all_gather + SingleTPU shouldn't call xm.all_gather (#6296 ) * resolve an issue with TPU * update * add changelog	2021-03-03 08:54:20 -05:00
thomas chaton	1aac481957	[bugfix] TPU test hangs to barrier on 1 process (#6272 ) * update * resolve flake8 * update * update * update changelog * update * resolve flake8 Co-authored-by: Your Name <you@example.com>	2021-03-02 18:01:35 -05:00
Jirka Borovec	58a6d59784	simplify skip-if tests >> 0/n (#5920 ) * skipif + yapf + isort * tests * docs * pp	2021-03-01 12:17:09 +00:00
Justus Schock	0647340f3b	Add mypy typing to precision plugins. (#6149 ) Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2021-02-26 14:27:16 +01:00
Justus Schock	3ed8ef8af9	type accelerators (#6148 )	2021-02-25 06:42:23 +00:00
Adrian Wälchli	ae6ce17598	fix amp/apex misconfiguration error for cpu (#6107 ) * fix weird test * fix apex plugin test * fix raise * cpu test * fix type * add changelog	2021-02-22 01:02:31 +01:00
Adrian Wälchli	fc9bb53e13	fix flake8 for new plugins (#5951 ) * flake8 * fix cyclic import * isort	2021-02-18 18:28:23 +00:00
Rohit Gupta	bcc0004955	Add before_batch_transfer and after_batch_transfer hooks (#3671 ) * add hooks * comment * docs * add tests * make it private * fix tests * docs * chlog * testcode * codefactor * fix doctest * fix doctest * suggestions * is always overriden * pep and BoringModel * BoringModel * docs * docs * docs * fix * rebase * rebase * suggestions * docs * suggestions * try fix docs * docs * update name * yapf * docs * rebase * yapf	2021-02-18 06:58:12 -05:00
Sean Naren	b019c25dde	Add descriptions to accelerator broadcast function/clean up all_gather (#6044 ) * Add descriptions to accelerator broadcast function/clean up all_gather * Remove todo	2021-02-18 06:12:52 -05:00
Sean Naren	b7c2e0a80e	Trainer only references accelerator (#6039 ) * Trainer only references accelerator where it can * Move teardown to the trainer, as it is reponsible for the accelerator	2021-02-17 21:41:18 +01:00
Sean Naren	7189d673f6	DeepSpeed Integration (#5954 ) * Add initial deepspeed changes * Address code review * Move static method outside of function * Fixes * Add missing annotation * Remove seed setting * Doc changes * Doc changes, add address reviews * Fix docs * Try fixing issue by moving to torch adam * Clean up check * Changes, better APIs! * Add wrapper, swap to git install revision * Add special test * Add warning * Address review * Add better disclaimer * Turn off ZeRO for testing due to compilation * Add description on modifying parameters via the plugin * Doc strings clear * Small doc fixes * Fix hash, reduce test * Added CI change * Move to azure pipeline * Fix test name * Add missing flag * Remove sudo... * Try conda instead * Swap to conda base * Try suggested install * Apply suggestions from code review * Apply suggestions from code review * Revert "Apply suggestions from code review" This reverts commit `41cca05a` * Revert "Apply suggestions from code review" This reverts commit `e06ec29e` * Remove setter * Address most review * Move out function, remove DeepSpeed from requirements * Install deepspeed/mpi4py within container * Use special tests, move to master commit for deepspeed * Export path * Force compile to happen first * Remove! * Debugging ninja * Fix error in optimizer step logic * Attempt to fix symbolic link * Reverse to aid debugging * Export path again * Clean up mess * var * Revert "var" This reverts commit `3450eaca` * Address review, add todo * Add note about unsupported functionality Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-02-17 15:23:42 -05:00
Adrian Wälchli	e0bb33c8cc	move accelerator_connector.py to the connectors subfolder (#6033 ) * move accelerator connector * rename BackendConnector -> AcceleratorConnector	2021-02-17 18:38:08 +00:00
chaton	6bc4490d01	[HotFix] Resolve TPU Training (#6027 ) * fix tpus * update * add back reduction in val_loss * resolve some bugs with TPUs * update changelog * update on comments * forgot status * Fix train_bn arg * resolve comments * update on comments Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-02-17 16:40:13 +00:00
chaton	e982800b81	Add PredictLoop (#5752 ) * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * add predict_loop * manual optimization * clean predictloop * update optimizer routing * add predict loop on new accelerator * resolve a bug * add rank to torchelastic * add predict_loop * add predict loop on new accelerator * resolve a bug * fix memory mixed precision * update * setstate on trainer for pickling in ddp spawn * add predict_loop * clean predictloop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * add predict_loop * clean predictloop * add predict loop on new accelerator * resolve a bug * add predict_loop * add predict loop on new accelerator * resolve a bug * resolve tests * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * remove sanetize * rename train to run_train * remove useless hooks * add misconfigurationException * remove wrong naming * resolve some legacy * udpate docstring * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * resolve flake8 * update code * update * Update pytorch_lightning/trainer/predict_loop.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/trainer/predict_loop.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * fix merge * fix merge * reset legacy accelerator * add missing rename dispatch * rename post traning * update code * resolved comments * typo * typo * add flow description * resolve comments * update on comments * update flow * add backticks * resolve tpu Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-16 17:11:56 -05:00
chaton	a52be5bb07	[Hot Fix] Ensure process_dataloader is called when tpu_cores > 1 to use Parallel DataLoader (#6015 ) * hotfix for tpu * update changelog * Update CHANGELOG.md Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2021-02-16 17:02:25 -05:00
chaton	6e79bef996	[accelerator][FeatBugFix] Improve manual optimization API (#5771 ) * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * update on comments * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * Apply suggestions from code review * formatting * update on comments * update typo * Update pytorch_lightning/trainer/properties.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * update on comments * resolve some comments * update on comments * resolve test * add toggle_model * update * update on comments * update doc * typo * update * typo * remove space * update * update on comments Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-16 16:00:35 -05:00
Jirka Borovec	936f42aa1c	clean AMP logic (#5994 ) * clean AMP logic * cleaning * ... * ... * Even apex	2021-02-16 19:06:47 +00:00
Adrian Wälchli	aa60c08641	move device-specific teardown logic from training loop to accelerator (#5973 ) * on train end * switch order	2021-02-15 17:38:03 -05:00
Adrian Wälchli	c912c4b729	remove legacy accelerators (#5949 ) * remove legacy accelerators * update imports * formatting Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-02-14 16:03:45 +00:00
Justus Schock	da6dbc8d1d	PoC: Accelerator refactor (#5743 ) * restoring the result from subprocess * fix queue.get() order for results * add missing "block_backward_sync" context manager * add missing "block_backward_sync" context manager * fix sync_batchnorm * fix supported gpu-ids for tuple * fix clip gradients and inf recursion * accelerator selection: added cluster_environment plugin * fix torchelastic test * fix reduce early stopping decision for DDP * fix tests: callbacks, conversion to lightning optimizer * fix lightning optimizer does not pickle * fix setting benchmark and deterministic option * fix slurm amp test * fix prepare_data test and determine node_rank * fix retrieving last path when testing * remove obsolete plugin argument * fix test: test_trainer_config * fix torchscript tests * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856) * resovle a bug * Accelerator refactor sharded rpc (#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * Use property in connector for sampler (#5913) * merge the import conflicts * fix spawning of processes in slurm * [wip] Fix some bugs for TPU [skip ci] (#5878) * fixed for single tpu * fixed spawn * fixed spawn * update * update * wip * resolve bugs * resolve bug * update on comment * removed decorator * resolve comments * set to 4 * update * update * need cleaning * update * update * update * resolve flake8 * resolve bugs * exclude broadcast * resolve bugs * change test * update * update * skip if meet fails * properly raise trace * update * add catch * wrap test * resolve typo * update * typo Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> * resolve some tests * update * fix imports * update * resolve flake8 * update azure pipeline * skip a sharded test on cpu that requires a gpu * resolve tpus * resolve bug * resolve flake8 * update * updat utils * revert permission change on files * suggestions from carlos Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting changes * remove incomplete comment * Update pytorch_lightning/accelerators/__init__.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * remove unrelated formatting change * add types * warn 1.7 ddp manual backward only if ddp kwarg unset * yapf + isort * pep8 unused imports * fix cyclic import in docs * Apply suggestions from code review * typer in accelerator.py * typo * Apply suggestions from code review * formatting * update on comments * update typo * Update pytorch_lightning/trainer/properties.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * suggestion from code review * suggestion from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Lezwon Castelino <lezwon@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-02-12 15:48:56 -05:00
Adrian Wälchli	bb7d188318	Fix ModelCheckpoint race condition in file existence check (#5155 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2021-02-05 21:40:39 +01:00
Sean Naren	524675705a	Sync working directory with all processes when using Hydra (#5629 ) * Append current work dir (hydra dir) to ensure all processes reference the same directory * Add changelog * Add different job name for DDP child processes to log to Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> (cherry picked from commit `ca68cac57a`)	2021-02-04 20:55:41 +01:00
rohitgr7	a37416843b	Fix sync resolve wrong merge tpu yapf	2021-02-03 20:11:35 +01:00
ananthsub	ad4e25fde5	[bugfix] Fix signature mismatch in DDPCPUHPCAccelerator's model_to_device (#5505 ) * Update ddp_cpu_hpc_accelerator.py * Update CHANGELOG.md	2021-02-03 19:41:45 +01:00
Rohit Gupta	293984bbc0	fix reinit_schedulers with correct optimizer (#5519 ) * update test * syntax * fix * update test * scheduler * only apex * fix * rev drone * chlog	2021-02-03 19:41:45 +01:00
lacrosse91	43e926b880	Fix Wrong exception message (#5492 ) * Update accelerator_connector.py * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2021-02-03 19:41:44 +01:00
Rohit Gupta	10c7dbe6a1	Refactor setup_training and remove test_mode (#5388 ) * ref and fix call for on_pretrained_routine * avoid failing tests * unnecessary_call * unnecessary call in accelerators * tmpdir * rm test_mode * pep * updates * more ref * Revert "more ref" This reverts commit `5d9e95f873`. * more refac Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-02-03 19:41:42 +01:00
Adrian Wälchli	8943d8bca0	add missing logic to new plugins and accelerator (#5734 ) * add missing logic * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d	2021-02-01 13:23:53 -05:00
Justus Schock	b3ebc18bcb	Hardware specific parts of Accelerator Refactoring (#5719 ) * add basic accelerator class. Co-Authored with @awaelchi * pep8 Co-authored-by: @awaelchi * add cpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add gpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add single device training Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add single tpu Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu spawn Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * make on_colab_kaggle utility func * add basic accelerator class. Co-Authored with @awaelchi * pep8 Co-authored-by: @awaelchi * add cpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add gpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu accelerator Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add single device training Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add single tpu Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu spawn Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * make on_colab_kaggle utility func * fixes * move * yapf * . * . * . * flake8 * sync accelerator connector changes from dev1.2 * changelog * fix tpu handling * tpu * aval * yapf * Update pytorch_lightning/plugins/training_type/tpu_spawn.py Co-authored-by: chaton <thomas@grid.ai> * Update pytorch_lightning/accelerators/accelerator_connector.py Co-authored-by: chaton <thomas@grid.ai> * Update pytorch_lightning/plugins/training_type/tpu_spawn.py Co-authored-by: chaton <thomas@grid.ai> * Update tpu_spawn.py * Update pytorch_lightning/accelerators/accelerator_connector.py Co-authored-by: chaton <thomas@grid.ai> * indentation Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: chaton <thomas@grid.ai>	2021-02-01 08:34:59 -05:00
Justus Schock	069ae27cef	Accelerator Refactor: Precision Plugins (#5718 ) * add basic accelerator class. Co-Authored with @awaelchi * add basic trainign type plugin. Co-Authored with @awaelchi * pep8 Co-authored-by: @awaelchi * update copyright Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add apex_amp Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add mixed base class Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add native amp Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add native amp sharded Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add tpu bfloat Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add inits Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update precision_plugin.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-01-31 13:12:02 -05:00
Adrian Wälchli	692f77b8a7	Refactor LightningDataParallel (#5670 ) * module * fix model access * scalar conversion * refactor * kwargs * auto unsqueeze * refactor code duplication * clean up * docs * update dp docs * changelog * generalize test * test * rename * warning cache * isort * unsqueezing test * device * device * scalar test * device * device * include coverage of overrides * clear * add deprecation test * docs * improve coverage * increase coverage * fix merge * extend test * rename base class * mention the predict method in docs * combine iteration over collection * remove override * move * line * Apply suggestions from code review * fix running stage * f401 * fix cyclic import Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-01-31 06:08:16 -05:00
Justus Schock	5d239ccd70	Base classes for accelerator refactoring (#5715 ) * add basic accelerator class. Co-Authored with @awaelchi * Add base plugin class. Co-authored with @awaelchi * add basic trainign type plugin. Co-Authored with @awaelchi * add basic precision plugin. Co-Authored with @awaelchi * Add missing inits. Co-authored with @awaelchi * pep8 Co-authored-by: @awaelchi * ignore flake8 * coverage omit * imports in init * lost * imports * flake8 * . * . * chlog * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/training_type/training_type_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-01-30 14:55:28 -05:00
chaton	3da28fd634	[feat] 1/2 Add trainer.predict (#5579 ) * start adding predict * add predict * resolve test * add predict * remove limit_predict * update * add test for predict * typo * update on comments * remove predict_step * update ddp_shareded * check ddp_sharded * resolve on comments * resolve isort * update dp * add test dp 1 gpu * made default forward * resolve path * resolve bug * update on comments * resolve doc * resolve bug * update * resolve bug * update on comments * resolve pep8 * update test doc * update on comments * solve special tests * resolve bug * resolve flake8 * Update pytorch_lightning/callbacks/progress.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update pytorch_lightning/trainer/trainer.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * add predict to LightningModule * missing predict * typo * rename is_prediction to _predicting * add * update * update * update doc Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-01-27 11:38:14 -05:00
Jirka Borovec	7e2e874d95	Refactor: legacy accelerators and plugins (#5645 ) * tests: legacy * legacy: accel * legacy: plug * fix imports * mypy * flake8	2021-01-26 20:04:36 -05:00

1 2 3 4 5 ...

310 Commits