lightning

Commit Graph

Author	SHA1	Message	Date
ananthsub	fa41c588f4	Remove ProfilerConnector class (#7654 ) * Remove ProfilerConnector class * Update trainer.py * Update CHANGELOG.md * Update trainer.py * Update trainer.py * tests	2021-05-24 08:58:15 -07:00
shuyingsunshine21	299f2c481b	FSDP with full state dict (#7487 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * fix version for ddp plugin test * fix * fix * changelog * Update CHANGELOG.md * fsdp with full state dict * fix missing import * modify unitest * fix * fix * fix typo * modify test and add changelog * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * limit max_epoch to 1 for testing * test * fix * update * testing remove special for multi gpu * assert gpu * add assertion for gpu * fix * Re-enable special test, use ModelCheckpoint * Fix paths * Fix path passing * test * test * fix test * fix * pre-commit format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-24 08:11:45 +01:00
Carlos Mocholí	3d4dd28bec	Replace `CallbackHookNameValidator` with `FxValidator` [3/n] (#7627 ) * Refactor FxValidator * Fix tests * Fix tests * Class attribute * Fix tests * Better error message * Fix tests * Update pytorch_lightning/trainer/connectors/logger_connector/fx_validator.py	2021-05-21 11:54:16 +01:00
Carlos Mocholí	901b2bac98	Unify `current_fx_name` and `current_hook_fx_name` [2/n] (#7594 ) * Minor loggger connector cleanup [1/n] * Missing line * Address comments * Rely on validator * Unify `current_fx_name` and `current_hook_fx_name` * Fix test	2021-05-19 20:31:06 +00:00
ananthsub	b4e28e7169	[feat] Add stronger validation for checkpoint_callback argument (#7539 ) * [feat] Add stronger validation for checkpoint_callback configuration * chlog * Update callback_connector.py * Update test_model_checkpoint.py * Update pytorch_lightning/trainer/connectors/callback_connector.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/trainer/connectors/callback_connector.py * Update tests/checkpointing/test_model_checkpoint.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-05-19 19:38:08 +00:00
Carlos Mocholí	76ff600898	Minor logger connector cleanup [1/n] (#7590 ) * Minor loggger connector cleanup [1/n] * Missing line * Address comments * Rely on validator	2021-05-19 19:25:32 +00:00
Nic Eggert	f4f51e0dcf	Add kubeflow cluster environment (#7300 ) * Add kubeflow cluster environment * Add KubeflowEnvironment to docs * Add KubeflowEnvironment to the changelog * break up a long line * Add method to detect kubeflow environment * Select Kubeflow environment when available * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Run pre-commit * task_idx == 0 Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2021-05-17 09:05:24 +01:00
Adrian Wälchli	6e6e29af49	remove trainer hidden state \| sanity refactor [2 / n] (#7507 )	2021-05-17 08:57:15 +01:00
Alan Du	6ac16ff348	Fix DistribType for `ddp_cpu` (spawn) (#7492 )	2021-05-14 20:53:26 +01:00
Rohit Gupta	7ca41734da	Add `dataloader_idx` to batch transfer hooks (#6241 ) * replace with kwargs * chlog * fix * add test * fix * device * deepspeed * pep * optional * docs * bc * comments * pep * mypy * pep * Apply suggestions from code review * kwargs * docs * . * . * 1.3 -> 1.4 * kwargs -> step_kwargs	2021-05-13 23:03:55 +05:30
Adrian Wälchli	dd1a17b071	Refactor result handling in training loop (#7506 ) * refactor results * rename dic -> dict * simplify * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix None check * chlog wording * move process_closure_result to the end Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-13 09:30:34 +01:00
shuyingsunshine21	8538c1f61e	Accelerator model state dict (#7474 ) * Fix some test errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * checkpoint consolidation * Update ddp_spawn.py * Update test_metric_result_integration.py * Update test_results.py * Update utils.py * Update utils.py * Update test_all_gather_grad.py * Update test_all_gather_grad.py * Update test_results.py * Revert "Update test_results.py" This reverts commit `9d4a2b891d`. * Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate" This reverts commit `c5053da789`, reversing changes made to `0d23d75bc9`. * Revert "Update test_all_gather_grad.py" This reverts commit `0d23d75bc9`. * Revert "Update utils.py" This reverts commit `70fe5da9c6`. * Revert "Update utils.py" This reverts commit `a9aae99f6e`. * Revert "Update test_results.py" This reverts commit `ea74906878`. * Revert "Update test_metric_result_integration.py" This reverts commit `bf70e431b3`. * Revert "Update ddp_spawn.py" This reverts commit `f17210183b`. * Revert "checkpoint consolidation" This reverts commit `536c1323b0`. * Revert "Revert "checkpoint consolidation"" This reverts commit `3a9fde915a`. * Revert "Revert "Revert "checkpoint consolidation""" This reverts commit `7a369f47e1`. * Revert "Revert "Update ddp_spawn.py"" This reverts commit `8222dc98ea`. * Revert "Revert "Update test_metric_result_integration.py"" This reverts commit `6c095b2370`. * Revert "Revert "Update test_results.py"" This reverts commit `250d0aaaa2`. * Revert "Revert "Update utils.py"" This reverts commit `8651d54d79`. * Revert "Revert "Update test_all_gather_grad.py"" This reverts commit `dcdcd29731`. * modify distributed environment to make test pass * modify model state dict to training type plugin * remove changes * add changelog * fixing isort for pre-commit failure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address code review Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: SeanNaren <sean@grid.ai>	2021-05-11 16:39:04 +01:00
Adrian Wälchli	ad9118f04a	remove trainer hidden state \| sanity refactor [1 / n] (#7437 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-11 11:09:08 +02:00
shuyingsunshine21	987530cd38	Set `num_nodes` and `sync_batchnorm` From Trainer for Manually Passed Training Type Plugin (#7026 ) Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2021-05-08 11:25:51 +00:00
Ethan Harris	45143fd825	Improve val step logging (#7351 ) * Fix val step logging * Add a type * Fix * Update CHANGELOG.md	2021-05-07 22:58:03 +00:00
ananthsub	98670c83a9	Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule (#7323 ) * deprecate-tbptt-trainer * Update CHANGELOG.md * Update lightning.py * test * Update lightning.py * Update training_loop.py * Update training_loop.py * Update lightning.py * Update training_loop.py * Update training_loop.py * update docs * Update accelerator.py * Update accelerator.py * more docs * tweaks * chlog * comments Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-05-05 11:21:00 +01:00
Carlos Mocholí	374ff750f5	Pass `current_epoch`/`global_step` as monitor candidates [1/2] (#7344 ) * Pass `current_epoch`/`global_step` as monitor candidates * Formatting * Fix deprecated test * Update CHANGELOG	2021-05-04 16:05:40 -04:00
Carlos Mocholí	8c0ea92af2	`TrainerState` refactor [5/5] (#7173 ) * `TrainerState` refactor * flake8 * Update finished check * Test cleanup * Fix tests * Fixes * Reorder * flake8 * Update CHANGELOG * Better docs * Better docs * Remove default * Update tests * Bad merge	2021-05-04 12:50:56 +02:00
Hemil Desai	82c19e1444	Update LR schedulers only when their corresponding Optimizer is being… (#4868 ) * Update LR schedulers only when their corresponding Optimizer is being used. In the case when optimizer frequencies are specified, the LR scheduler corresponding to a particular optimizer is updated only when that optimizer is being used in the training loop or epoch. * pep8speak fixes * Fix failing tests * Add docs * PR Feedback * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * formatting fix * PR Feedback - part 2 * More PR feedback * Apply suggestions from code review Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * Add typing imports * Stronger tests and fixes related to that * Add more tests plus PR feedback * Make optimizer_freq_cumsum a cached property @cached_property is only available after Python 3.8 so had to do it manually. * Fix tests * Apply suggestions from code review Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Avoid mutable defaults * Parametrize lr scheduling tests * PR feedback * Apply suggestions from code review * spell * Apply suggestions from code review * flake8 Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-05-04 09:37:40 +00:00
Kaushik B	6d7c6d6403	Update Accelerator Connector for Registry (#7214 )	2021-05-03 21:03:21 +00:00
Carlos Mocholí	5af086ab9f	Attach data refactor and tuner bugs [4/n] (#7258 ) Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-30 13:54:58 +00:00
Adrian Wälchli	b9b3fa371f	fix case where an IterableDataset doesn't produce a batch for an epoch (#7294 ) * wip * fix * add test * refactor + test * rm * formatting * update changelog * doc * docstring * remove unused import * Update CHANGELOG.md Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-30 12:45:55 +00:00
Carlos Mocholí	a5ac3f8a16	Code cleaning in preparation for #7258 [3/n] (#7262 )	2021-04-29 14:40:51 +02:00
Carlos Mocholí	bdc4272e99	`_launch` refactor and types [1/n] (#7232 )	2021-04-28 17:41:08 +02:00
Vaibhav Balloli	ccd87cadfc	Changes resume_from_checkpoint warning to error (#7075 ) Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-28 15:03:29 +02:00
thomas chaton	e147127c0e	[feat] Add better support for predict + ddp 2/3 (#7215 ) * wip * update * update * update * update * update * typo * update on comments * update * update * update * update * update changelog * update * Fix merge * Fix merge * move code * resolve test * add extra test * add an extra test * update on comments * add typing * resolve flake8 * Refactor and Docs * Fix tests * Fix tests * Fix tests * Duplicate * Fix tests * resolve bug * update * update on comments * update * update changelog * update * update * remove tpu * resolve flake8 * update on comments * update on comments * update on comment * resolve flake8 * add a cpu test for predict * add None test * update * Update CHANGELOG.md Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * resolve tests Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-04-27 08:46:45 -04:00
ananthsub	68eac4d948	Enforce Lightning module as source of truth for automatic optimization (#7130 ) * make lightning module source of truth for automatic optimization * Update configuration_validator.py * Update model_connector.py * rm-references * Update CHANGELOG.md * Update CHANGELOG.md Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-04-26 05:36:26 +00:00
Kaushik B	44d775fccf	Update Error message for ProfileConnector (#7204 ) * Update Error message for ProfileConnector * Update test	2021-04-25 11:37:21 -07:00
ananthsub	b3fe836656	Move metrics_to_scalars to a dedicated utilities file (#7180 ) * rm-trainer-logging * Update CHANGELOG.md * Update metrics.py * Update logging.py * Update metrics.py	2021-04-24 10:25:33 +01:00
Jirka Borovec	aa7d3dc6cc	Fix `torchmetrics` compatibility (#7131 ) * get_num_classes * tmp * fix one test * fix deprecated tests * fix deprecate * pep8 * deprecate 0.3 * wip * wip * HaCK * brnch * brnch * format * Apply suggestions from code review * prune * rev * mltilabel * Apply suggestions from code review * master * rev * . Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>	2021-04-22 20:45:46 +00:00
Carlos Mocholí	33066f8fd9	Add `on_predict_{batch,epoch}_{start,end}` and `Callback.on_predict_{start,end}` (#7141 ) * Update hooks typing and predict hooks * Update CHANGELOG * Progress * Progress * Add back `on_predict_{start,end}` * Typing and fix * Update tests/trainer/logging_/test_logger_connector.py * Update tests/callbacks/test_lambda_function.py	2021-04-22 10:05:28 -04:00
thomas chaton	99b9dfa883	[bugfix] Remove warning for distributed values (#7132 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: jirka <jirka.borovec@seznam.cz>	2021-04-22 02:14:46 +02:00
Akihiro Nitta	0302b8be32	Disable `lr_scheduler.step()` in manual optimization (#6825 ) Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-04-20 13:00:45 +02:00
Nicki Skafte	fbee5a86e7	Correctly reset metric objects in self.log (#7055 ) * reset * fix tests * fix tests * Apply suggestions from code review Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * move logic * chglog * pep8 * Add test * Improve test Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>	2021-04-19 14:48:48 +01:00
Carlos Mocholí	898ec8a94a	Create pytorch_lightning/utilities/types.py (#7048 )	2021-04-19 14:43:16 +02:00
Kaushik B	832a03af7c	Add Training Type Plugins Registry (#6982 ) Co-authored-by: Sean Naren <sean@grid.ai> Co-authored-by: thomas chaton <thomas@grid.ai>	2021-04-16 18:01:56 +05:30
Adrian Wälchli	67d21609c9	Add Trainer max_time argument + Callback (#6823 ) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>	2021-04-16 13:38:57 +02:00
Ethan Harris	f645df5e9a	Add typings for evaluation_loop.py and remove some dead code (#7015 )	2021-04-15 07:36:04 +00:00
CeShine Lee	24d0295ff1	Fix the `gradient_clip_algorithm` has no effect issue. (#6928 )	2021-04-14 14:17:06 +05:30
Adrian Wälchli	33cc9fe138	Clean up environment access in plugins (#6941 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-13 20:07:40 +02:00
Ethan Harris	b9bc77293b	Fix inconsistent outputs in `on__end` and `_end` (#6969 ) Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>	2021-04-13 15:16:21 +01:00
ananthsub	e891ceb836	Remove evaluation loop legacy dict returns for `*_epoch_end` hooks (#6973 ) Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>	2021-04-13 12:37:54 +01:00
ananthsub	968ac091c0	Remove hardcoding of rank_zero_only.rank in accelerator connector (#6878 )	2021-04-08 12:56:59 +05:30
Kaushik B	a17c027ea1	Update sync_dist warning for multiple processes (#6790 )	2021-04-06 16:57:43 +02:00
Anthony Kim	7f6154fcad	Add `Trainer(gradient_clip_algorithm='value'\|'norm')` (#6123 ) * add changelog * add clip by value * fix bug in training tricks.rst * fix bug in trainer.rst * Update trainer.rst * Update trainer.rst * Update CHANGELOG.md Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/plugins/precision/deepspeed_precision.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/utilities/enums.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * yapf formatting * update training tricks * update based on comment * update based on comment * Update pytorch_lightning/trainer/trainer.py Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> * update based on comment * pep8 * mypy * mypy * Update docs/source/advanced/training_tricks.rst Co-authored-by: thomas chaton <thomas@grid.ai> * Update sharded_native_amp.py * Update test_sharded_parity.py * update test codes * Update test_tpu.py * Update pytorch_lightning/trainer/connectors/training_trick_connector.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update test_trainer.py * Update enums.py * Update enums.py * add super-class initialization to precision plugins. * add clip_grad horovod cpu test * add clip_grad horovod cpu test * use subprocess check_call * change order of horovod tests * set max_epochs 2 in horovod test * remove clip_grad_val test from horovod-cpu * remove "type: ignore" * divide clip grad val test in horovod * update based on comments * add super-class initialization to precision plugins. * bugfix * bugfix * revert some changes * revert some changes * Update tests/models/test_horovod.py * merge master * Delete signature test No point in testing a signature Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: ananthsub <ananth.subramaniam@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-04-06 08:27:37 -05:00
Kaushik B	cf8e828559	[Fix] TPU Training Type Plugin (#6816 )	2021-04-06 15:02:44 +05:30
Carlos Mocholí	0dd2deebea	Remove legacy support for the magic `log`/`progress_bar` keys in dict returns (#6734 )	2021-03-31 00:28:04 +02:00
thomas chaton	1302766f83	DeepSpeed ZeRO Update (#6546 ) * Add context to call hook to handle all modules defined within the hook * Expose some additional parameters * Added docs, exposed parameters * Make sure we only configure if necessary * Setup activation checkpointing regardless, saves the user having to do it manually * Add some tests that fail currently * update * update * update * add tests * change docstring * resolve accumulate_grad_batches * resolve flake8 * Update DeepSpeed to use latest version, add some comments * add metrics * update * Small formatting fixes, clean up some code * Few cleanups * No need for default state * Fix tests, add some boilerplate that should move eventually * Add hook removal * Add a context manager to handle hook * Small naming cleanup * wip * move save_checkpoint responsability to accelerator * resolve flake8 * add BC * Change recommended scale to 16 * resolve flake8 * update test * update install * update * update test * update * update * update test * resolve flake8 * update * update * update on comments * Push * pull * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Update pytorch_lightning/plugins/training_type/deepspeed.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update * Apply suggestions from code review * Swap to using world size defined by plugin * update * update todo * Remove deepspeed from extra, keep it in the base cuda docker install * Push * pull * update * update * update * update * Minor changes * duplicate * format * format2 Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>	2021-03-30 13:39:02 -04:00
Carlos Mocholí	90444706b2	Remove logger_connector legacy code (#6733 )	2021-03-30 12:33:33 +02:00
Kaushik B	f79a13e495	[Model Parallel] Add configure sharded model hook (#6679 ) * Add base hook for model parallel * fix callback signature * Simplify hook * Add hook logic * add tests * add property setter * add logic for being called once * Update changelog * Fix * fix return type * fix lambda callback test * Fix tests * Apply code suggestions * add logic for setup_optimizers_predispatch * add common dummy model * Swap call order * Remove test that isn't needed anymore * Update tests * Add a bit more doc * Few code review fixes * Update pytorch_lightning/accelerators/accelerator.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Change hook name * Fix test * Test setup hook, refactor names * Swap call order of callbacks and model initialization * Change name of context manager Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>	2021-03-29 14:50:51 -06:00

1 2 3 4 5 ...

263 Commits