ananthsub
fa41c588f4
Remove ProfilerConnector class ( #7654 )
...
* Remove ProfilerConnector class
* Update trainer.py
* Update CHANGELOG.md
* Update trainer.py
* Update trainer.py
* tests
2021-05-24 08:58:15 -07:00
shuyingsunshine21
299f2c481b
FSDP with full state dict ( #7487 )
...
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d
.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789
, reversing
changes made to 0d23d75bc9
.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9
.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6
.
* Revert "Update utils.py"
This reverts commit a9aae99f6e
.
* Revert "Update test_results.py"
This reverts commit ea74906878
.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3
.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b
.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0
.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a
.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1
.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea
.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370
.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2
.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79
.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731
.
* modify distributed environment to make test pass
* fix version for ddp plugin test
* fix
* fix
* changelog
* Update CHANGELOG.md
* fsdp with full state dict
* fix missing import
* modify unitest
* fix
* fix
* fix typo
* modify test and add changelog
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* limit max_epoch to 1 for testing
* test
* fix
* update
* testing remove special for multi gpu
* assert gpu
* add assertion for gpu
* fix
* Re-enable special test, use ModelCheckpoint
* Fix paths
* Fix path passing
* test
* test
* fix test
* fix
* pre-commit format
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-24 08:11:45 +01:00
Carlos Mocholí
3d4dd28bec
Replace `CallbackHookNameValidator` with `FxValidator` [3/n] ( #7627 )
...
* Refactor FxValidator
* Fix tests
* Fix tests
* Class attribute
* Fix tests
* Better error message
* Fix tests
* Update pytorch_lightning/trainer/connectors/logger_connector/fx_validator.py
2021-05-21 11:54:16 +01:00
Carlos Mocholí
901b2bac98
Unify `current_fx_name` and `current_hook_fx_name` [2/n] ( #7594 )
...
* Minor loggger connector cleanup [1/n]
* Missing line
* Address comments
* Rely on validator
* Unify `current_fx_name` and `current_hook_fx_name`
* Fix test
2021-05-19 20:31:06 +00:00
ananthsub
b4e28e7169
[feat] Add stronger validation for checkpoint_callback argument ( #7539 )
...
* [feat] Add stronger validation for checkpoint_callback configuration
* chlog
* Update callback_connector.py
* Update test_model_checkpoint.py
* Update pytorch_lightning/trainer/connectors/callback_connector.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/trainer/connectors/callback_connector.py
* Update tests/checkpointing/test_model_checkpoint.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update CHANGELOG.md
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-19 19:38:08 +00:00
Carlos Mocholí
76ff600898
Minor logger connector cleanup [1/n] ( #7590 )
...
* Minor loggger connector cleanup [1/n]
* Missing line
* Address comments
* Rely on validator
2021-05-19 19:25:32 +00:00
Nic Eggert
f4f51e0dcf
Add kubeflow cluster environment ( #7300 )
...
* Add kubeflow cluster environment
* Add KubeflowEnvironment to docs
* Add KubeflowEnvironment to the changelog
* break up a long line
* Add method to detect kubeflow environment
* Select Kubeflow environment when available
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Run pre-commit
* task_idx == 0
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-17 09:05:24 +01:00
Adrian Wälchli
6e6e29af49
remove trainer hidden state | sanity refactor [2 / n] ( #7507 )
2021-05-17 08:57:15 +01:00
Alan Du
6ac16ff348
Fix DistribType for `ddp_cpu` (spawn) ( #7492 )
2021-05-14 20:53:26 +01:00
Rohit Gupta
7ca41734da
Add `dataloader_idx` to batch transfer hooks ( #6241 )
...
* replace with kwargs
* chlog
* fix
* add test
* fix
* device
* deepspeed
* pep
* optional
* docs
* bc
* comments
* pep
* mypy
* pep
* Apply suggestions from code review
* kwargs
* docs
* .
* .
* 1.3 -> 1.4
* kwargs -> step_kwargs
2021-05-13 23:03:55 +05:30
Adrian Wälchli
dd1a17b071
Refactor result handling in training loop ( #7506 )
...
* refactor results
* rename dic -> dict
* simplify
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* changelog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix None check
* chlog wording
* move process_closure_result to the end
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-13 09:30:34 +01:00
shuyingsunshine21
8538c1f61e
Accelerator model state dict ( #7474 )
...
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d
.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789
, reversing
changes made to 0d23d75bc9
.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9
.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6
.
* Revert "Update utils.py"
This reverts commit a9aae99f6e
.
* Revert "Update test_results.py"
This reverts commit ea74906878
.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3
.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b
.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0
.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a
.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1
.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea
.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370
.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2
.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79
.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731
.
* modify distributed environment to make test pass
* modify model state dict to training type plugin
* remove changes
* add changelog
* fixing isort for pre-commit failure
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Address code review
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-11 16:39:04 +01:00
Adrian Wälchli
ad9118f04a
remove trainer hidden state | sanity refactor [1 / n] ( #7437 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-11 11:09:08 +02:00
shuyingsunshine21
987530cd38
Set `num_nodes` and `sync_batchnorm` From Trainer for Manually Passed Training Type Plugin ( #7026 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-08 11:25:51 +00:00
Ethan Harris
45143fd825
Improve val step logging ( #7351 )
...
* Fix val step logging
* Add a type
* Fix
* Update CHANGELOG.md
2021-05-07 22:58:03 +00:00
ananthsub
98670c83a9
Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule ( #7323 )
...
* deprecate-tbptt-trainer
* Update CHANGELOG.md
* Update lightning.py
* test
* Update lightning.py
* Update training_loop.py
* Update training_loop.py
* Update lightning.py
* Update training_loop.py
* Update training_loop.py
* update docs
* Update accelerator.py
* Update accelerator.py
* more docs
* tweaks
* chlog
* comments
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 11:21:00 +01:00
Carlos Mocholí
374ff750f5
Pass `current_epoch`/`global_step` as monitor candidates [1/2] ( #7344 )
...
* Pass `current_epoch`/`global_step` as monitor candidates
* Formatting
* Fix deprecated test
* Update CHANGELOG
2021-05-04 16:05:40 -04:00
Carlos Mocholí
8c0ea92af2
`TrainerState` refactor [5/5] ( #7173 )
...
* `TrainerState` refactor
* flake8
* Update finished check
* Test cleanup
* Fix tests
* Fixes
* Reorder
* flake8
* Update CHANGELOG
* Better docs
* Better docs
* Remove default
* Update tests
* Bad merge
2021-05-04 12:50:56 +02:00
Hemil Desai
82c19e1444
Update LR schedulers only when their corresponding Optimizer is being… ( #4868 )
...
* Update LR schedulers only when their corresponding Optimizer is being used.
In the case when optimizer frequencies are specified,
the LR scheduler corresponding to a particular optimizer is updated
only when that optimizer is being used in the training loop or epoch.
* pep8speak fixes
* Fix failing tests
* Add docs
* PR Feedback
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* formatting fix
* PR Feedback - part 2
* More PR feedback
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Add typing imports
* Stronger tests and fixes related to that
* Add more tests plus PR feedback
* Make optimizer_freq_cumsum a cached property
@cached_property is only available after Python 3.8 so had to do it manually.
* Fix tests
* Apply suggestions from code review
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Avoid mutable defaults
* Parametrize lr scheduling tests
* PR feedback
* Apply suggestions from code review
* spell
* Apply suggestions from code review
* flake8
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-05-04 09:37:40 +00:00
Kaushik B
6d7c6d6403
Update Accelerator Connector for Registry ( #7214 )
2021-05-03 21:03:21 +00:00
Carlos Mocholí
5af086ab9f
Attach data refactor and tuner bugs [4/n] ( #7258 )
...
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 13:54:58 +00:00
Adrian Wälchli
b9b3fa371f
fix case where an IterableDataset doesn't produce a batch for an epoch ( #7294 )
...
* wip
* fix
* add test
* refactor + test
* rm
* formatting
* update changelog
* doc
* docstring
* remove unused import
* Update CHANGELOG.md
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 12:45:55 +00:00
Carlos Mocholí
a5ac3f8a16
Code cleaning in preparation for #7258 [3/n] ( #7262 )
2021-04-29 14:40:51 +02:00
Carlos Mocholí
bdc4272e99
`_launch` refactor and types [1/n] ( #7232 )
2021-04-28 17:41:08 +02:00
Vaibhav Balloli
ccd87cadfc
Changes resume_from_checkpoint warning to error ( #7075 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-28 15:03:29 +02:00
thomas chaton
e147127c0e
[feat] Add better support for predict + ddp 2/3 ( #7215 )
...
* wip
* update
* update
* update
* update
* update
* typo
* update on comments
* update
* update
* update
* update
* update changelog
* update
* Fix merge
* Fix merge
* move code
* resolve test
* add extra test
* add an extra test
* update on comments
* add typing
* resolve flake8
* Refactor and Docs
* Fix tests
* Fix tests
* Fix tests
* Duplicate
* Fix tests
* resolve bug
* update
* update on comments
* update
* update changelog
* update
* update
* remove tpu
* resolve flake8
* update on comments
* update on comments
* update on comment
* resolve flake8
* add a cpu test for predict
* add None test
* update
* Update CHANGELOG.md
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* resolve tests
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-27 08:46:45 -04:00
ananthsub
68eac4d948
Enforce Lightning module as source of truth for automatic optimization ( #7130 )
...
* make lightning module source of truth for automatic optimization
* Update configuration_validator.py
* Update model_connector.py
* rm-references
* Update CHANGELOG.md
* Update CHANGELOG.md
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-04-26 05:36:26 +00:00
Kaushik B
44d775fccf
Update Error message for ProfileConnector ( #7204 )
...
* Update Error message for ProfileConnector
* Update test
2021-04-25 11:37:21 -07:00
ananthsub
b3fe836656
Move metrics_to_scalars to a dedicated utilities file ( #7180 )
...
* rm-trainer-logging
* Update CHANGELOG.md
* Update metrics.py
* Update logging.py
* Update metrics.py
2021-04-24 10:25:33 +01:00
Jirka Borovec
aa7d3dc6cc
Fix `torchmetrics` compatibility ( #7131 )
...
* get_num_classes
* tmp
* fix one test
* fix deprecated tests
* fix deprecate
* pep8
* deprecate 0.3
* wip
* wip
* HaCK
* brnch
* brnch
* format
* Apply suggestions from code review
* prune
* rev
* mltilabel
* Apply suggestions from code review
* master
* rev
* .
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2021-04-22 20:45:46 +00:00
Carlos Mocholí
33066f8fd9
Add `on_predict_{batch,epoch}_{start,end}` and `Callback.on_predict_{start,end}` ( #7141 )
...
* Update hooks typing and predict hooks
* Update CHANGELOG
* Progress
* Progress
* Add back `on_predict_{start,end}`
* Typing and fix
* Update tests/trainer/logging_/test_logger_connector.py
* Update tests/callbacks/test_lambda_function.py
2021-04-22 10:05:28 -04:00
thomas chaton
99b9dfa883
[bugfix] Remove warning for distributed values ( #7132 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-04-22 02:14:46 +02:00
Akihiro Nitta
0302b8be32
Disable `lr_scheduler.step()` in manual optimization ( #6825 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-04-20 13:00:45 +02:00
Nicki Skafte
fbee5a86e7
Correctly reset metric objects in self.log ( #7055 )
...
* reset
* fix tests
* fix tests
* Apply suggestions from code review
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* move logic
* chglog
* pep8
* Add test
* Improve test
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-04-19 14:48:48 +01:00
Carlos Mocholí
898ec8a94a
Create pytorch_lightning/utilities/types.py ( #7048 )
2021-04-19 14:43:16 +02:00
Kaushik B
832a03af7c
Add Training Type Plugins Registry ( #6982 )
...
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-04-16 18:01:56 +05:30
Adrian Wälchli
67d21609c9
Add Trainer max_time argument + Callback ( #6823 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2021-04-16 13:38:57 +02:00
Ethan Harris
f645df5e9a
Add typings for evaluation_loop.py and remove some dead code ( #7015 )
2021-04-15 07:36:04 +00:00
CeShine Lee
24d0295ff1
Fix the `gradient_clip_algorithm` has no effect issue. ( #6928 )
2021-04-14 14:17:06 +05:30
Adrian Wälchli
33cc9fe138
Clean up environment access in plugins ( #6941 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-13 20:07:40 +02:00
Ethan Harris
b9bc77293b
Fix inconsistent outputs in `on_*_end` and `*_end` ( #6969 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-13 15:16:21 +01:00
ananthsub
e891ceb836
Remove evaluation loop legacy dict returns for `*_epoch_end` hooks ( #6973 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-13 12:37:54 +01:00
ananthsub
968ac091c0
Remove hardcoding of rank_zero_only.rank in accelerator connector ( #6878 )
2021-04-08 12:56:59 +05:30
Kaushik B
a17c027ea1
Update sync_dist warning for multiple processes ( #6790 )
2021-04-06 16:57:43 +02:00
Anthony Kim
7f6154fcad
Add `Trainer(gradient_clip_algorithm='value'|'norm')` ( #6123 )
...
* add changelog
* add clip by value
* fix bug in training tricks.rst
* fix bug in trainer.rst
* Update trainer.rst
* Update trainer.rst
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/precision/deepspeed_precision.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/utilities/enums.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* yapf formatting
* update training tricks
* update based on comment
* update based on comment
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* update based on comment
* pep8
* mypy
* mypy
* Update docs/source/advanced/training_tricks.rst
Co-authored-by: thomas chaton <thomas@grid.ai>
* Update sharded_native_amp.py
* Update test_sharded_parity.py
* update test codes
* Update test_tpu.py
* Update pytorch_lightning/trainer/connectors/training_trick_connector.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update test_trainer.py
* Update enums.py
* Update enums.py
* add super-class initialization to precision plugins.
* add clip_grad horovod cpu test
* add clip_grad horovod cpu test
* use subprocess check_call
* change order of horovod tests
* set max_epochs 2 in horovod test
* remove clip_grad_val test from horovod-cpu
* remove "type: ignore"
* divide clip grad val test in horovod
* update based on comments
* add super-class initialization to precision plugins.
* bugfix
* bugfix
* revert some changes
* revert some changes
* Update tests/models/test_horovod.py
* merge master
* Delete signature test
No point in testing a signature
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-04-06 08:27:37 -05:00
Kaushik B
cf8e828559
[Fix] TPU Training Type Plugin ( #6816 )
2021-04-06 15:02:44 +05:30
Carlos Mocholí
0dd2deebea
Remove legacy support for the magic `log`/`progress_bar` keys in dict returns ( #6734 )
2021-03-31 00:28:04 +02:00
thomas chaton
1302766f83
DeepSpeed ZeRO Update ( #6546 )
...
* Add context to call hook to handle all modules defined within the hook
* Expose some additional parameters
* Added docs, exposed parameters
* Make sure we only configure if necessary
* Setup activation checkpointing regardless, saves the user having to do it manually
* Add some tests that fail currently
* update
* update
* update
* add tests
* change docstring
* resolve accumulate_grad_batches
* resolve flake8
* Update DeepSpeed to use latest version, add some comments
* add metrics
* update
* Small formatting fixes, clean up some code
* Few cleanups
* No need for default state
* Fix tests, add some boilerplate that should move eventually
* Add hook removal
* Add a context manager to handle hook
* Small naming cleanup
* wip
* move save_checkpoint responsability to accelerator
* resolve flake8
* add BC
* Change recommended scale to 16
* resolve flake8
* update test
* update install
* update
* update test
* update
* update
* update test
* resolve flake8
* update
* update
* update on comments
* Push
* pull
* Update pytorch_lightning/plugins/training_type/deepspeed.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/plugins/training_type/deepspeed.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update
* Apply suggestions from code review
* Swap to using world size defined by plugin
* update
* update todo
* Remove deepspeed from extra, keep it in the base cuda docker install
* Push
* pull
* update
* update
* update
* update
* Minor changes
* duplicate
* format
* format2
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-03-30 13:39:02 -04:00
Carlos Mocholí
90444706b2
Remove logger_connector legacy code ( #6733 )
2021-03-30 12:33:33 +02:00
Kaushik B
f79a13e495
[Model Parallel] Add configure sharded model hook ( #6679 )
...
* Add base hook for model parallel
* fix callback signature
* Simplify hook
* Add hook logic
* add tests
* add property setter
* add logic for being called once
* Update changelog
* Fix
* fix return type
* fix lambda callback test
* Fix tests
* Apply code suggestions
* add logic for setup_optimizers_predispatch
* add common dummy model
* Swap call order
* Remove test that isn't needed anymore
* Update tests
* Add a bit more doc
* Few code review fixes
* Update pytorch_lightning/accelerators/accelerator.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Change hook name
* Fix test
* Test setup hook, refactor names
* Swap call order of callbacks and model initialization
* Change name of context manager
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-29 14:50:51 -06:00