Commit Graph

18 Commits

Author SHA1 Message Date
chaton f2f4a49271 [bug-fix] Call transfer_batch_to_device in DDPlugin (#5195)
* hacking out

* update

* remove useless on_before_forward

* update

* remove overriden

* iremove os

* use on_before_forward

* resolve flake8

* add test

* update

* add single_process_per_device

* resolve flake8

* update

* resolve

* update

* update

* update

* add comment

* resolve bug with sharded

* update

* remove property

* update

* resolve test

* resolve bug

* update on comments

* update doc

* Update pytorch_lightning/core/hooks.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* update on comments

* Update pytorch_lightning/plugins/ddp_plugin.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/plugins/ddp_plugin.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* resolve pep8

* add device_ids to pipe

* update on comments

* update

* resolve

* update

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

(cherry picked from commit d510707bc9)
2021-01-26 14:28:45 +01:00
Jirka Borovec 2846322f60
fix docs render (#5610) 2021-01-25 20:21:00 -05:00
Adrian Wälchli e806bb77fa
Refactor LightningDistributedDataParallel (#5185)
* add wrapper

* add squeeze

* replace LightningDistributedDP

* update import

* module access

* inputs

* refactor warning

* update

* resolve flake8

* remove old class

* set find unused params to False

* update docstrings

* update docs

* update docs

* add changelog

* deprecation

* rename wrapper -> module

* rename pl_module

* add unit tests

* Revert "add changelog"

This reverts commit 02ec0a6864f4ba2ace3bb6fc6ebc364e1a80ffd7.

* Revert "set find unused params to False"

This reverts commit 8e451515e6ba3227d00f4a5cb63f332cfedb7b30.

Co-authored-by: Ubuntu <thomas@grid.ai>
2021-01-13 14:35:42 -05:00
Adrian Wälchli 61308138c3
set find_unused_parameters=False in DDP as in pytorch (#5435)
* set find unused params to False

* add changelog

* fix changelog

* fix test

* update docs

* update changelog

Co-authored-by: chaton <thomas@grid.ai>
2021-01-13 10:13:40 -05:00
Jirka Borovec 54d20dc596
Refactor: clean trainer device & distrib getters (#5300)
* warnings

* .

* .

* flake8

* .

* .

* .

* use_tpu

* use_dp

* .

* use_ddp

* .

* use_horovod

* .

* .

* .
2021-01-12 05:22:37 -05:00
Jirka Borovec 0f36525e8f
fix/enable - check F401 (#5201)
* refactor - check F401

* missed

* fix
2020-12-21 10:15:04 +01:00
Jirka Borovec 2d54116baa
annotat unused vars (#5017)
* annotate all unused vars

* rank_zero_warn

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* f1 fixed

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2020-12-19 13:53:06 +01:00
Sean Naren ee9b3fe574
[feat] pp 1/n (#5016)
* Added changes for RPC plugin

* Add missing kwargs

* Fix code format

* Loading refactors by introducing is_distributed var, fix optimizer step flow

* Add rpc guard

* Added docstrings and typing

* resolve comments

* Add additional rpc hook, refactor name of exit process hook for clarity

* remove annotation

* Modify behaviour to allow optional return, add test for rpc plugin

* resolve tests

* rename is_ddp_based

* update

* update for windows

* update

* resolve test

* code smell

* Revert back to init_ddp_connection for backwards compat

* Swap to explicit name for property

* Add missing speed parity increase for CI variability, fix call counts for child process

Co-authored-by: tchaton <thomas@grid.ai>
2020-12-08 22:02:10 +00:00
chaton 2393474350
[hotfix] ddp + manual_optimisation (#4976)
* Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization

* debug

* Revert "debug"

This reverts commit ccca6b6b

* Expose manual reduce for automatic optimization

* Add input arguments

* Enable parity test

* clean imports

* Expose hook after to ensure we reset

* Fix naming

* add

* fix test

* resolve on comments

* typo

* Update tests/trainer/optimization/test_manual_optimization.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/trainer/optimization/test_manual_optimization.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* resolve comments

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-07 19:31:54 +00:00
chaton 02152c1729
Simplify optimization Logic (#4984)
* Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization

* debug

* Revert "debug"

This reverts commit ccca6b6b

* Expose manual reduce for automatic optimization

* Add input arguments

* Enable parity test

* clean imports

* Expose hook after to ensure we reset

* Fix naming

* add

* fix test

* uniformize optimizer logic

* resolve test

* resovle flake8

* resolve amp bug

* update tests

* remove bug

* remove optimizer_step in accelerators

* typo

* update lightning optimizer

* set doesn't work with ddp_spawn

* resolve flake8

* update threshold

* ignore pyright

* correct codeFactor

* remove useless if

* remove zer_grad function

* simplify step

* remove typo

* resolve bug

* Apply suggestions from code review

* update on comments

* resolve bugs

* remove tests

* Update pytorch_lightning/trainer/configuration_validator.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* simplify testing

* add more tests

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-12-07 12:55:49 +00:00
Sean Naren e952dee292
Allow string plugins (#4888)
* Allow plugin to be chosen via string

* Fix implementation, add tests

* Fix codefactor issues

* Added missing env patch

* Skip test for windows

* Reword reason

* Add skip to invalid test

* Create required_plugins function, move sharded amp requirement to plugin

* Pass AMPType, fix setter for apex

* Better doc strings

* Add exception when using apex

* Add trainer available_plugins function, warn user when plugins have been added automatically with option to override behaviour

* Fixed pep8 indent

* Fix codefactor issues

* Add env variables

* Update pytorch_lightning/cluster_environments/cluster_environment.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Addressed code review

* Update pytorch_lightning/plugins/plugin_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/plugin_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/plugin_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Addressed more code review feedback

* Fixed docstrings

* Swapped to verbose runtime error

* Apply suggestions from code review

* Apply suggestions from code review

* Update pytorch_lightning/plugins/sharded_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Change name

* Pass trainer to plugins that may require it

* Fix sharded plugin

* Added test to ensure string sharded works

* Removed trainer typing as this breaks pep8

* Fixed doc issues

* Fixed tests

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-12-01 20:30:49 +00:00
Sean Naren 404af43cde
5/n: Extract reference model call to plugins/accelerators (#4773)
* Encapsulate extracting reference model within the plugin to allow custom wrapper logic to live within the plugin/accelerators

* Add missing new lines

* Fix call to accelerator

* Removed double blank

* Use accelerator backend

* Handle case where wrapper has not been initialized within the plugin

* Added basic get model tests, add better typing

* Change model name

* Split GPU/DDP test

* Add stronger typing, skip ddp test on windows

* Fix import

* Fix import in dp

* Fixed PEP8 definition

* Add ddp launcher for ddp testing

* Modify accelerator reference model to property, change name to reflect func

* Revert property as this is incorrect.=

* Revert across accelerators

* Modified name to get_model_from_plugin

* Code review changes, fix issue with dp

* Add verb to function getter

Co-authored-by: chaton <thomas@grid.ai>
2020-11-23 17:21:47 +00:00
ananthsub 45c57600af
Move init_ddp_connection to DDP Plugin (#4407)
* Move init_ddp_connection to DDP Plugin

* cluster-env

* trainer?

* imports

* Update ddp_plugin.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-18 15:49:22 -05:00
Sean Naren e7134a9135
Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving (#4675)
* Allow ddp plugin to modify optimizer state saving

* Rely on the accelerator for optimizer states

* Ensure we init the accelerator for the saving function

* Better comment for optim state dump

* Revert "Ensure we init the accelerator for the saving function"

This reverts commit af65effa

* Added accelerator check to initialize tuner before saving model checkpoint

* Simplify comment

* Revert "Added accelerator check to initialize tuner before saving model checkpoint"

This reverts commit f9929c0c

* Return single optimizer state to reduce duplication

* Fixed docstring

* Fixed typing

* Fixed comment

* Added CHANGELOG.md

Co-authored-by: chaton <thomas@grid.ai>
2020-11-18 16:38:35 +00:00
Sean Naren 8283680aa0
Sharded Plugin 3/n: Expose step input to DDP plugin (#4686)
* Allow ddp plugin to move the input to a different device if needed

* Swapped name to on_before_forward to align with hooks in the future

* Update pytorch_lightning/plugins/ddp_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Pass variable arg type to hook, add example

* Remove blank space (pep check)

* Added blank line

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-18 15:45:30 +00:00
ananthsub 6878f3bf4e
Enable DDP Plugin to pass through args to LightningDistributedDataParallel (#4382)
* Update ddp_plugin.py

* Update ddp_plugin.py

* Update ddp_plugin.py

* Update test_ddp_plugin.py

* Update pytorch_lightning/plugins/ddp_plugin.py

* Update pytorch_lightning/plugins/ddp_plugin.py

* Fixed imports, make ddp_kwargs protected

Co-authored-by: SeanNaren <sean.narenthiran@gmail.com>
2020-10-27 12:27:59 +00:00
ananthsub 7166171962
Update ddp_plugin.py (#4363)
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-10-26 12:34:35 +00:00
William Falcon 753362d0a4
enable ddp as a plugin (#4285)
* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

* enable custom ddp plugin

Co-authored-by: chaton <thomas@grid.ai>
2020-10-22 05:15:51 -04:00