Commit Graph

9 Commits

Author SHA1 Message Date
puhuk af0bb96f0f
Remove the "_precision" suffix from some precision plugin files (#10052) 2021-11-19 17:37:39 +00:00
Carlos Mocholí d45897d522
Rename `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` (#10026)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 21:09:37 +00:00
Carlos Mocholí e8beceb631
Add `TPUPrecisionPlugin` (#10020)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 17:48:57 +00:00
shuyingsunshine21 299f2c481b
FSDP with full state dict (#7487)
* Fix some test errors
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* checkpoint consolidation

* Update ddp_spawn.py

* Update test_metric_result_integration.py

* Update test_results.py

* Update utils.py

* Update utils.py

* Update test_all_gather_grad.py

* Update test_all_gather_grad.py

* Update test_results.py

* Revert "Update test_results.py"

This reverts commit 9d4a2b891d.

* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"

This reverts commit c5053da789, reversing
changes made to 0d23d75bc9.

* Revert "Update test_all_gather_grad.py"

This reverts commit 0d23d75bc9.

* Revert "Update utils.py"

This reverts commit 70fe5da9c6.

* Revert "Update utils.py"

This reverts commit a9aae99f6e.

* Revert "Update test_results.py"

This reverts commit ea74906878.

* Revert "Update test_metric_result_integration.py"

This reverts commit bf70e431b3.

* Revert "Update ddp_spawn.py"

This reverts commit f17210183b.

* Revert "checkpoint consolidation"

This reverts commit 536c1323b0.

* Revert "Revert "checkpoint consolidation""

This reverts commit 3a9fde915a.

* Revert "Revert "Revert "checkpoint consolidation"""

This reverts commit 7a369f47e1.

* Revert "Revert "Update ddp_spawn.py""

This reverts commit 8222dc98ea.

* Revert "Revert "Update test_metric_result_integration.py""

This reverts commit 6c095b2370.

* Revert "Revert "Update test_results.py""

This reverts commit 250d0aaaa2.

* Revert "Revert "Update utils.py""

This reverts commit 8651d54d79.

* Revert "Revert "Update test_all_gather_grad.py""

This reverts commit dcdcd29731.

* modify distributed environment to make test pass

* fix version for ddp plugin test

* fix

* fix

* changelog

* Update CHANGELOG.md

* fsdp with full state dict

* fix missing import

* modify unitest

* fix

* fix

* fix typo

* modify test and add changelog

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* limit max_epoch to 1 for testing

* test

* fix

* update

* testing remove special for multi gpu

* assert gpu

* add assertion for gpu

* fix

* Re-enable special test, use ModelCheckpoint

* Fix paths

* Fix path passing

* test

* test

* fix test

* fix

* pre-commit format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-24 08:11:45 +01:00
Ethan Harris d02fe342c1
Feature/double precision (#6595)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-03-24 15:47:58 +05:30
Sean Naren 7189d673f6
DeepSpeed Integration (#5954)
* Add initial deepspeed changes

* Address code review

* Move static method outside of function

* Fixes

* Add missing annotation

* Remove seed setting

* Doc changes

* Doc changes, add address reviews

* Fix docs

* Try fixing issue by moving to torch adam

* Clean up check

* Changes, better APIs!

* Add wrapper, swap to git install revision

* Add special test

* Add warning

* Address review

* Add better disclaimer

* Turn off ZeRO for testing due to compilation

* Add description on modifying parameters via the plugin

* Doc strings clear

* Small doc fixes

* Fix hash, reduce test

* Added CI change

* Move to azure pipeline

* Fix test name

* Add missing flag

* Remove sudo...

* Try conda instead

* Swap to conda base

* Try suggested install

* Apply suggestions from code review

* Apply suggestions from code review

* Revert "Apply suggestions from code review"

This reverts commit 41cca05a

* Revert "Apply suggestions from code review"

This reverts commit e06ec29e

* Remove setter

* Address most review

* Move out function, remove DeepSpeed from requirements

* Install deepspeed/mpi4py within container

* Use special tests, move to master commit for deepspeed

* Export path

* Force compile to happen first

* Remove!

* Debugging ninja

* Fix error in optimizer step logic

* Attempt to fix symbolic link

* Reverse to aid debugging

* Export path again

* Clean up mess

* var

* Revert "var"

This reverts commit 3450eaca

* Address review, add todo

* Add note about unsupported functionality

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 15:23:42 -05:00
Justus Schock b3ebc18bcb
Hardware specific parts of Accelerator Refactoring (#5719)
* add basic accelerator class.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* add cpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add gpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single device training

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu spawn

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* make on_colab_kaggle utility func

* add basic accelerator class.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* add cpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add gpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single device training

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu spawn

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* make on_colab_kaggle utility func

* fixes

* move

* yapf

* .

* .

* .

* flake8

* sync accelerator connector changes from dev1.2

* changelog

* fix tpu handling

* tpu

* aval

* yapf

* Update pytorch_lightning/plugins/training_type/tpu_spawn.py

Co-authored-by: chaton <thomas@grid.ai>

* Update pytorch_lightning/accelerators/accelerator_connector.py

Co-authored-by: chaton <thomas@grid.ai>

* Update pytorch_lightning/plugins/training_type/tpu_spawn.py

Co-authored-by: chaton <thomas@grid.ai>

* Update tpu_spawn.py

* Update pytorch_lightning/accelerators/accelerator_connector.py

Co-authored-by: chaton <thomas@grid.ai>

* indentation

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: chaton <thomas@grid.ai>
2021-02-01 08:34:59 -05:00
Justus Schock 069ae27cef
Accelerator Refactor: Precision Plugins (#5718)
* add basic accelerator class.
Co-Authored with @awaelchi

* add basic trainign type plugin.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* update copyright

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add apex_amp

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add mixed base class

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add native amp

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add native amp sharded

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu bfloat

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add inits

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update precision_plugin.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-01-31 13:12:02 -05:00
Justus Schock 5d239ccd70
Base classes for accelerator refactoring (#5715)
* add basic accelerator class.
Co-Authored with @awaelchi

* Add base plugin class.
Co-authored with @awaelchi

* add basic trainign type plugin.
Co-Authored with @awaelchi

* add basic precision plugin.
Co-Authored with @awaelchi

* Add missing inits.
Co-authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* ignore  flake8

* coverage omit

* imports in init

* lost

* imports

* flake8

* .

* .

* chlog

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-01-30 14:55:28 -05:00