Commit Graph

4 Commits

Author SHA1 Message Date
Sean Naren 7189d673f6
DeepSpeed Integration (#5954)
* Add initial deepspeed changes

* Address code review

* Move static method outside of function

* Fixes

* Add missing annotation

* Remove seed setting

* Doc changes

* Doc changes, add address reviews

* Fix docs

* Try fixing issue by moving to torch adam

* Clean up check

* Changes, better APIs!

* Add wrapper, swap to git install revision

* Add special test

* Add warning

* Address review

* Add better disclaimer

* Turn off ZeRO for testing due to compilation

* Add description on modifying parameters via the plugin

* Doc strings clear

* Small doc fixes

* Fix hash, reduce test

* Added CI change

* Move to azure pipeline

* Fix test name

* Add missing flag

* Remove sudo...

* Try conda instead

* Swap to conda base

* Try suggested install

* Apply suggestions from code review

* Apply suggestions from code review

* Revert "Apply suggestions from code review"

This reverts commit 41cca05a

* Revert "Apply suggestions from code review"

This reverts commit e06ec29e

* Remove setter

* Address most review

* Move out function, remove DeepSpeed from requirements

* Install deepspeed/mpi4py within container

* Use special tests, move to master commit for deepspeed

* Export path

* Force compile to happen first

* Remove!

* Debugging ninja

* Fix error in optimizer step logic

* Attempt to fix symbolic link

* Reverse to aid debugging

* Export path again

* Clean up mess

* var

* Revert "var"

This reverts commit 3450eaca

* Address review, add todo

* Add note about unsupported functionality

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-17 15:23:42 -05:00
Justus Schock b3ebc18bcb
Hardware specific parts of Accelerator Refactoring (#5719)
* add basic accelerator class.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* add cpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add gpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single device training

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu spawn

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* make on_colab_kaggle utility func

* add basic accelerator class.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* add cpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add gpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single device training

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu spawn

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* make on_colab_kaggle utility func

* fixes

* move

* yapf

* .

* .

* .

* flake8

* sync accelerator connector changes from dev1.2

* changelog

* fix tpu handling

* tpu

* aval

* yapf

* Update pytorch_lightning/plugins/training_type/tpu_spawn.py

Co-authored-by: chaton <thomas@grid.ai>

* Update pytorch_lightning/accelerators/accelerator_connector.py

Co-authored-by: chaton <thomas@grid.ai>

* Update pytorch_lightning/plugins/training_type/tpu_spawn.py

Co-authored-by: chaton <thomas@grid.ai>

* Update tpu_spawn.py

* Update pytorch_lightning/accelerators/accelerator_connector.py

Co-authored-by: chaton <thomas@grid.ai>

* indentation

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: chaton <thomas@grid.ai>
2021-02-01 08:34:59 -05:00
Justus Schock 069ae27cef
Accelerator Refactor: Precision Plugins (#5718)
* add basic accelerator class.
Co-Authored with @awaelchi

* add basic trainign type plugin.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* update copyright

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add apex_amp

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add mixed base class

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add native amp

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add native amp sharded

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu bfloat

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add inits

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update precision_plugin.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-01-31 13:12:02 -05:00
Justus Schock 5d239ccd70
Base classes for accelerator refactoring (#5715)
* add basic accelerator class.
Co-Authored with @awaelchi

* Add base plugin class.
Co-authored with @awaelchi

* add basic trainign type plugin.
Co-Authored with @awaelchi

* add basic precision plugin.
Co-Authored with @awaelchi

* Add missing inits.
Co-authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* ignore  flake8

* coverage omit

* imports in init

* lost

* imports

* flake8

* .

* .

* chlog

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-01-30 14:55:28 -05:00