puhuk
|
af0bb96f0f
|
Remove the "_precision" suffix from some precision plugin files (#10052)
|
2021-11-19 17:37:39 +00:00 |
Carlos Mocholí
|
d45897d522
|
Rename `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` (#10026)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
|
2021-10-19 21:09:37 +00:00 |
Carlos Mocholí
|
e8beceb631
|
Add `TPUPrecisionPlugin` (#10020)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
|
2021-10-19 17:48:57 +00:00 |
shuyingsunshine21
|
299f2c481b
|
FSDP with full state dict (#7487)
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d .
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789 , reversing
changes made to 0d23d75bc9 .
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9 .
* Revert "Update utils.py"
This reverts commit 70fe5da9c6 .
* Revert "Update utils.py"
This reverts commit a9aae99f6e .
* Revert "Update test_results.py"
This reverts commit ea74906878 .
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3 .
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b .
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0 .
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a .
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1 .
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea .
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370 .
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2 .
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79 .
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731 .
* modify distributed environment to make test pass
* fix version for ddp plugin test
* fix
* fix
* changelog
* Update CHANGELOG.md
* fsdp with full state dict
* fix missing import
* modify unitest
* fix
* fix
* fix typo
* modify test and add changelog
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* limit max_epoch to 1 for testing
* test
* fix
* update
* testing remove special for multi gpu
* assert gpu
* add assertion for gpu
* fix
* Re-enable special test, use ModelCheckpoint
* Fix paths
* Fix path passing
* test
* test
* fix test
* fix
* pre-commit format
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
|
2021-05-24 08:11:45 +01:00 |
Ethan Harris
|
d02fe342c1
|
Feature/double precision (#6595)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
|
2021-03-24 15:47:58 +05:30 |
Sean Naren
|
7189d673f6
|
DeepSpeed Integration (#5954)
* Add initial deepspeed changes
* Address code review
* Move static method outside of function
* Fixes
* Add missing annotation
* Remove seed setting
* Doc changes
* Doc changes, add address reviews
* Fix docs
* Try fixing issue by moving to torch adam
* Clean up check
* Changes, better APIs!
* Add wrapper, swap to git install revision
* Add special test
* Add warning
* Address review
* Add better disclaimer
* Turn off ZeRO for testing due to compilation
* Add description on modifying parameters via the plugin
* Doc strings clear
* Small doc fixes
* Fix hash, reduce test
* Added CI change
* Move to azure pipeline
* Fix test name
* Add missing flag
* Remove sudo...
* Try conda instead
* Swap to conda base
* Try suggested install
* Apply suggestions from code review
* Apply suggestions from code review
* Revert "Apply suggestions from code review"
This reverts commit 41cca05a
* Revert "Apply suggestions from code review"
This reverts commit e06ec29e
* Remove setter
* Address most review
* Move out function, remove DeepSpeed from requirements
* Install deepspeed/mpi4py within container
* Use special tests, move to master commit for deepspeed
* Export path
* Force compile to happen first
* Remove!
* Debugging ninja
* Fix error in optimizer step logic
* Attempt to fix symbolic link
* Reverse to aid debugging
* Export path again
* Clean up mess
* var
* Revert "var"
This reverts commit 3450eaca
* Address review, add todo
* Add note about unsupported functionality
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
|
2021-02-17 15:23:42 -05:00 |
Justus Schock
|
b3ebc18bcb
|
Hardware specific parts of Accelerator Refactoring (#5719)
* add basic accelerator class.
Co-Authored with @awaelchi
* pep8
Co-authored-by: @awaelchi
* add cpu accelerator
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add gpu accelerator
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add tpu accelerator
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add accelerator connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add single device training
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add single tpu
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add tpu spawn
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* make on_colab_kaggle utility func
* add basic accelerator class.
Co-Authored with @awaelchi
* pep8
Co-authored-by: @awaelchi
* add cpu accelerator
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add gpu accelerator
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add tpu accelerator
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add accelerator connector
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add single device training
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add single tpu
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add tpu spawn
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* make on_colab_kaggle utility func
* fixes
* move
* yapf
* .
* .
* .
* flake8
* sync accelerator connector changes from dev1.2
* changelog
* fix tpu handling
* tpu
* aval
* yapf
* Update pytorch_lightning/plugins/training_type/tpu_spawn.py
Co-authored-by: chaton <thomas@grid.ai>
* Update pytorch_lightning/accelerators/accelerator_connector.py
Co-authored-by: chaton <thomas@grid.ai>
* Update pytorch_lightning/plugins/training_type/tpu_spawn.py
Co-authored-by: chaton <thomas@grid.ai>
* Update tpu_spawn.py
* Update pytorch_lightning/accelerators/accelerator_connector.py
Co-authored-by: chaton <thomas@grid.ai>
* indentation
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: chaton <thomas@grid.ai>
|
2021-02-01 08:34:59 -05:00 |
Justus Schock
|
069ae27cef
|
Accelerator Refactor: Precision Plugins (#5718)
* add basic accelerator class.
Co-Authored with @awaelchi
* add basic trainign type plugin.
Co-Authored with @awaelchi
* pep8
Co-authored-by: @awaelchi
* update copyright
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add apex_amp
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add mixed base class
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add native amp
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add native amp sharded
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add tpu bfloat
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* add inits
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update precision_plugin.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
|
2021-01-31 13:12:02 -05:00 |
Justus Schock
|
5d239ccd70
|
Base classes for accelerator refactoring (#5715)
* add basic accelerator class.
Co-Authored with @awaelchi
* Add base plugin class.
Co-authored with @awaelchi
* add basic trainign type plugin.
Co-Authored with @awaelchi
* add basic precision plugin.
Co-Authored with @awaelchi
* Add missing inits.
Co-authored with @awaelchi
* pep8
Co-authored-by: @awaelchi
* ignore flake8
* coverage omit
* imports in init
* lost
* imports
* flake8
* .
* .
* chlog
* Update pytorch_lightning/plugins/training_type/training_type_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/training_type/training_type_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/training_type/training_type_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/training_type/training_type_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/training_type/training_type_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/training_type/training_type_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/training_type/training_type_plugin.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
|
2021-01-30 14:55:28 -05:00 |