Adrian Wälchli
321502fe31
Update backward hook for `PrecisionPlugin` ( #10008 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-19 10:51:45 +00:00
Adrian Wälchli
10d0b41977
Introduce `PrecisionPlugin.forward_context()` ( #9988 )
...
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-10-18 12:58:19 +00:00
Carlos Mocholí
0ddd6a8c19
Remove `_NATIVE_AMP_AVAILABLE` checks ( #9747 )
2021-09-29 15:34:26 +02:00
Carlos Mocholí
44aed17aff
Remove duplicated native AMP + LBFGS check ( #9748 )
2021-09-29 13:14:03 +00:00
Carlos Mocholí
9ebfbbc349
Remove unused `post_optimizer_step` ( #9746 )
2021-09-29 13:09:22 +00:00
Carlos Mocholí
6892d533ea
Run plugin closure before `on_before_optimizer_step` [1/2] ( #9288 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-09-07 11:52:20 +00:00
Jirka Borovec
6e124e7207
CI: precommit - docformatter ( #8584 )
...
* CI: precommit - docformatter
* fix deprecated
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
John St. John
c30d9b9fae
Update call to `amp.autocast` from `fast_dtype` to `dtype` ( #9211 )
...
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-09-04 02:59:11 +00:00
four4fish
f01a9a6cd2
Remove `BasePlugin` ( #9066 )
...
* Remove BasePlugin
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-25 19:10:28 +00:00
Sean Naren
bac8b1be81
Add support for CPU AMP autocast ( #9084 )
2021-08-25 12:18:00 +00:00
Sean Naren
1bab0a17a9
Fix torch bfloat import version ( #9089 )
2021-08-24 19:18:12 +00:00
Sean Naren
1feec8c601
Add bfloat16 support to Lightning Trainer ( #9049 )
2021-08-24 09:47:21 +00:00
Carlos Mocholí
e63968ab88
Add `pyupgrade` to `pre-commit` ( #8557 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 14:38:12 +02:00
Carlos Mocholí
a64cc37394
Replace `yapf` with `black` ( #7783 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Dusan Drevicky
1b06edf2f2
Add the `on_before_optimizer_step` hook ( #8048 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-07-09 13:30:52 +02:00
thomas chaton
1c825a2a9c
Add the `on_before_backward` hook ( #7865 )
...
* Add callback to hook tests and add predict test
* Fix lambda callback test
* Simplify lambda call test
* Use LambdaCallback
* Dynamically append to called for the model
* Remove print
* Consistency
* Consistency
* Prepare args/kwargs testing
* yapf doesn't like dict literals
* Add arguments for fit no val test
* Add arguments for fit no val test
* add before_backward_hook
* add test
* resolve flake8
* resolve tests
* update changelog
* add on_before_backward to LightningModule
* update on comments
* Test arguments
* Datamodule refactor
* Fix eval test
* remove extra file
* resolve bug
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* move to hooks
* update
* resolve flake8
* update on comments
* Update full fit + val test
* Update test
* Remove FIXME
* Remove FIXME
* Undo change
* Fix
* Parametrize fit hook test
* Comment
* Parametrize fit hook test with different precision plugins
* Fix tests
* Parametrize fit hook test with manual optimization
* Unnecessary parenthesis
* WIP
* Comments
* Fix message
* Test CI error
* Revert "Test CI error"
This reverts commit 39c4a85a83
.
* Add ddp training type teardown
* Update CHANGELOG
* Adrian's fix
* Use destructor
* Update CHANGELOG.md
* RPC destructor
* Update pytorch_lightning/plugins/training_type/ddp.py
* Why do you not work :(
* Missing condition
* Fix deepspeed test
* GC collect in conftest
* Do not show warnings for special tests
* Needs to run on 1.8
To avoid: "RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8"
* Run torch 1.8
* Skip test due to 'Python bus error'
* Debug NCCL
* shm size
* Disable warnings for special tests
* Remove NCCL_DEBUG statement
* Try smaller shm size
* Revert "Skip test due to 'Python bus error'"
This reverts commit e0a3e8785d
.
* README and adjust versions
* Avoid self.on_gpu call
* empty cache cleanup
* More garbage collection
* Unroll parametrizations
* Do not reuse mock
* Undo changes
* Undo notebooks modification
* resolve test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* delete file
* Undo
* Fix test
* Revert "WIP"
This reverts commit f5828a8c42
.
* Rename
* Remove optimizers
* Fix bug with LightningOptimizer
* Add optimizers
* update
* update
* Update CHANGELOG
* On after backward refactor
* Do not call super
* Fixes
* Remove should_accumulate
* pre/post backward refactor
* Call the LM backward hook
* Update tests
* Remove dev debug patch
* Fix test
* Remove optimizer arguments and typing
* Docs fixes
* Fix comment
* Undo changes
* Split manual and auto
* Undo change
* Deepsource
* Remove optimizers
* Undo changes
* Call the hook
* Docs
* Docs
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-09 06:15:57 +00:00
Carlos Mocholí
eb6d991218
Refactor plugins backward ( #8328 )
2021-07-08 16:02:09 +02:00
Carlos Mocholí
c4353ea702
Remove `dev_debugger.call_count` ( #8317 )
2021-07-07 19:59:59 +02:00
Carlos Mocholí
ea88105b88
Parametrize fit hook test with different precision plugins ( #8070 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-07-05 10:50:01 +00:00
deepsource-autofix[bot]
7e2f84e050
Remove methods with unnecessary super delegation. ( #8148 )
...
* Remove methods with unnecessary super delegation.
* Update fully_sharded.py
* replace init in test
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
2021-07-02 08:00:55 +00:00
Ethan Harris
57dce7244c
Fix double precision casting complex buffers ( #8208 )
...
* Fix double precision casting complex buffers
* Update CHANGELOG.md
* Fixes
* Fixes
* Fix
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-30 10:57:42 +01:00
thomas chaton
24db914093
Support state restoration of logged results 2/2( #7966 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-25 19:16:11 +00:00
Edgar Riba
b378806b6c
Add `add_to_queue`/`get_from_queue` for DDP spawn( #7916 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-06-23 03:19:37 +02:00
Yifu Wang
b71aa55b9e
Make optimizers skippable when using amp ( #7975 )
...
Co-authored-by: Yifu Wang <yifuwang@2012@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-16 00:23:30 +00:00
Sean Naren
96433d03ea
IPU Integration 5/5 ( #7867 )
...
* Initial changes
* Add broken example for now
* Fix reference
* Fix format
* Code runs
* Fixes
* Clear up files
* Add tests, helpers, fixes
* Small cleanups
* Refactors based on review
* Swap to special tests
* Add special tests
* Add source
* Cleanups
* Add logic to attach/detach model from devices
* Fixes for tests
* Fixes for tests
* Move earlier
* Cleanups
* Add check for nvcc
* Add tests, cleanups
* Fix errors
* fix
* Try condition
* Add missing annotation
* Clearer
* Clearer message
* Fix variable
* Cleanups
* Add comment
* CHANGELOG.md
* Add simple selection test
* Remove special=True to see what happens
* Fix test
* Update tests/accelerators/test_ipu.py
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
* Convert ipu_cores -> ipus
* Add typing, fail earlier
* simplify precision
* Add test, add helper
* fix accum
* Update pytorch_lightning/plugins/training_type/ipu.py
Co-authored-by: thomas chaton <thomas@grid.ai>
* Use stages
* Make sure warning message returned
* thorw error
* Add more tests, use fs
* add comment
* Clean
* Address feedback, add IPU tests
* Fixes
* Fix signature
* Add types
* Remove autoround
* Add docstring
* ipu_cores -> ipus
* Add test, remove unnecessary precision set
* Add optimizer test
* Add precision back with test
* Address code review
* Change to probs
* Move some of the asserts earlier
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-06-11 15:07:04 +00:00
Adrian Wälchli
cfd01d7f8d
move amp checkpoint state management to precision plugin ( #7831 )
2021-06-07 07:45:01 +00:00
Ethan Harris
03bb389b21
Fix double precision + ddp_spawn ( #6924 )
...
* Initial fix
* Initial fix
* Initial fix
* Updates
* Updates
* Update typing and docs
* Undo accidental refactor
* Remove unused imports
* Add DDP double precision test
* Remove unused variable
* Update CHANGELOG.md
* Fix test
* Update tests
* Formatting
* Revert bad change
* Add back changes
* Correct wrapping order
* Improve unwrapping
* Correct wrapping order
* Fix... finally
* Respond to comments
* Drop ddp test
* Simplify ddp spawn test
* Simplify ddp spawn test
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-06-01 15:21:17 +00:00
Carlos Mocholí
d47173bb72
Use typing forward references ( #7770 )
...
* Use typing forward references
* Update pytorch_lightning/core/lightning.py
2021-05-31 09:54:28 +02:00
shuyingsunshine21
299f2c481b
FSDP with full state dict ( #7487 )
...
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d
.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789
, reversing
changes made to 0d23d75bc9
.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9
.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6
.
* Revert "Update utils.py"
This reverts commit a9aae99f6e
.
* Revert "Update test_results.py"
This reverts commit ea74906878
.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3
.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b
.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0
.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a
.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1
.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea
.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370
.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2
.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79
.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731
.
* modify distributed environment to make test pass
* fix version for ddp plugin test
* fix
* fix
* changelog
* Update CHANGELOG.md
* fsdp with full state dict
* fix missing import
* modify unitest
* fix
* fix
* fix typo
* modify test and add changelog
* fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* limit max_epoch to 1 for testing
* test
* fix
* update
* testing remove special for multi gpu
* assert gpu
* add assertion for gpu
* fix
* Re-enable special test, use ModelCheckpoint
* Fix paths
* Fix path passing
* test
* test
* fix test
* fix
* pre-commit format
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-24 08:11:45 +01:00
Carlos Mocholí
8208c330eb
Use `torch.nn.utils.clip_grad_norm_` and add `clip_grad_by_value` support for TPU ( #7025 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-05-07 16:41:39 +00:00
thomas chaton
16d6c9828d
[bugfix] Apex never instantiated. ( #7274 )
...
* update
* update
* update apex
* update
* update
* update
* remove test.py
* update
* update
* update on comments
* update changelog
* update
* update
* typo
2021-04-30 13:16:28 -04:00
Carlos Mocholí
ca6c87ffbe
Add back `clip_gradients(model)` ( #7231 )
2021-04-27 11:34:02 +00:00
ananthsub
3f1a08ab00
Fix mypy checks for double precision plugin ( #7151 )
2021-04-22 11:29:38 +01:00
thomas chaton
013756404b
[bugfix] Add set_default_tensor_type to torch.DoubleTensor with precision=64 ( #7108 )
...
* update
* Update pytorch_lightning/plugins/precision/double.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/plugins/precision/double.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update pytorch_lightning/plugins/precision/double.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* resolve tests
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-04-20 15:25:37 +00:00
Carlos Mocholí
898ec8a94a
Create pytorch_lightning/utilities/types.py ( #7048 )
2021-04-19 14:43:16 +02:00
Carlos Mocholí
f29ecbfd90
Typing for accelerators and plugins ( #7022 )
2021-04-15 16:48:16 +00:00
Ethan Harris
f645df5e9a
Add typings for evaluation_loop.py and remove some dead code ( #7015 )
2021-04-15 07:36:04 +00:00
Adrian Wälchli
d3f73a0a74
Plugin Docs ( #6952 )
...
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-04-14 20:53:21 +00:00
Anthony Kim
7f6154fcad
Add `Trainer(gradient_clip_algorithm='value'|'norm')` ( #6123 )
...
* add changelog
* add clip by value
* fix bug in training tricks.rst
* fix bug in trainer.rst
* Update trainer.rst
* Update trainer.rst
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/plugins/precision/deepspeed_precision.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/utilities/enums.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* yapf formatting
* update training tricks
* update based on comment
* update based on comment
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
* update based on comment
* pep8
* mypy
* mypy
* Update docs/source/advanced/training_tricks.rst
Co-authored-by: thomas chaton <thomas@grid.ai>
* Update sharded_native_amp.py
* Update test_sharded_parity.py
* update test codes
* Update test_tpu.py
* Update pytorch_lightning/trainer/connectors/training_trick_connector.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* Update test_trainer.py
* Update enums.py
* Update enums.py
* add super-class initialization to precision plugins.
* add clip_grad horovod cpu test
* add clip_grad horovod cpu test
* use subprocess check_call
* change order of horovod tests
* set max_epochs 2 in horovod test
* remove clip_grad_val test from horovod-cpu
* remove "type: ignore"
* divide clip grad val test in horovod
* update based on comments
* add super-class initialization to precision plugins.
* bugfix
* bugfix
* revert some changes
* revert some changes
* Update tests/models/test_horovod.py
* merge master
* Delete signature test
No point in testing a signature
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-04-06 08:27:37 -05:00
Kaushik B
a72a7992a2
Update clip gradients signature for precision plugins ( #6764 )
2021-03-31 17:06:48 +05:30
thomas chaton
1302766f83
DeepSpeed ZeRO Update ( #6546 )
...
* Add context to call hook to handle all modules defined within the hook
* Expose some additional parameters
* Added docs, exposed parameters
* Make sure we only configure if necessary
* Setup activation checkpointing regardless, saves the user having to do it manually
* Add some tests that fail currently
* update
* update
* update
* add tests
* change docstring
* resolve accumulate_grad_batches
* resolve flake8
* Update DeepSpeed to use latest version, add some comments
* add metrics
* update
* Small formatting fixes, clean up some code
* Few cleanups
* No need for default state
* Fix tests, add some boilerplate that should move eventually
* Add hook removal
* Add a context manager to handle hook
* Small naming cleanup
* wip
* move save_checkpoint responsability to accelerator
* resolve flake8
* add BC
* Change recommended scale to 16
* resolve flake8
* update test
* update install
* update
* update test
* update
* update
* update test
* resolve flake8
* update
* update
* update on comments
* Push
* pull
* Update pytorch_lightning/plugins/training_type/deepspeed.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/plugins/training_type/deepspeed.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update
* Apply suggestions from code review
* Swap to using world size defined by plugin
* update
* update todo
* Remove deepspeed from extra, keep it in the base cuda docker install
* Push
* pull
* update
* update
* update
* update
* Minor changes
* duplicate
* format
* format2
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-03-30 13:39:02 -04:00
Ethan Harris
d02fe342c1
Feature/double precision ( #6595 )
...
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-03-24 15:47:58 +05:30
Justus Schock
634d83134f
Add AMP for validation, prediction and testing ( #6565 )
...
* Add Tests for val and test-steps
* Add native AMP
* pep8 tests
* pep8 plugin
* changelog
2021-03-20 23:15:49 +00:00
Kaushik B
87c03b1038
Update Gradient Clipping for TPU Accelerator ( #6576 )
2021-03-20 01:02:57 +05:30
thomas chaton
0544efd453
[bug] Update broadcast + reduce decision ModelCheckpoint] ( #6410 )
...
* resolve bug
* update
* update changelog
* update PR
* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* add todo
* resolve issues
* resolve flake8
* update
* add coverage for reduce
* wip
* restore back to brodbact
* remove test.py
* resolve flake8
* update
* check world size
* resolve test
* update
* use pytorch version when defined
* update on comments
* update on comments
* flake8
* resolve bugs
* Update CHANGELOG.md
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
* update
* update
* update
* update
* remove test
* update
* resolve flake8
* update
* update
* update
* proxy
* update
* update
* resolve typo
* prune
* update parallel
* update
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-14 17:14:27 +00:00
Sean Naren
39231aee1a
[Fix] Call clip gradients if clip val greater than 0 ( #6330 )
...
* Call clip gradients if clip val greater than 0
* format
* Format
* Move to top of file
2021-03-04 19:45:58 +00:00
Jirka Borovec
dcec4efe03
Simplify test for AMP plugins ( #6311 )
...
* AMP
* fuse
* yapf
2021-03-03 08:56:57 +01:00
Jirka Borovec
58a6d59784
simplify skip-if tests >> 0/n ( #5920 )
...
* skipif + yapf + isort
* tests
* docs
* pp
2021-03-01 12:17:09 +00:00
Justus Schock
0647340f3b
Add mypy typing to precision plugins. ( #6149 )
...
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2021-02-26 14:27:16 +01:00
Kaushik B
e7298b5d38
fix parallel devices return type & add copyright ( #6215 )
2021-02-26 11:09:08 +01:00