ananthsub
aad86423f7
Remove more deprecated methods from base `Accelerator` class ( #10448 )
2021-11-10 12:58:24 +05:30
puhuk
f9b9cdb0d1
Remove deprecated accelerator pass through functions in Accelerator ( #10403 )
...
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-08 17:36:37 +00:00
Adrian Wälchli
a270a79ed9
Rename "master" methods to "main" in ClusterEnvironment plugins ( #10103 )
...
* rename occurrences of master port, master address, maser node, master process
* rename properties
* add property decorators
* occurrences in docs
* update changelog
* update changelog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* add lost method
* create deprecation
* add changelog
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix typo (but it was already there!!!)
* Apply suggestions from code review
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
* add todo
* update more occurences
* add types
* add missing import
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-08 12:32:58 +00:00
Carlos Mocholí
9237106451
Clip before step ( #10248 )
2021-10-30 11:27:49 +01:00
Kaushik B
cedaebfcbb
Add `auto_device_count` method to `Accelerators` ( #10222 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-29 22:31:32 +02:00
Carlos Mocholí
81d15c5986
Implement double optimizer closure for hook structure consistency ( #10167 )
2021-10-29 13:03:04 +00:00
Carlos Mocholí
03f01fb5ec
Fix gradient norm tracking and gradient clipping ( #9287 )
...
* WIP
* Progress
* Undo test change
* Fix plugin closure execution order
* Update CHANGELOG
* Fix manual optimization on AMP and skipping backward
* Fix for deepspeed
* Typo
* Hook test for manual closure
* Add skipping test with AMP
* You are hideous, apex
* Add deepspeed test
* Update CHANGELOG
* Fix for broken master
* Add RunIf
* FIXMEs
* Rename
* Fix grad norm
* add a simple test
* update test
* update test
* update test
* fix merge conflicts
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Sea of changes
* Undo change
* Introduce TPUPrecisionPlugin
* Undo changes
* Undo changes
* Resolve FIXME
* Undo change
* Undo change
* Undo change
* Fix FIXMEs
* Fix FIXME
* Correct value
* Bad merge
* Fix circular imports
* WIP
* Fixing clipping
* Fixes
* Bad merge
* Move optimizer step and clipping into the `PrecisionPlugin`
* Fix AMP
* Update CHANGELOG
* Fix tests
* Underscore
* Progress
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove pre_optimizer_step
* Missed one
* Progress
* Progress
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update FIXMEs
* Fix test
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix test
* DeepSpeed warning. mypy
* Rename
* Finish tests
* Update CHANGELOG
* Dumb fixes
* accelerator=auto
* Apply suggestions from code review
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
* Update on comments
* Use ClassifModule
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí
48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` ( #10143 )
2021-10-26 17:26:26 +02:00
Rohit Gupta
93266e2c22
Avoid deprecated warnings from accelerator and checkpoint connector #10142
2021-10-26 14:10:30 +02:00
Carlos Mocholí
b376799430
Minor fixes related to clipping ( #10130 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Adrian Wälchli
d41902883a
Update `optimizer_step` methods in accelerator and plugins ( #10023 )
...
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-20 21:36:27 +01:00
Carlos Mocholí
ef5a12212a
Isolate optimizer step logic to the `PrecisionPlugin` ( #10029 )
2021-10-20 15:43:08 +00:00
four4fish
a002f872ea
[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator ( #9901 )
...
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 17:38:22 +02:00
Rohit Gupta
4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` ( #9816 )
...
* deprecate hooks
* dep todo
* explicit
* Apply suggestions from code review
* Apply suggestions from code review
* code review
* base
2021-10-07 10:18:11 +00:00
Carlos Mocholí
0ddd6a8c19
Remove `_NATIVE_AMP_AVAILABLE` checks ( #9747 )
2021-09-29 15:34:26 +02:00
Carlos Mocholí
9ebfbbc349
Remove unused `post_optimizer_step` ( #9746 )
2021-09-29 13:09:22 +00:00
four4fish
15cd6ad45b
Call TrainingTypePlugin collective functions directly instead of going through the Accelerator ( #9677 )
...
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-27 14:52:57 +02:00
Danielle Pintz
ab069876cb
[1/4] Add get_device_stats to accelerator interface ( #9586 )
2021-09-26 21:09:16 -07:00
ananthsub
41e3be197f
Remove `call_configure_sharded_model` lifecycle property ( #9612 )
2021-09-24 03:57:53 +02:00
Carlos Mocholí
b1ed1db089
Keep global step update in the loop ( #8856 )
2021-09-14 19:21:39 +05:30
Kaushik B
b294c5760e
Fix type hint for filepath ( #9434 )
2021-09-10 21:38:54 +00:00
Danielle Pintz
cc2ac02dd1
Move add_to_queue/get_from_queue to DDPSpawnPlugin ( #9118 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-09-10 20:58:02 +00:00
Carlos Mocholí
3070a9ea6e
Fix hiddens type annotation ( #9377 )
2021-09-09 08:45:52 +01:00
Jirka Borovec
6e124e7207
CI: precommit - docformatter ( #8584 )
...
* CI: precommit - docformatter
* fix deprecated
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
four4fish
f01a9a6cd2
Remove `BasePlugin` ( #9066 )
...
* Remove BasePlugin
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-25 19:10:28 +00:00
four4fish
c912ebf889
Remove TrainingTypePlugin.on_save and Accelerator.on_save ( #9023 )
...
* Remove TrainingTypePlugin.on_save and Accelerator.on_save
2021-08-23 10:11:00 -07:00
ananthsub
8a931732ae
Remove unused `on_train_epoch_end` hook in accelerator ( #9035 )
2021-08-23 00:20:10 +05:30
four4fish
13e64e6a80
Remove deprecated functions from accelerator.py ( #9019 )
2021-08-22 00:25:42 +02:00
Carlos Mocholí
d0efb55b0f
Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 ( #8911 )
2021-08-16 13:34:42 +02:00
Carlos Mocholí
93ab24d1ee
Replace DataLoader sampler once for IPUs ( #8858 )
2021-08-16 11:28:05 +02:00
Carlos Mocholí
ed13040729
Connect the model to the training type plugin at the start of run ( #8536 )
2021-08-04 17:43:34 +02:00
Sean Naren
07b7dc9c17
[Fix] Add delay property for checkpointing, refactor loading checkpoint (DeepSpeed Checkpointing Fix 1/n) ( #8627 )
...
* Add property to delay checkpointing, move loading checkpoint file into the run function to allow deepspeed engine to be loaded
* Add a small test
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update pytorch_lightning/accelerators/accelerator.py
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Address review
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-30 11:31:08 +01:00
Carlos Mocholí
a64cc37394
Replace `yapf` with `black` ( #7783 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
thomas chaton
c9af1a7aec
[bugfix] Reduce memory leaks ( #8490 )
...
* reduce memory leak
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update changelog
* Apply suggestions from code review
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* resolve flake8
* update on comments
* resolve bug
* update
* Undo whitespace changes
* remove bug
* resolve flake8
* revert change
* update on comments
* delete the ddp wrapper as it hold memory
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* resolve flake8
* update on comments
* update changelog
* resolve test
* Update CHANGELOG
* Refactor teardown
* Fix comment
* Do it for non-gpu too
* remove ref when the model is not a lightning_module
* Fix import error
* move down
* resolve bug
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* resolve assignement
* update
* move above
* Fix device calls to support tpu training
* Updat todo
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-07-21 11:37:05 +02:00
Carlos Mocholí
c5a120ed9d
Update to Mypy>0.9 ( #8386 )
2021-07-13 08:23:36 +02:00
Carlos Mocholí
eb6d991218
Refactor plugins backward ( #8328 )
2021-07-08 16:02:09 +02:00
Adrian Wälchli
ea5cfd2005
move batch to device before sending it to hooks ( #7378 )
...
* update train step
* test
* x
* limits
* val
* typeo
* x
* x
* step
* min gpus
* run all loops
* x
* limit test
* profiler
* clean up accelerator code
* move files
* rename
* move tests
* changelog
* reorder callbacks and model hooks
* add test description
* replace unneccessary method
* fix chlog
* adjust batch_to_device for DP Plugin
* update tests for dataloader idx
* unused imports
* hook change
* switch None
* clear memory
* change to None
* None
* None
* memory savings
* remove redundant todo
* hack
* cheat
* Revert "cheat"
This reverts commit a8433bd0b4
.
* Revert "hack"
This reverts commit 43a6d1edeb
.
* update new epoch loop
* remove from old loop code
* update chlog
* update hook test
* changelog
* teardown
* integrate changes in new eval loop
* fix hook calls
* add prediction step
* bad merge
* Revert "bad merge"
This reverts commit 488080863c
.
* fix train batch hook test
* rm -rf _notebooks
* update chlog
* release memory
* fix type
* notebooks mess
* debug
* Revert "debug"
This reverts commit eec4ee2f77
.
* teardown
* fix teardown bug
* debug
* x
* debug
* Revert "debug"
This reverts commit a6e6101946
.
Revert "debug"
This reverts commit 5ddeaec069
.
debug
debug
Revert "debug"
This reverts commit 605be746f7daedf265b2c05a1c153ce543394435.
Revert "Revert "debug""
This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8.
debug
x
x
x
s
tol
x
tol
* Fix changelog
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-05 09:31:39 +01:00
deepsource-autofix[bot]
03154eb30a
Refactor unnecessary `else` / `elif` when `if` block has a `return` statement ( #8156 )
...
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-28 15:27:41 +05:30
Carlos Mocholí
4d9b72b8a9
Nuke RPC ( #8101 )
2021-06-23 18:31:13 +00:00
Sean Naren
41be61c6f2
[IPU] Add hooks for IPU lifecycle 4/5 ( #7864 )
2021-06-07 12:06:41 +00:00
Sean Naren
6388c29e87
[IPU] Add reset dataloader hooks to training type plugin 3/n ( #7861 )
...
* Add hooks
* Add tests for hooks
* Add changelog
* Test changes, add typing
2021-06-07 10:37:09 +00:00
shuyingsunshine21
2242423b75
refactor accelerator teardown -> training type plugin teardown ( #7579 )
2021-05-22 13:19:24 -07:00
Rohit Gupta
7ca41734da
Add `dataloader_idx` to batch transfer hooks ( #6241 )
...
* replace with kwargs
* chlog
* fix
* add test
* fix
* device
* deepspeed
* pep
* optional
* docs
* bc
* comments
* pep
* mypy
* pep
* Apply suggestions from code review
* kwargs
* docs
* .
* .
* 1.3 -> 1.4
* kwargs -> step_kwargs
2021-05-13 23:03:55 +05:30
shuyingsunshine21
8538c1f61e
Accelerator model state dict ( #7474 )
...
* Fix some test errors
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
* checkpoint consolidation
* Update ddp_spawn.py
* Update test_metric_result_integration.py
* Update test_results.py
* Update utils.py
* Update utils.py
* Update test_all_gather_grad.py
* Update test_all_gather_grad.py
* Update test_results.py
* Revert "Update test_results.py"
This reverts commit 9d4a2b891d
.
* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"
This reverts commit c5053da789
, reversing
changes made to 0d23d75bc9
.
* Revert "Update test_all_gather_grad.py"
This reverts commit 0d23d75bc9
.
* Revert "Update utils.py"
This reverts commit 70fe5da9c6
.
* Revert "Update utils.py"
This reverts commit a9aae99f6e
.
* Revert "Update test_results.py"
This reverts commit ea74906878
.
* Revert "Update test_metric_result_integration.py"
This reverts commit bf70e431b3
.
* Revert "Update ddp_spawn.py"
This reverts commit f17210183b
.
* Revert "checkpoint consolidation"
This reverts commit 536c1323b0
.
* Revert "Revert "checkpoint consolidation""
This reverts commit 3a9fde915a
.
* Revert "Revert "Revert "checkpoint consolidation"""
This reverts commit 7a369f47e1
.
* Revert "Revert "Update ddp_spawn.py""
This reverts commit 8222dc98ea
.
* Revert "Revert "Update test_metric_result_integration.py""
This reverts commit 6c095b2370
.
* Revert "Revert "Update test_results.py""
This reverts commit 250d0aaaa2
.
* Revert "Revert "Update utils.py""
This reverts commit 8651d54d79
.
* Revert "Revert "Update test_all_gather_grad.py""
This reverts commit dcdcd29731
.
* modify distributed environment to make test pass
* modify model state dict to training type plugin
* remove changes
* add changelog
* fixing isort for pre-commit failure
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Address code review
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-11 16:39:04 +01:00
ananthsub
6104a6316a
[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks ( #7339 )
...
* Remove outputs from on_train_epoch_end
* iterate
* Update callback_hook.py
* update
* Update training_loop.py
* Update test_training_loop.py
* early stop?
* fix
* update tests
* Update test_hooks.py
* Update pytorch_lightning/trainer/callback_hook.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update pytorch_lightning/trainer/training_loop.py
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
* Update trainer.py
* Update pytorch_lightning/trainer/trainer.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 17:18:16 +02:00
ananthsub
98670c83a9
Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule ( #7323 )
...
* deprecate-tbptt-trainer
* Update CHANGELOG.md
* Update lightning.py
* test
* Update lightning.py
* Update training_loop.py
* Update training_loop.py
* Update lightning.py
* Update training_loop.py
* Update training_loop.py
* update docs
* Update accelerator.py
* Update accelerator.py
* more docs
* tweaks
* chlog
* comments
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 11:21:00 +01:00
Carlos Mocholí
8c0ea92af2
`TrainerState` refactor [5/5] ( #7173 )
...
* `TrainerState` refactor
* flake8
* Update finished check
* Test cleanup
* Fix tests
* Fixes
* Reorder
* flake8
* Update CHANGELOG
* Better docs
* Better docs
* Remove default
* Update tests
* Bad merge
2021-05-04 12:50:56 +02:00
ananthsub
39274273a4
Update accelerator.py ( #7318 )
2021-05-03 11:17:26 -04:00
Adrian Wälchli
e0c64f0ef6
Fix Adagrad optimizer not working with DDP/GPU ( #7277 )
...
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-05-03 03:57:17 +05:30
thomas chaton
16d6c9828d
[bugfix] Apex never instantiated. ( #7274 )
...
* update
* update
* update apex
* update
* update
* update
* remove test.py
* update
* update
* update on comments
* update changelog
* update
* update
* typo
2021-04-30 13:16:28 -04:00