Commit Graph

120 Commits

Author SHA1 Message Date
ananthsub aad86423f7
Remove more deprecated methods from base `Accelerator` class (#10448) 2021-11-10 12:58:24 +05:30
puhuk f9b9cdb0d1
Remove deprecated accelerator pass through functions in Accelerator (#10403)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-08 17:36:37 +00:00
Adrian Wälchli a270a79ed9
Rename "master" methods to "main" in ClusterEnvironment plugins (#10103)
* rename occurrences of master port, master address, maser node, master process

* rename properties

* add property decorators

* occurrences in docs

* update changelog

* update changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add lost method

* create deprecation

* add changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo (but it was already there!!!)

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* add todo

* update more occurences

* add types

* add missing import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-08 12:32:58 +00:00
Carlos Mocholí 9237106451
Clip before step (#10248) 2021-10-30 11:27:49 +01:00
Kaushik B cedaebfcbb
Add `auto_device_count` method to `Accelerators` (#10222)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-29 22:31:32 +02:00
Carlos Mocholí 81d15c5986
Implement double optimizer closure for hook structure consistency (#10167) 2021-10-29 13:03:04 +00:00
Carlos Mocholí 03f01fb5ec
Fix gradient norm tracking and gradient clipping (#9287)
* WIP

* Progress

* Undo test change

* Fix plugin closure execution order

* Update CHANGELOG

* Fix manual optimization on AMP and skipping backward

* Fix for deepspeed

* Typo

* Hook test for manual closure

* Add skipping test with AMP

* You are hideous, apex

* Add deepspeed test

* Update CHANGELOG

* Fix for broken master

* Add RunIf

* FIXMEs

* Rename

* Fix grad norm

* add a simple test

* update test

* update  test

* update test

* fix merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sea of changes

* Undo change

* Introduce TPUPrecisionPlugin

* Undo changes

* Undo changes

* Resolve FIXME

* Undo change

* Undo change

* Undo change

* Fix FIXMEs

* Fix FIXME

* Correct value

* Bad merge

* Fix circular imports

* WIP

* Fixing clipping

* Fixes

* Bad merge

* Move optimizer step and clipping into the `PrecisionPlugin`

* Fix AMP

* Update CHANGELOG

* Fix tests

* Underscore

* Progress

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove pre_optimizer_step

* Missed one

* Progress

* Progress

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FIXMEs

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix test

* DeepSpeed warning. mypy

* Rename

* Finish tests

* Update CHANGELOG

* Dumb fixes

* accelerator=auto

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update on comments

* Use ClassifModule

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí 48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` (#10143) 2021-10-26 17:26:26 +02:00
Rohit Gupta 93266e2c22
Avoid deprecated warnings from accelerator and checkpoint connector #10142 2021-10-26 14:10:30 +02:00
Carlos Mocholí b376799430
Minor fixes related to clipping (#10130)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Adrian Wälchli d41902883a
Update `optimizer_step` methods in accelerator and plugins (#10023)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-20 21:36:27 +01:00
Carlos Mocholí ef5a12212a
Isolate optimizer step logic to the `PrecisionPlugin` (#10029) 2021-10-20 15:43:08 +00:00
four4fish a002f872ea
[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator (#9901)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 17:38:22 +02:00
Rohit Gupta 4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816)
* deprecate hooks

* dep todo

* explicit

* Apply suggestions from code review

* Apply suggestions from code review

* code review

* base
2021-10-07 10:18:11 +00:00
Carlos Mocholí 0ddd6a8c19
Remove `_NATIVE_AMP_AVAILABLE` checks (#9747) 2021-09-29 15:34:26 +02:00
Carlos Mocholí 9ebfbbc349
Remove unused `post_optimizer_step` (#9746) 2021-09-29 13:09:22 +00:00
four4fish 15cd6ad45b
Call TrainingTypePlugin collective functions directly instead of going through the Accelerator (#9677)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-27 14:52:57 +02:00
Danielle Pintz ab069876cb
[1/4] Add get_device_stats to accelerator interface (#9586) 2021-09-26 21:09:16 -07:00
ananthsub 41e3be197f
Remove `call_configure_sharded_model` lifecycle property (#9612) 2021-09-24 03:57:53 +02:00
Carlos Mocholí b1ed1db089
Keep global step update in the loop (#8856) 2021-09-14 19:21:39 +05:30
Kaushik B b294c5760e
Fix type hint for filepath (#9434) 2021-09-10 21:38:54 +00:00
Danielle Pintz cc2ac02dd1
Move add_to_queue/get_from_queue to DDPSpawnPlugin (#9118)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-09-10 20:58:02 +00:00
Carlos Mocholí 3070a9ea6e
Fix hiddens type annotation (#9377) 2021-09-09 08:45:52 +01:00
Jirka Borovec 6e124e7207
CI: precommit - docformatter (#8584)
* CI: precommit - docformatter
* fix deprecated

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
four4fish f01a9a6cd2
Remove `BasePlugin` (#9066)
* Remove BasePlugin

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-25 19:10:28 +00:00
four4fish c912ebf889
Remove TrainingTypePlugin.on_save and Accelerator.on_save (#9023)
* Remove TrainingTypePlugin.on_save and Accelerator.on_save
2021-08-23 10:11:00 -07:00
ananthsub 8a931732ae
Remove unused `on_train_epoch_end` hook in accelerator (#9035) 2021-08-23 00:20:10 +05:30
four4fish 13e64e6a80
Remove deprecated functions from accelerator.py (#9019) 2021-08-22 00:25:42 +02:00
Carlos Mocholí d0efb55b0f
Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 (#8911) 2021-08-16 13:34:42 +02:00
Carlos Mocholí 93ab24d1ee
Replace DataLoader sampler once for IPUs (#8858) 2021-08-16 11:28:05 +02:00
Carlos Mocholí ed13040729
Connect the model to the training type plugin at the start of run (#8536) 2021-08-04 17:43:34 +02:00
Sean Naren 07b7dc9c17
[Fix] Add delay property for checkpointing, refactor loading checkpoint (DeepSpeed Checkpointing Fix 1/n) (#8627)
* Add property to delay checkpointing, move loading checkpoint file into the run function to allow deepspeed engine to be loaded

* Add a small test

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/accelerators/accelerator.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Address review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-30 11:31:08 +01:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
thomas chaton c9af1a7aec
[bugfix] Reduce memory leaks (#8490)
* reduce memory leak

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* resolve flake8

* update on comments

* resolve bug

* update

* Undo whitespace changes

* remove bug

* resolve flake8

* revert change

* update on comments

* delete the ddp wrapper as it hold memory

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve flake8

* update on comments

* update changelog

* resolve test

* Update CHANGELOG

* Refactor teardown

* Fix comment

* Do it for non-gpu too

* remove ref when the model is not a lightning_module

* Fix import error

* move down

* resolve bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve assignement

* update

* move above

* Fix device calls to support tpu training

* Updat todo

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-07-21 11:37:05 +02:00
Carlos Mocholí c5a120ed9d
Update to Mypy>0.9 (#8386) 2021-07-13 08:23:36 +02:00
Carlos Mocholí eb6d991218
Refactor plugins backward (#8328) 2021-07-08 16:02:09 +02:00
Adrian Wälchli ea5cfd2005
move batch to device before sending it to hooks (#7378)
* update train step

* test

* x

* limits

* val

* typeo

* x

* x

* step

* min gpus

* run all loops

* x

* limit test

* profiler

* clean up accelerator code

* move files

* rename

* move tests

* changelog

* reorder callbacks and model hooks

* add test description

* replace unneccessary method

* fix chlog

* adjust batch_to_device for DP Plugin

* update tests for dataloader idx

* unused imports

* hook change

* switch None

* clear memory

* change to None

* None

* None

* memory savings

* remove redundant todo

* hack

* cheat

* Revert "cheat"

This reverts commit a8433bd0b4.

* Revert "hack"

This reverts commit 43a6d1edeb.

* update new epoch loop

* remove from old loop code

* update chlog

* update hook test

* changelog

* teardown

* integrate changes in new eval loop

* fix hook calls

* add prediction step

* bad merge

* Revert "bad merge"

This reverts commit 488080863c.

* fix train batch hook test

* rm -rf _notebooks

* update chlog

* release memory

* fix type

* notebooks mess

* debug

* Revert "debug"

This reverts commit eec4ee2f77.

* teardown

* fix teardown bug

* debug

* x

* debug

* Revert "debug"

This reverts commit a6e6101946.

Revert "debug"

This reverts commit 5ddeaec069.

debug


debug


Revert "debug"

This reverts commit 605be746f7daedf265b2c05a1c153ce543394435.

Revert "Revert "debug""

This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8.

debug


x


x


x


s


tol


x


tol

* Fix changelog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-05 09:31:39 +01:00
deepsource-autofix[bot] 03154eb30a
Refactor unnecessary `else` / `elif` when `if` block has a `return` statement (#8156)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-28 15:27:41 +05:30
Carlos Mocholí 4d9b72b8a9
Nuke RPC (#8101) 2021-06-23 18:31:13 +00:00
Sean Naren 41be61c6f2
[IPU] Add hooks for IPU lifecycle 4/5 (#7864) 2021-06-07 12:06:41 +00:00
Sean Naren 6388c29e87
[IPU] Add reset dataloader hooks to training type plugin 3/n (#7861)
* Add hooks

* Add tests for hooks

* Add changelog

* Test changes, add typing
2021-06-07 10:37:09 +00:00
shuyingsunshine21 2242423b75
refactor accelerator teardown -> training type plugin teardown (#7579) 2021-05-22 13:19:24 -07:00
Rohit Gupta 7ca41734da
Add `dataloader_idx` to batch transfer hooks (#6241)
* replace with kwargs

* chlog

* fix

* add test

* fix

* device

* deepspeed

* pep

* optional

* docs

* bc

* comments

* pep

* mypy

* pep

* Apply suggestions from code review

* kwargs

* docs

* .

* .

* 1.3 -> 1.4

* kwargs -> step_kwargs
2021-05-13 23:03:55 +05:30
shuyingsunshine21 8538c1f61e
Accelerator model state dict (#7474)
* Fix some test errors
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* checkpoint consolidation

* Update ddp_spawn.py

* Update test_metric_result_integration.py

* Update test_results.py

* Update utils.py

* Update utils.py

* Update test_all_gather_grad.py

* Update test_all_gather_grad.py

* Update test_results.py

* Revert "Update test_results.py"

This reverts commit 9d4a2b891d.

* Revert "Merge pull request #1 from shuyingsunshine21/shuyingsunshine21-checkpoint_consolidate"

This reverts commit c5053da789, reversing
changes made to 0d23d75bc9.

* Revert "Update test_all_gather_grad.py"

This reverts commit 0d23d75bc9.

* Revert "Update utils.py"

This reverts commit 70fe5da9c6.

* Revert "Update utils.py"

This reverts commit a9aae99f6e.

* Revert "Update test_results.py"

This reverts commit ea74906878.

* Revert "Update test_metric_result_integration.py"

This reverts commit bf70e431b3.

* Revert "Update ddp_spawn.py"

This reverts commit f17210183b.

* Revert "checkpoint consolidation"

This reverts commit 536c1323b0.

* Revert "Revert "checkpoint consolidation""

This reverts commit 3a9fde915a.

* Revert "Revert "Revert "checkpoint consolidation"""

This reverts commit 7a369f47e1.

* Revert "Revert "Update ddp_spawn.py""

This reverts commit 8222dc98ea.

* Revert "Revert "Update test_metric_result_integration.py""

This reverts commit 6c095b2370.

* Revert "Revert "Update test_results.py""

This reverts commit 250d0aaaa2.

* Revert "Revert "Update utils.py""

This reverts commit 8651d54d79.

* Revert "Revert "Update test_all_gather_grad.py""

This reverts commit dcdcd29731.

* modify distributed environment to make test pass

* modify model state dict to training type plugin

* remove changes

* add changelog

* fixing isort for pre-commit failure

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address code review

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: SeanNaren <sean@grid.ai>
2021-05-11 16:39:04 +01:00
ananthsub 6104a6316a
[1/2] Deprecate `outputs` in `on_train_epoch_end` hooks (#7339)
* Remove outputs from on_train_epoch_end

* iterate

* Update callback_hook.py

* update

* Update training_loop.py

* Update test_training_loop.py

* early stop?

* fix

* update tests

* Update test_hooks.py

* Update pytorch_lightning/trainer/callback_hook.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update pytorch_lightning/trainer/training_loop.py

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Update trainer.py

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 17:18:16 +02:00
ananthsub 98670c83a9
Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule (#7323)
* deprecate-tbptt-trainer

* Update CHANGELOG.md

* Update lightning.py

* test

* Update lightning.py

* Update training_loop.py

* Update training_loop.py

* Update lightning.py

* Update training_loop.py

* Update training_loop.py

* update docs

* Update accelerator.py

* Update accelerator.py

* more docs

* tweaks

* chlog

* comments

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 11:21:00 +01:00
Carlos Mocholí 8c0ea92af2
`TrainerState` refactor [5/5] (#7173)
* `TrainerState` refactor

* flake8

* Update finished check

* Test cleanup

* Fix tests

* Fixes

* Reorder

* flake8

* Update CHANGELOG

* Better docs

* Better docs

* Remove default

* Update tests

* Bad merge
2021-05-04 12:50:56 +02:00
ananthsub 39274273a4
Update accelerator.py (#7318) 2021-05-03 11:17:26 -04:00
Adrian Wälchli e0c64f0ef6
Fix Adagrad optimizer not working with DDP/GPU (#7277)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-05-03 03:57:17 +05:30
thomas chaton 16d6c9828d
[bugfix] Apex never instantiated. (#7274)
* update

* update

* update apex

* update

* update

* update

* remove test.py

* update

* update

* update on comments

* update changelog

* update

* update

* typo
2021-04-30 13:16:28 -04:00