Commit Graph

295 Commits

Author SHA1 Message Date
ananthsub aad86423f7
Remove more deprecated methods from base `Accelerator` class (#10448) 2021-11-10 12:58:24 +05:30
puhuk f9b9cdb0d1
Remove deprecated accelerator pass through functions in Accelerator (#10403)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-11-08 17:36:37 +00:00
Adrian Wälchli a270a79ed9
Rename "master" methods to "main" in ClusterEnvironment plugins (#10103)
* rename occurrences of master port, master address, maser node, master process

* rename properties

* add property decorators

* occurrences in docs

* update changelog

* update changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add lost method

* create deprecation

* add changelog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo (but it was already there!!!)

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* add todo

* update more occurences

* add types

* add missing import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-11-08 12:32:58 +00:00
Carlos Mocholí 9237106451
Clip before step (#10248) 2021-10-30 11:27:49 +01:00
Kaushik B cedaebfcbb
Add `auto_device_count` method to `Accelerators` (#10222)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-10-29 22:31:32 +02:00
Carlos Mocholí 81d15c5986
Implement double optimizer closure for hook structure consistency (#10167) 2021-10-29 13:03:04 +00:00
Carlos Mocholí 03f01fb5ec
Fix gradient norm tracking and gradient clipping (#9287)
* WIP

* Progress

* Undo test change

* Fix plugin closure execution order

* Update CHANGELOG

* Fix manual optimization on AMP and skipping backward

* Fix for deepspeed

* Typo

* Hook test for manual closure

* Add skipping test with AMP

* You are hideous, apex

* Add deepspeed test

* Update CHANGELOG

* Fix for broken master

* Add RunIf

* FIXMEs

* Rename

* Fix grad norm

* add a simple test

* update test

* update  test

* update test

* fix merge conflicts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sea of changes

* Undo change

* Introduce TPUPrecisionPlugin

* Undo changes

* Undo changes

* Resolve FIXME

* Undo change

* Undo change

* Undo change

* Fix FIXMEs

* Fix FIXME

* Correct value

* Bad merge

* Fix circular imports

* WIP

* Fixing clipping

* Fixes

* Bad merge

* Move optimizer step and clipping into the `PrecisionPlugin`

* Fix AMP

* Update CHANGELOG

* Fix tests

* Underscore

* Progress

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove pre_optimizer_step

* Missed one

* Progress

* Progress

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FIXMEs

* Fix test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix test

* DeepSpeed warning. mypy

* Rename

* Finish tests

* Update CHANGELOG

* Dumb fixes

* accelerator=auto

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update on comments

* Use ClassifModule

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-28 15:23:27 +00:00
Carlos Mocholí 48b6292cf0
Move optimizer step and clipping into the `PrecisionPlugin` (#10143) 2021-10-26 17:26:26 +02:00
Rohit Gupta 93266e2c22
Avoid deprecated warnings from accelerator and checkpoint connector #10142 2021-10-26 14:10:30 +02:00
Carlos Mocholí b376799430
Minor fixes related to clipping (#10130)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-10-25 16:40:22 +00:00
Adrian Wälchli d41902883a
Update `optimizer_step` methods in accelerator and plugins (#10023)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-10-20 21:36:27 +01:00
Carlos Mocholí ef5a12212a
Isolate optimizer step logic to the `PrecisionPlugin` (#10029) 2021-10-20 15:43:08 +00:00
Carlos Mocholí e8beceb631
Add `TPUPrecisionPlugin` (#10020)
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-10-19 17:48:57 +00:00
Carlos Mocholí e5dfdf34f9
Avoid deprecation warning after #9901 (#9951) 2021-10-16 17:36:25 +01:00
four4fish a002f872ea
[2/n] Directly call TrainingTypePlugin APIs instead of going through the Accelerator (#9901)
Co-authored-by: tchaton <thomas@grid.ai>
2021-10-14 17:38:22 +02:00
Danielle Pintz 940b910d27
[2/4] Add DeviceStatsMonitor callback (#9712)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-10-13 18:29:36 +00:00
Rohit Gupta 4decbc0d95
Deprecate `dataloader_idx` from `on_train_batch_start/end` (#9816)
* deprecate hooks

* dep todo

* explicit

* Apply suggestions from code review

* Apply suggestions from code review

* code review

* base
2021-10-07 10:18:11 +00:00
Carlos Mocholí 0ddd6a8c19
Remove `_NATIVE_AMP_AVAILABLE` checks (#9747) 2021-09-29 15:34:26 +02:00
Carlos Mocholí 9ebfbbc349
Remove unused `post_optimizer_step` (#9746) 2021-09-29 13:09:22 +00:00
four4fish 15cd6ad45b
Call TrainingTypePlugin collective functions directly instead of going through the Accelerator (#9677)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-09-27 14:52:57 +02:00
Danielle Pintz ab069876cb
[1/4] Add get_device_stats to accelerator interface (#9586) 2021-09-26 21:09:16 -07:00
ananthsub 41e3be197f
Remove `call_configure_sharded_model` lifecycle property (#9612) 2021-09-24 03:57:53 +02:00
Aki Nitta f5608e90d6
Document exceptions in accelerators (#9558)
* Document exceptions in ipu.py

* Document exceptions in tpu.py

* Document exceptions in gpu.py
2021-09-18 15:14:08 +09:00
Carlos Mocholí b1ed1db089
Keep global step update in the loop (#8856) 2021-09-14 19:21:39 +05:30
Kaushik B b294c5760e
Fix type hint for filepath (#9434) 2021-09-10 21:38:54 +00:00
Danielle Pintz cc2ac02dd1
Move add_to_queue/get_from_queue to DDPSpawnPlugin (#9118)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-09-10 20:58:02 +00:00
Carlos Mocholí 3070a9ea6e
Fix hiddens type annotation (#9377) 2021-09-09 08:45:52 +01:00
Jirka Borovec 6e124e7207
CI: precommit - docformatter (#8584)
* CI: precommit - docformatter
* fix deprecated

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-09-06 12:49:09 +00:00
four4fish f01a9a6cd2
Remove `BasePlugin` (#9066)
* Remove BasePlugin

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-08-25 19:10:28 +00:00
Sean Naren bac8b1be81
Add support for CPU AMP autocast (#9084) 2021-08-25 12:18:00 +00:00
four4fish c912ebf889
Remove TrainingTypePlugin.on_save and Accelerator.on_save (#9023)
* Remove TrainingTypePlugin.on_save and Accelerator.on_save
2021-08-23 10:11:00 -07:00
ananthsub 8a931732ae
Remove unused `on_train_epoch_end` hook in accelerator (#9035) 2021-08-23 00:20:10 +05:30
four4fish 13e64e6a80
Remove deprecated functions from accelerator.py (#9019) 2021-08-22 00:25:42 +02:00
Carlos Mocholí d0efb55b0f
Delete `TrainingEpochLoop._dataloader_idx` which always equals 0 (#8911) 2021-08-16 13:34:42 +02:00
Carlos Mocholí 93ab24d1ee
Replace DataLoader sampler once for IPUs (#8858) 2021-08-16 11:28:05 +02:00
Carlos Mocholí ed13040729
Connect the model to the training type plugin at the start of run (#8536) 2021-08-04 17:43:34 +02:00
Caleb Robinson 9ca02f58ae
Fix an import deprecation warning (#8687)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-08-03 22:17:28 +00:00
Jirka Borovec f67892ea96
CI: yesqa (#8564)
* add yesqa
* fix flake8

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-08-02 16:05:56 +00:00
Sean Naren 07b7dc9c17
[Fix] Add delay property for checkpointing, refactor loading checkpoint (DeepSpeed Checkpointing Fix 1/n) (#8627)
* Add property to delay checkpointing, move loading checkpoint file into the run function to allow deepspeed engine to be loaded

* Add a small test

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update pytorch_lightning/accelerators/accelerator.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Address review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-30 11:31:08 +01:00
Santiago Castro b256d6acd3
Avoid unnecessary list creation (#8595) 2021-07-28 13:36:45 +05:30
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
thomas chaton c9af1a7aec
[bugfix] Reduce memory leaks (#8490)
* reduce memory leak

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update changelog

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* resolve flake8

* update on comments

* resolve bug

* update

* Undo whitespace changes

* remove bug

* resolve flake8

* revert change

* update on comments

* delete the ddp wrapper as it hold memory

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve flake8

* update on comments

* update changelog

* resolve test

* Update CHANGELOG

* Refactor teardown

* Fix comment

* Do it for non-gpu too

* remove ref when the model is not a lightning_module

* Fix import error

* move down

* resolve bug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolve assignement

* update

* move above

* Fix device calls to support tpu training

* Updat todo

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Kaushik B <kaushikbokka@gmail.com>
2021-07-21 11:37:05 +02:00
Carlos Mocholí 6ce77a102b
Set minimum PyTorch version to 1.6 (#8288)
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-07-13 17:12:49 +00:00
Carlos Mocholí c5a120ed9d
Update to Mypy>0.9 (#8386) 2021-07-13 08:23:36 +02:00
Carlos Mocholí eb6d991218
Refactor plugins backward (#8328) 2021-07-08 16:02:09 +02:00
Adrian Wälchli d73c32ab51
move `torch.cuda.set_device()` to enable collective calls earlier in setup (#8312) 2021-07-07 13:15:41 +02:00
Adrian Wälchli ea5cfd2005
move batch to device before sending it to hooks (#7378)
* update train step

* test

* x

* limits

* val

* typeo

* x

* x

* step

* min gpus

* run all loops

* x

* limit test

* profiler

* clean up accelerator code

* move files

* rename

* move tests

* changelog

* reorder callbacks and model hooks

* add test description

* replace unneccessary method

* fix chlog

* adjust batch_to_device for DP Plugin

* update tests for dataloader idx

* unused imports

* hook change

* switch None

* clear memory

* change to None

* None

* None

* memory savings

* remove redundant todo

* hack

* cheat

* Revert "cheat"

This reverts commit a8433bd0b4.

* Revert "hack"

This reverts commit 43a6d1edeb.

* update new epoch loop

* remove from old loop code

* update chlog

* update hook test

* changelog

* teardown

* integrate changes in new eval loop

* fix hook calls

* add prediction step

* bad merge

* Revert "bad merge"

This reverts commit 488080863c.

* fix train batch hook test

* rm -rf _notebooks

* update chlog

* release memory

* fix type

* notebooks mess

* debug

* Revert "debug"

This reverts commit eec4ee2f77.

* teardown

* fix teardown bug

* debug

* x

* debug

* Revert "debug"

This reverts commit a6e6101946.

Revert "debug"

This reverts commit 5ddeaec069.

debug


debug


Revert "debug"

This reverts commit 605be746f7daedf265b2c05a1c153ce543394435.

Revert "Revert "debug""

This reverts commit a7612d5410409ed886cfb609457349ecf44cbfa8.

debug


x


x


x


s


tol


x


tol

* Fix changelog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-07-05 09:31:39 +01:00
Carlos Mocholí 74eb6cc7e9
Clean `cuda.empty_cache` usage (#8199) 2021-06-30 13:04:24 +02:00
deepsource-autofix[bot] 03154eb30a
Refactor unnecessary `else` / `elif` when `if` block has a `return` statement (#8156)
Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2021-06-28 15:27:41 +05:30
Carlos Mocholí 4d9b72b8a9
Nuke RPC (#8101) 2021-06-23 18:31:13 +00:00