Commit Graph

85 Commits

Author SHA1 Message Date
Adrian Wälchli 4dc08e4035
Loop Refactor 6/N - Remove Old Predict Loop (#8094) 2021-06-23 14:05:06 +02:00
Adrian Wälchli a45ab00b30
Loop Refactor 5/N - Prediction Loop (#7700)
* integrate d180bb2

* Minor changes

* Refactor loop logic into logger connector

* Refactor test

* Tighter fx validator

* Add back split idx

* Typing

* update

* Conflict

* Fix tests

* resolve grad_norm

* update

* move to train loop

* Bye grad_norm_dict parameter

* Fix sync test

* update

* Fix bug when validation is run mid epoch

* fix grad_norm_dict test

* Fix fx_validator test

* fix grad_norm_dict test

* Fix order bug

* Detach tensors in test

* resolve some tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove pdb

* resolve flake8

* Update test

* more tests

* Revert last thomas' changes

* resolve 1 test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor context restoration

* integrate latest changes from logger connector refactor poc

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* integrate latest changes from logger connector refactor poc

* Minor changes

* update changelog

* Remove unused argument

* Update CHANGELOG

* Copy call_hook changes

* Docs

* Fix ref

* move to cpu

* Bad merge

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove pdb

* remove pdb

* Refactor to

* Avoid partial

* trigger ci

* Bad merge

* integrate latest logger connector changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove grad norm dicts list

* Diff

* properties first

* Bad merge

* Reuse metrics_to_scalars

* Use active loop

* Move to device

* resolve test

* integrate latest changes from logger connector poc

* define union

* define union

* Update logger connector

* Update result

* Update imports

* Update after rename

* Refactor reduce_fx and op

* Fix test after rename

* mypy

* integrate latest logger connector refactor poc changes

* Fix test

* Refactor test

* Deprecate `self.log(sync_dist_op)` in favor of `self.log(reduce_fx)`

* Undo field

* add redundant return

* rename

rename files and classes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename

* Replace code

* Fix names and imports

* Remove metric_attribute

* imports

* loop hygiene

* yapf on loops

* protected new loop trigger

* rename NEW LOOP guard

* integrate latest logger connector changes

* integrate latest logger connector changes (eval loop)

* resolve todo dataloading reset

* re-add notebooks

* add missing init

* bad merge

* remove NEW_LOOP guard

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* flake8

* exclude coverage


coverage

* integrate #7917, remove teardown from training loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update "accumulated_batches_reached" condition

 based on if iter count was updated  or not

* remove public loop properties

* make skip backward protected again

* typing base loop

* typing fit loop

* typing training_batch_loop

* typing evaluation loop

* typing prediction loop

* typing training epoch loop

* dataloader_loop

* evaluation_dataloader_loop

* prediction_dataloader_loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* integrate train loop changes from master

* integrate eval loop changes from master

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tpipes moving model to cpu and leaving it there.

* don't reset fit loop


don't reset fit loop

* fix test iteration count <-> batch_idx reset

* replace torch.Tensor -> Tensor

* fix attribute error to block_ddp_sync_behaviour

* fix flake8 and yapf conflict

* remove redundant override

* add classes

Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* trainer changes

* connect

* clean up

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update test renaming

* rename evaluation loop to evaluation epoch loop

* minor docstring improvements

* update chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try ci fix

* update code owners for pl/loops

* update mock path

* re-order

* simplify dataloader reset

* simplify get_dataloaders()

* save predictions on_run_end()

* improve skip condition re-routing

* re-order

* remove unused type import

* check which assert is failing

* pig

* hobbit

* teardown for evaluation

* Revert "hobbit"

This reverts commit e81b0dbee3.

* Revert "pig"

This reverts commit 33d89e0720.

* Revert "check which assert is failing"

This reverts commit b7483b425c.

* free memory in fit loop teardown

* update docstring

* period

* remove dead code

* else carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/dataloader/evaluation_dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update chlog

* unused imp

* move default construction in run_evaluation

* add something for lawyer to read

* switch typehint for eval loop trainer property

* add missing imports

* remove a todo that needs more discussion

* combine _get_num_dataloaders with the property

* Update pytorch_lightning/loops/dataloader/dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* black + yapf

* avoid coverage on old unused eval loop

* empty space in docstring

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* resolve todo for args forwarding

* weekproxy trainer

* fix check for num dataloaders kwargs

* clean up num prediction dataloaders property

* free memory

* rm notebooks folder

* rm old file

* revert changes to old eval loop

* bad merge

* undo teardown

* setup signature

* remove file for notes

* free memory

* chlog

* Revert "weekproxy trainer"

This reverts commit d4e6969170.

* connect trainer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up max batches and dataloaders

* max batches handling

* no grad handling

* unused argument

* protected attrs

* unused imports

* undo unintentional rename

* consistent naming

* capitalization in docstring

* list all args

* Update pytorch_lightning/loops/prediction_epoch_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/prediction_epoch_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/dataloader/prediction_dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/dataloader/prediction_dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/prediction_epoch_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-06-23 10:17:04 +01:00
Adrian Wälchli 9a64e534c7
Loop Refactor 4/N - Remove Old Evaluation Loop (#8056) 2021-06-22 11:57:37 +02:00
Adrian Wälchli 0d6dfd42d8
Merge pull request #7990 from PyTorchLightning/refactor/loops/loops_everywhere_eval
Loop Refactor 3/N - Evaluation Loop
2021-06-18 08:54:59 -04:00
Adrian Wälchli 341adad819
Loop Refactor 2/N - Remove Old Training Loop (#7985)
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-16 09:00:33 +01:00
Mauricio Villegas 0004216f2f
Easier configurability of callbacks that should always be present in LightningCLI (#7964)
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-16 02:03:37 +02:00
Adrian Wälchli 971908a1aa
Loop Refactor 1/N - Training Loop (#7871)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-15 12:55:06 +00:00
Carlos Mocholí b45a89a256
Clean-up after logger connector redesign 2/2 (#7631)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-10 12:09:01 +00:00
Carlos Mocholí dbea5bb710
Add typing to `ModelPruning` callback (#7529)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-19 22:01:42 +02:00
Jirka Borovec 298f9e5c2d
Prune deprecated utils modules (#7503)
* argparse_utils

* model_utils

* warning_utils

* xla_device_utils

* chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-13 07:24:42 +00:00
Carlos Mocholí f29ecbfd90
Typing for accelerators and plugins (#7022) 2021-04-15 16:48:16 +00:00
Ethan Harris f645df5e9a
Add typings for evaluation_loop.py and remove some dead code (#7015) 2021-04-15 07:36:04 +00:00
Akihiro Nitta ac60536818
Follow E231 [flake8] (#6110)
* Remove E231 from ignore list

* Follow E231

* Update pytorch_lightning/trainer/data_loading.py
2021-03-24 12:50:50 +01:00
Jirka Borovec 64d0fa4472
update coverage config (#6524)
* update coverage config

* parallel

* parallel

* Apply suggestions from code review

* Apply suggestions from code review

* paralel

* paralel

* paralel

* combine

* combine

* .

* ..

* ..

* ..

* rev

* cb

* cb

* drop

* drop

* .

* ..

* ...

* ...

* ...

* .
2021-03-23 23:05:04 +01:00
Jirka Borovec 156847bea7
CI: resume testing with py3.8 (#6516)
* testing on python 3.8

* req
2021-03-15 12:07:23 +01:00
thomas chaton 0544efd453
[bug] Update broadcast + reduce decision ModelCheckpoint] (#6410)
* resolve bug

* update

* update changelog

* update PR

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* add todo

* resolve issues

* resolve flake8

* update

* add coverage for reduce

* wip

* restore back to brodbact

* remove test.py

* resolve flake8

* update

* check world size

* resolve test

* update

* use pytorch version when defined

* update on comments

* update on comments

* flake8

* resolve bugs

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update

* update

* update

* update

* remove test

* update

* resolve flake8

* update

* update

* update

* proxy

* update

* update

* resolve typo

* prune

* update parallel

* update

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-14 17:14:27 +00:00
Carlos Mocholí 8dabc30bfc
Run CI (#6402) 2021-03-08 08:12:33 +01:00
Adrian Wälchli ec8d46e02b
introduce default cluster environment for lightning-specific ddp (#5915)
* handle distributed_sampler_kwargs

* move emptying cache to accelertor

* fix a few tests

* restoring the result from subprocess

* fix queue.get() order for results

* add missing "block_backward_sync" context manager

* add missing "block_backward_sync" context manager

* fix sync_batchnorm

* fix supported gpu-ids for tuple

* fix clip gradients and inf recursion

* accelerator selection: added cluster_environment plugin

* fix torchelastic test

* fix reduce early stopping decision for DDP

* fix tests: callbacks, conversion to lightning optimizer

* fix lightning optimizer does not pickle

* fix setting benchmark and deterministic option

* fix slurm amp test

* fix prepare_data test and determine node_rank

* fix retrieving last path when testing

* remove obsolete plugin argument

* fix test: test_trainer_config

* fix torchscript tests

* fix trainer.model access

* move properties

* fix test_transfer_batch_hook

* fix auto_select_gpus

* fix omegaconf test

* fix test that needs to simulate slurm ddp

* add horovod plugin

* fix test with named arguments

* clean up whitespace

* fix datamodules test

* remove old accelerators

* fix naming

* move old plugins

* move to plugins

* create precision subpackage

* create training_type subpackage

* fix all new import errors

* fix wrong arguments order passed to test

* fix LR finder

* Added sharded training type and amp plugin

* Move clip grad to precision plugin

* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically

* Fix import issue, attempting to fix tests

* Fix initial test

* Reflect hook logic from master, should wrap model after move to device

* Optional state consolidation, since master has optimizers not wrapped

* change attribute for instance test

* reset optimizers

optimizers are not used in main process, so state would be wrong.

* legacy

* imports in accel

* legacy2

* trainer imports

* fix import errors after rebase

* move hook to new setup location

* provide unwrapping logic

* fix trainer callback system

* added ddp2 implementation

* fix imports .legacy

* move plugins

* restore legacy

* drop test.py from root

* add tpu accelerator and plugins

* fixes

* fix lightning optimizer merge

* reset bugreportmodel

* unwrapping

* step routing forward

* model access

* unwrap

* opt

* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* manual optimization

* update optimizer routing

* add rank to torchelastic

* fix memory mixed precision

* setstate on trainer for pickling in ddp spawn

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* abstract the cluster plugins

* default plugin

* integrate default environment

* fix property

* adapt tests

* adjust test

* fix world size access

* base cluster env

* revert rebase errors

* revert rebase errors

* missing import

* revert unrelated change

* remove unused cluster local rank

* remove unrelated changes

* fix unrelated changes

* fix pep8

* remove unused var

* reset permissions

* ypaf

* test default environment

* test torchelastic environment

* world  size as int

* tests for slurm environment

* changelog

* test comments

* remove unintended change

* keep master port fixed after it is generated

* test random master port

* yapf

* add missing default environment

* move helper function

* rename default environment

* rename

* rename

* yapf

* Update pytorch_lightning/plugins/environments/lightning_environment.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update CHANGELOG.md

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* spawn -> create

Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-05 01:47:29 +00:00
Jirka Borovec e038e747a0
hotfix for PT1.6 and torchtext (#6323)
* ci: azure reinstall torchtext

* move

* todos

* 0.6.0

* skip examples

* formatter

* skip

* todo

* Apply suggestions from code review
2021-03-04 17:48:17 +01:00
Justus Schock 0647340f3b
Add mypy typing to precision plugins. (#6149)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2021-02-26 14:27:16 +01:00
Justus Schock 3ed8ef8af9
type accelerators (#6148) 2021-02-25 06:42:23 +00:00
Adrian Wälchli fc9bb53e13
fix flake8 for new plugins (#5951)
* flake8

* fix cyclic import

* isort
2021-02-18 18:28:23 +00:00
Jirka Borovec 1c87f1f6cd
remove legacy plugins (#5950)
* remove legacy plugins

* imports

* formatting

* fix docs references

* fix cluster environment inheritance

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-02-16 19:20:58 +00:00
Jirka Borovec f83cca6107
formatting flake8 & isort (#5824)
* formatting

* isort

* make

* yapf

* isort
2021-02-05 18:33:12 -05:00
Carlos Mocholí aa03b73e60
Remove psf/black references (#5739)
* Update pyproject.toml

* Update setup.cfg

* Update test.txt

* Update CONTRIBUTING.md

* Update requirements/test.txt
2021-02-03 08:37:06 +00:00
Justus Schock b3ebc18bcb
Hardware specific parts of Accelerator Refactoring (#5719)
* add basic accelerator class.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* add cpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add gpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single device training

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu spawn

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* make on_colab_kaggle utility func

* add basic accelerator class.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* add cpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add gpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single device training

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu spawn

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* make on_colab_kaggle utility func

* fixes

* move

* yapf

* .

* .

* .

* flake8

* sync accelerator connector changes from dev1.2

* changelog

* fix tpu handling

* tpu

* aval

* yapf

* Update pytorch_lightning/plugins/training_type/tpu_spawn.py

Co-authored-by: chaton <thomas@grid.ai>

* Update pytorch_lightning/accelerators/accelerator_connector.py

Co-authored-by: chaton <thomas@grid.ai>

* Update pytorch_lightning/plugins/training_type/tpu_spawn.py

Co-authored-by: chaton <thomas@grid.ai>

* Update tpu_spawn.py

* Update pytorch_lightning/accelerators/accelerator_connector.py

Co-authored-by: chaton <thomas@grid.ai>

* indentation

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: chaton <thomas@grid.ai>
2021-02-01 08:34:59 -05:00
Jirka Borovec 963c17b669
formatting 6/n: metrics (#5722)
* yapf metrics

* op
2021-02-01 09:24:07 +01:00
Justus Schock 069ae27cef
Accelerator Refactor: Precision Plugins (#5718)
* add basic accelerator class.
Co-Authored with @awaelchi

* add basic trainign type plugin.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* update copyright

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add apex_amp

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add mixed base class

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add native amp

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add native amp sharded

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu bfloat

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add inits

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update precision_plugin.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-01-31 13:12:02 -05:00
Adrian Wälchli 692f77b8a7
Refactor LightningDataParallel (#5670)
* module

* fix model access

* scalar conversion

* refactor

* kwargs

* auto unsqueeze

* refactor code duplication

* clean up

* docs

* update dp docs

* changelog

* generalize test

* test

* rename

* warning cache

* isort

* unsqueezing test

* device

* device

* scalar test

* device

* device

* include coverage of overrides

* clear

* add deprecation test

* docs

* improve coverage

* increase coverage

* fix merge

* extend test

* rename base class

* mention the predict method in docs

* combine iteration over collection

* remove override

* move

* line

* Apply suggestions from code review

* fix running stage

* f401

* fix cyclic import

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-01-31 06:08:16 -05:00
Justus Schock 5d239ccd70
Base classes for accelerator refactoring (#5715)
* add basic accelerator class.
Co-Authored with @awaelchi

* Add base plugin class.
Co-authored with @awaelchi

* add basic trainign type plugin.
Co-Authored with @awaelchi

* add basic precision plugin.
Co-Authored with @awaelchi

* Add missing inits.
Co-authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* ignore  flake8

* coverage omit

* imports in init

* lost

* imports

* flake8

* .

* .

* chlog

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/plugins/training_type/training_type_plugin.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-01-30 14:55:28 -05:00
Jirka Borovec 99ea2a3b35
define Yapf config (#5591)
* define YAPF

* add check

* add check

* add temp ignore

* apply yapf

* ex
2021-01-27 21:58:33 -05:00
Jirka Borovec 7e2e874d95
Refactor: legacy accelerators and plugins (#5645)
* tests: legacy

* legacy: accel

* legacy: plug

* fix imports

* mypy

* flake8
2021-01-26 20:04:36 -05:00
Jirka Borovec cb58fdeb3d
fix: freeze mypy (#5634)
* update mypy for tests

* freeze
2021-01-24 20:09:08 -05:00
Alan Du 1c8ad3a94b Tighten up mypy config (#5237) 2021-01-05 09:58:37 +01:00
Jirka Borovec 0f36525e8f
fix/enable - check F401 (#5201)
* refactor - check F401

* missed

* fix
2020-12-21 10:15:04 +01:00
Jirka Borovec 35fd6e93c7
refactor - check E501 (#5200) 2020-12-21 14:23:09 +05:30
Jirka Borovec 6d2c564bc6
refactor - check F841 (#5202) 2020-12-21 11:10:55 +05:30
Jirka Borovec 2c11d96012
replace pyright by mypy (#5021)
* drop pyright & add mypy

* detail

* name

* fix

* flake8

* ver

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-09 10:57:11 +08:00
chaton c2e6e68c7e
optimizer clean up (#4658)
* add LightningOptimizer

* typo

* add mock closure

* typo

* remove logic in optimizer_step

* update

* update

* update

* desactivate LightningOptimizer for hovorod

* resolve flake

* typo

* check optimizer name

* change name

* added backward to LightningOptimizer

* remove use_lightning_optimizer

* move update

* simplify init

* resolve comments

* resolve bug

* update

* update

* resolve bugs

* resolve flake8

* set state

* work manual_optimizer_step

* add doc

* add enable_pl_optimizer

* make optimizer_step

* add make_optimizer_step

* add examples

* resolve test

* add test_optimizer_return_options_enable_pl_optimizer

* add enable_pl_optimizer=True

* update

* update tests

* resolve bugs

* update

* set Trainer to False

* update

* resolve bugs

* update

* remove from doc

* resolve bug

* typo

* update

* set to True

* simplification

* typo

* resolve horovod

* unwrap horovod

* remove Optimizer

* resolve horovod

* move logic to amp_backend

* doesn't seem to be pickable

* update

* add again

* resolve some bugs

* cleanup

* resolve bug with AMP

* change __repr__

* round at -12

* udpate

* update

* update

* remove from horovod

* typo

* add convert_to_lightning_optimizers in each accelerators

* typo

* forgot

* forgot a convert_to_lightning_optimizers

* update

* update

* update

* increase coverage

* update

* resolve flake8

* update

* remove useless code

* resolve comments + add support for LightningOptimizer base class

* resolve flake

* check optimizer get wrapped back

* resolve DDPSharded

* reduce code

* lightningoptimizer

* Update pytorch_lightning/core/optimizer.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/core/lightning.py

* remove reference to step function

* Apply suggestions from code review

* update on comments

* resolve

* Update CHANGELOG.md

* add back training_step in apex and native_amp

* rename optimizer_step

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-12-01 00:09:46 +00:00
Jirka Borovec bddc6cd77a
pytest default color (#4703)
* pytest default color

* time

Co-authored-by: chaton <thomas@grid.ai>
2020-11-18 10:53:44 +00:00
Nathan Painchaud 2d78d9b84a
CI: Added isort import check for the code on pull-request (#4242)
* added isort CI job and updated isort config

* changed CI check output from files to full diff

* added isort pre-commit hook

* Added missing first party and restricted files affected by isort

* Applied isort to root-level, docs and benchmarks

* Apply suggestions from code review

Co-authored-by: Nathan Painchaud <nathanpainchaud@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-13 22:57:46 +01:00
Nicki Skafte edcb6e49b9
Speedup of metric tests (#4122)
* speedup

* something working

* update the rest

* more desc

* recurse tests/metrics again

* pep8

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
2020-10-14 13:51:58 -04:00
Jirka Borovec 2c3b512c50
reverted "temporary drop metrics tests while speeding them up" and SKIP (#4115)
* Revert "temporary drop metrics tests while speeding them up (#4071)"

This reverts commit 86c70622fb.

* skip metrics tests

* skipping
2020-10-14 19:01:43 +02:00
William Falcon 5b645d713e
Covv1 (#4072)
* temporary drop metrics tests while speeding them up

* cov

* cov

* docs
2020-10-11 10:21:53 -04:00
William Falcon 2b255a3df4
ref: enable custom clusters (1/n) (#4048)
* enable cluster plugins

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices

* enable cluster plugins + test backend choices
2020-10-10 08:09:29 -04:00
Jirka Borovec c77073f040
skip files in coverage (#3944) 2020-10-07 12:37:01 -04:00
William Falcon f43028f3ae
added copyright notices (#3062) 2020-08-19 22:03:22 -04:00
Simon-Martin Schröder a5736d244f
Configure isort (#2136)
* Configure isort

* Fix whitespace

* Line length, make THIRDPARTY the default
2020-06-16 14:36:19 -04:00
Jirka Borovec 2674976f2c
remove deprecated API for v0.8 (#2073)
* remove deprecated API

* chlog

* times

* missed

* formatting check

* missing

* missing

* miss

* fix docs build error

* fix pep whitespace error

* docs

* wip

* amp_level

* amp_level

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-06-12 14:37:52 -04:00
kumuji fd7814d287
Added black formater for the code with code-checker on pull (#1610)
* black

Added throught black.toml other options are hard so far

No caching for black github action

Moved from black.toml to pyproject.toml

Exclude not only yml but also yaml

Update pyproject.toml

Co-authored-by: Thomas Johansen <thomasjo@gmail.com>

Update .github/workflows/code-formatting-check.yml

mergify

Remove formating check

E231 error ignoring because of black formating

Updated CONTRIBUTING to the master

* Update .github/workflows/code-formatting-check.yml

* Bump black to 19.10b0 version

* resolved incorrect merge of CONTRIBUTING,

Black skipping string normalization

* Minor fixes in CONTRIBUTING, two typos

* Update setup.cfg

* chlog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-06-03 18:23:14 +02:00