Commit Graph

106 Commits

Author SHA1 Message Date
Adrian Wälchli 8af85eeaaf
Ignore _notebooks when running flake8 (#13990) 2022-08-03 06:22:47 -04:00
Jirka Borovec 558658abb0
setup: set default metadata (#13571) 2022-07-11 17:47:06 -04:00
Mansy 767fe6eefa
Add CI for app examples (#13495)
* Add CI for app examples

Co-authored-by: manskx <mansy@lightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
2022-07-02 05:05:16 +00:00
Jirka Borovec bb51e2a55b
Merge pull request #12723 from PyTorchLightning/req/strategies
Separate strategies' requirements
2022-05-04 10:06:02 -04:00
puhuk 663216fe3e
Remove pytorch_lightning.core.memory.get_gpu_memory_map (#12644)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2022-04-12 15:55:56 +00:00
Carlos Mocholí 5f920dc088
Refactor Horovod NCCL check (#11948) 2022-02-28 10:45:32 +00:00
ananthsub 8d23f6287a
Update module path for `LightningDeprecationWarning` in setup.cfg (#11793) 2022-02-10 08:59:32 +05:30
Aki Nitta 3c3ba39a06
Tests: Fail on FutureWarning (#11541) 2022-01-20 12:52:34 +00:00
Carlos Mocholí 5d9df39b1c
Remove legacy configurations (#10829) 2021-11-30 13:35:55 +00:00
Carlos Mocholí 3d2d0f2536
MANIFEST.in and setup.py clean-up (#7614) 2021-11-19 15:38:42 +01:00
Carlos Mocholí 0fa07da987
Fail the test when a `DeprecationWarning` is raised (#9940) 2021-11-17 23:41:50 +01:00
Aki Nitta 7ad0ac5509
Exclude docs/ from default pytest dirs (#10078)
Co-authored-by: tchaton <thomas@grid.ai>
2021-11-01 16:30:23 +00:00
Aki Nitta 8f14e77d76
Make pytest not run .github/* (#10012) 2021-10-19 09:57:29 +02:00
Adrian Wälchli b054d21493
disable warnings summary for pytest (#9743) 2021-09-30 10:51:56 +02:00
Carlos Mocholí 32003159f0
Remove legacy pytest markers (#9761) 2021-09-29 17:08:26 +00:00
four4fish ddf6967421
Deprecate LightningDistributed and keep logic in ddp/ddpSpawn directly (#9691)
* Deprecate LightningDistributed and keep logic in ddp/ddpSpawn directly

* Deprecate LightningDistributed and keep logic in ddp/ddpSpawn directly

* Deprecate LightningDistributed and keep logic in ddp/ddpSpawn directly

* Deprecate LightningDistributed and keep logic in ddp/ddpSpawn directly

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Deprecate LightningDistributed and keep logic in ddp/ddpSpawn directly

* Update pytorch_lightning/distributed/dist.py

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Deprecate LightningDistributed and keep logic in ddp/ddpSpawn directly

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Apply suggestions from code review

* Apply suggestions from code review

* Deprecate LightningDistributed and keep logic in ddp/ddpSpawn directly

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
2021-09-25 08:39:15 -07:00
Carlos Mocholí a64cc37394
Replace `yapf` with `black` (#7783)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-26 13:37:35 +02:00
Carlos Mocholí c5a120ed9d
Update to Mypy>0.9 (#8386) 2021-07-13 08:23:36 +02:00
Daniel Stancl 91d98c8345
Fix mypy in utilities.device_dtype_mixin (#8127) 2021-07-12 18:56:06 +02:00
Daniel Stancl 667def8d89
Fix mypy in `utilities.parsing` (#8132) 2021-07-07 23:32:12 +00:00
Daniel Stancl 34efadd5b8
Fix mypy in `utilities.device_parser` (#8136)
* Fix mypy for utilities.device_parser

* Fix remaining mypy issues + disable ignoring mypy errors

* Return one Optional type annotation back

* Fix annotation for the parse_tpu_cores method

* Remove unused import

* Include carmocca's suggestion and fix mypy issue

* include carmocca's suggestion

* add `else` statement to `parse_gpu_ids` to inform mypy `gpus` is a type of `List[int]`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-07-06 10:39:57 +02:00
Adrian Wälchli 4dc08e4035
Loop Refactor 6/N - Remove Old Predict Loop (#8094) 2021-06-23 14:05:06 +02:00
Adrian Wälchli a45ab00b30
Loop Refactor 5/N - Prediction Loop (#7700)
* integrate d180bb2

* Minor changes

* Refactor loop logic into logger connector

* Refactor test

* Tighter fx validator

* Add back split idx

* Typing

* update

* Conflict

* Fix tests

* resolve grad_norm

* update

* move to train loop

* Bye grad_norm_dict parameter

* Fix sync test

* update

* Fix bug when validation is run mid epoch

* fix grad_norm_dict test

* Fix fx_validator test

* fix grad_norm_dict test

* Fix order bug

* Detach tensors in test

* resolve some tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove pdb

* resolve flake8

* Update test

* more tests

* Revert last thomas' changes

* resolve 1 test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor context restoration

* integrate latest changes from logger connector refactor poc

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* integrate latest changes from logger connector refactor poc

* Minor changes

* update changelog

* Remove unused argument

* Update CHANGELOG

* Copy call_hook changes

* Docs

* Fix ref

* move to cpu

* Bad merge

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove pdb

* remove pdb

* Refactor to

* Avoid partial

* trigger ci

* Bad merge

* integrate latest logger connector changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove grad norm dicts list

* Diff

* properties first

* Bad merge

* Reuse metrics_to_scalars

* Use active loop

* Move to device

* resolve test

* integrate latest changes from logger connector poc

* define union

* define union

* Update logger connector

* Update result

* Update imports

* Update after rename

* Refactor reduce_fx and op

* Fix test after rename

* mypy

* integrate latest logger connector refactor poc changes

* Fix test

* Refactor test

* Deprecate `self.log(sync_dist_op)` in favor of `self.log(reduce_fx)`

* Undo field

* add redundant return

* rename

rename files and classes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename

* Replace code

* Fix names and imports

* Remove metric_attribute

* imports

* loop hygiene

* yapf on loops

* protected new loop trigger

* rename NEW LOOP guard

* integrate latest logger connector changes

* integrate latest logger connector changes (eval loop)

* resolve todo dataloading reset

* re-add notebooks

* add missing init

* bad merge

* remove NEW_LOOP guard

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* flake8

* exclude coverage


coverage

* integrate #7917, remove teardown from training loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update "accumulated_batches_reached" condition

 based on if iter count was updated  or not

* remove public loop properties

* make skip backward protected again

* typing base loop

* typing fit loop

* typing training_batch_loop

* typing evaluation loop

* typing prediction loop

* typing training epoch loop

* dataloader_loop

* evaluation_dataloader_loop

* prediction_dataloader_loop

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* integrate train loop changes from master

* integrate eval loop changes from master

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tpipes moving model to cpu and leaving it there.

* don't reset fit loop


don't reset fit loop

* fix test iteration count <-> batch_idx reset

* replace torch.Tensor -> Tensor

* fix attribute error to block_ddp_sync_behaviour

* fix flake8 and yapf conflict

* remove redundant override

* add classes

Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* trainer changes

* connect

* clean up

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update test renaming

* rename evaluation loop to evaluation epoch loop

* minor docstring improvements

* update chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try ci fix

* update code owners for pl/loops

* update mock path

* re-order

* simplify dataloader reset

* simplify get_dataloaders()

* save predictions on_run_end()

* improve skip condition re-routing

* re-order

* remove unused type import

* check which assert is failing

* pig

* hobbit

* teardown for evaluation

* Revert "hobbit"

This reverts commit e81b0dbee3.

* Revert "pig"

This reverts commit 33d89e0720.

* Revert "check which assert is failing"

This reverts commit b7483b425c.

* free memory in fit loop teardown

* update docstring

* period

* remove dead code

* else carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/dataloader/evaluation_dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update chlog

* unused imp

* move default construction in run_evaluation

* add something for lawyer to read

* switch typehint for eval loop trainer property

* add missing imports

* remove a todo that needs more discussion

* combine _get_num_dataloaders with the property

* Update pytorch_lightning/loops/dataloader/dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* black + yapf

* avoid coverage on old unused eval loop

* empty space in docstring

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* resolve todo for args forwarding

* weekproxy trainer

* fix check for num dataloaders kwargs

* clean up num prediction dataloaders property

* free memory

* rm notebooks folder

* rm old file

* revert changes to old eval loop

* bad merge

* undo teardown

* setup signature

* remove file for notes

* free memory

* chlog

* Revert "weekproxy trainer"

This reverts commit d4e6969170.

* connect trainer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up max batches and dataloaders

* max batches handling

* no grad handling

* unused argument

* protected attrs

* unused imports

* undo unintentional rename

* consistent naming

* capitalization in docstring

* list all args

* Update pytorch_lightning/loops/prediction_epoch_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/prediction_epoch_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/dataloader/prediction_dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/dataloader/prediction_dataloader_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/loops/prediction_epoch_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-06-23 10:17:04 +01:00
Adrian Wälchli 9a64e534c7
Loop Refactor 4/N - Remove Old Evaluation Loop (#8056) 2021-06-22 11:57:37 +02:00
Adrian Wälchli 0d6dfd42d8
Merge pull request #7990 from PyTorchLightning/refactor/loops/loops_everywhere_eval
Loop Refactor 3/N - Evaluation Loop
2021-06-18 08:54:59 -04:00
Adrian Wälchli 341adad819
Loop Refactor 2/N - Remove Old Training Loop (#7985)
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-16 09:00:33 +01:00
Mauricio Villegas 0004216f2f
Easier configurability of callbacks that should always be present in LightningCLI (#7964)
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-06-16 02:03:37 +02:00
Adrian Wälchli 971908a1aa
Loop Refactor 1/N - Training Loop (#7871)
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Justus Schock <justus.schock@posteo.de>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
2021-06-15 12:55:06 +00:00
Carlos Mocholí b45a89a256
Clean-up after logger connector redesign 2/2 (#7631)
Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-06-10 12:09:01 +00:00
Carlos Mocholí dbea5bb710
Add typing to `ModelPruning` callback (#7529)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-19 22:01:42 +02:00
Jirka Borovec 298f9e5c2d
Prune deprecated utils modules (#7503)
* argparse_utils

* model_utils

* warning_utils

* xla_device_utils

* chlog

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2021-05-13 07:24:42 +00:00
Carlos Mocholí f29ecbfd90
Typing for accelerators and plugins (#7022) 2021-04-15 16:48:16 +00:00
Ethan Harris f645df5e9a
Add typings for evaluation_loop.py and remove some dead code (#7015) 2021-04-15 07:36:04 +00:00
Akihiro Nitta ac60536818
Follow E231 [flake8] (#6110)
* Remove E231 from ignore list

* Follow E231

* Update pytorch_lightning/trainer/data_loading.py
2021-03-24 12:50:50 +01:00
Jirka Borovec 64d0fa4472
update coverage config (#6524)
* update coverage config

* parallel

* parallel

* Apply suggestions from code review

* Apply suggestions from code review

* paralel

* paralel

* paralel

* combine

* combine

* .

* ..

* ..

* ..

* rev

* cb

* cb

* drop

* drop

* .

* ..

* ...

* ...

* ...

* .
2021-03-23 23:05:04 +01:00
Jirka Borovec 156847bea7
CI: resume testing with py3.8 (#6516)
* testing on python 3.8

* req
2021-03-15 12:07:23 +01:00
thomas chaton 0544efd453
[bug] Update broadcast + reduce decision ModelCheckpoint] (#6410)
* resolve bug

* update

* update changelog

* update PR

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* add todo

* resolve issues

* resolve flake8

* update

* add coverage for reduce

* wip

* restore back to brodbact

* remove test.py

* resolve flake8

* update

* check world size

* resolve test

* update

* use pytorch version when defined

* update on comments

* update on comments

* flake8

* resolve bugs

* Update CHANGELOG.md

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* update

* update

* update

* update

* remove test

* update

* resolve flake8

* update

* update

* update

* proxy

* update

* update

* resolve typo

* prune

* update parallel

* update

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-14 17:14:27 +00:00
Carlos Mocholí 8dabc30bfc
Run CI (#6402) 2021-03-08 08:12:33 +01:00
Adrian Wälchli ec8d46e02b
introduce default cluster environment for lightning-specific ddp (#5915)
* handle distributed_sampler_kwargs

* move emptying cache to accelertor

* fix a few tests

* restoring the result from subprocess

* fix queue.get() order for results

* add missing "block_backward_sync" context manager

* add missing "block_backward_sync" context manager

* fix sync_batchnorm

* fix supported gpu-ids for tuple

* fix clip gradients and inf recursion

* accelerator selection: added cluster_environment plugin

* fix torchelastic test

* fix reduce early stopping decision for DDP

* fix tests: callbacks, conversion to lightning optimizer

* fix lightning optimizer does not pickle

* fix setting benchmark and deterministic option

* fix slurm amp test

* fix prepare_data test and determine node_rank

* fix retrieving last path when testing

* remove obsolete plugin argument

* fix test: test_trainer_config

* fix torchscript tests

* fix trainer.model access

* move properties

* fix test_transfer_batch_hook

* fix auto_select_gpus

* fix omegaconf test

* fix test that needs to simulate slurm ddp

* add horovod plugin

* fix test with named arguments

* clean up whitespace

* fix datamodules test

* remove old accelerators

* fix naming

* move old plugins

* move to plugins

* create precision subpackage

* create training_type subpackage

* fix all new import errors

* fix wrong arguments order passed to test

* fix LR finder

* Added sharded training type and amp plugin

* Move clip grad to precision plugin

* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically

* Fix import issue, attempting to fix tests

* Fix initial test

* Reflect hook logic from master, should wrap model after move to device

* Optional state consolidation, since master has optimizers not wrapped

* change attribute for instance test

* reset optimizers

optimizers are not used in main process, so state would be wrong.

* legacy

* imports in accel

* legacy2

* trainer imports

* fix import errors after rebase

* move hook to new setup location

* provide unwrapping logic

* fix trainer callback system

* added ddp2 implementation

* fix imports .legacy

* move plugins

* restore legacy

* drop test.py from root

* add tpu accelerator and plugins

* fixes

* fix lightning optimizer merge

* reset bugreportmodel

* unwrapping

* step routing forward

* model access

* unwrap

* opt

* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* manual optimization

* update optimizer routing

* add rank to torchelastic

* fix memory mixed precision

* setstate on trainer for pickling in ddp spawn

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* abstract the cluster plugins

* default plugin

* integrate default environment

* fix property

* adapt tests

* adjust test

* fix world size access

* base cluster env

* revert rebase errors

* revert rebase errors

* missing import

* revert unrelated change

* remove unused cluster local rank

* remove unrelated changes

* fix unrelated changes

* fix pep8

* remove unused var

* reset permissions

* ypaf

* test default environment

* test torchelastic environment

* world  size as int

* tests for slurm environment

* changelog

* test comments

* remove unintended change

* keep master port fixed after it is generated

* test random master port

* yapf

* add missing default environment

* move helper function

* rename default environment

* rename

* rename

* yapf

* Update pytorch_lightning/plugins/environments/lightning_environment.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update CHANGELOG.md

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* spawn -> create

Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-03-05 01:47:29 +00:00
Jirka Borovec e038e747a0
hotfix for PT1.6 and torchtext (#6323)
* ci: azure reinstall torchtext

* move

* todos

* 0.6.0

* skip examples

* formatter

* skip

* todo

* Apply suggestions from code review
2021-03-04 17:48:17 +01:00
Justus Schock 0647340f3b
Add mypy typing to precision plugins. (#6149)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
2021-02-26 14:27:16 +01:00
Justus Schock 3ed8ef8af9
type accelerators (#6148) 2021-02-25 06:42:23 +00:00
Adrian Wälchli fc9bb53e13
fix flake8 for new plugins (#5951)
* flake8

* fix cyclic import

* isort
2021-02-18 18:28:23 +00:00
Jirka Borovec 1c87f1f6cd
remove legacy plugins (#5950)
* remove legacy plugins

* imports

* formatting

* fix docs references

* fix cluster environment inheritance

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-02-16 19:20:58 +00:00
Jirka Borovec f83cca6107
formatting flake8 & isort (#5824)
* formatting

* isort

* make

* yapf

* isort
2021-02-05 18:33:12 -05:00
Carlos Mocholí aa03b73e60
Remove psf/black references (#5739)
* Update pyproject.toml

* Update setup.cfg

* Update test.txt

* Update CONTRIBUTING.md

* Update requirements/test.txt
2021-02-03 08:37:06 +00:00
Justus Schock b3ebc18bcb
Hardware specific parts of Accelerator Refactoring (#5719)
* add basic accelerator class.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* add cpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add gpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single device training

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu spawn

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* make on_colab_kaggle utility func

* add basic accelerator class.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* add cpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add gpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu accelerator

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single device training

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add single tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu spawn

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* make on_colab_kaggle utility func

* fixes

* move

* yapf

* .

* .

* .

* flake8

* sync accelerator connector changes from dev1.2

* changelog

* fix tpu handling

* tpu

* aval

* yapf

* Update pytorch_lightning/plugins/training_type/tpu_spawn.py

Co-authored-by: chaton <thomas@grid.ai>

* Update pytorch_lightning/accelerators/accelerator_connector.py

Co-authored-by: chaton <thomas@grid.ai>

* Update pytorch_lightning/plugins/training_type/tpu_spawn.py

Co-authored-by: chaton <thomas@grid.ai>

* Update tpu_spawn.py

* Update pytorch_lightning/accelerators/accelerator_connector.py

Co-authored-by: chaton <thomas@grid.ai>

* indentation

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: chaton <thomas@grid.ai>
2021-02-01 08:34:59 -05:00
Jirka Borovec 963c17b669
formatting 6/n: metrics (#5722)
* yapf metrics

* op
2021-02-01 09:24:07 +01:00
Justus Schock 069ae27cef
Accelerator Refactor: Precision Plugins (#5718)
* add basic accelerator class.
Co-Authored with @awaelchi

* add basic trainign type plugin.
Co-Authored with @awaelchi

* pep8

Co-authored-by: @awaelchi

* update copyright

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add apex_amp

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add mixed base class

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add native amp

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add native amp sharded

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add tpu bfloat

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add inits

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update precision_plugin.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-01-31 13:12:02 -05:00
Adrian Wälchli 692f77b8a7
Refactor LightningDataParallel (#5670)
* module

* fix model access

* scalar conversion

* refactor

* kwargs

* auto unsqueeze

* refactor code duplication

* clean up

* docs

* update dp docs

* changelog

* generalize test

* test

* rename

* warning cache

* isort

* unsqueezing test

* device

* device

* scalar test

* device

* device

* include coverage of overrides

* clear

* add deprecation test

* docs

* improve coverage

* increase coverage

* fix merge

* extend test

* rename base class

* mention the predict method in docs

* combine iteration over collection

* remove override

* move

* line

* Apply suggestions from code review

* fix running stage

* f401

* fix cyclic import

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-01-31 06:08:16 -05:00