Commit Graph

102 Commits

Author SHA1 Message Date
Adrian Wälchli 6cc1a06078
rename accelerator_backend -> accelerator (#6034)
* rename accelerator backend

* rename new additions from master

* add proper deprecation

* pep8

* warning match

* add missing warning type
2021-02-18 15:54:12 +00:00
Sean Naren b7c2e0a80e
Trainer only references accelerator (#6039)
* Trainer only references accelerator where it can

* Move teardown to the trainer, as it is reponsible for the accelerator
2021-02-17 21:41:18 +01:00
chaton e982800b81
Add PredictLoop (#5752)
* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* add predict_loop

* manual optimization

* clean predictloop

* update optimizer routing

* add predict loop on new accelerator

* resolve a bug

* add rank to torchelastic

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* fix memory mixed precision

* update

* setstate on trainer for pickling in ddp spawn

* add predict_loop

* clean predictloop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* clean predictloop

* add predict loop on new accelerator

* resolve a bug

* add predict_loop

* add predict loop on new accelerator

* resolve a bug

* resolve tests

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* remove sanetize

* rename train to run_train

* remove useless hooks

* add misconfigurationException

* remove wrong naming

* resolve some legacy

* udpate docstring

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* Use property in connector for sampler (#5913)

* merge the import conflicts

* fix spawning of processes in slurm

* [wip] Fix some bugs for TPU [skip ci] (#5878)

* fixed for single tpu

* fixed spawn

* fixed spawn

* update

* update

* wip

* resolve bugs

* resolve bug

* update on comment

* removed decorator

* resolve comments

* set to 4

* update

* update

* need cleaning

* update

* update

* update

* resolve flake8

* resolve bugs

* exclude broadcast

* resolve bugs

* change test

* update

* update

* skip if meet fails

* properly raise trace

* update

* add catch

* wrap test

* resolve typo

* update

* typo

Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>

* resolve some tests

* update

* fix imports

* update

* resolve flake8

* update azure pipeline

* skip a sharded test on cpu that requires a gpu

* resolve tpus

* resolve bug

* resolve flake8

* update

* updat utils

* revert permission change on files

* suggestions from carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting changes

* remove incomplete comment

* Update pytorch_lightning/accelerators/__init__.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting change

* add types

* warn 1.7 ddp manual backward only if ddp kwarg unset

* yapf + isort

* pep8 unused imports

* fix cyclic import in docs

* Apply suggestions from code review

* typer in accelerator.py

* typo

* resolve flake8

* update code

* update

* Update pytorch_lightning/trainer/predict_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Update pytorch_lightning/trainer/predict_loop.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fix merge

* fix merge

* reset legacy accelerator

* add missing rename dispatch

* rename post traning

* update code

* resolved comments

* typo

* typo

* add flow description

* resolve comments

* update on comments

* update flow

* add backticks

* resolve tpu

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: justusschock <justus.schock@posteo.de>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-16 17:11:56 -05:00
Adrian Wälchli c912c4b729
remove legacy accelerators (#5949)
* remove legacy accelerators

* update imports

* formatting

Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-02-14 16:03:45 +00:00
Justus Schock da6dbc8d1d
PoC: Accelerator refactor (#5743)
* restoring the result from subprocess

* fix queue.get() order for results

* add missing "block_backward_sync" context manager

* add missing "block_backward_sync" context manager

* fix sync_batchnorm

* fix supported gpu-ids for tuple

* fix clip gradients and inf recursion

* accelerator selection: added cluster_environment plugin

* fix torchelastic test

* fix reduce early stopping decision for DDP

* fix tests: callbacks, conversion to lightning optimizer

* fix lightning optimizer does not pickle

* fix setting benchmark and deterministic option

* fix slurm amp test

* fix prepare_data test and determine node_rank

* fix retrieving last path when testing

* remove obsolete plugin argument

* fix test: test_trainer_config

* fix torchscript tests

* fix trainer.model access

* move properties

* fix test_transfer_batch_hook

* fix auto_select_gpus

* fix omegaconf test

* fix test that needs to simulate slurm ddp

* add horovod plugin

* fix test with named arguments

* clean up whitespace

* fix datamodules test

* remove old accelerators

* fix naming

* move old plugins

* move to plugins

* create precision subpackage

* create training_type subpackage

* fix all new import errors

* fix wrong arguments order passed to test

* fix LR finder

* Added sharded training type and amp plugin

* Move clip grad to precision plugin

* Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically

* Fix import issue, attempting to fix tests

* Fix initial test

* Reflect hook logic from master, should wrap model after move to device

* Optional state consolidation, since master has optimizers not wrapped

* change attribute for instance test

* reset optimizers

optimizers are not used in main process, so state would be wrong.

* legacy

* imports in accel

* legacy2

* trainer imports

* fix import errors after rebase

* move hook to new setup location

* provide unwrapping logic

* fix trainer callback system

* added ddp2 implementation

* fix imports .legacy

* move plugins

* restore legacy

* drop test.py from root

* add tpu accelerator and plugins

* fixes

* fix lightning optimizer merge

* reset bugreportmodel

* unwrapping

* step routing forward

* model access

* unwrap

* opt

* integrate distrib_type

* sync changes

* sync

* fixes

* add forgotten generators

* add missing logic

* update

* import

* missed imports

* import fixes

* isort

* mv f

* changelog

* format

* move helper to parallel plugin

* d

* add world size

* clean up

* duplicate

* activate ddp_sharded and tpu

* set nvidia flags

* remove unused colab var

* use_tpu <-> on_tpu attrs

* make some ddp_cpu and clusterplugin tests pass

* Ref/accelerator connector (#5742)

* final cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* connector cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* trainer cleanup

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* accelerator cleanup + missing logic in accelerator connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add missing changes to callbacks

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* reflect accelerator changes to lightning module

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* clean cluster envs

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* cleanup plugins

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add broadcasting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* yapf

* remove plugin connector

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* plugins

* manual optimization

* update optimizer routing

* add rank to torchelastic

* fix memory mixed precision

* setstate on trainer for pickling in ddp spawn

* add predict method

* add back commented accelerator code

* adapt test for sync_batch_norm to new plugin

* fix deprecated tests

* fix ddp cpu choice when no num_processes are given

* yapf format

* skip a memory test that cannot pass anymore

* fix pickle error in spawn plugin

* x

* avoid

* x

* fix cyclic import in docs build

* add support for sharded

* update typing

* add sharded and sharded_spawn to distributed types

* make unwrap model default

* refactor LightningShardedDataParallel similar to LightningDistributedDataParallel

* update sharded spawn to reflect changes

* update sharded to reflect changes

* Merge 1.1.5 changes

* fix merge

* fix merge

* yapf isort

* fix merge

* yapf isort

* fix indentation in test

* copy over reinit scheduler implementation from dev1.2

* fix apex tracking calls with dev_debugger

* reduce diff to dev1.2, clean up

* fix trainer config test  when gpus>0 and num_processes >0 and ddp_cpu

* sort plugin tests legacy/new

* fix error handling for amp on cpu

* fix merge


fix merge


fix merge

* [Feat] Resolve manual_backward (#5837)

* resolve manual_backward

* resolve flake8

* update

* resolve for ddp_spawn

* resolve flake8

* resolve flake8

* resolve flake8

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* fix tests/accelerator tests on cpu

* [BugFix] Resolve manual optimization (#5852)

* resolve manual_optimization

* update

* update

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (#5856)

* resovle a bug

* Accelerator refactor sharded rpc (#5854)

* rpc branch

* merge

* update handling of rpc

* make devices etc. Optional in RPC

* set devices etc. later if necessary

* remove devices from sequential

* make devices optional in rpc

* fix import

* uncomment everything

* fix cluster selection

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>

* resolve bug

* fix assert in rpc test

* resolve a test

* fix docs compilation

* accelerator refactor - fix for sharded parity test (#5866)

* fix memory issue with ddp_spawn

* x


x


x


x


x


x


x


x


x

* x

* Remove DDP2 as this does not apply

* Add missing pre optimizer hook to ensure lambda closure is called

* fix apex docstring

* [accelerator][BugFix] Resolve some test for 1 gpu (#5863)

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* update

* update

* revert init

* resolve a bug

* update

* resolve flake8

* update

* update

* update

* revert init

* update

* resolve flake8

* update

* update

* update

* update

* update

* all_gather

* update

* make plugins work, add misconfig for RPC

* update

* update

* remove breaking test

* resolve some tests

* resolve flake8

* revert to ddp_spawn

Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de>

* yapf isort

* resolve flake8

* fix apex doctests

* fix apex doctests 2

* resolve docs

* update drone

* clean env

* update

* update

* update

* update

* merge

* Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (#5881)

* Fix RPC related tests, clean out old API, update for new accelerator API

* Move tests out of legacy folder, update paths and names

* Update test_remove_1-4.py

* Expose properties for tpu cores/gpus/num_gpus

* Add root GPU property

* Move properties to properties.py

* move tests that were previously in drone

* Fix root GPU property (#5908)

* Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator

* Add missing tests back

* fix best model path transfer when no checkpoint callback available

* Fix setup hook order [wip] (#5858)

* Call trainer setup hook before accelerator setup

* Add test case

* add new test

* typo

* fix callback order in test

Co-authored-by: tchaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* rename ddp sequential -> rpc sequential for special test

* revert

* fix stupid merge problem

* Use property in connector for sampler (#5913)

* merge the import conflicts

* fix spawning of processes in slurm

* [wip] Fix some bugs for TPU [skip ci] (#5878)

* fixed for single tpu

* fixed spawn

* fixed spawn

* update

* update

* wip

* resolve bugs

* resolve bug

* update on comment

* removed decorator

* resolve comments

* set to 4

* update

* update

* need cleaning

* update

* update

* update

* resolve flake8

* resolve bugs

* exclude broadcast

* resolve bugs

* change test

* update

* update

* skip if meet fails

* properly raise trace

* update

* add catch

* wrap test

* resolve typo

* update

* typo

Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>

* resolve some tests

* update

* fix imports

* update

* resolve flake8

* update azure pipeline

* skip a sharded test on cpu that requires a gpu

* resolve tpus

* resolve bug

* resolve flake8

* update

* updat utils

* revert permission change on files

* suggestions from carlos

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting changes

* remove incomplete comment

* Update pytorch_lightning/accelerators/__init__.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* remove unrelated formatting change

* add types

* warn 1.7 ddp manual backward only if ddp kwarg unset

* yapf + isort

* pep8 unused imports

* fix cyclic import in docs

* Apply suggestions from code review

* typer in accelerator.py

* typo

* Apply suggestions from code review

* formatting

* update on comments

* update typo

* Update pytorch_lightning/trainer/properties.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update

* suggestion from code review

* suggestion from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: root <root@ip-172-31-88-60.ec2.internal>
Co-authored-by: Lezwon Castelino <lezwon@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-02-12 15:48:56 -05:00
Jirka Borovec b434c479e7
Quantisation (#5706)
* empty

* sq

* obs


* int

* ts

* helpers

* chlog

* yapf

* avg

* dupl

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fixes

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* fixes

* note

* warn

* 45

* link

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* yapf

* flake8

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-11 07:04:57 -05:00
Jirka Borovec f83cca6107
formatting flake8 & isort (#5824)
* formatting

* isort

* make

* yapf

* isort
2021-02-05 18:33:12 -05:00
chaton d8f2d8e15a
[Feat-BugFix] Resolve custom DataLoader (#5745)
* resolve custom dataloader

* update changelog

* fix tests

* update on comments

* resolve comments

* add support for custom batch_sampler

* Update tests/trainer/test_data_loading.py

* resolve test

* resolve flake8

* resolve yapf

Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-02-05 09:03:18 +00:00
Jirka Borovec aba212341a
formatting 4/n: Trainer (#5720)
* yapf trainer

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* .

* fix

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-02-03 09:25:42 +00:00
Jirka Borovec 7e2e874d95
Refactor: legacy accelerators and plugins (#5645)
* tests: legacy

* legacy: accel

* legacy: plug

* fix imports

* mypy

* flake8
2021-01-26 20:04:36 -05:00
Arnaud Gelas a9d9f33a86
Fix isort failures in trainer (#5529)
Remove from skipped module in pyproject.toml and fix failures on:
- pytorch_lightning/trainer/*.py
2021-01-18 13:42:50 -05:00
Jirka Borovec 54d20dc596
Refactor: clean trainer device & distrib getters (#5300)
* warnings

* .

* .

* flake8

* .

* .

* .

* use_tpu

* use_dp

* .

* use_ddp

* .

* use_horovod

* .

* .

* .
2021-01-12 05:22:37 -05:00
Justus Schock d88cf4a652
Add Support for multiple train loaders (#1959)
* add support for wrong dtype in apply_func

* apply loader resetting to possible collection of loaders

* add combined loader iter class

* integrate combined loader iter to training loop

* fix imports

* fix imports

* finish supporters

* add tests for supporters

* add test for model with multiple loaders

* fix trainer integration

* fix instance check

* Train loaders (#4032)

* patch for issues discussed in #1959, encapsulating underlying datastructures returned from train_dataloader

* update data_loading.py to it uses patch discussed in #1959

* rename class

* Separate CombinedLoaderIterator into two classes, and update related tests. (#4606)

* Fix the bugs after rebasing.

* Add custom get_len for apply_to_collection

* Refactor MultiIterator to be as CombinedLoaderIterator

* To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py

* Reload _loader_iters when calling __iter__

* Don't transform DataLoader to CombinedLoaderIterator when it's along

* Updates test_fit_multiple_train_loaders for testing num_training_batches

* Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format.

* Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders

* Update tests for supporters

* Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders.

* Fix pep8 issues

* Add tests for train_loader_patch.py

* Add descriptions to multiple_trainloader_mode

* Remove unused variables

* Add docstrings and typing

* Add more tests for better converage

* Remove unused commented codes

* Add sampler property

* Remove extract_dataset

* Update typing

* pep8

* Update train_loader_patch.py

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/supporters.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* reviewer comments

* fix stupid import

* add docs

* add back line separator

* fix line sep

* pep8

* Apply suggestions from code review

* fix

* fix

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>

* flake8

Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box>
Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com>
Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-04 19:57:53 +00:00
Jirka Borovec a884866ff0
Unify names in Utils (#5199)
* warnings

* argparse

* mutils

* xla device

* deprecated

* tests

* simple

* flake8

* fix

* flake8

* 1.4
2020-12-22 00:23:33 +01:00
Jirka Borovec 53d7c9555c
drop usage of deprecated distributed_backend (#5009)
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-12-09 09:18:23 +01:00
Sean Naren ee9b3fe574
[feat] pp 1/n (#5016)
* Added changes for RPC plugin

* Add missing kwargs

* Fix code format

* Loading refactors by introducing is_distributed var, fix optimizer step flow

* Add rpc guard

* Added docstrings and typing

* resolve comments

* Add additional rpc hook, refactor name of exit process hook for clarity

* remove annotation

* Modify behaviour to allow optional return, add test for rpc plugin

* resolve tests

* rename is_ddp_based

* update

* update for windows

* update

* resolve test

* code smell

* Revert back to init_ddp_connection for backwards compat

* Swap to explicit name for property

* Add missing speed parity increase for CI variability, fix call counts for child process

Co-authored-by: tchaton <thomas@grid.ai>
2020-12-08 22:02:10 +00:00
Jirka Borovec 3976db597d
refactor imports of optional dependencies (#4859)
* refactor imports of optional dependencies

* fix

* fix

* fix

* fix

* fix

* flake8

* flake8

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-12-04 10:26:10 +01:00
chaton 6ba77c2611
Merge branch 'master' into better_simple_profiler 2020-11-27 18:43:01 +00:00
Jirka Borovec 042152cd61
ref: fix & simplify test callback (#4009)
* simplify test callback

* update

* use mock

* flake8
2020-11-27 19:12:56 +01:00
tchaton 290d74b40e resolve test 2020-11-27 16:47:13 +00:00
chaton dee968f20b
[bug] Replace_sampler attach previous multiprocessing_context (#4742)
* resolve bug

* add test docstring

* Update tests/trainer/test_dataloaders.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update test

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-27 12:57:25 +00:00
Jirka Borovec 442d57f1e9
simplify imports xla / TPU (#4872)
* xla

* tpu

* fix

* fix

* flake8
2020-11-27 00:37:48 +01:00
Jirka Borovec 11e73ceaa6
fix import and typo in AMP (#4871)
* fix import and typo

* docs

* apex

* fix

* typo
2020-11-26 23:45:52 +01:00
ananthsub f6efb712ed
Skip replacing dataloader sampler if it's already a distributed sampler (#4273)
* Update data_loading.py

* Update data_loading.py

* add test + update flag description

* add to changelog

* Update test_dataloaders.py

* fix-pickle

* Update test_dataloaders.py

* Added missing reference calls

* Update tests/trainer/test_dataloaders.py

* Apply suggestions from code review

* Update data_loading.py

* Update test_dataloaders.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-10-23 17:34:07 +01:00
William Falcon 7ffe05a3d1
ref: accelerator names (#4066)
* ref: accelerator names

* docs
2020-10-11 01:05:14 -04:00
William Falcon 05e0b4e5a1
Revert "Remove limitation of batch scaler (#4006)" (#4040)
This reverts commit 7e756ca11f.
2020-10-09 21:03:23 -04:00
Nicki Skafte 7e756ca11f
Remove limitation of batch scaler (#4006)
* working code

* add tests

* fix scaling

* move patch dataloader to utils

* renaming

* fix tests

* add changelog

* update docs

* pep8
2020-10-09 14:53:01 -04:00
William Falcon 6044cf9003
Fixes #3945 (#3947) 2020-10-07 13:46:27 -04:00
William Falcon b922409624
clean and organize fit (#3938)
* clean and organize fit

* clean and organize fit

* clean and organize fit

* clean and organize fit

* clean and organize fit
2020-10-07 11:04:10 -04:00
William Falcon 575e01be82
tests for multiple optimizers and dataloader combinations (#3937)
* added tests for multiple optimizers and dataloaders

* added tests for multiple optimizers and dataloaders

* added tests for multiple optimizers and dataloaders
2020-10-07 10:13:57 -04:00
Lezwon Castelino 69833dad5b
Added check to verify xla device is TPU (#3274)
* tpu device check

* replaced with xmp spawn

* Revert "replaced with xmp spawn"

This reverts commit 6835380f

* replaced all instances of XLA_AVAILABLE

* moved inner_f to global scope

* made refactors

* added changelog

* added TPU_AVAILABLE variable

* fix codefactor issues

* removed form trainer and early stopping

* add TORCHXLA_AVAILABLE check

* added tests

* refactoring

* Update pytorch_lightning/utilities/xla_device_utils.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* updated function names

* fixed bug

* updated CHANGELOG.md

* added todo

* added type hints

* isort and black

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-10-06 19:54:37 +02:00
William Falcon d787208e76
Fixes #2792 (#3857) 2020-10-04 23:25:02 -04:00
Jirka Borovec 62eabdd535
revert backend types (#3788)
* revert backend types

* todo

* todo
2020-10-02 06:18:44 -04:00
William Falcon ac2b0f0f06
ref: continue #3733 (#3767)
* ref: #3733 part 2

* ref: #3733 part 2
2020-10-01 09:25:33 -04:00
Jirka Borovec 31a36f04df
define distributed as a type (#3740)
* define type

* miss

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* miss

* warn

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-09-30 08:33:01 -04:00
Adrian Wälchli 0b222fc6cf
fix warning (#3538) 2020-09-22 09:46:00 -04:00
Adrian Wälchli e6c7548b30
Fix overfit_batches > 0 on distributed_backend = "ddp" (#3534)
* example

* ex

* example

* sampler

* fix

* fix

* remove example

* changelog
2020-09-19 19:00:58 -04:00
Phil b5dc6998ae
Disable train dataloader shuffle when overfit_batches is active. (#3501)
* Disable train dataloader shuffle when overfit_batches is active.

* pep8

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-09-15 05:07:27 -04:00
William Falcon 541c4ab01d
ref: organize args 3/n (#3447)
* ref: organize args 2/n

* ref: organize args 2/n

* ref: organize args 2/n

* ref: organize args 2/n
2020-09-10 08:55:30 -04:00
William Falcon caf7893f27
ref: modular is_overridden (#3290)
* ref: modular is_overridden

* ref: modular is_overridden

* ref: modular is_overridden

* ref: modular is_overridden
2020-08-31 12:12:02 -04:00
Rohit Gupta 85cd558a3f
Follow up of #2892 (#3202)
* Follow up of #2892

* typo

* iterabledataset
2020-08-27 15:28:29 -04:00
William Falcon f3c63f7746
tests to ensure correct dataloader calls (#3221)
* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence

* tests to ensure correct dataloading interval and sequence
2020-08-27 09:49:46 -04:00
William Falcon f43028f3ae
added copyright notices (#3062) 2020-08-19 22:03:22 -04:00
Jirka Borovec a6e7aa7796
allow using apex with any PT version (#2865)
* wip

* setup

* type

* name

* wip

* docs

* imports

* fix if

* fix if

* use_amp

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* fix tests

* todos

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-08 11:07:32 +02:00
Rohit Gupta a642349228
Support limit_mode_batches (int) for infinite dataloader (#2840)
* Support limit_mode_batches(int) for infinite dataloader

* flake8

* revert and update

* add and update tests

* pep8

* chlog

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Add suggestions by @awaelchli

* docs

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Apply suggestions from code review

* fix

* max

* check

* add and update tests

* max

* check

* check

* check

* chlog

* tests

* update exception message

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-08-07 13:02:36 +02:00
William Falcon 5d0f0325d8
Revert "Support limit_mode_batches (int) for infinite dataloader" (#2839)
* Revert "Support limit_mode_batches (int) for infinite dataloader (#2787)"

This reverts commit de9c9f0864.

* Update training_tricks.py
2020-08-05 15:57:26 -04:00
Rohit Gupta de9c9f0864
Support limit_mode_batches (int) for infinite dataloader (#2787)
* Support limit_mode_batches(int) for infinite dataloader

* flake8

* revert and update

* add and update tests

* pep8

* chlog

* Update CHANGELOG.md

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Add suggestions by @awaelchli

* docs

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

* Apply suggestions from code review

* fix

* max

* check

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
2020-08-05 17:04:49 +00:00
Rohit Gupta 8baec1a191
Fix shuffle for distributed sampler (#2789)
* Fix shuffle for distributed sampler

* add test

* test

* chlog

* update test

* update test

* update test

* assertions via callback

* define callback outside for pickling

* skip ddp test on windows

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-08-01 23:22:57 -04:00
Nathan Raw 9076551aec
Enable val/test loop disabling + datamodule tests (#2692)
* 🎨 warn instead of error out on loaders

* 🐛 test misconfiguration should still fail

* 🚧 .

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

* updated docs with new result obj

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-07-25 12:57:40 -04:00
William Falcon 11069c8784
Fix ddp tests + .test() (#2512)
* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* fix deprecation warnings

* added base tests for tpu

* added base tests for tpu

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

* added base tests for tpu

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
2020-07-07 12:24:56 -04:00