lightning/docs/source/multiple_loaders.rst

107 lines
3.3 KiB
ReStructuredText
Raw Normal View History

doctest for .rst files (#1511) * add doctest to circleci * Revert "add doctest to circleci" This reverts commit c45b34ea911a81f87989f6c3a832b1e8d8c471c6. * Revert "Revert "add doctest to circleci"" This reverts commit 41fca97fdcfe1cf4f6bdb3bbba75d25fa3b11f70. * doctest docs rst files * Revert "doctest docs rst files" This reverts commit b4a2e83e3da5ed1909de500ec14b6b614527c07f. * doctest only rst * doctest debugging.rst * doctest apex * doctest callbacks * doctest early stopping * doctest for child modules * doctest experiment reporting * indentation * doctest fast training * doctest for hyperparams * doctests for lr_finder * doctests multi-gpu * more doctest * make doctest drone * fix label build error * update fast training * update invalid imports * fix problem with int device count * rebase stuff * wip * wip * wip * intro guide * add missing code block * circleci * logger import for doctest * test if doctest runs on drone * fix mnist download * also run install deps for building docs * install cmake * try sudo * hide output * try pip stuff * try to mock horovod * Tranfer -> Transfer * add torchvision to extras * revert pip stuff * mlflow file location * do not mock torch * torchvision * drone extra req. * try higher sphinx version * Revert "try higher sphinx version" This reverts commit 490ac28e46d6fd52352640dfdf0d765befa56988. * try coverage command * try coverage command * try undoc flag * newline * undo drone * report coverage * review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * remove torchvision from extras * skip tests only if torchvision not available * fix testoutput torchvision Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-05 02:16:54 +00:00
.. testsetup:: *
from pytorch_lightning.core.lightning import LightningModule
.. _multiple_loaders:
Multiple Datasets
=================
Lightning supports multiple dataloaders in a few ways.
1. Create a dataloader that iterates multiple datasets under the hood.
Add Support for multiple train loaders (#1959) * add support for wrong dtype in apply_func * apply loader resetting to possible collection of loaders * add combined loader iter class * integrate combined loader iter to training loop * fix imports * fix imports * finish supporters * add tests for supporters * add test for model with multiple loaders * fix trainer integration * fix instance check * Train loaders (#4032) * patch for issues discussed in #1959, encapsulating underlying datastructures returned from train_dataloader * update data_loading.py to it uses patch discussed in #1959 * rename class * Separate CombinedLoaderIterator into two classes, and update related tests. (#4606) * Fix the bugs after rebasing. * Add custom get_len for apply_to_collection * Refactor MultiIterator to be as CombinedLoaderIterator * To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py * Reload _loader_iters when calling __iter__ * Don't transform DataLoader to CombinedLoaderIterator when it's along * Updates test_fit_multiple_train_loaders for testing num_training_batches * Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format. * Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders * Update tests for supporters * Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders. * Fix pep8 issues * Add tests for train_loader_patch.py * Add descriptions to multiple_trainloader_mode * Remove unused variables * Add docstrings and typing * Add more tests for better converage * Remove unused commented codes * Add sampler property * Remove extract_dataset * Update typing * pep8 * Update train_loader_patch.py * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/supporters.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * reviewer comments * fix stupid import * add docs * add back line separator * fix line sep * pep8 * Apply suggestions from code review * fix * fix * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * flake8 Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box> Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com> Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-04 19:57:53 +00:00
2. In the training loop you can pass multiple loaders as a dict or list/tuple and lightning
will automatically combine the batches from different loaders.
3. In the validation and test loop you also have the option to return multiple dataloaders
which lightning will call sequentially.
----------
Multiple training dataloaders
-----------------------------
Add Support for multiple train loaders (#1959) * add support for wrong dtype in apply_func * apply loader resetting to possible collection of loaders * add combined loader iter class * integrate combined loader iter to training loop * fix imports * fix imports * finish supporters * add tests for supporters * add test for model with multiple loaders * fix trainer integration * fix instance check * Train loaders (#4032) * patch for issues discussed in #1959, encapsulating underlying datastructures returned from train_dataloader * update data_loading.py to it uses patch discussed in #1959 * rename class * Separate CombinedLoaderIterator into two classes, and update related tests. (#4606) * Fix the bugs after rebasing. * Add custom get_len for apply_to_collection * Refactor MultiIterator to be as CombinedLoaderIterator * To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py * Reload _loader_iters when calling __iter__ * Don't transform DataLoader to CombinedLoaderIterator when it's along * Updates test_fit_multiple_train_loaders for testing num_training_batches * Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format. * Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders * Update tests for supporters * Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders. * Fix pep8 issues * Add tests for train_loader_patch.py * Add descriptions to multiple_trainloader_mode * Remove unused variables * Add docstrings and typing * Add more tests for better converage * Remove unused commented codes * Add sampler property * Remove extract_dataset * Update typing * pep8 * Update train_loader_patch.py * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/supporters.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * reviewer comments * fix stupid import * add docs * add back line separator * fix line sep * pep8 * Apply suggestions from code review * fix * fix * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * flake8 Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box> Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com> Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-04 19:57:53 +00:00
For training, the usual way to use multiple dataloaders is to create a ``DataLoader`` class
which wraps your multiple dataloaders (this of course also works for testing and validation
dataloaders).
(`reference <https://discuss.pytorch.org/t/train-simultaneously-on-two-datasets/649/2>`_)
doctest for .rst files (#1511) * add doctest to circleci * Revert "add doctest to circleci" This reverts commit c45b34ea911a81f87989f6c3a832b1e8d8c471c6. * Revert "Revert "add doctest to circleci"" This reverts commit 41fca97fdcfe1cf4f6bdb3bbba75d25fa3b11f70. * doctest docs rst files * Revert "doctest docs rst files" This reverts commit b4a2e83e3da5ed1909de500ec14b6b614527c07f. * doctest only rst * doctest debugging.rst * doctest apex * doctest callbacks * doctest early stopping * doctest for child modules * doctest experiment reporting * indentation * doctest fast training * doctest for hyperparams * doctests for lr_finder * doctests multi-gpu * more doctest * make doctest drone * fix label build error * update fast training * update invalid imports * fix problem with int device count * rebase stuff * wip * wip * wip * intro guide * add missing code block * circleci * logger import for doctest * test if doctest runs on drone * fix mnist download * also run install deps for building docs * install cmake * try sudo * hide output * try pip stuff * try to mock horovod * Tranfer -> Transfer * add torchvision to extras * revert pip stuff * mlflow file location * do not mock torch * torchvision * drone extra req. * try higher sphinx version * Revert "try higher sphinx version" This reverts commit 490ac28e46d6fd52352640dfdf0d765befa56988. * try coverage command * try coverage command * try undoc flag * newline * undo drone * report coverage * review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * remove torchvision from extras * skip tests only if torchvision not available * fix testoutput torchvision Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-05 02:16:54 +00:00
.. testcode::
class ConcatDataset(torch.utils.data.Dataset):
def __init__(self, *datasets):
self.datasets = datasets
def __getitem__(self, i):
return tuple(d[i] for d in self.datasets)
def __len__(self):
return min(len(d) for d in self.datasets)
class LitModel(LightningModule):
doctest for .rst files (#1511) * add doctest to circleci * Revert "add doctest to circleci" This reverts commit c45b34ea911a81f87989f6c3a832b1e8d8c471c6. * Revert "Revert "add doctest to circleci"" This reverts commit 41fca97fdcfe1cf4f6bdb3bbba75d25fa3b11f70. * doctest docs rst files * Revert "doctest docs rst files" This reverts commit b4a2e83e3da5ed1909de500ec14b6b614527c07f. * doctest only rst * doctest debugging.rst * doctest apex * doctest callbacks * doctest early stopping * doctest for child modules * doctest experiment reporting * indentation * doctest fast training * doctest for hyperparams * doctests for lr_finder * doctests multi-gpu * more doctest * make doctest drone * fix label build error * update fast training * update invalid imports * fix problem with int device count * rebase stuff * wip * wip * wip * intro guide * add missing code block * circleci * logger import for doctest * test if doctest runs on drone * fix mnist download * also run install deps for building docs * install cmake * try sudo * hide output * try pip stuff * try to mock horovod * Tranfer -> Transfer * add torchvision to extras * revert pip stuff * mlflow file location * do not mock torch * torchvision * drone extra req. * try higher sphinx version * Revert "try higher sphinx version" This reverts commit 490ac28e46d6fd52352640dfdf0d765befa56988. * try coverage command * try coverage command * try undoc flag * newline * undo drone * report coverage * review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * remove torchvision from extras * skip tests only if torchvision not available * fix testoutput torchvision Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-05 02:16:54 +00:00
def train_dataloader(self):
concat_dataset = ConcatDataset(
datasets.ImageFolder(traindir_A),
datasets.ImageFolder(traindir_B)
)
loader = torch.utils.data.DataLoader(
concat_dataset,
batch_size=args.batch_size,
shuffle=True,
num_workers=args.workers,
pin_memory=True
)
return loader
def val_dataloader(self):
# SAME
doctest for .rst files (#1511) * add doctest to circleci * Revert "add doctest to circleci" This reverts commit c45b34ea911a81f87989f6c3a832b1e8d8c471c6. * Revert "Revert "add doctest to circleci"" This reverts commit 41fca97fdcfe1cf4f6bdb3bbba75d25fa3b11f70. * doctest docs rst files * Revert "doctest docs rst files" This reverts commit b4a2e83e3da5ed1909de500ec14b6b614527c07f. * doctest only rst * doctest debugging.rst * doctest apex * doctest callbacks * doctest early stopping * doctest for child modules * doctest experiment reporting * indentation * doctest fast training * doctest for hyperparams * doctests for lr_finder * doctests multi-gpu * more doctest * make doctest drone * fix label build error * update fast training * update invalid imports * fix problem with int device count * rebase stuff * wip * wip * wip * intro guide * add missing code block * circleci * logger import for doctest * test if doctest runs on drone * fix mnist download * also run install deps for building docs * install cmake * try sudo * hide output * try pip stuff * try to mock horovod * Tranfer -> Transfer * add torchvision to extras * revert pip stuff * mlflow file location * do not mock torch * torchvision * drone extra req. * try higher sphinx version * Revert "try higher sphinx version" This reverts commit 490ac28e46d6fd52352640dfdf0d765befa56988. * try coverage command * try coverage command * try undoc flag * newline * undo drone * report coverage * review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * remove torchvision from extras * skip tests only if torchvision not available * fix testoutput torchvision Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-05 02:16:54 +00:00
...
def test_dataloader(self):
# SAME
doctest for .rst files (#1511) * add doctest to circleci * Revert "add doctest to circleci" This reverts commit c45b34ea911a81f87989f6c3a832b1e8d8c471c6. * Revert "Revert "add doctest to circleci"" This reverts commit 41fca97fdcfe1cf4f6bdb3bbba75d25fa3b11f70. * doctest docs rst files * Revert "doctest docs rst files" This reverts commit b4a2e83e3da5ed1909de500ec14b6b614527c07f. * doctest only rst * doctest debugging.rst * doctest apex * doctest callbacks * doctest early stopping * doctest for child modules * doctest experiment reporting * indentation * doctest fast training * doctest for hyperparams * doctests for lr_finder * doctests multi-gpu * more doctest * make doctest drone * fix label build error * update fast training * update invalid imports * fix problem with int device count * rebase stuff * wip * wip * wip * intro guide * add missing code block * circleci * logger import for doctest * test if doctest runs on drone * fix mnist download * also run install deps for building docs * install cmake * try sudo * hide output * try pip stuff * try to mock horovod * Tranfer -> Transfer * add torchvision to extras * revert pip stuff * mlflow file location * do not mock torch * torchvision * drone extra req. * try higher sphinx version * Revert "try higher sphinx version" This reverts commit 490ac28e46d6fd52352640dfdf0d765befa56988. * try coverage command * try coverage command * try undoc flag * newline * undo drone * report coverage * review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * remove torchvision from extras * skip tests only if torchvision not available * fix testoutput torchvision Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-05 02:16:54 +00:00
...
Add Support for multiple train loaders (#1959) * add support for wrong dtype in apply_func * apply loader resetting to possible collection of loaders * add combined loader iter class * integrate combined loader iter to training loop * fix imports * fix imports * finish supporters * add tests for supporters * add test for model with multiple loaders * fix trainer integration * fix instance check * Train loaders (#4032) * patch for issues discussed in #1959, encapsulating underlying datastructures returned from train_dataloader * update data_loading.py to it uses patch discussed in #1959 * rename class * Separate CombinedLoaderIterator into two classes, and update related tests. (#4606) * Fix the bugs after rebasing. * Add custom get_len for apply_to_collection * Refactor MultiIterator to be as CombinedLoaderIterator * To get the right num_training_batches. Call the wrapper for multi trainloader in data_loading.py, instead of training_loop.py * Reload _loader_iters when calling __iter__ * Don't transform DataLoader to CombinedLoaderIterator when it's along * Updates test_fit_multiple_train_loaders for testing num_training_batches * Seperate CombinedLoaderIterator into CombinedLoaderIterator and CombinedDataLoader. Add CombinedDataset for unified DataLoader format. * Initialize CombinedDataLoader before calculating num_training_batches. Also updating self._worker_check for multiple loaders * Update tests for supporters * Update tests for multiple trainloaders. Add tests about few_workers for multiple loaders. * Fix pep8 issues * Add tests for train_loader_patch.py * Add descriptions to multiple_trainloader_mode * Remove unused variables * Add docstrings and typing * Add more tests for better converage * Remove unused commented codes * Add sampler property * Remove extract_dataset * Update typing * pep8 * Update train_loader_patch.py * Apply suggestions from code review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * Update pytorch_lightning/trainer/supporters.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * reviewer comments * fix stupid import * add docs * add back line separator * fix line sep * pep8 * Apply suggestions from code review * fix * fix * Apply suggestions from code review Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * Apply suggestions from code review Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> * flake8 Co-authored-by: Justus Schock <justusschock@justuss-mbp.fritz.box> Co-authored-by: Christofer Fransson <christofer_fransson@yahoo.com> Co-authored-by: YI-LIN SUNG <r06942076@ntu.edu.tw> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz>
2021-01-04 19:57:53 +00:00
However, with lightning you can also return multiple loaders and lightning will take care of batch combination.
For more details please have a look at :attr:`~pytorch_lightning.trainer.trainer.Trainer.multiple_trainloader_mode`
.. testcode::
class LitModel(LightningModule):
def train_dataloader(self):
loader_a = torch.utils.data.DataLoader(range(6), batch_size=4)
loader_b = torch.utils.data.DataLoader(range(15), batch_size=5)
# pass loaders as a dict. This will create batches like this:
# {'a': batch from loader_a, 'b': batch from loader_b}
loaders = {'a': loader_a,
'b': loader_b}
# OR:
# pass loaders as sequence. This will create batches like this:
# [batch from loader_a, batch from loader_b]
loaders = [loader_a, loader_b]
return loaders
----------
Test/Val dataloaders
--------------------
For validation and test dataloaders, lightning also gives you the additional
option of passing multiple dataloaders back from each call.
See the following for more details:
- :meth:`~pytorch_lightning.core.datamodule.LightningDataModule.val_dataloader`
- :meth:`~pytorch_lightning.core.datamodule.LightningDataModule.test_dataloader`
doctest for .rst files (#1511) * add doctest to circleci * Revert "add doctest to circleci" This reverts commit c45b34ea911a81f87989f6c3a832b1e8d8c471c6. * Revert "Revert "add doctest to circleci"" This reverts commit 41fca97fdcfe1cf4f6bdb3bbba75d25fa3b11f70. * doctest docs rst files * Revert "doctest docs rst files" This reverts commit b4a2e83e3da5ed1909de500ec14b6b614527c07f. * doctest only rst * doctest debugging.rst * doctest apex * doctest callbacks * doctest early stopping * doctest for child modules * doctest experiment reporting * indentation * doctest fast training * doctest for hyperparams * doctests for lr_finder * doctests multi-gpu * more doctest * make doctest drone * fix label build error * update fast training * update invalid imports * fix problem with int device count * rebase stuff * wip * wip * wip * intro guide * add missing code block * circleci * logger import for doctest * test if doctest runs on drone * fix mnist download * also run install deps for building docs * install cmake * try sudo * hide output * try pip stuff * try to mock horovod * Tranfer -> Transfer * add torchvision to extras * revert pip stuff * mlflow file location * do not mock torch * torchvision * drone extra req. * try higher sphinx version * Revert "try higher sphinx version" This reverts commit 490ac28e46d6fd52352640dfdf0d765befa56988. * try coverage command * try coverage command * try undoc flag * newline * undo drone * report coverage * review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * remove torchvision from extras * skip tests only if torchvision not available * fix testoutput torchvision Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-05-05 02:16:54 +00:00
.. testcode::
def val_dataloader(self):
loader_1 = Dataloader()
loader_2 = Dataloader()
return [loader_1, loader_2]