From d273271b4be4d778210635327ada4c7acc5d8372 Mon Sep 17 00:00:00 2001 From: William Falcon Date: Sun, 21 Jul 2019 08:29:12 -0400 Subject: [PATCH] updated docs --- docs/Trainer/Distributed training.md | 33 ++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/docs/Trainer/Distributed training.md b/docs/Trainer/Distributed training.md index 42f42d1bff..5a04252ba8 100644 --- a/docs/Trainer/Distributed training.md +++ b/docs/Trainer/Distributed training.md @@ -3,6 +3,26 @@ Lightning makes multi-gpu training and 16 bit training trivial. *Note:* None of the flags below require changing anything about your lightningModel definition. +--- +#### Choosing a backend +Lightning supports two backends. DataParallel and DistributedDataParallel. Both can be used for single-node multi-GPU training. +For multi-node training you must use DistributedDataParallel. + +You can toggle between each mode by setting this flag. +``` {.python} +# DEFAULT uses DataParallel +trainer = Trainer(distributed_backend='dp') + +# change to distributed data parallel +trainer = Trainer(distributed_backend='ddp') +``` + +If you request multiple nodes, the back-end will auto-switch to ddp. +We recommend you use DistributedDataparallel even for single-node multi-GPU training. It is MUCH faster than DP but *may* +have configuration issues depending on your cluster. + +For a deeper understanding of what lightning is doing, feel free to read [this guide](https://medium.com/@_willfalcon/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565). + --- #### 16-bit mixed precision 16 bit precision can cut your memory footprint by half. If using volta architecture GPUs it can give a dramatic training speed-up as well. @@ -67,6 +87,19 @@ cluster.per_experiment_nb_gpus = 8 cluster.add_slurm_cmd(cmd='ntasks-per-node', value=8, comment='1 task per gpu') ``` +Finally, make sure to add a distributed sampler to your dataset. + +```python +# ie: this: +dataset = myDataset() +dataloader = Dataloader(dataset) + +# becomes: +dataset = myDataset() +dist_sampler = torch.utils.data.distributed.DistributedSampler(dataset) +dataloader = Dataloader(dataset, sampler=dist_sampler) +``` + --- #### Self-balancing architecture Here lightning distributes parts of your module across available GPUs to optimize for speed and memory.