updated docs

This commit is contained in:
William Falcon 2019-07-21 08:29:12 -04:00
parent babaa088d7
commit d273271b4b
1 changed files with 33 additions and 0 deletions

View File

@ -3,6 +3,26 @@ Lightning makes multi-gpu training and 16 bit training trivial.
*Note:*
None of the flags below require changing anything about your lightningModel definition.
---
#### Choosing a backend
Lightning supports two backends. DataParallel and DistributedDataParallel. Both can be used for single-node multi-GPU training.
For multi-node training you must use DistributedDataParallel.
You can toggle between each mode by setting this flag.
``` {.python}
# DEFAULT uses DataParallel
trainer = Trainer(distributed_backend='dp')
# change to distributed data parallel
trainer = Trainer(distributed_backend='ddp')
```
If you request multiple nodes, the back-end will auto-switch to ddp.
We recommend you use DistributedDataparallel even for single-node multi-GPU training. It is MUCH faster than DP but *may*
have configuration issues depending on your cluster.
For a deeper understanding of what lightning is doing, feel free to read [this guide](https://medium.com/@_willfalcon/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565).
---
#### 16-bit mixed precision
16 bit precision can cut your memory footprint by half. If using volta architecture GPUs it can give a dramatic training speed-up as well.
@ -67,6 +87,19 @@ cluster.per_experiment_nb_gpus = 8
cluster.add_slurm_cmd(cmd='ntasks-per-node', value=8, comment='1 task per gpu')
```
Finally, make sure to add a distributed sampler to your dataset.
```python
# ie: this:
dataset = myDataset()
dataloader = Dataloader(dataset)
# becomes:
dataset = myDataset()
dist_sampler = torch.utils.data.distributed.DistributedSampler(dataset)
dataloader = Dataloader(dataset, sampler=dist_sampler)
```
---
#### Self-balancing architecture
Here lightning distributes parts of your module across available GPUs to optimize for speed and memory.