added single gpu train doc

This commit is contained in:
William Falcon 2019-08-07 14:14:23 -04:00
parent 73b50abb57
commit cca6d2c65d
1 changed files with 13 additions and 0 deletions

View File

@ -23,6 +23,19 @@ have configuration issues depending on your cluster.
For a deeper understanding of what lightning is doing, feel free to read [this guide](https://medium.com/@_willfalcon/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565).
---
#### Distributed and 16-bit precision.
Due to an issue with apex and DistributedDataParallel (PyTorch and NVIDIA issue), Lightning does
not allow 16-bit and DP training. We tried to get this to work, but it's an issue on their end.
| 1 GPU | 1+ GPUs | DP | DDP | 16-bit | command |
|---|---|---|---|---|---|
| Y | | | | Y | ```Trainer(gpus=[0])``` |
| | Y | Y | | | ```Trainer(gpus=[0, ...])``` |
| | Y | | Y | | ```Trainer(gpus=[0, ...], distributed_backend='ddp')``` |
| | Y | | Y | Y | ```Trainer(gpus=[0, ...], distributed_backend='ddp', use_amp=True)``` |
---
#### CUDA flags
CUDA flags make certain GPUs visible to your script.