From cca6d2c65dab5dd7e3c415f159faddd3432d035e Mon Sep 17 00:00:00 2001 From: William Falcon Date: Wed, 7 Aug 2019 14:14:23 -0400 Subject: [PATCH] added single gpu train doc --- docs/Trainer/Distributed training.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/Trainer/Distributed training.md b/docs/Trainer/Distributed training.md index cafc719df0..b59f26c890 100644 --- a/docs/Trainer/Distributed training.md +++ b/docs/Trainer/Distributed training.md @@ -23,6 +23,19 @@ have configuration issues depending on your cluster. For a deeper understanding of what lightning is doing, feel free to read [this guide](https://medium.com/@_willfalcon/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565). +--- +#### Distributed and 16-bit precision. +Due to an issue with apex and DistributedDataParallel (PyTorch and NVIDIA issue), Lightning does +not allow 16-bit and DP training. We tried to get this to work, but it's an issue on their end. + +| 1 GPU | 1+ GPUs | DP | DDP | 16-bit | command | +|---|---|---|---|---|---| +| Y | | | | Y | ```Trainer(gpus=[0])``` | +| | Y | Y | | | ```Trainer(gpus=[0, ...])``` | +| | Y | | Y | | ```Trainer(gpus=[0, ...], distributed_backend='ddp')``` | +| | Y | | Y | Y | ```Trainer(gpus=[0, ...], distributed_backend='ddp', use_amp=True)``` | + + --- #### CUDA flags CUDA flags make certain GPUs visible to your script.