lightning/docs/Trainer/Distributed training.md

Lightning makes multi-gpu training and 16 bit training trivial.

*Note:*   
None of the flags below require changing anything about your lightningModel definition. 

---
#### 16-bit mixed precision
16 bit precision can cut your memory footprint by half. If using volta architecture GPUs it can give a dramatic training speed-up as well.    
First, install apex (if install fails, look [here](https://github.com/NVIDIA/apex)):
```bash
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
```

then set this use_amp to True.
``` {.python}
# DEFAULT
trainer = Trainer(amp_level='O2', use_amp=False)
```

---
#### Single-gpu
Make sure you're on a GPU machine. 
```python
# set these flags
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# DEFAULT
trainer = Trainer(gpus=[0])
```

---
#### multi-gpu 
Make sure you're on a GPU machine. You can set as many GPUs as you want.
In this setting, the model will run on all 8 GPUs at once using DataParallel under the hood.
```python
# set these flags
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"

# DEFAULT
trainer = Trainer(gpus=[0,1,2,3,4,5,6,7])
```

---
#### Multi-node
COMING SOON.

---
#### Self-balancing architecture
Here lightning distributes parts of your module across available GPUs to optimize for speed and memory.   

COMING SOON.
debugging and gpu guide 2019-06-27 18:22:00 +00:00			`Lightning makes multi-gpu training and 16 bit training trivial.`

			`Note:`
			`None of the flags below require changing anything about your lightningModel definition.`

			`---`
			`#### 16-bit mixed precision`
			`16 bit precision can cut your memory footprint by half. If using volta architecture GPUs it can give a dramatic training speed-up as well.`
			`First, install apex (if install fails, look [here](https://github.com/NVIDIA/apex)):`
			```bash
			`$ git clone https://github.com/NVIDIA/apex`
			`$ cd apex`
			`$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./`
			```

			`then set this use_amp to True.`
			``` {.python}
			`# DEFAULT`
			`trainer = Trainer(amp_level='O2', use_amp=False)`
			```

			`---`
			`#### Single-gpu`
			`Make sure you're on a GPU machine.`
			```python
			`# set these flags`
			`os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"`
			`os.environ["CUDA_VISIBLE_DEVICES"] = "0"`

			`# DEFAULT`
			`trainer = Trainer(gpus=[0])`
			```

			`---`
			`#### multi-gpu`
			`Make sure you're on a GPU machine. You can set as many GPUs as you want.`
			`In this setting, the model will run on all 8 GPUs at once using DataParallel under the hood.`
			```python
			`# set these flags`
			`os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"`
			`os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"`

			`# DEFAULT`
			`trainer = Trainer(gpus=[0,1,2,3,4,5,6,7])`
			```

			`---`
			`#### Multi-node`
			`COMING SOON.`

			`---`
			`#### Self-balancing architecture`
			`Here lightning distributes parts of your module across available GPUs to optimize for speed and memory.`

			`COMING SOON.`