lightning/Distributed training.md at c396a4ca1172fb7f9a18313269a878943c004967

1.4 KiB

Raw Blame History

Lightning makes multi-gpu training and 16 bit training trivial.

Note:
None of the flags below require changing anything about your lightningModel definition.

16-bit mixed precision

16 bit precision can cut your memory footprint by half. If using volta architecture GPUs it can give a dramatic training speed-up as well.
First, install apex (if install fails, look here):

$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

then set this use_amp to True.

# DEFAULT
trainer = Trainer(amp_level='O2', use_amp=False)

Single-gpu

Make sure you're on a GPU machine.

# set these flags
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# DEFAULT
trainer = Trainer(gpus=[0])

multi-gpu

Make sure you're on a GPU machine. You can set as many GPUs as you want. In this setting, the model will run on all 8 GPUs at once using DataParallel under the hood.

# set these flags
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"

# DEFAULT
trainer = Trainer(gpus=[0,1,2,3,4,5,6,7])

Multi-node

COMING SOON.

Self-balancing architecture

Here lightning distributes parts of your module across available GPUs to optimize for speed and memory.