From 9317fbfc259e820af49e3eeb323ba88b362adbfe Mon Sep 17 00:00:00 2001 From: Tobias Date: Tue, 5 Oct 2021 09:12:26 +0200 Subject: [PATCH] Make DDP and Horovod batch_size scaling examples explicit (#9813) Co-authored-by: Rohit Gupta --- docs/source/advanced/multi_gpu.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/source/advanced/multi_gpu.rst b/docs/source/advanced/multi_gpu.rst index 7497344bc6..ee689e1611 100644 --- a/docs/source/advanced/multi_gpu.rst +++ b/docs/source/advanced/multi_gpu.rst @@ -611,16 +611,17 @@ Let's say you have a batch size of 7 in your dataloader. def train_dataloader(self): return Dataset(..., batch_size=7) -In (DDP, Horovod) your effective batch size will be 7 * gpus * num_nodes. +In DDP or Horovod your effective batch size will be 7 * gpus * num_nodes. .. code-block:: python # effective batch size = 7 * 8 - Trainer(gpus=8, accelerator="ddp|horovod") + Trainer(gpus=8, accelerator="ddp") + Trainer(gpus=8, accelerator="horovod") # effective batch size = 7 * 8 * 10 - Trainer(gpus=8, num_nodes=10, accelerator="ddp|horovod") - + Trainer(gpus=8, num_nodes=10, accelerator="ddp") + Trainer(gpus=8, num_nodes=10, accelerator="horovod") In DDP2, your effective batch size will be 7 * num_nodes. The reason is that the full batch is visible to all GPUs on the node when using DDP2.