From 9317fbfc259e820af49e3eeb323ba88b362adbfe Mon Sep 17 00:00:00 2001
From: Tobias <darklefknight@googlemail.com>
Date: Tue, 5 Oct 2021 09:12:26 +0200
Subject: [PATCH] Make DDP and Horovod batch_size scaling examples explicit
 (#9813)

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
---
 docs/source/advanced/multi_gpu.rst | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/docs/source/advanced/multi_gpu.rst b/docs/source/advanced/multi_gpu.rst
index 7497344bc6..ee689e1611 100644
--- a/docs/source/advanced/multi_gpu.rst
+++ b/docs/source/advanced/multi_gpu.rst
@@ -611,16 +611,17 @@ Let's say you have a batch size of 7 in your dataloader.
         def train_dataloader(self):
             return Dataset(..., batch_size=7)
 
-In (DDP, Horovod) your effective batch size will be 7 * gpus * num_nodes.
+In DDP or Horovod your effective batch size will be 7 * gpus * num_nodes.
 
 .. code-block:: python
 
     # effective batch size = 7 * 8
-    Trainer(gpus=8, accelerator="ddp|horovod")
+    Trainer(gpus=8, accelerator="ddp")
+    Trainer(gpus=8, accelerator="horovod")
 
     # effective batch size = 7 * 8 * 10
-    Trainer(gpus=8, num_nodes=10, accelerator="ddp|horovod")
-
+    Trainer(gpus=8, num_nodes=10, accelerator="ddp")
+    Trainer(gpus=8, num_nodes=10, accelerator="horovod")
 
 In DDP2, your effective batch size will be 7 * num_nodes.
 The reason is that the full batch is visible to all GPUs on the node when using DDP2.