Make DDP and Horovod batch_size scaling examples explicit (#9813)

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
This commit is contained in:
Tobias 2021-10-05 09:12:26 +02:00 committed by GitHub
parent 3392215ef6
commit 9317fbfc25
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 5 additions and 4 deletions

View File

@ -611,16 +611,17 @@ Let's say you have a batch size of 7 in your dataloader.
def train_dataloader(self): def train_dataloader(self):
return Dataset(..., batch_size=7) return Dataset(..., batch_size=7)
In (DDP, Horovod) your effective batch size will be 7 * gpus * num_nodes. In DDP or Horovod your effective batch size will be 7 * gpus * num_nodes.
.. code-block:: python .. code-block:: python
# effective batch size = 7 * 8 # effective batch size = 7 * 8
Trainer(gpus=8, accelerator="ddp|horovod") Trainer(gpus=8, accelerator="ddp")
Trainer(gpus=8, accelerator="horovod")
# effective batch size = 7 * 8 * 10 # effective batch size = 7 * 8 * 10
Trainer(gpus=8, num_nodes=10, accelerator="ddp|horovod") Trainer(gpus=8, num_nodes=10, accelerator="ddp")
Trainer(gpus=8, num_nodes=10, accelerator="horovod")
In DDP2, your effective batch size will be 7 * num_nodes. In DDP2, your effective batch size will be 7 * num_nodes.
The reason is that the full batch is visible to all GPUs on the node when using DDP2. The reason is that the full batch is visible to all GPUs on the node when using DDP2.