lightning/docs/source-pytorch/accelerators/gpu_faq.rst

:orphan:

.. _gpu_faq:

GPU training (FAQ)
==================

******************************************************************
How should I adjust the learning rate when using multiple devices?
******************************************************************

When using distributed training make sure to modify your learning rate according to your effective
batch size.

Let's say you have a batch size of 7 in your dataloader.

.. testcode::

    class LitModel(LightningModule):
        def train_dataloader(self):
            return Dataset(..., batch_size=7)

In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * devices * num_nodes.

.. code-block:: python

    # effective batch size = 7 * 8
    Trainer(accelerator="gpu", devices=8, strategy="ddp")
    Trainer(accelerator="gpu", devices=8, strategy="ddp_spawn")
    Trainer(accelerator="gpu", devices=8, strategy="ddp_sharded")
    Trainer(accelerator="gpu", devices=8, strategy="horovod")

    # effective batch size = 7 * 8 * 10
    Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp")
    Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_spawn")
    Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_sharded")
    Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="horovod")

In DDP2 or DP, your effective batch size will be 7 * num_nodes.
The reason is that the full batch is visible to all GPUs on the node when using DDP2.

.. code-block:: python

    # effective batch size = 7
    Trainer(accelerator="gpu", devices=8, strategy="ddp2")
    Trainer(accelerator="gpu", devices=8, strategy="dp")

    # effective batch size = 7 * 10
    Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp2")
    Trainer(accelerator="gpu", devices=8, strategy="dp")


.. note:: Huge batch sizes are actually really bad for convergence. Check out:
        `Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour <https://arxiv.org/abs/1706.02677>`_

----

*********************************************************
How do I use multiple GPUs on Jupyter or Colab notebooks?
*********************************************************

To use multiple GPUs on notebooks, use the *DP* mode.

.. code-block:: python

    Trainer(accelerator="gpu", devices=4, strategy="dp")

If you want to use other models, please launch your training via the command-shell.

.. note:: Learn how to :ref:`access a cloud machine with multiple GPUs <grid_cloud_session_basic>` in this guide.

----

*****************************************************
I'm getting errors related to Pickling. What do I do?
*****************************************************

Pickle is Python's mechanism for serializing and unserializing data. A majority of distributed modes require that your code is fully pickle compliant. If you run into an issue with pickling try the following to figure out the issue

.. code-block:: python

    import pickle

    model = YourModel()
    pickle.dumps(model)

If you `ddp` your code doesn't need to be pickled.

.. code-block:: python

    Trainer(accelerator="gpu", devices=4, strategy="ddp")

If you use `ddp_spawn` the pickling requirement remains. This is a limitation of Python.

.. code-block:: python

    Trainer(accelerator="gpu", devices=4, strategy="ddp_spawn")
docs refactor 3/n (#12795) * updated titles + css * updated titles + css * levels structure * levels structure * levels structure * adding level indexes * finished intro guide layout * finished intro guide layout * general titles * general titles * added movie * added movie * finished 15 mins * levels * added core levels * added core levels * fixed api reference on the left * gpu guides * gpu guides * gpu guides * gpu guides * precision * hpu guide * added ipu * added ipu * added ipu * added ckpt docs * finished basic logging * intermediate * intermediate * intermediate * fixed * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * fixed margins * added logger stuff * added logger stuff * added logger stuff * added logger stuff * added logger stuff * ic * added inconsolata * added inconsolata * added inconsolata * added inconsolata * added inconsolata * added inconsolata * added inconsolata * updated menu * added basic cloud docs * added basic cloud docs * added basic cloud docs * added basic cloud docs * ic * ic * ic * ic * ic * ic * ic * ic * ic * ic * ic * ic * added demos folder * added demos folder * added demos folder * added demos folder * added demos folder * added demos folder * twocolumns directive * twocols * twocols * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * registry * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * cleaning up * updated titles + css * levels structure * adding level indexes * finished intro guide layout * general titles * added movie * finished 15 mins * levels * added core levels * fixed api reference on the left * gpu guides * precision * hpu guide * added ipu * added ckpt docs * finished basic logging * intermediate * fixed margins * added logger stuff * ic * added inconsolata * updated menu * added basic cloud docs * ic * added demos folder * twocolumns directive * registry * cleaning up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * deconflict * deconflict * deconflict * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add testsetup sections wherever needed; fix errors in building docs * pre-commit fixes * Fix duplicate label * minor nit with pre-commit * Fix labels * More changes... * require * debug & cli * prec & model & visu * fix references * fix references * fix refs * fix refs - model_parallel * fix references * prune testsetup with global * refs in index * Fix duplicate label errors * Update orphan docs * Update orphan docs * Update orphan docs * fix links * Fix genindex and search index * fix refs * fix refs * Fix index rst related issues * fix refs * inc to rst * Fix links ref * fix more references * fix refs * deconflict * errors * errors * errors * fix refs * fix refs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix warnings * Fix LightningCLI errors * Fix LightningCLI errors * Fix LightningCLI errors * Fix LightningCLI errors * fix doc build * Duplicate Label fix (docs) (#12800) Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * ignore typing in demo folder * Ignore demos for mypy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: rohitgr7 <rohitgr1998@gmail.com> Co-authored-by: Kaushik B <kaushikbokka@gmail.com> Co-authored-by: otaj <ota@grid.ai> 2022-04-19 18:15:47 +00:00			`:orphan:`

			`.. _gpu_faq:`

			`GPU training (FAQ)`
			`==================`

			`******************************************************************`
			`How should I adjust the learning rate when using multiple devices?`
			`******************************************************************`

			`When using distributed training make sure to modify your learning rate according to your effective`
			`batch size.`

			`Let's say you have a batch size of 7 in your dataloader.`

			`.. testcode::`

			`class LitModel(LightningModule):`
			`def train_dataloader(self):`
			`return Dataset(..., batch_size=7)`

			`In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * devices * num_nodes.`

			`.. code-block:: python`

			`# effective batch size = 7 * 8`
			`Trainer(accelerator="gpu", devices=8, strategy="ddp")`
			`Trainer(accelerator="gpu", devices=8, strategy="ddp_spawn")`
			`Trainer(accelerator="gpu", devices=8, strategy="ddp_sharded")`
			`Trainer(accelerator="gpu", devices=8, strategy="horovod")`

			`# effective batch size = 7 * 8 * 10`
			`Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp")`
			`Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_spawn")`
			`Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_sharded")`
			`Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="horovod")`

			`In DDP2 or DP, your effective batch size will be 7 * num_nodes.`
			`The reason is that the full batch is visible to all GPUs on the node when using DDP2.`

			`.. code-block:: python`

			`# effective batch size = 7`
			`Trainer(accelerator="gpu", devices=8, strategy="ddp2")`
			`Trainer(accelerator="gpu", devices=8, strategy="dp")`

			`# effective batch size = 7 * 10`
			`Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp2")`
			`Trainer(accelerator="gpu", devices=8, strategy="dp")`


			`.. note:: Huge batch sizes are actually really bad for convergence. Check out:`
			`Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour <https://arxiv.org/abs/1706.02677>`_

			`----`

			`*********************************************************`
			`How do I use multiple GPUs on Jupyter or Colab notebooks?`
			`*********************************************************`

			`To use multiple GPUs on notebooks, use the DP mode.`

			`.. code-block:: python`

			`Trainer(accelerator="gpu", devices=4, strategy="dp")`

			`If you want to use other models, please launch your training via the command-shell.`

			.. note:: Learn how to :ref:`access a cloud machine with multiple GPUs <grid_cloud_session_basic>` in this guide.

			`----`

			`*****************************************************`
			`I'm getting errors related to Pickling. What do I do?`
			`*****************************************************`

			`Pickle is Python's mechanism for serializing and unserializing data. A majority of distributed modes require that your code is fully pickle compliant. If you run into an issue with pickling try the following to figure out the issue`

			`.. code-block:: python`

			`import pickle`

			`model = YourModel()`
			`pickle.dumps(model)`

			If you `ddp` your code doesn't need to be pickled.

			`.. code-block:: python`

			`Trainer(accelerator="gpu", devices=4, strategy="ddp")`

			If you use `ddp_spawn` the pickling requirement remains. This is a limitation of Python.

			`.. code-block:: python`

			`Trainer(accelerator="gpu", devices=4, strategy="ddp_spawn")`