Document limitations of multi-GPU in Jupyter notebooks (#18132)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-24 21:22:16 +02:00 · 2023-07-24 21:22:16 +02:00 · 6552d29a12
parent 0e7e6b31c5
commit 6552d29a12
5 changed files with 198 additions and 0 deletions
--- a/docs/source-fabric/fundamentals/notebooks.rst
+++ b/docs/source-fabric/fundamentals/notebooks.rst
@ -26,3 +26,62 @@ If you want to use multiprocessing, for example, multi-GPU, you can put your cod


 As you can see, this function accepts one argument, the ``Fabric`` object, and it gets launched on as many devices as specified.
+
+
+----
+
+
+*********************
+Multi-GPU Limitations
+*********************
+
+The multi-GPU capabilities in Jupyter are enabled by launching processes using the 'fork' start method.
+It is the only supported way of multi-processing in notebooks, but also brings some limitations that you should be aware of.
+
+Avoid initializing CUDA before launch
+=====================================
+
+Don't run torch CUDA functions before calling ``fabric.launch(train)`` in any of the notebook cells beforehand, otherwise your code may hang or crash.
+
+.. code-block:: python
+
+    # BAD: Don't run CUDA-related code before `.launch()`
+    # x = torch.tensor(1).cuda()
+    # torch.cuda.empty_cache()
+    # torch.cuda.is_available()
+
+
+    def train(fabric):
+        # GOOD: Move CUDA calls into the training function
+        x = torch.tensor(1).cuda()
+        torch.cuda.empty_cache()
+        torch.cuda.is_available()
+        ...
+
+
+    fabric = Fabric(accelerator="cuda", devices=2)
+    fabric.launch(train)
+
+
+Move data loading code inside the function
+==========================================
+
+If you define/load your data in the main process before calling ``fabric.launch(train)``, you may see a slowdown or crashes (segmentation fault, SIGSEV, etc.).
+The best practice is to move your data loading code inside the training function to avoid these issues:
+
+.. code-block:: python
+
+    # BAD: Don't load data in the main process
+    # dataset = MyDataset("data/")
+    # dataloader = torch.utils.data.DataLoader(dataset)
+
+
+    def train(fabric):
+        # GOOD: Move data loading code into the training function
+        dataset = MyDataset("data/")
+        dataloader = torch.utils.data.DataLoader(dataset)
+        ...
+
+
+    fabric = Fabric(accelerator="cuda", devices=2)
+    fabric.launch(train)
--- a/docs/source-pytorch/accelerators/gpu_faq.rst
+++ b/docs/source-pytorch/accelerators/gpu_faq.rst
@ -48,6 +48,7 @@ To use multiple GPUs on notebooks, use the *DDP_NOTEBOOK* mode.
    Trainer(accelerator="gpu", devices=4, strategy="ddp_notebook")

 If you want to use other strategies, please launch your training via the command-shell.
+See also: :doc:`../../common/notebooks`

 ----

--- a/docs/source-pytorch/common/index.rst
+++ b/docs/source-pytorch/common/index.rst
@ -138,6 +138,13 @@ How-to Guides
    :col_css: col-md-4
    :height: 180

+.. displayitem::
+   :header: Train in a notebook
+   :description: Train models in interactive notebooks (Jupyter, Colab, Kaggle, etc.)
+   :col_css: col-md-4
+   :button_link: ../common/notebooks.html
+   :height: 180
+
 .. displayitem::
    :header: Train on single or multiple GPUs
    :description: Train models faster with GPU accelerators
--- a/docs/source-pytorch/common/notebooks.rst
+++ b/docs/source-pytorch/common/notebooks.rst
@ -0,0 +1,124 @@
+:orphan:
+
+.. _jupyter_notebooks:
+
+##############################################
+Interactive Notebooks (Jupyter, Colab, Kaggle)
+##############################################
+
+**Audience:** Users looking to train models in interactive notebooks (Jupyter, Colab, Kaggle, etc.).
+
+
+----
+
+
+**********************
+Lightning in notebooks
+**********************
+
+You can use the Lightning Trainer in interactive notebooks just like in a regular Python script, including multi-GPU training!
+
+.. code-block:: python
+
+    import lightning as L
+
+    # Works in Jupyter, Colab and Kaggle!
+    trainer = L.Trainer(accelerator="auto", devices="auto")
+
+
+You can find many notebook examples on our :doc:`tutorials page <../tutorials>` too!
+
+
+----
+
+
+.. _jupyter_notebook_example:
+
+************
+Full example
+************
+
+Paste the following code block into a notebook cell:
+
+.. code-block:: python
+
+    import lightning as L
+    from torch import nn, optim, utils
+    import torchvision
+
+    encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
+    decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))
+
+
+    class LitAutoEncoder(L.LightningModule):
+        def __init__(self, encoder, decoder):
+            super().__init__()
+            self.encoder = encoder
+            self.decoder = decoder
+
+        def training_step(self, batch, batch_idx):
+            x, y = batch
+            x = x.view(x.size(0), -1)
+            z = self.encoder(x)
+            x_hat = self.decoder(z)
+            loss = nn.functional.mse_loss(x_hat, x)
+            self.log("train_loss", loss)
+            return loss
+
+        def configure_optimizers(self):
+            return optim.Adam(self.parameters(), lr=1e-3)
+
+        def prepare_data(self):
+            torchvision.datasets.MNIST(".", download=True)
+
+        def train_dataloader(self):
+            dataset = torchvision.datasets.MNIST(".", transform=torchvision.transforms.ToTensor())
+            return utils.data.DataLoader(dataset, batch_size=64)
+
+
+    autoencoder = LitAutoEncoder(encoder, decoder)
+    trainer = L.Trainer(max_epochs=2, devices="auto")
+    trainer.fit(model=autoencoder)
+
+
+----
+
+
+*********************
+Multi-GPU Limitations
+*********************
+
+The multi-GPU capabilities in Jupyter are enabled by launching processes using the 'fork' start method.
+It is the only supported way of multi-processing in notebooks, but also brings some limitations that you should be aware of.
+
+Avoid initializing CUDA before .fit()
+=====================================
+
+Don't run torch CUDA functions before calling ``trainer.fit()`` in any of the notebook cells beforehand, otherwise your code may hang or crash.
+
+.. code-block:: python
+
+    # BAD: Don't run CUDA-related code before `.fit()`
+    x = torch.tensor(1).cuda()
+    torch.cuda.empty_cache()
+    torch.cuda.is_available()
+
+    trainer = L.Trainer(accelerator="cuda", devices=2)
+    trainer.fit(model)
+
+
+Move data loading code inside the hooks
+=======================================
+
+If you define/load your data in the main process before calling ``trainer.fit()``, you may see a slowdown or crashes (segmentation fault, SIGSEV, etc.).
+
+.. code-block:: python
+
+    # BAD: Don't load data in the main process
+    dataset = MyDataset("data/")
+    train_dataloader = torch.utils.data.DataLoader(dataset)
+
+    trainer = L.Trainer(accelerator="cuda", devices=2)
+    trainer.fit(model, train_dataloader)
+
+The best practice is to move your data loading code inside the ``*_dataloader()`` hooks in the :class:`~lightning.pytorch.core.module.LightningModule` or :class:`~lightning.pytorch.core.datamodule.LightningDataModule` as shown in the :ref:`example above <jupyter_notebook_example>`.
--- a/docs/source-pytorch/common_usecases.rst
+++ b/docs/source-pytorch/common_usecases.rst
@ -112,6 +112,13 @@ Customize and extend Lightning for things like custom hardware or distributed st
   :button_link: advanced/model_parallel.html
   :height: 100

+.. displayitem::
+   :header: Train in a notebook
+   :description: Train models in interactive notebooks (Jupyter, Colab, Kaggle, etc.)
+   :col_css: col-md-12
+   :button_link: common/notebooks.html
+   :height: 100
+
 .. displayitem::
   :header: Train on single or multiple GPUs
   :description: Train models faster with GPUs.