From a913db8e882f30818a1ebf41c8bdd671c6d11f81 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adrian=20W=C3=A4lchli?= Date: Fri, 6 Jan 2023 10:08:55 +0100 Subject: [PATCH] Update Lightning Lite docs (3/n) (#16245) --- docs/source-pytorch/fabric/fabric.rst | 47 ++++++++++++++------------- 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/docs/source-pytorch/fabric/fabric.rst b/docs/source-pytorch/fabric/fabric.rst index 096ac9e1d5..5291882160 100644 --- a/docs/source-pytorch/fabric/fabric.rst +++ b/docs/source-pytorch/fabric/fabric.rst @@ -169,33 +169,34 @@ Furthermore, you can access the current device from ``fabric.device`` or rely on ---------- - -Distributed Training Pitfalls -============================= - -The :class:`~lightning_fabric.fabric.Fabric` provides you with the tools to scale your training, but there are several major challenges ahead of you now: +******************* +Fabric in Notebooks +******************* -.. list-table:: - :widths: 50 50 - :header-rows: 0 - - * - Processes divergence - - This happens when processes execute a different section of the code due to different if/else conditions, race conditions on existing files and so on, resulting in hanging. - * - Cross processes reduction - - Miscalculated metrics or gradients due to errors in their reduction. - * - Large sharded models - - Instantiation, materialization and state management of large models. - * - Rank 0 only actions - - Logging, profiling, and so on. - * - Checkpointing / Early stopping / Callbacks / Logging - - Ability to customize your training behavior easily and make it stateful. - * - Fault-tolerant training - - Ability to resume from a failure as if it never happened. +Fabric works exactly the same way in notebooks (Jupyter, Google Colab, Kaggle, etc.) if you only run in a single process or a single GPU. +If you want to use multiprocessing, for example multi-GPU, you can put your code in a function and pass that function to the +:meth:`~lightning_fabric.fabric.Fabric.launch` method: -If you are facing one of those challenges, then you are already meeting the limit of :class:`~lightning_fabric.fabric.Fabric`. -We recommend you to convert to :doc:`Lightning <../starter/introduction>`, so you never have to worry about those. +.. code-block:: python + + + # Notebook Cell + def train(fabric): + + model = ... + optimizer = ... + model, optimizer = fabric.setup(model, optimizer) + ... + + + # Notebook Cell + fabric = Fabric(accelerator="cuda", devices=2) + fabric.launch(train) # Launches the `train` function on two GPUs + + +As you can see, this function accepts one argument, the ``Fabric`` object, and it gets launched on as many devices as specified. ----------