Update Lightning Lite docs (3/n) (#16245)

This commit is contained in:
Adrian Wälchli 2023-01-06 10:08:55 +01:00 committed by GitHub
parent 0a928e8ead
commit a913db8e88
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 24 additions and 23 deletions

View File

@ -169,33 +169,34 @@ Furthermore, you can access the current device from ``fabric.device`` or rely on
---------- ----------
*******************
Distributed Training Pitfalls Fabric in Notebooks
============================= *******************
The :class:`~lightning_fabric.fabric.Fabric` provides you with the tools to scale your training, but there are several major challenges ahead of you now:
.. list-table:: Fabric works exactly the same way in notebooks (Jupyter, Google Colab, Kaggle, etc.) if you only run in a single process or a single GPU.
:widths: 50 50 If you want to use multiprocessing, for example multi-GPU, you can put your code in a function and pass that function to the
:header-rows: 0 :meth:`~lightning_fabric.fabric.Fabric.launch` method:
* - Processes divergence
- This happens when processes execute a different section of the code due to different if/else conditions, race conditions on existing files and so on, resulting in hanging.
* - Cross processes reduction
- Miscalculated metrics or gradients due to errors in their reduction.
* - Large sharded models
- Instantiation, materialization and state management of large models.
* - Rank 0 only actions
- Logging, profiling, and so on.
* - Checkpointing / Early stopping / Callbacks / Logging
- Ability to customize your training behavior easily and make it stateful.
* - Fault-tolerant training
- Ability to resume from a failure as if it never happened.
If you are facing one of those challenges, then you are already meeting the limit of :class:`~lightning_fabric.fabric.Fabric`. .. code-block:: python
We recommend you to convert to :doc:`Lightning <../starter/introduction>`, so you never have to worry about those.
# Notebook Cell
def train(fabric):
model = ...
optimizer = ...
model, optimizer = fabric.setup(model, optimizer)
...
# Notebook Cell
fabric = Fabric(accelerator="cuda", devices=2)
fabric.launch(train) # Launches the `train` function on two GPUs
As you can see, this function accepts one argument, the ``Fabric`` object, and it gets launched on as many devices as specified.
---------- ----------