lightning/docs/source-fabric/guide/multi_node/cloud.rst

116 lines
2.9 KiB
ReStructuredText

:orphan:
##########################
Run in the Lightning Cloud
##########################
**Audience**: Users who don't want to waste time on cluster configuration and maintenance.
The Lightning AI cloud is a platform where you can build, train, finetune and deploy models without worrying about infrastructure, cost management, scaling, and other technical headaches.
In this guide, and within just 10 minutes, you will learn how to run a Fabric training script across multiple nodes in the cloud.
----
*************
Initial Setup
*************
First, create a free `Lightning AI account <https://lightning.ai/>`_.
Then, log in from the CLI:
.. code-block:: bash
lightning login
A page opens in your browser where you can follow the instructions to complete the setup.
----
***************************************
Launch multi-node training in the cloud
***************************************
**Step 1:** Put your code inside a :class:`~lightning_app.core.work.LightningWork`:
.. code-block:: python
:emphasize-lines: 5
:caption: app.py
import lightning as L
from lightning.app.components import FabricMultiNode
# 1. Put your code inside a LightningWork
class MyTrainingComponent(L.LightningWork):
def run(self):
# Set up Fabric
# The `devices` and `num_nodes` gets set by Lightning automatically
fabric = L.Fabric(strategy="ddp", precision="16-mixed")
# Your training code
model = ...
optimizer = ...
model, optimizer = fabric.setup(model, optimizer)
...
**Step 2:** Init a :class:`~lightning_app.core.app.LightningApp` with the ``FabricMultiNode`` component.
Configure the number of nodes, the number of GPUs per node, and the type of GPU:
.. code-block:: python
:emphasize-lines: 5,7
:caption: app.py
# 2. Create the app with the FabricMultiNode component inside
app = L.LightningApp(
FabricMultiNode(
MyTrainingComponent,
# Run with 2 nodes
num_nodes=2,
# Each with 4 x V100 GPUs, total 8 GPUs
cloud_compute=L.CloudCompute("gpu-fast-multi"),
)
)
**Step 3:** Run your code from the CLI:
.. code-block:: bash
lightning run app app.py --cloud
This command will upload your Python file and then opens the app admin view, where you can see the logs of what's happening.
.. figure:: https://pl-public-data.s3.amazonaws.com/assets_lightning/fabric/fabric-multi-node-admin.png
:alt: The Lightning AI admin page of an app running a multi-node fabric training script
:width: 100%
----
**********
Next steps
**********
.. raw:: html
<div class="display-card-container">
<div class="row">
.. displayitem::
:header: Lightning App Docs
:description: Learn more about apps and the Lightning cloud.
:col_css: col-md-4
:button_link: https://lightning.ai
:height: 150
.. raw:: html
</div>
</div>