2023-03-07 15:43:47 +00:00
################
Lightning Fabric
################
2021-10-30 10:25:52 +00:00
2023-02-09 18:06:29 +00:00
Fabric is the fast and lightweight way to scale PyTorch models without boilerplate code.
2021-11-02 15:13:01 +00:00
2023-02-09 18:06:29 +00:00
- Easily switch from running on CPU to GPU (Apple Silicon, CUDA, ...), TPU, multi-GPU or even multi-node training
- State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed) and mixed precision out of the box
- Handles all the boilerplate device logic for you
- Brings useful tools to help you build a trainer (callbacks, logging, checkpoints, ...)
- Designed with multi-billion parameter models in mind
2021-10-30 10:25:52 +00:00
2023-02-09 18:06:29 +00:00
|
2021-10-30 10:25:52 +00:00
2023-01-04 18:11:29 +00:00
.. code-block :: diff
2021-10-30 10:25:52 +00:00
2023-01-04 18:11:29 +00:00
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
2021-10-30 10:25:52 +00:00
2023-01-04 18:11:29 +00:00
+ from lightning.fabric import Fabric
2021-10-30 10:25:52 +00:00
2023-01-12 12:08:32 +00:00
class PyTorchModel(nn.Module):
2023-01-04 18:11:29 +00:00
...
2021-10-30 10:25:52 +00:00
2023-01-12 12:08:32 +00:00
class PyTorchDataset(Dataset):
2023-01-04 18:11:29 +00:00
...
2021-10-30 10:25:52 +00:00
2023-01-04 18:11:29 +00:00
+ fabric = Fabric(accelerator="cuda", devices=8, strategy="ddp")
+ fabric.launch()
2021-10-30 10:25:52 +00:00
2023-02-20 10:09:55 +00:00
- device = "cuda" if torch.cuda.is_available() else "cpu"
2023-01-12 12:08:32 +00:00
model = PyTorchModel(...)
2023-01-04 18:11:29 +00:00
optimizer = torch.optim.SGD(model.parameters())
+ model, optimizer = fabric.setup(model, optimizer)
2023-01-12 12:08:32 +00:00
dataloader = DataLoader(PyTorchDataset(...), ...)
2023-01-04 18:11:29 +00:00
+ dataloader = fabric.setup_dataloaders(dataloader)
model.train()
2021-10-30 10:25:52 +00:00
2023-01-04 18:11:29 +00:00
for epoch in range(num_epochs):
for batch in dataloader:
2023-01-12 12:08:32 +00:00
input, target = batch
- input, target = input.to(device), target.to(device)
2023-01-04 18:11:29 +00:00
optimizer.zero_grad()
2023-01-12 12:08:32 +00:00
output = model(input)
loss = loss_fn(output, target)
2023-01-04 18:11:29 +00:00
- loss.backward()
+ fabric.backward(loss)
optimizer.step()
2023-01-23 13:28:20 +00:00
lr_scheduler.step()
2021-10-30 10:25:52 +00:00
2023-01-12 13:37:24 +00:00
----
2021-10-30 10:25:52 +00:00
2023-02-09 18:06:29 +00:00
***** ***** *
Why Fabric?
***** ***** *
2023-03-07 15:43:47 +00:00
Fabric differentiates itself from a fully-fledged trainer like Lightning's `Trainer <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html> `_ in these key aspects:
2023-02-09 18:06:29 +00:00
**Fast to implement**
There is no need to restructure your code: Just change a few lines in the PyTorch script and you'll be able to leverage Fabric features.
**Maximum Flexibility**
Write your own training and/or inference logic down to the individual optimizer calls.
2023-03-07 15:43:47 +00:00
You aren't forced to conform to a standardized epoch-based training loop like the one in Lightning `Trainer <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html> `_ .
2023-02-09 18:06:29 +00:00
You can do flexible iteration based training, meta-learning, cross-validation and other types of optimization algorithms without digging into framework internals.
This also makes it super easy to adopt Fabric in existing PyTorch projects to speed-up and scale your models without the compromise on large refactors.
Just remember: With great power comes a great responsibility.
**Maximum Control**
2023-03-07 15:43:47 +00:00
The Lightning `Trainer <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html> `_ has many built-in features to make research simpler with less boilerplate, but debugging it requires some familiarity with the framework internals.
2023-02-09 18:06:29 +00:00
In Fabric, everything is opt-in. Think of it as a toolbox: You take out the tools (Fabric functions) you need and leave the other ones behind.
This makes it easier to develop and debug your PyTorch code as you gradually add more features to it.
Fabric provides important tools to remove undesired boilerplate code (distributed, hardware, checkpoints, logging, ...), but leaves the design and orchestration fully up to you.
2023-03-07 15:43:47 +00:00
----
***** ***** **
Installation
***** ***** **
Fabric ships directly with Lightning. Install it with
.. code-block :: bash
pip install lightning
2023-03-09 12:28:06 +00:00
For alternative ways to install, read the :doc: `installation guide <fundamentals/installation>` .
2023-03-07 15:43:47 +00:00
2023-02-09 18:06:29 +00:00
----
2023-01-04 18:11:29 +00:00
***** ***** **
2023-01-10 19:11:03 +00:00
Fundamentals
2023-01-04 18:11:29 +00:00
***** ***** **
2021-10-30 10:25:52 +00:00
2023-01-10 19:11:03 +00:00
.. raw :: html
<div class="display-card-container">
<div class="row">
.. displayitem ::
:header: Getting Started
:description: Learn how to add Fabric to your PyTorch code
:button_link: fundamentals/convert.html
:col_css: col-md-4
:height: 150
:tag: basic
.. displayitem ::
:header: Accelerators
:description: Take advantage of your hardware with a switch of a flag
:button_link: fundamentals/accelerators.html
:col_css: col-md-4
:height: 150
:tag: intermediate
.. displayitem ::
:header: Code Structure
:description: Best practices for setting up your training script with Fabric
:button_link: fundamentals/code_structure.html
:col_css: col-md-4
:height: 150
:tag: basic
.. displayitem ::
2023-01-18 22:30:51 +00:00
:header: Launch Distributed Training
2023-01-10 19:11:03 +00:00
:description: Launch a Python script on multiple devices and machines
:button_link: fundamentals/launch.html
:col_css: col-md-4
:height: 150
:tag: intermediate
.. displayitem ::
:header: Fabric in Notebooks
:description: Launch on multiple devices from within a Jupyter notebook
:button_link: fundamentals/notebooks.html
:col_css: col-md-4
:height: 150
:tag: basic
2023-01-12 13:37:24 +00:00
.. displayitem ::
:header: Mixed Precision Training
:description: Save memory and speed up training using mixed precision
:button_link: fundamentals/precision.html
:col_css: col-md-4
:height: 150
:tag: intermediate
2023-01-10 19:11:03 +00:00
.. raw :: html
</div>
</div>
2023-01-09 18:33:18 +00:00
2023-01-12 13:37:24 +00:00
----
2021-10-30 10:25:52 +00:00
2023-01-10 19:11:03 +00:00
***** ***** ***** ***** **
Build Your Own Trainer
***** ***** ***** ***** **
2021-10-30 10:25:52 +00:00
2023-01-10 19:11:03 +00:00
.. raw :: html
2021-10-30 10:25:52 +00:00
2023-01-10 19:11:03 +00:00
<div class="display-card-container">
<div class="row">
2021-10-30 10:25:52 +00:00
2023-01-10 19:11:03 +00:00
.. displayitem ::
:header: The LightningModule
:description: Organize your code in a LightningModule and use it with Fabric
:button_link: guide/lightning_module.html
:col_css: col-md-4
:height: 150
:tag: basic
2021-10-30 10:25:52 +00:00
2023-01-10 19:11:03 +00:00
.. displayitem ::
:header: Callbacks
:description: Make use of the Callback system in Fabric
:button_link: guide/callbacks.html
:col_css: col-md-4
:height: 150
:tag: basic
2021-10-30 10:25:52 +00:00
2023-01-10 19:11:03 +00:00
.. displayitem ::
:header: Logging
:description: Learn how Fabric helps you remove boilerplate code for tracking metrics with a logger
:button_link: guide/logging.html
:col_css: col-md-4
:height: 150
:tag: basic
2021-10-30 10:25:52 +00:00
2023-01-19 20:40:12 +00:00
.. displayitem ::
:header: Checkpoints
:description: Efficient saving and loading of model weights, training state, hyperparameters and more.
:button_link: guide/checkpoint.html
:col_css: col-md-4
:height: 150
:tag: basic
2023-01-10 19:11:03 +00:00
.. displayitem ::
:header: Trainer Template
:description: Take our Fabric Trainer template and customize it for your needs
2023-03-07 15:43:47 +00:00
:button_link: https://github.com/Lightning-AI/lightning/tree/master/examples/fabric/build_your_own_trainer
2023-01-10 19:11:03 +00:00
:col_css: col-md-4
:height: 150
:tag: intermediate
2021-10-30 10:25:52 +00:00
2023-01-10 19:11:03 +00:00
.. raw :: html
2021-10-30 10:25:52 +00:00
2023-01-10 19:11:03 +00:00
</div>
</div>
2021-10-30 10:25:52 +00:00
2023-01-12 13:37:24 +00:00
----
2022-10-19 19:55:12 +00:00
2023-01-10 19:11:03 +00:00
***** ***** *****
Advanced Topics
***** ***** *****
2022-10-19 19:55:12 +00:00
2023-01-12 13:37:24 +00:00
.. raw :: html
2023-01-06 15:54:19 +00:00
2023-01-12 13:37:24 +00:00
<div class="display-card-container">
<div class="row">
.. displayitem ::
:header: Efficient Gradient Accumulation
:description: Learn how to perform efficient gradient accumulation in distributed settings
:button_link: advanced/gradient_accumulation.html
:col_css: col-md-4
2023-01-13 13:09:44 +00:00
:height: 160
2023-01-12 13:37:24 +00:00
:tag: advanced
.. displayitem ::
2023-01-18 22:30:51 +00:00
:header: Distributed Communication
2023-01-12 13:37:24 +00:00
:description: Learn all about communication primitives for distributed operation. Gather, reduce, broadcast, etc.
2023-01-18 22:30:51 +00:00
:button_link: advanced/distributed_communication.html
2023-01-12 13:37:24 +00:00
:col_css: col-md-4
2023-01-13 13:09:44 +00:00
:height: 160
2023-01-12 13:37:24 +00:00
:tag: advanced
2023-03-07 12:19:43 +00:00
.. displayitem ::
:header: Multiple Models and Optimizers
:description: See how flexible Fabric is to work with multiple models and optimizers!
:button_link: advanced/multiple_setup.html
:col_css: col-md-4
:height: 160
:tag: advanced
2023-01-12 13:37:24 +00:00
.. raw :: html
</div>
</div>
2023-01-06 15:54:19 +00:00
2023-01-12 13:37:24 +00:00
----
2023-01-06 15:54:19 +00:00
2023-01-10 19:11:03 +00:00
.. raw :: html
2023-01-06 15:54:19 +00:00
2023-03-06 15:13:51 +00:00
<div style="display:none">
2023-01-06 15:54:19 +00:00
2023-03-06 15:13:51 +00:00
.. toctree ::
:maxdepth: 1
:name: start
2023-03-09 12:28:06 +00:00
:caption: Get Started
Fabric in 5 minutes <fundamentals/convert>
Installation <fundamentals/installation>
.. toctree ::
:maxdepth: 1
:name: fundamentals
2023-03-06 15:13:51 +00:00
:caption: Fundamentals
2023-01-06 15:54:19 +00:00
2023-03-06 15:13:51 +00:00
Accelerators <fundamentals/accelerators>
Code Structure <fundamentals/code_structure>
Launch Distributed Training <fundamentals/launch>
Fabric in Notebooks <fundamentals/notebooks>
Mixed Precision Training <fundamentals/precision>
2023-01-09 18:33:18 +00:00
2023-03-06 15:13:51 +00:00
.. toctree ::
:maxdepth: 1
:name: byot
:caption: Build Your Own Trainer
2023-01-23 13:28:20 +00:00
2023-03-06 15:13:51 +00:00
The LightningModule <guide/lightning_module>
Callbacks <guide/callbacks>
Logging <guide/logging>
Checkpoints <guide/checkpoint>
2023-03-07 15:43:47 +00:00
Trainer Template <https://github.com/Lightning-AI/lightning/tree/master/examples/fabric/build_your_own_trainer>
2023-01-09 18:33:18 +00:00
2023-03-06 15:13:51 +00:00
.. toctree ::
:maxdepth: 1
:name: advanced
:caption: Advanced Topics
2023-01-09 18:33:18 +00:00
2023-03-06 15:13:51 +00:00
Efficient Gradient Accumulation <advanced/gradient_accumulation>
Distributed Communication <advanced/distributed_communication>
2023-03-07 12:19:43 +00:00
Multiple Models and Optimizers <advanced/multiple_setup>
2023-01-09 18:33:18 +00:00
2023-03-06 15:13:51 +00:00
.. toctree ::
:maxdepth: 1
:name: examples
:caption: Examples
2023-01-12 14:31:34 +00:00
2023-03-06 15:13:51 +00:00
Examples <examples/index>
2023-01-09 18:33:18 +00:00
2023-03-06 15:13:51 +00:00
.. toctree ::
:maxdepth: 1
:name: api
:caption: API Reference
2023-01-09 18:33:18 +00:00
2023-03-06 15:13:51 +00:00
Fabric Arguments <api/fabric_args>
Fabric Methods <api/fabric_methods>
Utilities <api/utilities>
Full API Reference <api_reference>
2023-01-09 18:33:18 +00:00
2023-01-10 19:11:03 +00:00
.. raw :: html
</div>