2022-04-19 18:15:47 +00:00
:orphan:
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
#######################
Lightning in 15 minutes
#######################
**Required background:** None
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
**Goal:** In this guide, we'll walk you through the 7 key steps of a typical Lightning workflow.
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
PyTorch Lightning is the deep learning framework with "batteries included" for professional AI researchers and machine learning engineers who need maximal flexibility while super-charging performance at scale.
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
.. join_slack ::
:align: left
:margin: 20
2020-08-20 01:22:39 +00:00
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
Lightning organizes PyTorch code to remove boilerplate and unlock scalability.
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
.. raw :: html
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
<video width="100%" max-width="800px" controls autoplay muted playsinline
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pl_docs_animation_final.m4v"></video>
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
|
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
By organizing PyTorch code, lightning enables:
2020-08-20 01:22:39 +00:00
2022-03-27 14:34:32 +00:00
.. raw :: html
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
<div class="display-card-container">
<div class="row">
.. Add callout items below this line
.. displayitem ::
:header: Full flexibility
:description: Try any ideas using raw PyTorch without the boilerplate.
:col_css: col-md-3
:image_center: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/card_full_control.png
:height: 290
.. displayitem ::
:description: Decoupled research and engineering code enable reproducibility and better readability.
:header: Reproducible + Readable
:col_css: col-md-3
:image_center: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/card_no_boilerplate.png
:height: 290
.. displayitem ::
:description: Use multiple GPUs/TPUs/HPUs etc... without code changes.
:header: Simple multi-GPU training
:col_css: col-md-3
:image_center: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/card_hardware.png
:height: 290
.. displayitem ::
:description: We've done all the testing so you don't have to.
:header: Built-in testing
:col_css: col-md-3
:image_center: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/card_testing.png
:height: 290
.. raw :: html
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
</div>
</div>
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
.. End of callout item section
----
***** ***** ***** ***** ***** ***
1: Install PyTorch Lightning
***** ***** ***** ***** ***** ***
.. raw :: html
2020-08-30 13:31:36 +00:00
2022-04-19 18:15:47 +00:00
<div class="row" style='font-size: 16px'>
<div class='col-md-6'>
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
For `pip <https://pypi.org/project/pytorch-lightning/> `_ users
2020-08-20 01:22:39 +00:00
2022-03-27 14:34:32 +00:00
.. code-block :: bash
2020-08-20 01:22:39 +00:00
2022-03-27 14:34:32 +00:00
pip install pytorch-lightning
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
.. raw :: html
</div>
<div class='col-md-6'>
For `conda <https://anaconda.org/conda-forge/pytorch-lightning> `_ users
2020-08-20 01:22:39 +00:00
2022-03-27 14:34:32 +00:00
.. code-block :: bash
2020-08-20 01:22:39 +00:00
2022-03-27 14:34:32 +00:00
conda install pytorch-lightning -c conda-forge
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
.. raw :: html
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
</div>
</div>
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
Or read the `advanced install guide <installation.html> `_
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
----
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
.. _new_project:
***** ***** ***** ***** ***** **
2: Define a LightningModule
***** ***** ***** ***** ***** **
A LightningModule enables your PyTorch nn.Module to play together in complex ways inside the training_step (there is also an optional validation_step and test_step).
2020-08-20 01:22:39 +00:00
2021-01-26 09:44:54 +00:00
.. testcode ::
2020-08-20 01:22:39 +00:00
import os
2022-04-19 18:15:47 +00:00
from torch import optim, nn, utils, Tensor
from tests.helpers.datasets import MNIST
2020-08-20 01:22:39 +00:00
import pytorch_lightning as pl
2022-04-19 18:15:47 +00:00
# define any number of nn.Modules (or use your current ones)
encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
# define the LightningModule
2021-03-23 23:08:57 +00:00
class LitAutoEncoder(pl.LightningModule):
2022-04-19 18:15:47 +00:00
def __init__(self, encoder, decoder):
2020-08-20 01:22:39 +00:00
super().__init__()
2022-04-19 18:15:47 +00:00
self.encoder = encoder
self.decoder = decoder
2020-09-22 18:00:02 +00:00
2020-09-21 15:17:59 +00:00
def training_step(self, batch, batch_idx):
2022-04-19 18:15:47 +00:00
# training_step defines the train loop.
# it is independent of forward
2020-09-21 15:17:59 +00:00
x, y = batch
2020-08-20 01:22:39 +00:00
x = x.view(x.size(0), -1)
2020-09-21 15:17:59 +00:00
z = self.encoder(x)
x_hat = self.decoder(z)
2022-04-19 18:15:47 +00:00
loss = nn.functional.mse_loss(x_hat, x)
2020-09-30 03:44:27 +00:00
# Logging to TensorBoard by default
2021-07-30 12:10:15 +00:00
self.log("train_loss", loss)
2020-09-21 15:17:59 +00:00
return loss
2020-08-30 13:31:36 +00:00
2020-08-20 01:22:39 +00:00
def configure_optimizers(self):
2022-04-19 18:15:47 +00:00
optimizer = optim.Adam(self.parameters(), lr=1e-3)
2020-08-20 01:22:39 +00:00
return optimizer
2020-08-30 13:31:36 +00:00
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
# init the autoencoder
autoencoder = LitAutoEncoder(encoder, decoder)
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
----
2020-08-30 13:31:36 +00:00
2022-04-19 18:15:47 +00:00
***** ***** ***** *** *
3: Define a dataset
***** ***** ***** *** *
2020-09-23 11:36:51 +00:00
2022-04-19 18:15:47 +00:00
Lightning supports ANY iterable (:class: `~torch.utils.data.DataLoader` , numpy, etc...) for the train/val/test/predict splits.
2020-10-12 20:48:07 +00:00
.. code-block :: python
2022-04-19 18:15:47 +00:00
# setup data
dataset = MNIST(os.getcwd(), download=True)
train_loader = utils.data.DataLoader(dataset)
2020-09-23 11:36:51 +00:00
2022-04-19 18:15:47 +00:00
----
2020-09-22 18:00:02 +00:00
2022-04-19 18:15:47 +00:00
***** ***** ***** ***
4: Train the model
***** ***** ***** ***
2020-08-30 13:31:36 +00:00
2022-04-19 18:15:47 +00:00
The Lightning :doc: `Trainer <../common/trainer>` "mixes" any :doc: `LightningModule <../common/lightning_module>` with any dataset and abstracts away all the engineering complexity needed for scale.
2020-08-30 15:01:16 +00:00
.. code-block :: python
2020-08-30 13:31:36 +00:00
2022-04-19 18:15:47 +00:00
# train the model (hint: here are some helpful Trainer arguments for rapid idea iteration)
trainer = pl.Trainer(limit_train_batches=100, max_epochs=1)
2022-01-13 21:11:43 +00:00
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
The Lightning :doc: `Trainer <../common/trainer>` automates `40+ tricks <../common/trainer.html#trainer-flags> `_ including:
2020-09-22 10:00:54 +00:00
* Epoch and batch iteration
2022-01-13 21:11:43 +00:00
* `` optimizer.step() `` , `` loss.backward() `` , `` optimizer.zero_grad() `` calls
* Calling of `` model.eval() `` , enabling/disabling grads during evaluation
* :doc: `Checkpoint Saving and Loading <../common/checkpointing>`
2022-04-19 18:15:47 +00:00
* Tensorboard (see :doc: `loggers <../visualize/loggers>` options)
* :doc: `Multi-GPU <../accelerators/gpu>` support
2022-01-06 13:42:44 +00:00
* :doc: `TPU <../accelerators/tpu>`
2022-04-19 18:15:47 +00:00
* :ref: `16-bit precision AMP <speed-amp>` support
2020-11-30 13:29:49 +00:00
2022-04-19 18:15:47 +00:00
----
2020-11-30 13:29:49 +00:00
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
***** ***** ***** *
5: Use the model
***** ***** ***** *
Once you've trained the model you can export to onnx, torchscript and put it into production or simply load the weights and run predictions.
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
.. code :: python
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
# load checkpoint
checkpoint = "./lightning_logs/version_0/checkpoints/epoch=0-step=100.ckpt"
autoencoder = LitAutoEncoder.load_from_checkpoint(checkpoint, encoder=encoder, decoder=decoder)
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
# choose your trained nn.Module
encoder = autoencoder.encoder
encoder.eval()
2021-01-07 05:24:47 +00:00
2022-04-19 18:15:47 +00:00
# embed 4 fake images!
fake_image_batch = Tensor(4, 28 * 28)
embeddings = encoder(fake_image_batch)
print("⚡" * 20, "\nPredictions (4 image embeddings):\n", embeddings, "\n", "⚡" * 20)
2021-01-07 05:24:47 +00:00
2022-04-19 18:15:47 +00:00
----
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
***** ***** ***** ***** *
6: Visualize training
***** ***** ***** ***** *
Lightning comes with a *lot* of batteries included. A helpful one is Tensorboard for visualizing experiments.
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
Run this on your commandline and open your browser to **http://localhost:6006/**
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
.. code :: bash
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
tensorboard --logdir .
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
----
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
***** ***** ***** ***** ***
7: Supercharge training
***** ***** ***** ***** ***
Enable advanced training features using Trainer arguments. These are state-of-the-art techniques that are automatically integrated into your training loop without changes to your code.
2021-10-28 12:31:02 +00:00
2022-04-19 18:15:47 +00:00
.. code ::
2021-10-28 12:31:02 +00:00
2022-04-19 18:15:47 +00:00
# train on 4 GPUs
trainer = Trainer(
devices=4,
accelerator="gpu",
)
2020-08-30 13:31:36 +00:00
2022-04-19 18:15:47 +00:00
# train 1TB+ parameter models with Deepspeed/fsdp
trainer = Trainer(
devices=4,
accelerator="gpu",
strategy="deepspeed_stage_2",
precision=16
)
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
# 20+ helpful flags for rapid idea iteration
trainer = Trainer(
max_epochs=10,
min_epochs=5,
overfit_batches=1
)
2020-08-30 10:51:34 +00:00
2022-04-19 18:15:47 +00:00
# access the latest state of the art techniques
trainer = Trainer(callbacks=[StochasticWeightAveraging(...)])
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
----
2020-09-21 15:17:59 +00:00
2022-04-19 18:15:47 +00:00
***** ***** ***** *****
Maximize flexibility
***** ***** ***** *****
Lightning's core guiding principle is to always provide maximal flexibility **without ever hiding any of the PyTorch** .
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
Lightning offers 5 *added* degrees of flexibility depending on your project's complexity.
2021-07-30 12:10:15 +00:00
2022-04-19 18:15:47 +00:00
----
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
Customize training loop
=======================
2021-07-30 12:10:15 +00:00
2022-04-19 18:15:47 +00:00
.. image :: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/custom_loop.png
:width: 600
:alt: Injecting custom code in a training loop
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
Inject custom code anywhere in the Training loop using any of the 20+ methods (:ref: `lightning_hooks` ) available in the LightningModule.
2020-08-20 01:22:39 +00:00
2021-01-26 09:44:54 +00:00
.. testcode ::
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
class LitAutoEncoder(pl.LightningModule):
def backward(self, loss, optimizer, optimizer_idx):
loss.backward()
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
----
2021-10-28 12:31:02 +00:00
2022-04-19 18:15:47 +00:00
Extend the Trainer
2022-01-13 21:11:43 +00:00
==================
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
.. raw :: html
2022-03-27 14:34:32 +00:00
2022-04-19 18:15:47 +00:00
<video width="100%" max-width="800px" controls autoplay muted playsinline
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/cb.m4v"></video>
2022-03-27 14:34:32 +00:00
2022-04-19 18:15:47 +00:00
If you have multiple lines of code with similar functionalities, you can use callbacks to easily group them together and toggle all of those lines on or off at the same time.
2022-03-27 14:34:32 +00:00
2022-04-19 18:15:47 +00:00
.. code ::
2022-03-27 14:34:32 +00:00
2022-04-19 18:15:47 +00:00
trainer = Trainer(callbacks=[AWSCheckpoints()])
2022-03-27 14:34:32 +00:00
2022-04-19 18:15:47 +00:00
----
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
Use a raw PyTorch loop
======================
2020-10-11 17:30:25 +00:00
2022-04-19 18:15:47 +00:00
For certain types of work at the bleeding-edge of research, Lightning offers experts full control of their training loops in various ways.
2020-10-11 17:30:25 +00:00
2022-04-19 18:15:47 +00:00
.. raw :: html
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
<div class="display-card-container">
<div class="row">
.. Add callout items below this line
.. displayitem ::
:header: Manual optimization
:description: Automated training loop, but you own the optimization steps.
:col_css: col-md-4
:image_center: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/manual_opt.png
:button_link: ../model/build_model_advanced.html#manual-optimization
:image_height: 220px
:height: 320
.. displayitem ::
:header: Lightning Lite
:description: Full control over loop for migrating complex PyTorch projects.
:col_css: col-md-4
:image_center: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/lite.png
:button_link: ../model/build_model_expert.html
:image_height: 220px
:height: 320
.. displayitem ::
:header: Loops
:description: Enable meta-learning, reinforcement learning, GANs with full control.
:col_css: col-md-4
:image_center: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/loops.png
:button_link: ../model/custom_loop_expert.html
:image_height: 220px
:height: 320
2020-10-11 17:30:25 +00:00
2022-04-19 18:15:47 +00:00
.. raw :: html
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
</div>
</div>
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
.. End of callout item section
2020-09-22 10:00:54 +00:00
2022-04-19 18:15:47 +00:00
----
2020-10-11 17:30:25 +00:00
2022-04-19 18:15:47 +00:00
***** *****
Next steps
***** *****
Depending on your use case, you might want to check one of these out next.
2020-10-11 17:30:25 +00:00
2022-04-19 18:15:47 +00:00
.. raw :: html
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
<div class="display-card-container">
<div class="row">
.. Add callout items below this line
.. displayitem ::
:header: Level 2: Add a validation and test set
:description: Add validation and test sets to avoid over/underfitting.
:button_link: ../levels/basic_level_2.html
:col_css: col-md-3
:height: 180
:tag: basic
.. displayitem ::
:header: See more examples
:description: See examples across computer vision, NLP, RL, etc...
:col_css: col-md-3
:button_link: ../tutorials.html
:height: 180
:tag: basic
.. displayitem ::
:header: I need my raw PyTorch Loop
:description: Expert-level control for researchers working on the bleeding-edge
:col_css: col-md-3
:button_link: ../model/build_model_expert.html
:height: 180
:tag: expert
.. displayitem ::
:header: Deploy your model
:description: Learn how to predict or put your model into production
:col_css: col-md-3
:button_link: ../deploy/production.html
:height: 180
:tag: basic
2022-01-13 21:11:43 +00:00
2022-04-19 18:15:47 +00:00
.. raw :: html
2020-08-20 01:22:39 +00:00
2022-04-19 18:15:47 +00:00
</div>
</div>