PyTorch Lightning

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

Masterclass • Key Features • How To Use • Docs • Resources • Community • FAQ • Licence

*Codecov is > 90%+ but build delays may show less

PyTorch Lightning is just organized PyTorch

Lightning disentangles PyTorch code to decouple the science from the engineering by organizing it into 4 categories:

Research code (the LightningModule).
Engineering code (you delete, and is handled by the Trainer).
Non-essential research code (logging, etc... this goes in Callbacks).
Data (use PyTorch Dataloaders or organize them into a LightningDataModule).

Once you do this, you can train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code!

Get started with our 3 steps guide

Continuous Integration

System / PyTorch ver.	1.3 (min. req.)*	1.4	1.5
Conda py3.7 [linux]
Linux py3.7 [GPUs**]	-	-	-
Linux py3.7 [TPUs***]	-	-	-
Linux py3.6 / py3.7 / py3.8		-	-
OSX py3.6 / py3.7	-		-
Windows py3.6 / py3.7 / py3.8		-	-

* torch>=1.4 is the minimal pytorch version for Python 3.8
** tests run on two NVIDIA K80
*** tests run on Google GKE TPUv2/3

PyTorch Lightning Masterclass

New lessons weekly!

From PyTorch to PyTorch Lightning

Converting a VAE to PyTorch Lightning

Key Features

Scale your models to run on any hardware (CPU, GPUs, TPUs) without changing your model
Making code more readable by decoupling the research code from the engineering
Easier to reproduce
Less error prone by automating most of the training loop and tricky engineering
Keeps all the flexibility (LightningModules are still PyTorch modules), but removes a ton of boilerplate
Lightning has out-of-the-box integration with the popular logging/visualizing frameworks (Tensorboard, MLFlow, Neptune.ai, Comet.ml, Wandb).
Tested rigorously with every new PR. We test every combination of PyTorch and Python supported versions, every OS, multi GPUs and even TPUs.
Minimal running speed overhead (about 300 ms per epoch compared with pure PyTorch).

Lightning automates 40+ parts of DL/ML research

GPU training
Distributed GPU (cluster) training
TPU training
EarlyStopping
Logging/Visualizing
Checkpointing
Experiment management
Full list here

How To Use

Install

Simple installation from PyPI

pip install pytorch-lightning

From Conda

conda install pytorch-lightning -c conda-forge

Install bleeding-edge (no guarantees)

pip install git+https://github.com/PytorchLightning/pytorch-lightning.git@master --upgrade

Here's a minimal example without a test loop.

import os
import torch
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
import pytorch_lightning as pl

# this is just a plain nn.Module with some structure
class LitClassifier(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        result = pl.TrainResult(loss)
        result.log('train_loss', loss, on_epoch=True)
        return result
        
    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        result = pl.EvalResult(checkpoint_on=loss)
        result.log('val_loss', loss)
        return result

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

# train!
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train, val = random_split(dataset, [55000, 5000])

model = LitClassifier()
trainer = pl.Trainer()
trainer.fit(model, DataLoader(train), DataLoader(val))

And without changing a single line of code, you could run on GPUs

# 8 GPUs
trainer = Trainer(max_epochs=1, gpus=8)

# 256 GPUs
trainer = Trainer(max_epochs=1, gpus=8, num_nodes=32)

Or TPUs

# Distributes TPU core training
trainer = Trainer(tpu_cores=8)

# Single TPU core training
trainer = Trainer(tpu_cores=[1])

Docs

Resources

Examples

Tutorials

Check out our introduction guide to get started. Or jump straight into our tutorials.

Community

The lightning cimmunity is maintained by

16 core contributors who are all a mix of professional engineers, Research Scientists, Ph.D. students from top AI labs.
200+ community contributors.

Lightning is also part of the PyTorch ecosystem which requires projects to have solid testing, documentation and support.

Asking for help

If you have any questions please:

Read the docs.
Look it up in our forum (or add a new question)
Search through the issues.
Join our slack.
Ask on stackoverflow with the tag pytorch-lightning.

Funding

Building open-source software with only a few part-time people is hard! We've secured funding to make sure we can hire a full-time staff, attend conferences, and move faster through implementing features you request.

Our goal is to build an incredible research platform and a big supportive community. Many open-source projects have gone on to fund operations through things like support and special help for big corporations!

If you are one of these corporations, please feel free to reach out to will@pytorchlightning.ai!

FAQ

Starting a new project?

Use our seed-project aimed at reproducibility!

Why lightning?

Although your research/production project might start simple, once you add things like GPU AND TPU training, 16-bit precision, etc, you end up spending more time engineering than researching. Lightning automates AND rigorously tests those parts for you.

Lightning has 3 goals in mind:

Maximal flexibility while abstracting out the common boilerplate across research projects.
Reproducibility. If all projects use the LightningModule template, it will be much much easier to understand what's going on and where to look! It will also mean every implementation follows a standard format.
Democratizing PyTorch power-user features. Distributed training? 16-bit? know you need them but don't want to take the time to implement? All good... these come built into Lightning.

Who is Lightning for?

Professional researchers
Ph.D. students
Corporate production teams

If you're just getting into deep learning, we recommend you learn PyTorch first! Once you've implemented a few models, come back and use all the advanced features of Lightning :)

What does lightning control for me?

Everything in Blue! This is how lightning separates the science (red) from engineering (blue).

How much effort is it to convert?

If your code is not a huge mess you should be able to organize it into a LightningModule in less than 1 hour. If your code IS a mess, then you needed to clean up anyhow ;)

Check out this step-by-step guide. Or watch this video.

How flexible is it?

As you see, you're just organizing your PyTorch code - there's no abstraction.

And for the stuff that the Trainer abstracts out, you can override any part you want to do things like implement your own distributed training, 16-bit precision, or even a custom backward pass.

For example, here you could do your own backward pass without worrying about GPUs, TPUs or 16-bit since we already handle it.

class LitModel(LightningModule):

    def optimizer_zero_grad(self, current_epoch, batch_idx, optimizer, opt_idx):
      optimizer.zero_grad()

For anything else you might need, we have an extensive callback system you can use to add arbitrary functionality not implemented by our team in the Trainer.

What types of research works?

Anything! Remember, that this is just organized PyTorch code. The Training step defines the core complexity found in the training loop.

Could be as complex as a seq2seq

# define what happens for training here
def training_step(self, batch, batch_idx):
    x, y = batch

    # define your own forward and loss calculation
    hidden_states = self.encoder(x)

    # even as complex as a seq-2-seq + attn model
    # (this is just a toy, non-working example to illustrate)
    start_token = '<SOS>'
    last_hidden = torch.zeros(...)
    loss = 0
    for step in range(max_seq_len):
        attn_context = self.attention_nn(hidden_states, start_token)
        pred = self.decoder(start_token, attn_context, last_hidden)
        last_hidden = pred
        pred = self.predict_nn(pred)
        loss += self.loss(last_hidden, y[step])

    #toy example as well
    loss = loss / max_seq_len
    return {'loss': loss}

Or as basic as CNN image classification

# define what happens for validation here
def validation_step(self, batch, batch_idx):
    x, y = batch

    # or as basic as a CNN classification
    out = self(x)
    loss = my_loss(out, y)
    return {'loss': loss}

Does Lightning Slow my PyTorch?

No! Lightning is meant for research/production cases that require high-performance.

We have tests to ensure we get the EXACT same results in under 600 ms difference per epoch. In reality, lightning adds about a 300 ms overhead per epoch. Check out the parity tests here.

Overall, Lightning guarantees rigorously tested, correct, modern best practices for the automated parts.

How does Lightning compare with Ignite and fast.ai?

Here's a thorough comparison.

Is this another library I have to learn?

Nope! We use pure Pytorch everywhere and don't add unnecessary abstractions!

Are there plans to support Python 2?

Nope.

Are there plans to support virtualenv?

Nope. Please use anaconda or miniconda.

conda activate my_env
pip install pytorch-lightning

Licence

Please observe the Apache 2.0 license that is listed in this repository. In addition the Lightning framework is Patent Pending.

BibTeX

If you want to cite the framework feel free to use this (but only if you loved it 😊):

@article{falcon2019pytorch,
  title={PyTorch Lightning},
  author={Falcon, WA},
  journal={GitHub. Note: https://github.com/PyTorchLightning/pytorch-lightning Cited by},
  volume={3},
  year={2019}
}

21 KiB Raw Blame History