c64520e658
* tensorboard version * WIP test tb hparams logs (#3040) * optional * req * tensorboard>=2.2.0 * data * data * TB Co-authored-by: Rosario Scalise <rosario@cs.washington.edu> |
||
---|---|---|
.circleci | ||
.github | ||
benchmarks | ||
dockers | ||
docs | ||
pl_examples | ||
pytorch_lightning | ||
requirements | ||
tests | ||
.codecov.yml | ||
.drone.yml | ||
.gitignore | ||
.mergify.yml | ||
.pep8speaks.yml | ||
.pre-commit-config.yaml | ||
.pyrightconfig.json | ||
.readthedocs.yml | ||
.run_local_tests.sh | ||
.update.sh | ||
CHANGELOG.md | ||
LICENSE | ||
MANIFEST.in | ||
README.md | ||
environment.yml | ||
pyproject.toml | ||
setup.cfg | ||
setup.py |
README.md
PyTorch Lightning
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
Masterclass • Key Features • How To Use • Docs • Resources • Community • FAQ • Licence
*Codecov is > 90%+ but build delays may show less
PyTorch Lightning is just organized PyTorch
Lightning disentangles PyTorch code to decouple the science from the engineering by organizing it into 4 categories:
- Research code (the LightningModule).
- Engineering code (you delete, and is handled by the Trainer).
- Non-essential research code (logging, etc... this goes in Callbacks).
- Data (use PyTorch Dataloaders or organize them into a LightningDataModule).
Once you do this, you can train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code!
Get started with our 3 steps guide
Trending contributors
Continuous Integration
- *
torch>=1.4
is the minimal pytorch version for Python 3.8 - ** tests run on two NVIDIA K80
- *** tests run on Google GKE TPUv2/3
PyTorch Lightning Masterclass
New lessons weekly!
Key Features
- Scale your models to run on any hardware (CPU, GPUs, TPUs) without changing your model
- Making code more readable by decoupling the research code from the engineering
- Easier to reproduce
- Less error prone by automating most of the training loop and tricky engineering
- Keeps all the flexibility (LightningModules are still PyTorch modules), but removes a ton of boilerplate
- Lightning has out-of-the-box integration with the popular logging/visualizing frameworks (Tensorboard, MLFlow, Neptune.ai, Comet.ml, Wandb).
- Tested rigorously with every new PR. We test every combination of PyTorch and Python supported versions, every OS, multi GPUs and even TPUs.
- Minimal running speed overhead (about 300 ms per epoch compared with pure PyTorch).
Lightning automates 40+ parts of DL/ML research
- GPU training
- Distributed GPU (cluster) training
- TPU training
- EarlyStopping
- Logging/Visualizing
- Checkpointing
- Experiment management
- Full list here
How To Use
Install
Simple installation from PyPI
pip install pytorch-lightning
From Conda
conda install pytorch-lightning -c conda-forge
Install bleeding-edge (no guarantees)
pip install git+https://github.com/PytorchLightning/pytorch-lightning.git@master --upgrade
Here's a minimal example without a test loop.
import os
import torch
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
import pytorch_lightning as pl
from pytorch_lightning import Trainer
# this is just a plain nn.Module with some structure
class LitClassifier(pl.LightningModule):
def __init__(self):
super().__init__()
self.l1 = torch.nn.Linear(28 * 28, 10)
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = F.cross_entropy(y_hat, y)
result = pl.TrainResult(loss)
result.log('train_loss', loss, on_epoch=True)
return result
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = F.cross_entropy(y_hat, y)
result = pl.EvalResult(checkpoint_on=loss)
result.log('val_loss', loss)
return result
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
# train!
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train, val = random_split(dataset, [55000, 5000])
model = LitClassifier()
trainer = Trainer()
trainer.fit(model, DataLoader(train), DataLoader(val))
And without changing a single line of code, you could run on GPUs
# 8 GPUs
trainer = Trainer(max_epochs=1, gpus=8)
# 256 GPUs
trainer = Trainer(max_epochs=1, gpus=8, num_nodes=32)
Or TPUs
# Distributes TPU core training
trainer = Trainer(tpu_cores=8)
# Single TPU core training
trainer = Trainer(tpu_cores=[1])
Docs
Resources
Examples
Hello world
MNIST hello world
MNIST on TPUs
Contrastive Learning
NLP
Reinforcement Learning
Vision
Classic ML
Logistic Regression
Linear Regression
Tutorials
Check out our introduction guide to get started. Or jump straight into our tutorials.
Community
The lightning cimmunity is maintained by
- 16 core contributors who are all a mix of professional engineers, Research Scientists, Ph.D. students from top AI labs.
- 200+ community contributors.
Lightning is also part of the PyTorch ecosystem which requires projects to have solid testing, documentation and support.
Asking for help
If you have any questions please:
- Read the docs.
- Look it up in our forum (or add a new question)
- Search through the issues.
- Join our slack.
- Ask on stackoverflow with the tag pytorch-lightning.
Funding
Building open-source software with only a few part-time people is hard! We've secured funding to make sure we can hire a full-time staff, attend conferences, and move faster through implementing features you request.
Our goal is to build an incredible research platform and a big supportive community. Many open-source projects have gone on to fund operations through things like support and special help for big corporations!
If you are one of these corporations, please feel free to reach out to will@pytorchlightning.ai!
FAQ
Starting a new project?
Use our seed-project aimed at reproducibility!
Why lightning?
Although your research/production project might start simple, once you add things like GPU AND TPU training, 16-bit precision, etc, you end up spending more time engineering than researching. Lightning automates AND rigorously tests those parts for you.
Lightning has 3 goals in mind:
- Maximal flexibility while abstracting out the common boilerplate across research projects.
- Reproducibility. If all projects use the LightningModule template, it will be much much easier to understand what's going on and where to look! It will also mean every implementation follows a standard format.
- Democratizing PyTorch power-user features. Distributed training? 16-bit? know you need them but don't want to take the time to implement? All good... these come built into Lightning.
Who is Lightning for?
- Professional researchers
- Ph.D. students
- Corporate production teams
If you're just getting into deep learning, we recommend you learn PyTorch first! Once you've implemented a few models, come back and use all the advanced features of Lightning :)
What does lightning control for me?
Everything in Blue! This is how lightning separates the science (red) from engineering (blue).
How much effort is it to convert?
If your code is not a huge mess you should be able to organize it into a LightningModule in less than 1 hour. If your code IS a mess, then you needed to clean up anyhow ;)
Check out this step-by-step guide. Or watch this video.
How flexible is it?
As you see, you're just organizing your PyTorch code - there's no abstraction.
And for the stuff that the Trainer abstracts out, you can override any part you want to do things like implement your own distributed training, 16-bit precision, or even a custom backward pass.
For example, here you could do your own backward pass without worrying about GPUs, TPUs or 16-bit since we already handle it.
class LitModel(LightningModule):
def optimizer_zero_grad(self, current_epoch, batch_idx, optimizer, opt_idx):
optimizer.zero_grad()
For anything else you might need, we have an extensive callback system you can use to add arbitrary functionality not implemented by our team in the Trainer.
What types of research works?
Anything! Remember, that this is just organized PyTorch code. The Training step defines the core complexity found in the training loop.
Could be as complex as a seq2seq
# define what happens for training here
def training_step(self, batch, batch_idx):
x, y = batch
# define your own forward and loss calculation
hidden_states = self.encoder(x)
# even as complex as a seq-2-seq + attn model
# (this is just a toy, non-working example to illustrate)
start_token = '<SOS>'
last_hidden = torch.zeros(...)
loss = 0
for step in range(max_seq_len):
attn_context = self.attention_nn(hidden_states, start_token)
pred = self.decoder(start_token, attn_context, last_hidden)
last_hidden = pred
pred = self.predict_nn(pred)
loss += self.loss(last_hidden, y[step])
#toy example as well
loss = loss / max_seq_len
return {'loss': loss}
Or as basic as CNN image classification
# define what happens for validation here
def validation_step(self, batch, batch_idx):
x, y = batch
# or as basic as a CNN classification
out = self(x)
loss = my_loss(out, y)
return {'loss': loss}
Does Lightning Slow my PyTorch?
No! Lightning is meant for research/production cases that require high-performance.
We have tests to ensure we get the EXACT same results in under 600 ms difference per epoch. In reality, lightning adds about a 300 ms overhead per epoch. Check out the parity tests here.
Overall, Lightning guarantees rigorously tested, correct, modern best practices for the automated parts.
How does Lightning compare with Ignite and fast.ai?
Is this another library I have to learn?
Nope! We use pure Pytorch everywhere and don't add unnecessary abstractions!
Are there plans to support Python 2?
Nope.
Are there plans to support virtualenv?
Nope. Please use anaconda or miniconda.
conda activate my_env
pip install pytorch-lightning
Licence
Please observe the Apache 2.0 license that is listed in this repository. In addition the Lightning framework is Patent Pending.
BibTeX
If you want to cite the framework feel free to use this (but only if you loved it 😊):
@article{falcon2019pytorch,
title={PyTorch Lightning},
author={Falcon, WA},
journal={GitHub. Note: https://github.com/PyTorchLightning/pytorch-lightning},
volume={3},
year={2019}
}