Build and train PyTorch models and connect them to the ML lifecycle using Lightning App templates, without handling DIY infrastructure, cost management, scaling, and other headaches.
Go to file
Ananya Harsh Jha 6f1a2ce517
integrate metrics API with self.log (#3961)
* metrics integration into self.log

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>

* ddp and regualr test for self.log + metrics

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>

* pep8

* fix log tests

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>

* docs

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
2020-10-07 22:54:32 -04:00
.circleci run TPU tests with multiple versions (#3024) 2020-09-30 08:36:02 -04:00
.github Update ci_dockers.yml (#3935) 2020-10-07 08:26:07 -04:00
benchmarks removing this troubling test that has random behavior (#3941) 2020-10-07 12:01:51 -04:00
dockers Fix apt repo issue for docker (#3823) 2020-10-05 23:18:14 -04:00
docs integrate metrics API with self.log (#3961) 2020-10-07 22:54:32 -04:00
notebooks prune Results usage in notebooks (#3911) 2020-10-06 16:57:56 -04:00
pl_examples added bug report model (#3901) 2020-10-06 09:05:20 -04:00
pytorch_lightning integrate metrics API with self.log (#3961) 2020-10-07 22:54:32 -04:00
requirements Mocking Loggers Part 5/5 (final) (#3926) 2020-10-06 23:49:06 -04:00
tests integrate metrics API with self.log (#3961) 2020-10-07 22:54:32 -04:00
.codecov.yml skip files in coverage (#3944) 2020-10-07 12:37:01 -04:00
.drone.yml fix path in CI for release & python version in all dockers & duplicated badges (#3765) 2020-10-02 05:26:21 -04:00
.gitignore Callback docs with autosummary (#3908) 2020-10-06 17:28:45 -04:00
.mergify.yml run TPU tests with multiple versions (#3024) 2020-09-30 08:36:02 -04:00
.pep8speaks.yml Set pep8speaks' max-line-length to 120 (same as black) (#3173) 2020-08-25 21:21:02 -04:00
.pre-commit-config.yaml added copyright notices (#3062) 2020-08-19 22:03:22 -04:00
.pyrightconfig.json nb steps in early stop (#3909) 2020-10-06 15:20:08 -04:00
.readthedocs.yml added copyright notices (#3062) 2020-08-19 22:03:22 -04:00
.run_local_tests.sh simplify tests & cleaning (#2588) 2020-08-07 23:22:05 +02:00
.update.sh default test logger (#1478) 2020-04-21 20:33:10 -04:00
CHANGELOG.md new chlog template (#3963) 2020-10-07 20:42:24 -04:00
LICENSE update license (#809) 2020-02-09 14:18:50 -05:00
MANIFEST.in make PyTorch Lightning PEP 561 Compliant (#3187) 2020-09-09 13:37:03 -04:00
README.md use badges only with push (#3914) 2020-10-06 17:30:16 -04:00
environment.yml fix path in CI for release & python version in all dockers & duplicated badges (#3765) 2020-10-02 05:26:21 -04:00
pyproject.toml Added black formater for the code with code-checker on pull (#1610) 2020-06-03 18:23:14 +02:00
setup.cfg skip files in coverage (#3944) 2020-10-07 12:37:01 -04:00
setup.py Mocking loggers (part 1, wandb) (#3596) 2020-09-25 16:00:02 +02:00

README.md

Logo

PyTorch Lightning

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

MasterclassKey FeaturesHow To UseDocsExamplesCommunityLicence

PyPI - Python Version PyPI Status PyPI Status Conda DockerHub codecov

ReadTheDocs Slack Discourse status license Next Release

*Codecov is > 90%+ but build delays may show less

PyTorch Lightning is just organized PyTorch

Lightning disentangles PyTorch code to decouple the science from the engineering. PT to PL


Lightning Philosophy

Lightning is designed with these principles in mind:

Principle 1: Enable maximal flexibility.
Principle 2: Abstract away unecessary boilerplate, but make it accessible when needed.
Principle 3: Systems should be self-contained (ie: optimizers, computation code, etc).
Principle 4: Deep learning code should be organized into 4 distinct categories.

  • Research code (the LightningModule).
  • Engineering code (you delete, and is handled by the Trainer).
  • Non-essential research code (logging, etc... this goes in Callbacks).
  • Data (use PyTorch Dataloaders or organize them into a LightningDataModule).

Once you do this, you can train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code!

Get started with our 3 steps guide



Continuous Integration

System / PyTorch ver. 1.3 (min. req.)* 1.4 1.5 1.6 (latest) 1.7 (nightly)
Conda py3.7 [linux] PyTorch & Conda PyTorch & Conda PyTorch & Conda PyTorch & Conda PyTorch & Conda
Linux py3.7 [GPUs**] - - Build Status - -
Linux py3.7 [TPUs***] - - - TPU tests -
Linux py3.6 / py3.7 / py3.8 CI complete testing - - CI complete testing -
OSX py3.6 / py3.7 - CI complete testing - CI complete testing -
Windows py3.6 / py3.7 / py3.8 CI complete testing - - CI complete testing -
  • * torch>=1.4 is the minimal pytorch version for Python 3.8
  • ** tests run on two NVIDIA K80
  • *** tests run on Google GKE TPUv2/3
  • TPU w/ py3.6/py3.7 means we support Colab and Kaggle env.

How To Use

Step 0: Install

Simple installation from PyPI

pip install pytorch-lightning

From Conda

conda install pytorch-lightning -c conda-forge

Install bleeding-edge (no guarantees)

pip install git+https://github.com/PytorchLightning/pytorch-lightning.git@master --upgrade

Step 0: Add these imports

import os
import torch
from torch import nn
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
import pytorch_lightning as pl

Step 1: Define a LightningModule (nn.Module subclass)

A LightningModule defines a full system (ie: a GAN, autoencoder, BERT or a simple Image Classifier).

class LitAutoEncoder(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3))
        self.decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28))
    
    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding

    def training_step(self, batch, batch_idx):
        # training_step defined the train loop. It is independent of forward
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer
Note: Training_step defines the training loop. Forward defines how the LightningModule behaves during inference/prediction.

Step 2: Train!

dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train, val = random_split(dataset, [55000, 5000])

autoencoder = LitAutoEncoder()
trainer = pl.Trainer()
trainer.fit(autoencoder, DataLoader(train), DataLoader(val))

And without changing a single line of code, you could run on GPUs

# 8 GPUs
trainer = Trainer(max_epochs=1, gpus=8)

# 256 GPUs
trainer = Trainer(max_epochs=1, gpus=8, num_nodes=32)

Or TPUs

# Distributes TPU core training
trainer = Trainer(tpu_cores=8)

# Single TPU core training
trainer = Trainer(tpu_cores=[1])

Key Features

  • Scale your models to run on any hardware (CPU, GPUs, TPUs) without changing your model
  • Making code more readable by decoupling the research code from the engineering
  • Easier to reproduce
  • Less error prone by automating most of the training loop and tricky engineering
  • Keeps all the flexibility (LightningModules are still PyTorch modules), but removes a ton of boilerplate
  • Lightning has out-of-the-box integration with the popular logging/visualizing frameworks (Tensorboard, MLFlow, Neptune.ai, Comet.ml, Wandb).
  • Tested rigorously with every new PR. We test every combination of PyTorch and Python supported versions, every OS, multi GPUs and even TPUs.
  • Minimal running speed overhead (about 300 ms per epoch compared with pure PyTorch).

Lightning automates 40+ parts of DL/ML research

  • GPU training
  • Distributed GPU (cluster) training
  • TPU training
  • EarlyStopping
  • Logging/Visualizing
  • Checkpointing
  • Experiment management
  • Full list here

Examples

Hello world

MNIST hello world
MNIST on TPUs

Contrastive Learning

BYOL
CPC v2
Moco v2
SIMCLR

NLP

BERT
GPT-2

Reinforcement Learning

DQN
Dueling-DQN
Reinforce

Vision

GAN

Classic ML

Logistic Regression
Linear Regression


Community

The lightning community is maintained by

  • 16 core contributors who are all a mix of professional engineers, Research Scientists, Ph.D. students from top AI labs.
  • 280+ community contributors.

Lightning is also part of the PyTorch ecosystem which requires projects to have solid testing, documentation and support.

Asking for help

If you have any questions please:

  1. Read the docs.
  2. Look it up in our forum (or add a new question)
  3. Search through the issues.
  4. Join our slack.
  5. Ask on stackoverflow with the tag pytorch-lightning.

Funding

Building open-source software with only a few part-time people is hard! We've secured funding to make sure we can hire a full-time staff, attend conferences, and move faster through implementing features you request.

Our goal is to build an incredible research platform and a big supportive community. Many open-source projects have gone on to fund operations through things like support and special help for big corporations!

If you are one of these corporations, please feel free to reach out to will@pytorchlightning.ai!


Licence

Please observe the Apache 2.0 license that is listed in this repository. In addition the Lightning framework is Patent Pending.

BibTeX

If you want to cite the framework feel free to use this (but only if you loved it 😊):

@article{falcon2019pytorch,
  title={PyTorch Lightning},
  author={Falcon, WA},
  journal={GitHub. Note: https://github.com/PyTorchLightning/pytorch-lightning},
  volume={3},
  year={2019}
}