Build and train PyTorch models and connect them to the ML lifecycle using Lightning App templates, without handling DIY infrastructure, cost management, scaling, and other headaches.
Go to file
Tadej Svetina c8f605e47d
Classification metrics overhaul: precision & recall (4/n) (#4842)
* Add stuff

* Change metrics documentation layout

* Add stuff

* Add stat scores

* Change testing utils

* Replace len(*.shape) with *.ndim

* More descriptive error message for input formatting

* Replace movedim with permute

* PEP 8 compliance

* WIP

* Add reduce_scores function

* Temporarily add back legacy class_reduce

* Division with float

* PEP 8 compliance

* Remove precision recall

* Replace movedim with permute

* Add back tests

* Add empty newlines

* Add precision recall back

* Add empty line

* Fix permute

* Fix some issues with old versions of PyTorch

* Style changes in error messages

* More error message style improvements

* Fix typo in docs

* Add more descriptive variable names in utils

* Change internal var names

* Revert unwanted changes

* Revert unwanted changes pt 2

* Update metrics interface

* Add top_k parameter

* Add back reduce function

* Add stuff

* PEP3

* Add depreciation

* PEP8

* Deprecate param

* PEP8

* Fix and simplify testing for older PT versions

* Update Changelog

* Remove redundant import

* Add tests to increase coverage

* Remove zero_division

* fix zero_division

* Add zero_div + edge case tests

* Reorder cls metric args

* Add back quotes for is_multiclass

* Add precision_recall and tests

* PEP8

* Fix docs

* Fix docs

* Update

* Change precision_recall output

* PEP8/isort

* Add method _get_final_stats

* Fix depr test

* Add comment to deprecation tests

* isort

* Apply suggestions from code review

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Add typing to test

* Add matc str to pytest.raises

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-01-18 03:24:13 -05:00
.circleci formatting (#4898) 2020-11-30 00:57:28 -05:00
.github set minimal req. PT 1.4 (#5418) 2021-01-12 19:15:35 -05:00
benchmarks add memory parity for PL vs Vanilla (#5170) 2021-01-06 11:40:01 +01:00
dockers set minimal req. PT 1.4 (#5418) 2021-01-12 19:15:35 -05:00
docs Classification metrics overhaul: precision & recall (4/n) (#4842) 2021-01-18 03:24:13 -05:00
notebooks Add TPU example (#5109) 2021-01-06 11:54:54 +01:00
pl_examples add promxial policy optimization template to pl_examples (#5394) 2021-01-09 12:49:11 -05:00
pytorch_lightning Classification metrics overhaul: precision & recall (4/n) (#4842) 2021-01-18 03:24:13 -05:00
requirements set minimal req. PT 1.4 (#5418) 2021-01-12 19:15:35 -05:00
tests Classification metrics overhaul: precision & recall (4/n) (#4842) 2021-01-18 03:24:13 -05:00
.codecov.yml skip files in coverage (#3944) 2020-10-07 12:37:01 -04:00
.drone.jsonnet Create .drone.jsonnet (#4968) 2020-12-06 00:17:18 +00:00
.drone.yml [bug-fix] Trainer.test points to latest best_model_path (#5161) 2021-01-06 15:14:10 +01:00
.gitignore [FEAT] Add lambda closure to manual_optimizer_step (#4618) 2020-11-12 19:22:06 +00:00
.mergify.yml temporarily suspend all mergify rules (#5112) 2021-01-05 09:58:37 +01:00
.pep8speaks.yml
.pre-commit-config.yaml Enforce pre-commit to use a recent and fixed version of isort. (#5408) 2021-01-08 10:55:58 -05:00
.readthedocs.yml move base req. to root (#4219) 2020-10-18 20:40:18 +02:00
CHANGELOG.md Classification metrics overhaul: precision & recall (4/n) (#4842) 2021-01-18 03:24:13 -05:00
LICENSE
MANIFEST.in CI: update badges for release (#5002) 2020-12-09 10:59:44 +01:00
Makefile add make cmd - clean (#5204) 2021-01-05 09:58:37 +01:00
README.md set minimal req. PT 1.4 (#5418) 2021-01-12 19:15:35 -05:00
environment.yml set minimal req. PT 1.4 (#5418) 2021-01-12 19:15:35 -05:00
pyproject.toml Fix pre-commit isort failure on pytorch_lightning/accelerators (#5503) 2021-01-16 14:10:56 -05:00
requirements.txt set minimal req. PT 1.4 (#5418) 2021-01-12 19:15:35 -05:00
setup.cfg Tighten up mypy config (#5237) 2021-01-05 09:58:37 +01:00
setup.py [feat] Enable self.log in callbacks (#5094) 2020-12-16 16:08:39 -05:00

README.md

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.


WebsiteKey FeaturesHow To UseDocsExamplesCommunityGrid AILicence

PyPI - Python Version PyPI Status PyPI Status Conda DockerHub codecov

ReadTheDocs Slack Discourse status license Next Release

*Codecov is > 90%+ but build delays may show less

NEWS

Dec 2020 - Read about how Facebook uses Lightning to standardize deep learning across research and production teams


PyTorch Lightning is just organized PyTorch

Lightning disentangles PyTorch code to decouple the science from the engineering. PT to PL


Lightning Philosophy

Lightning is designed with these principles in mind:

Principle 1: Enable maximal flexibility. Principle 2: Abstract away unnecessary boilerplate, but make it accessible when needed. Principle 3: Systems should be self-contained (ie: optimizers, computation code, etc). Principle 4: Deep learning code should be organized into 4 distinct categories.

  • Research code (the LightningModule).
  • Engineering code (you delete, and is handled by the Trainer).
  • Non-essential research code (logging, etc... this goes in Callbacks).
  • Data (use PyTorch Dataloaders or organize them into a LightningDataModule).

Once you do this, you can train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code!

Get started with our 2 step guide


Inference

Lightning is also designed for the fast inference AI researchers and production teams need to scale up things like BERT and self-supervised learning. Lightning can automatically export to ONNX or TorchScript for those cases.


Continuous Integration

System / PyTorch ver. 1.4 (min. req.)* 1.5 1.6 1.7 (latest) 1.8 (nightly)
Conda py3.7 [linux] PyTorch & Conda PyTorch & Conda PyTorch & Conda PyTorch & Conda PyTorch & Conda
Linux py3.7 [GPUs**] - - GPUs Status - -
Linux py3.{6,7} [TPUs***] - - TPU tests TPU tests
Linux py3.{6,7} CI complete testing - - CI complete testing -
OSX py3.{6,7,8} - CI complete testing - CI complete testing -
Windows py3.{6,7,8} CI complete testing - - CI complete testing -
  • ** tests run on two NVIDIA K80
  • *** tests run on Google GKE TPUv2/3
  • TPU w/ py3.6/py3.7 means we support Colab and Kaggle env.

How To Use

Step 0: Install

Simple installation from PyPI

pip install pytorch-lightning

To get full package experience you can install also all optional dependencies with pytorch-lightning['extra'] or for CPU users with pytorch-lightning['cpu-extra'].

From Conda

conda install pytorch-lightning -c conda-forge

Install bleeding-edge - future 1.2

the actual status of 1.2 [nightly] is following:

CI base testing CI complete testing PyTorch & Conda TPU tests Docs check

Install future release from the source (no guarantees)

pip install git+https://github.com/PytorchLightning/pytorch-lightning.git@release/1.2-dev --upgrade

or nightly from testing PyPI

pip install -iU https://test.pypi.org/simple/ pytorch-lightning

Step 1: Add these imports

import os
import torch
from torch import nn
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
import pytorch_lightning as pl

Step 2: Define a LightningModule (nn.Module subclass)

A LightningModule defines a full system (ie: a GAN, autoencoder, BERT or a simple Image Classifier).

class LitAutoEncoder(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3))
        self.decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28))

    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding

    def training_step(self, batch, batch_idx):
        # training_step defined the train loop. It is independent of forward
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log('train_loss', loss)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

Note: Training_step defines the training loop. Forward defines how the LightningModule behaves during inference/prediction.

Step 3: Train!

dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train, val = random_split(dataset, [55000, 5000])

autoencoder = LitAutoEncoder()
trainer = pl.Trainer()
trainer.fit(autoencoder, DataLoader(train), DataLoader(val))

And without changing a single line of code, you could run on GPUs/TPUs

# 8 GPUs
trainer = Trainer(max_epochs=1, gpus=8)

# 256 GPUs
trainer = Trainer(max_epochs=1, gpus=8, num_nodes=32)

# TPUs
trainer = Trainer(tpu_cores=8)

And even export for production via onnx or torchscript

# torchscript
autoencoder = LitAutoEncoder()
torch.jit.save(autoencoder.to_torchscript(), "model.pt")

# onnx
with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as tmpfile:
    autoencoder = LitAutoEncoder()
    input_sample = torch.randn((1, 64))
    autoencoder.to_onnx(tmpfile.name, input_sample, export_params=True)
    os.path.isfile(tmpfile.name)

For advanced users, you can still own complex training loops

class LitAutoEncoder(pl.LightningModule):
    def training_step(self, batch, batch_idx, opt_idx):
        (opt_a, opt_b) = self.optimizers()

        loss_a = ...
        self.manual_backward(loss_a, opt_a)
        opt_a.step()
        opt_a.zero_grad()

        loss_b = ...
        self.manual_backward(loss_b, opt_b, retain_graph=True)
        self.manual_backward(loss_b, opt_b)
        opt_b.step()
        opt_b.zero_grad()

Key Features

  • Scale your models to run on any hardware (CPU, GPUs, TPUs) without changing your model
  • Making code more readable by decoupling the research code from the engineering
  • Easier to reproduce
  • Less error prone by automating most of the training loop and tricky engineering
  • Keeps all the flexibility (LightningModules are still PyTorch modules), but removes a ton of boilerplate
  • Lightning has out-of-the-box integration with the popular logging/visualizing frameworks (Tensorboard, MLFlow, Neptune.ai, Comet.ml, Wandb).
  • Tested rigorously with every new PR. We test every combination of PyTorch and Python supported versions, every OS, multi GPUs and even TPUs.
  • Minimal running speed overhead (about 300 ms per epoch compared with pure PyTorch).

Lightning automates 40+ parts of DL/ML research

  • GPU training
  • Distributed GPU (cluster) training
  • TPU training
  • EarlyStopping
  • Logging/Visualizing
  • Checkpointing
  • Experiment management
  • Full list here

Examples

Hello world
Contrastive Learning
NLP
Reinforcement Learning
Vision
Classic ML

Community

The lightning community is maintained by

  • 16 core contributors who are all a mix of professional engineers, Research Scientists, Ph.D. students from top AI labs.
  • 280+ community contributors.

Lightning is also part of the PyTorch ecosystem which requires projects to have solid testing, documentation and support.

Asking for help

If you have any questions please:

  1. Read the docs.
  2. Look it up in our forum (or add a new question)
  3. Search through the issues.
  4. Join our slack.
  5. Ask on stackoverflow with the tag pytorch-lightning.

Funding

Building open-source software with only a few part-time people is hard!

We're venture funded and backed by some of the top VC funds in the world, Index Ventures, Bain Capital Ventures, First Minute Capital.

Their funding ensures we can continue to build awesome tooling like Grid, give you around the clock support, hire a full-time staff, attend conferences, and move faster through implementing features you request.

To supercharge your research and production work, visit our Grid.ai platform


Grid AI

Grid AI is our native platform for training models at scale on the cloud!

Sign up for early access here

To use grid, take your regular command:

    python my_model.py --learning_rate 1e-6 --layers 2 --gpus 4

And change it to use the grid train command:

    grid train --grid_gpus 4 my_model.py --learning_rate 'uniform(1e-6, 1e-1, 20)' --layers '[2, 4, 8, 16]'

The above command will launch (20 * 4) experiments each running on 4 GPUs (320 GPUs!) - by making ZERO changes to your code.


Licence

Please observe the Apache 2.0 license that is listed in this repository. In addition the Lightning framework is Patent Pending.

BibTeX

If you want to cite the framework feel free to use this (but only if you loved it 😊):

@article{falcon2019pytorch,
  title={PyTorch Lightning},
  author={Falcon, WA},
  journal={GitHub. Note: https://github.com/PyTorchLightning/pytorch-lightning},
  volume={3},
  year={2019}
}