Update README.md
This commit is contained in:
parent
09482fb64f
commit
10311c48b7
296
README.md
296
README.md
|
@ -54,14 +54,28 @@ pip install pytorch-lightning
|
|||
## What is it?
|
||||
Lightning is a way to organize your PyTorch code to decouple the science code from the engineering. It's more of a style-guide than a framework.
|
||||
|
||||
By refactoring your code, we can automate most of the non-research code. Lightning guarantees tested, correct, modern best practices for the automated parts.
|
||||
To use Lightning, first refactor your research code into a [LightningModule](https://pytorch-lightning.readthedocs.io/en/latest/lightning-module.html).
|
||||
|
||||
Here's an example of how to organize PyTorch code into the LightningModule.
|
||||
![PT to PL](docs/source/_images/lightning_module/pt_to_pl.png)
|
||||
|
||||
![PT to PL](docs/source/_images/mnist_imgs/pt_to_pl.jpg)
|
||||
And Lightning automates the rest using the [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html)!
|
||||
![PT to PL](docs/source/_images/lightning_module/pt_trainer.png)
|
||||
|
||||
- If you are a researcher, Lightning is infinitely flexible, you can modify everything down to the way .backward is called or distributed is set up.
|
||||
- If you are a scientist or production team, lightning is very simple to use with best practice defaults.
|
||||
Lightning guarantees riguously tested, correct, modern best practices for the automated parts.
|
||||
|
||||
## How flexible is it?
|
||||
As you see, you're just organizing your PyTorch code - there's no abstraction.
|
||||
|
||||
And for the stuff that the Trainer abstracts out you can [override any part](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html#extensibility) you want to do things like implement your own distributed training, 16-bit precision, or even a custom backwards pass.
|
||||
|
||||
For anything else you might need, we have an extensive [callback system](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html#callbacks) you can use to add arbitrary functionality not implemented by our team in the Trainer.
|
||||
|
||||
## Who is Lightning for?
|
||||
- Professional researchers
|
||||
- PhD students
|
||||
- Corporate production teams
|
||||
|
||||
If you're just getting into deep learning, we recommend you learn PyTorch first! Once you've implemented a few models, come back and use all the advanced features of Lightning :)
|
||||
|
||||
## What does lightning control for me?
|
||||
|
||||
|
@ -71,18 +85,23 @@ This is how lightning separates the science (red) from the engineering (blue).
|
|||
![Overview](docs/source/_static/images/pl_overview.gif)
|
||||
|
||||
## How much effort is it to convert?
|
||||
You're probably tired of switching frameworks at this point. But it is a very quick process to refactor into the Lightning format (ie: hours). [Check out this tutorial](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09).
|
||||
If your code is not a huge mess you should be able to organize it into a LightningModule in less than 1 hour.
|
||||
If your code IS a mess, then you needed to clean up anyhow ;)
|
||||
|
||||
[Check out this step-by-step guide](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09).
|
||||
|
||||
## What are the differences with PyTorch?
|
||||
If you're wondering what you gain out of refactoring your PyTorch code, [read this comparison!](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09)
|
||||
|
||||
## Starting a new project?
|
||||
[Use our seed-project aimed at reproducibility!](https://github.com/PytorchLightning/pytorch-lightning-conference-seed)
|
||||
|
||||
## Why do I want to use lightning?
|
||||
Every research project starts the same, a model, a training loop, validation loop, etc. As your research advances, you're likely to need distributed training, 16-bit precision, checkpointing, gradient accumulation, etc.
|
||||
Although your research/production project might start simple, once you add things like GPU AND TPU training, 16-bit precision, etc, you end up spending more time engineering than researching. Lightning automates AND rigorously tests those parts for you.
|
||||
|
||||
Lightning sets up all the boilerplate state-of-the-art training for you so you can focus on the research.
|
||||
## Support
|
||||
- [7 core contributors](https://pytorch-lightning.readthedocs.io/en/latest/governance.html) who are all a mix of professional engineers, Research Scientists, PhD students from top AI labs.
|
||||
- 100+ community contributors.
|
||||
|
||||
Lightning is also part of the [PyTorch ecosystem](https://pytorch.org/ecosystem/) which requires projects to have solid testing, documentation and support.
|
||||
|
||||
---
|
||||
|
||||
|
@ -102,25 +121,19 @@ Lightning sets up all the boilerplate state-of-the-art training for you so you c
|
|||
|
||||
---
|
||||
|
||||
## How do I do use it?
|
||||
Think about Lightning as refactoring your research code instead of using a new framework. The research code goes into a [LightningModule](https://pytorch-lightning.rtfd.io/en/latest/lightning-module.html) which you fit using a Trainer.
|
||||
## Realistic example
|
||||
Here's how you would organize a realistic PyTorch project into Lightning.
|
||||
|
||||
The LightningModule defines a *system* such as seq-2-seq, GAN, etc... It can ALSO define a simple classifier such as the example below.
|
||||
![PT to PL](docs/source/_images/mnist_imgs/pt_to_pl.jpg)
|
||||
|
||||
To use lightning do 2 things:
|
||||
1. [Define a LightningModule](https://pytorch-lightning.rtfd.io/en/latest/lightning-module.html)
|
||||
```python
|
||||
import os
|
||||
|
||||
import torch
|
||||
from torch.nn import functional as F
|
||||
from torch.utils.data import DataLoader
|
||||
from torchvision.datasets import MNIST
|
||||
from torchvision import transforms
|
||||
|
||||
import pytorch_lightning as pl
|
||||
|
||||
class CoolSystem(pl.LightningModule):
|
||||
The LightningModule defines a *system* such as seq-2-seq, GAN, etc...
|
||||
It can ALSO define a simple classifier.
|
||||
|
||||
In summary, you:
|
||||
|
||||
1. Define a [LightningModule](https://pytorch-lightning.rtfd.io/en/latest/lightning-module.html)
|
||||
```python
|
||||
class LitSystem(pl.LightningModule):
|
||||
|
||||
def __init__(self):
|
||||
super(CoolSystem, self).__init__()
|
||||
|
@ -129,102 +142,29 @@ To use lightning do 2 things:
|
|||
|
||||
def forward(self, x):
|
||||
return torch.relu(self.l1(x.view(x.size(0), -1)))
|
||||
|
||||
def training_step(self, batch, batch_idx):
|
||||
# REQUIRED
|
||||
x, y = batch
|
||||
y_hat = self.forward(x)
|
||||
loss = F.cross_entropy(y_hat, y)
|
||||
tensorboard_logs = {'train_loss': loss}
|
||||
return {'loss': loss, 'log': tensorboard_logs}
|
||||
|
||||
def validation_step(self, batch, batch_idx):
|
||||
# OPTIONAL
|
||||
x, y = batch
|
||||
y_hat = self.forward(x)
|
||||
return {'val_loss': F.cross_entropy(y_hat, y)}
|
||||
|
||||
def validation_end(self, outputs):
|
||||
# OPTIONAL
|
||||
avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
|
||||
tensorboard_logs = {'val_loss': avg_loss}
|
||||
return {'avg_val_loss': avg_loss, 'log': tensorboard_logs}
|
||||
|
||||
def test_step(self, batch, batch_idx):
|
||||
# OPTIONAL
|
||||
x, y = batch
|
||||
y_hat = self.forward(x)
|
||||
return {'test_loss': F.cross_entropy(y_hat, y)}
|
||||
|
||||
def test_end(self, outputs):
|
||||
# OPTIONAL
|
||||
avg_loss = torch.stack([x['test_loss'] for x in outputs]).mean()
|
||||
tensorboard_logs = {'test_loss': avg_loss}
|
||||
return {'avg_test_loss': avg_loss, 'log': tensorboard_logs}
|
||||
|
||||
def configure_optimizers(self):
|
||||
# REQUIRED
|
||||
# can return multiple optimizers and learning_rate schedulers
|
||||
# (LBFGS it is automatically supported, no need for closure function)
|
||||
return torch.optim.Adam(self.parameters(), lr=0.02)
|
||||
|
||||
@pl.data_loader
|
||||
def train_dataloader(self):
|
||||
# REQUIRED
|
||||
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)
|
||||
|
||||
@pl.data_loader
|
||||
def val_dataloader(self):
|
||||
# OPTIONAL
|
||||
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)
|
||||
|
||||
@pl.data_loader
|
||||
def test_dataloader(self):
|
||||
# OPTIONAL
|
||||
return DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=transforms.ToTensor()), batch_size=32)
|
||||
```
|
||||
2. Fit with a [trainer](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.html)
|
||||
```python
|
||||
from pytorch_lightning import Trainer
|
||||
|
||||
model = CoolSystem()
|
||||
|
||||
# most basic trainer, uses good defaults
|
||||
trainer = Trainer()
|
||||
trainer.fit(model)
|
||||
```
|
||||
|
||||
Trainer sets up a tensorboard logger, early stopping and checkpointing by default (you can modify all of them or
|
||||
use something other than tensorboard).
|
||||
|
||||
Here are more advanced examples
|
||||
```python
|
||||
# train on cpu using only 10% of the data (for demo purposes)
|
||||
trainer = Trainer(max_epochs=1, train_percent_check=0.1)
|
||||
|
||||
# train on 4 gpus (lightning chooses GPUs for you)
|
||||
# trainer = Trainer(max_epochs=1, gpus=4, distributed_backend='ddp')
|
||||
|
||||
# train on 4 gpus (you choose GPUs)
|
||||
# trainer = Trainer(max_epochs=1, gpus=[0, 1, 3, 7], distributed_backend='ddp')
|
||||
|
||||
# train on 32 gpus across 4 nodes (make sure to submit appropriate SLURM job)
|
||||
# trainer = Trainer(max_epochs=1, gpus=8, num_gpu_nodes=4, distributed_backend='ddp')
|
||||
|
||||
# train (1 epoch only here for demo)
|
||||
trainer.fit(model)
|
||||
|
||||
# view tensorboard logs
|
||||
logging.info(f'View tensorboard logs by running\ntensorboard --logdir {os.getcwd()}')
|
||||
logging.info('and going to http://localhost:6006 on your browser')
|
||||
def training_step(self, batch, batch_idx):
|
||||
...
|
||||
```
|
||||
|
||||
When you're all done you can even run the test set separately.
|
||||
```python
|
||||
trainer.test()
|
||||
```
|
||||
2. Fit it with a [Trainer](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.html)
|
||||
```python
|
||||
from pytorch_lightning import Trainer
|
||||
|
||||
**Could be as complex as seq-2-seq + attention**
|
||||
model = CoolSystem()
|
||||
|
||||
# most basic trainer, uses good defaults
|
||||
trainer = Trainer()
|
||||
trainer.fit(model)
|
||||
```
|
||||
|
||||
[Check out the COLAB demo here](https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=HOk9c4_35FKg)
|
||||
|
||||
## What types of research works?
|
||||
Anything! Remember, that this is just organized PyTorch code.
|
||||
The Training step defines the core complexity found in the training loop.
|
||||
|
||||
#### Could be as complex as a seq2seq
|
||||
|
||||
```python
|
||||
# define what happens for training here
|
||||
|
@ -251,7 +191,7 @@ def training_step(self, batch, batch_idx):
|
|||
return {'loss': loss}
|
||||
```
|
||||
|
||||
**Or as basic as CNN image classification**
|
||||
#### Or as basic as CNN image classification
|
||||
|
||||
```python
|
||||
# define what happens for validation here
|
||||
|
@ -264,62 +204,74 @@ def validation_step(self, batch, batch_idx):
|
|||
return {'loss': loss}
|
||||
```
|
||||
|
||||
**And you also decide how to collate the output of all validation steps**
|
||||
|
||||
```python
|
||||
def validation_epoch_end(self, outputs):
|
||||
"""
|
||||
Called at the end of validation to aggregate outputs
|
||||
:param outputs: list of individual outputs of each validation step
|
||||
:return:
|
||||
"""
|
||||
val_loss_mean = 0
|
||||
val_acc_mean = 0
|
||||
for output in outputs:
|
||||
val_loss_mean += output['val_loss']
|
||||
val_acc_mean += output['val_acc']
|
||||
|
||||
val_loss_mean /= len(outputs)
|
||||
val_acc_mean /= len(outputs)
|
||||
logs = {'val_loss': val_loss_mean.item(), 'val_acc': val_acc_mean.item()}
|
||||
result = {'log': logs}
|
||||
return result
|
||||
And without changing a single line of code, you could run on CPUs
|
||||
```python
|
||||
trainer = Trainer(max_epochs=1)
|
||||
```
|
||||
|
||||
## Tensorboard
|
||||
Lightning is fully integrated with tensorboard, MLFlow and supports any logging module.
|
||||
|
||||
|
||||
Or GPUs
|
||||
```python
|
||||
# 8 GPUs
|
||||
trainer = Trainer(max_epochs=1, gpus=8)
|
||||
|
||||
# 256 GPUs
|
||||
trainer = Trainer(max_epochs=1, gpus=8, num_nodes=32)
|
||||
```
|
||||
|
||||
Or TPUs
|
||||
```python
|
||||
trainer = Trainer(num_tpu_cores=8)
|
||||
```
|
||||
|
||||
When you're done training, run the test accuracy
|
||||
```python
|
||||
trainer.test()
|
||||
```
|
||||
|
||||
## Visualization
|
||||
Lightning has out-of-the-box integration with the popular logging/visualizing frameworks
|
||||
|
||||
- Tensorboard
|
||||
- MLFlow
|
||||
- Neptune.ai
|
||||
- Comet.ml
|
||||
- ...
|
||||
|
||||
![tensorboard-support](docs/source/_static/images/tf_loss.png)
|
||||
|
||||
Lightning also adds a text column with all the hyperparameters for this experiment.
|
||||
|
||||
![tensorboard-support](docs/source/_static/images/tf_tags.png)
|
||||
|
||||
## Lightning automates all of the following ([each is also configurable](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.html)):
|
||||
|
||||
|
||||
- [Running grid search on a cluster](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.distrib_data_parallel.html)
|
||||
- [Fast dev run](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.utilities.debugging.html)
|
||||
- [Logging](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.loggers.html)
|
||||
- [Implement Your Own Distributed (DDP) training](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.core.lightning.html#pytorch_lightning.core.lightning.LightningModule.configure_ddp)
|
||||
- [Multi-GPU & Multi-node](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.distrib_parts.html)
|
||||
- [Training loop](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.training_loop.html)
|
||||
- [Hooks](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.core.hooks.html)
|
||||
- [Configure optimizers](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.core.lightning.html#pytorch_lightning.core.lightning.LightningModule.configure_optimizers)
|
||||
- [Validations](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.evaluation_loop.html)
|
||||
- [Model saving & Restoring training session](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.training_io.html)
|
||||
## Lightning automates 40+ parts of DL/ML research
|
||||
- GPU training
|
||||
- Distributed GPU (cluster) training
|
||||
- TPU training
|
||||
- EarlyStopping
|
||||
- Logging/Visualizing
|
||||
- Checkpointing
|
||||
- Experiment management
|
||||
- [Full list here](https://pytorch-lightning.readthedocs.io/en/latest/#common-use-cases)
|
||||
|
||||
|
||||
## Examples
|
||||
- [GAN](https://github.com/PytorchLightning/pytorch-lightning/tree/master/pl_examples/domain_templates/gan.py)
|
||||
- [MNIST](https://github.com/PytorchLightning/pytorch-lightning/tree/master/pl_examples/basic_examples)
|
||||
- [Other projects using Lightning](https://github.com/PytorchLightning/pytorch-lightning/network/dependents?package_id=UGFja2FnZS0zNzE3NDU4OTM%3D)
|
||||
- [Multi-node](https://github.com/PytorchLightning/pytorch-lightning/tree/master/pl_examples/multi_node_examples)
|
||||
Check out this awesome list of research papers and implementations done with Lightning.
|
||||
|
||||
- [Contextual Emotion Detection (DoubleDistilBert)](https://github.com/PyTorchLightning/emotion_transformer)
|
||||
- [Generative Adversarial Network](https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=TyYOdg8g77P0)
|
||||
- [Hyperparameter optimization with Optuna](https://github.com/optuna/optuna/blob/master/examples/pytorch_lightning_simple.py)
|
||||
- [Image Inpainting using Partial Convolutions](https://github.com/ryanwongsa/Image-Inpainting)
|
||||
- [MNIST on TPU](https://colab.research.google.com/drive/1-_LKx4HwAxl5M6xPJmqAAu444LTDQoa3#scrollTo=BHBz1_AnamN_)
|
||||
- [NER (transformers, TPU, huggingface)](https://colab.research.google.com/drive/1dBN-wwYUngLYVt985wGs_OKPlK_ANB9D)
|
||||
- [NeuralTexture (CVPR)](https://github.com/PyTorchLightning/neuraltexture)
|
||||
- [Recurrent Attentive Neural Process](https://github.com/PyTorchLightning/attentive-neural-processes)
|
||||
- [Siamese Nets for One-shot Image Recognition](https://github.com/PyTorchLightning/Siamese-Neural-Networks)
|
||||
- [Speech Transformers](https://github.com/PyTorchLightning/speech-transformer-pytorch_lightning)
|
||||
- [Transformers transfer learning (Huggingface)](https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=yr7eaxkF-djf)
|
||||
- [Transformers text classification](https://github.com/ricardorei/lightning-text-classification)
|
||||
- [VAE Library of over 18+ VAE flavors](https://github.com/AntixK/PyTorch-VAE)
|
||||
|
||||
## Tutorials
|
||||
- [Basic Lightning use](https://towardsdatascience.com/supercharge-your-ai-research-with-pytorch-lightning-337948a99eec)
|
||||
- [9 key speed features in Pytorch-Lightning](https://towardsdatascience.com/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565)
|
||||
- [SLURM, multi-node training with Lightning](https://towardsdatascience.com/trivial-multi-node-training-with-pytorch-lightning-ff75dfb809bd)
|
||||
Check out our [introduction guide](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) to get started.
|
||||
Or jump straight into [our tutorials](https://pytorch-lightning.readthedocs.io/en/latest/#tutorials).
|
||||
|
||||
---
|
||||
|
||||
|
@ -328,26 +280,24 @@ Welcome to the Lightning community!
|
|||
|
||||
If you have any questions, feel free to:
|
||||
1. [read the docs](https://pytorch-lightning.rtfd.io/en/latest/).
|
||||
2. [Search through the issues](https://github.com/PytorchLightning/pytorch-lightning/issues?utf8=%E2%9C%93&q=my++question).
|
||||
2. [Search through the issues](https://github.com/PytorchLightning/pytorch-lightning/issues?utf8=%E2%9C%93&q=my++question).
|
||||
3. [Ask on stackoverflow](https://stackoverflow.com/questions/ask?guided=false) with the tag pytorch-lightning.
|
||||
|
||||
If no one replies to you quickly enough, feel free to post the stackoverflow link to our Gitter chat!
|
||||
|
||||
To chat with the rest of us visit our [gitter channel](https://gitter.im/PyTorch-Lightning/community)!
|
||||
4. [Join our slack](https://join.slack.com/t/pytorch-lightning/shared_invite/enQtODU5ODIyNTUzODQwLTFkMDg5Mzc1MDBmNjEzMDgxOTVmYTdhYjA1MDdmODUyOTg2OGQ1ZWZkYTQzODhhNzdhZDA3YmNhMDhlMDY4YzQ).
|
||||
|
||||
---
|
||||
## FAQ
|
||||
**How do I use Lightning for rapid research?**
|
||||
[Here's a walk-through](https://pytorch-lightning.rtfd.io/en/latest/)
|
||||
[Here's a walk-through](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html)
|
||||
|
||||
**Why was Lightning created?**
|
||||
Lightning has 3 goals in mind:
|
||||
|
||||
1. Maximal flexibility while abstracting out the common boilerplate across research projects.
|
||||
2. Reproducibility. If all projects use the LightningModule template, it will be much much easier to understand what's going on and where to look! It will also mean every implementation follows a standard format.
|
||||
3. Democratizing PyTorch power user features. Distributed training? 16-bit? know you need them but don't want to take the time to implement? All good... these come built into Lightning.
|
||||
|
||||
**How does Lightning compare with Ignite and fast.ai?**
|
||||
[Here's a thorough comparison](https://medium.com/@_willfalcon/pytorch-lightning-vs-pytorch-ignite-vs-fast-ai-61dc7480ad8a).
|
||||
[Here's a thorough comparison](https://medium.com/@_willfalcon/pytorch-lightning-vs-pytorch-ignite-vs-fast-ai-61dc7480ad8a).
|
||||
|
||||
**Is this another library I have to learn?**
|
||||
Nope! We use pure Pytorch everywhere and don't add unecessary abstractions!
|
||||
|
|
Loading…
Reference in New Issue