diff --git a/README.md b/README.md index 2fd8cff0da..94926ffb6b 100644 --- a/README.md +++ b/README.md @@ -54,14 +54,28 @@ pip install pytorch-lightning ## What is it? Lightning is a way to organize your PyTorch code to decouple the science code from the engineering. It's more of a style-guide than a framework. -By refactoring your code, we can automate most of the non-research code. Lightning guarantees tested, correct, modern best practices for the automated parts. +To use Lightning, first refactor your research code into a [LightningModule](https://pytorch-lightning.readthedocs.io/en/latest/lightning-module.html). -Here's an example of how to organize PyTorch code into the LightningModule. +![PT to PL](docs/source/_images/lightning_module/pt_to_pl.png) -![PT to PL](docs/source/_images/mnist_imgs/pt_to_pl.jpg) +And Lightning automates the rest using the [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html)! +![PT to PL](docs/source/_images/lightning_module/pt_trainer.png) -- If you are a researcher, Lightning is infinitely flexible, you can modify everything down to the way .backward is called or distributed is set up. -- If you are a scientist or production team, lightning is very simple to use with best practice defaults. +Lightning guarantees riguously tested, correct, modern best practices for the automated parts. + +## How flexible is it? +As you see, you're just organizing your PyTorch code - there's no abstraction. + +And for the stuff that the Trainer abstracts out you can [override any part](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html#extensibility) you want to do things like implement your own distributed training, 16-bit precision, or even a custom backwards pass. + +For anything else you might need, we have an extensive [callback system](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html#callbacks) you can use to add arbitrary functionality not implemented by our team in the Trainer. + +## Who is Lightning for? +- Professional researchers +- PhD students +- Corporate production teams + +If you're just getting into deep learning, we recommend you learn PyTorch first! Once you've implemented a few models, come back and use all the advanced features of Lightning :) ## What does lightning control for me? @@ -71,18 +85,23 @@ This is how lightning separates the science (red) from the engineering (blue). ![Overview](docs/source/_static/images/pl_overview.gif) ## How much effort is it to convert? -You're probably tired of switching frameworks at this point. But it is a very quick process to refactor into the Lightning format (ie: hours). [Check out this tutorial](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09). +If your code is not a huge mess you should be able to organize it into a LightningModule in less than 1 hour. +If your code IS a mess, then you needed to clean up anyhow ;) + +[Check out this step-by-step guide](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09). -## What are the differences with PyTorch? -If you're wondering what you gain out of refactoring your PyTorch code, [read this comparison!](https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09) ## Starting a new project? [Use our seed-project aimed at reproducibility!](https://github.com/PytorchLightning/pytorch-lightning-conference-seed) ## Why do I want to use lightning? -Every research project starts the same, a model, a training loop, validation loop, etc. As your research advances, you're likely to need distributed training, 16-bit precision, checkpointing, gradient accumulation, etc. +Although your research/production project might start simple, once you add things like GPU AND TPU training, 16-bit precision, etc, you end up spending more time engineering than researching. Lightning automates AND rigorously tests those parts for you. -Lightning sets up all the boilerplate state-of-the-art training for you so you can focus on the research. +## Support +- [7 core contributors](https://pytorch-lightning.readthedocs.io/en/latest/governance.html) who are all a mix of professional engineers, Research Scientists, PhD students from top AI labs. +- 100+ community contributors. + +Lightning is also part of the [PyTorch ecosystem](https://pytorch.org/ecosystem/) which requires projects to have solid testing, documentation and support. --- @@ -102,25 +121,19 @@ Lightning sets up all the boilerplate state-of-the-art training for you so you c --- -## How do I do use it? -Think about Lightning as refactoring your research code instead of using a new framework. The research code goes into a [LightningModule](https://pytorch-lightning.rtfd.io/en/latest/lightning-module.html) which you fit using a Trainer. +## Realistic example +Here's how you would organize a realistic PyTorch project into Lightning. -The LightningModule defines a *system* such as seq-2-seq, GAN, etc... It can ALSO define a simple classifier such as the example below. +![PT to PL](docs/source/_images/mnist_imgs/pt_to_pl.jpg) -To use lightning do 2 things: -1. [Define a LightningModule](https://pytorch-lightning.rtfd.io/en/latest/lightning-module.html) - ```python - import os - - import torch - from torch.nn import functional as F - from torch.utils.data import DataLoader - from torchvision.datasets import MNIST - from torchvision import transforms - - import pytorch_lightning as pl - - class CoolSystem(pl.LightningModule): +The LightningModule defines a *system* such as seq-2-seq, GAN, etc... +It can ALSO define a simple classifier. + +In summary, you: + +1. Define a [LightningModule](https://pytorch-lightning.rtfd.io/en/latest/lightning-module.html) +```python + class LitSystem(pl.LightningModule): def __init__(self): super(CoolSystem, self).__init__() @@ -129,102 +142,29 @@ To use lightning do 2 things: def forward(self, x): return torch.relu(self.l1(x.view(x.size(0), -1))) - - def training_step(self, batch, batch_idx): - # REQUIRED - x, y = batch - y_hat = self.forward(x) - loss = F.cross_entropy(y_hat, y) - tensorboard_logs = {'train_loss': loss} - return {'loss': loss, 'log': tensorboard_logs} - - def validation_step(self, batch, batch_idx): - # OPTIONAL - x, y = batch - y_hat = self.forward(x) - return {'val_loss': F.cross_entropy(y_hat, y)} - - def validation_end(self, outputs): - # OPTIONAL - avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean() - tensorboard_logs = {'val_loss': avg_loss} - return {'avg_val_loss': avg_loss, 'log': tensorboard_logs} - def test_step(self, batch, batch_idx): - # OPTIONAL - x, y = batch - y_hat = self.forward(x) - return {'test_loss': F.cross_entropy(y_hat, y)} - - def test_end(self, outputs): - # OPTIONAL - avg_loss = torch.stack([x['test_loss'] for x in outputs]).mean() - tensorboard_logs = {'test_loss': avg_loss} - return {'avg_test_loss': avg_loss, 'log': tensorboard_logs} - - def configure_optimizers(self): - # REQUIRED - # can return multiple optimizers and learning_rate schedulers - # (LBFGS it is automatically supported, no need for closure function) - return torch.optim.Adam(self.parameters(), lr=0.02) - - @pl.data_loader - def train_dataloader(self): - # REQUIRED - return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32) - - @pl.data_loader - def val_dataloader(self): - # OPTIONAL - return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32) - - @pl.data_loader - def test_dataloader(self): - # OPTIONAL - return DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=transforms.ToTensor()), batch_size=32) - ``` -2. Fit with a [trainer](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.html) - ```python - from pytorch_lightning import Trainer - - model = CoolSystem() - - # most basic trainer, uses good defaults - trainer = Trainer() - trainer.fit(model) - ``` - -Trainer sets up a tensorboard logger, early stopping and checkpointing by default (you can modify all of them or -use something other than tensorboard). - -Here are more advanced examples -```python -# train on cpu using only 10% of the data (for demo purposes) -trainer = Trainer(max_epochs=1, train_percent_check=0.1) - -# train on 4 gpus (lightning chooses GPUs for you) -# trainer = Trainer(max_epochs=1, gpus=4, distributed_backend='ddp') - -# train on 4 gpus (you choose GPUs) -# trainer = Trainer(max_epochs=1, gpus=[0, 1, 3, 7], distributed_backend='ddp') - -# train on 32 gpus across 4 nodes (make sure to submit appropriate SLURM job) -# trainer = Trainer(max_epochs=1, gpus=8, num_gpu_nodes=4, distributed_backend='ddp') - -# train (1 epoch only here for demo) -trainer.fit(model) - -# view tensorboard logs -logging.info(f'View tensorboard logs by running\ntensorboard --logdir {os.getcwd()}') -logging.info('and going to http://localhost:6006 on your browser') + def training_step(self, batch, batch_idx): + ... ``` -When you're all done you can even run the test set separately. -```python -trainer.test() -``` +2. Fit it with a [Trainer](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.html) + ```python + from pytorch_lightning import Trainer -**Could be as complex as seq-2-seq + attention** + model = CoolSystem() + + # most basic trainer, uses good defaults + trainer = Trainer() + trainer.fit(model) + ``` + +[Check out the COLAB demo here](https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=HOk9c4_35FKg) + +## What types of research works? +Anything! Remember, that this is just organized PyTorch code. +The Training step defines the core complexity found in the training loop. + +#### Could be as complex as a seq2seq ```python # define what happens for training here @@ -251,7 +191,7 @@ def training_step(self, batch, batch_idx): return {'loss': loss} ``` -**Or as basic as CNN image classification** +#### Or as basic as CNN image classification ```python # define what happens for validation here @@ -264,62 +204,74 @@ def validation_step(self, batch, batch_idx): return {'loss': loss} ``` -**And you also decide how to collate the output of all validation steps** - -```python -def validation_epoch_end(self, outputs): - """ - Called at the end of validation to aggregate outputs - :param outputs: list of individual outputs of each validation step - :return: - """ - val_loss_mean = 0 - val_acc_mean = 0 - for output in outputs: - val_loss_mean += output['val_loss'] - val_acc_mean += output['val_acc'] - - val_loss_mean /= len(outputs) - val_acc_mean /= len(outputs) - logs = {'val_loss': val_loss_mean.item(), 'val_acc': val_acc_mean.item()} - result = {'log': logs} - return result +And without changing a single line of code, you could run on CPUs +```python +trainer = Trainer(max_epochs=1) ``` - -## Tensorboard -Lightning is fully integrated with tensorboard, MLFlow and supports any logging module. + + +Or GPUs +```python +# 8 GPUs +trainer = Trainer(max_epochs=1, gpus=8) + +# 256 GPUs +trainer = Trainer(max_epochs=1, gpus=8, num_nodes=32) +``` + +Or TPUs +```python +trainer = Trainer(num_tpu_cores=8) +``` + +When you're done training, run the test accuracy +```python +trainer.test() +``` + +## Visualization +Lightning has out-of-the-box integration with the popular logging/visualizing frameworks + +- Tensorboard +- MLFlow +- Neptune.ai +- Comet.ml +- ... ![tensorboard-support](docs/source/_static/images/tf_loss.png) -Lightning also adds a text column with all the hyperparameters for this experiment. -![tensorboard-support](docs/source/_static/images/tf_tags.png) - -## Lightning automates all of the following ([each is also configurable](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.html)): - - -- [Running grid search on a cluster](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.distrib_data_parallel.html) -- [Fast dev run](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.utilities.debugging.html) -- [Logging](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.loggers.html) -- [Implement Your Own Distributed (DDP) training](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.core.lightning.html#pytorch_lightning.core.lightning.LightningModule.configure_ddp) -- [Multi-GPU & Multi-node](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.distrib_parts.html) -- [Training loop](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.training_loop.html) -- [Hooks](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.core.hooks.html) -- [Configure optimizers](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.core.lightning.html#pytorch_lightning.core.lightning.LightningModule.configure_optimizers) -- [Validations](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.evaluation_loop.html) -- [Model saving & Restoring training session](https://pytorch-lightning.rtfd.io/en/latest/pytorch_lightning.trainer.training_io.html) +## Lightning automates 40+ parts of DL/ML research +- GPU training +- Distributed GPU (cluster) training +- TPU training +- EarlyStopping +- Logging/Visualizing +- Checkpointing +- Experiment management +- [Full list here](https://pytorch-lightning.readthedocs.io/en/latest/#common-use-cases) ## Examples -- [GAN](https://github.com/PytorchLightning/pytorch-lightning/tree/master/pl_examples/domain_templates/gan.py) -- [MNIST](https://github.com/PytorchLightning/pytorch-lightning/tree/master/pl_examples/basic_examples) -- [Other projects using Lightning](https://github.com/PytorchLightning/pytorch-lightning/network/dependents?package_id=UGFja2FnZS0zNzE3NDU4OTM%3D) -- [Multi-node](https://github.com/PytorchLightning/pytorch-lightning/tree/master/pl_examples/multi_node_examples) +Check out this awesome list of research papers and implementations done with Lightning. + +- [Contextual Emotion Detection (DoubleDistilBert)](https://github.com/PyTorchLightning/emotion_transformer) +- [Generative Adversarial Network](https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=TyYOdg8g77P0) +- [Hyperparameter optimization with Optuna](https://github.com/optuna/optuna/blob/master/examples/pytorch_lightning_simple.py) +- [Image Inpainting using Partial Convolutions](https://github.com/ryanwongsa/Image-Inpainting) +- [MNIST on TPU](https://colab.research.google.com/drive/1-_LKx4HwAxl5M6xPJmqAAu444LTDQoa3#scrollTo=BHBz1_AnamN_) +- [NER (transformers, TPU, huggingface)](https://colab.research.google.com/drive/1dBN-wwYUngLYVt985wGs_OKPlK_ANB9D) +- [NeuralTexture (CVPR)](https://github.com/PyTorchLightning/neuraltexture) +- [Recurrent Attentive Neural Process](https://github.com/PyTorchLightning/attentive-neural-processes) +- [Siamese Nets for One-shot Image Recognition](https://github.com/PyTorchLightning/Siamese-Neural-Networks) +- [Speech Transformers](https://github.com/PyTorchLightning/speech-transformer-pytorch_lightning) +- [Transformers transfer learning (Huggingface)](https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=yr7eaxkF-djf) +- [Transformers text classification](https://github.com/ricardorei/lightning-text-classification) +- [VAE Library of over 18+ VAE flavors](https://github.com/AntixK/PyTorch-VAE) ## Tutorials -- [Basic Lightning use](https://towardsdatascience.com/supercharge-your-ai-research-with-pytorch-lightning-337948a99eec) -- [9 key speed features in Pytorch-Lightning](https://towardsdatascience.com/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565) -- [SLURM, multi-node training with Lightning](https://towardsdatascience.com/trivial-multi-node-training-with-pytorch-lightning-ff75dfb809bd) +Check out our [introduction guide](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) to get started. +Or jump straight into [our tutorials](https://pytorch-lightning.readthedocs.io/en/latest/#tutorials). --- @@ -328,26 +280,24 @@ Welcome to the Lightning community! If you have any questions, feel free to: 1. [read the docs](https://pytorch-lightning.rtfd.io/en/latest/). -2. [Search through the issues](https://github.com/PytorchLightning/pytorch-lightning/issues?utf8=%E2%9C%93&q=my++question). +2. [Search through the issues](https://github.com/PytorchLightning/pytorch-lightning/issues?utf8=%E2%9C%93&q=my++question). 3. [Ask on stackoverflow](https://stackoverflow.com/questions/ask?guided=false) with the tag pytorch-lightning. - -If no one replies to you quickly enough, feel free to post the stackoverflow link to our Gitter chat! - -To chat with the rest of us visit our [gitter channel](https://gitter.im/PyTorch-Lightning/community)! +4. [Join our slack](https://join.slack.com/t/pytorch-lightning/shared_invite/enQtODU5ODIyNTUzODQwLTFkMDg5Mzc1MDBmNjEzMDgxOTVmYTdhYjA1MDdmODUyOTg2OGQ1ZWZkYTQzODhhNzdhZDA3YmNhMDhlMDY4YzQ). --- ## FAQ **How do I use Lightning for rapid research?** -[Here's a walk-through](https://pytorch-lightning.rtfd.io/en/latest/) +[Here's a walk-through](https://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html) **Why was Lightning created?** Lightning has 3 goals in mind: + 1. Maximal flexibility while abstracting out the common boilerplate across research projects. 2. Reproducibility. If all projects use the LightningModule template, it will be much much easier to understand what's going on and where to look! It will also mean every implementation follows a standard format. 3. Democratizing PyTorch power user features. Distributed training? 16-bit? know you need them but don't want to take the time to implement? All good... these come built into Lightning. **How does Lightning compare with Ignite and fast.ai?** -[Here's a thorough comparison](https://medium.com/@_willfalcon/pytorch-lightning-vs-pytorch-ignite-vs-fast-ai-61dc7480ad8a). +[Here's a thorough comparison](https://medium.com/@_willfalcon/pytorch-lightning-vs-pytorch-ignite-vs-fast-ai-61dc7480ad8a). **Is this another library I have to learn?** Nope! We use pure Pytorch everywhere and don't add unecessary abstractions!