Create 3 steps to Lightning guide to replace quick-start (#3055)

* Update new-project.rst * Update new-project.rst * Create 3_steps.rst * revert * remove the callbacks vid * fix blank line * change ref * spelling * spelling * Update docs/source/new-project.rst Co-authored-by: Nathan Raw <nxr9266@g.rit.edu> * spelling * spelling * spelling * spelling * spelling * spelling * spelling Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Nathan Raw <nxr9266@g.rit.edu>
2020-08-19 21:22:39 -04:00 · 2020-08-19 21:22:39 -04:00 · 6abd742145
parent 3453bba898
commit 6abd742145
3 changed files with 532 additions and 687 deletions
--- a/docs/source/3_steps.rst
+++ b/docs/source/3_steps.rst
@ -0,0 +1,531 @@
+.. testsetup:: *
+
+    from pytorch_lightning.core.lightning import LightningModule
+    from pytorch_lightning.core.datamodule import LightningDataModule
+    from pytorch_lightning.trainer.trainer import Trainer
+    import os
+    import torch
+    from torch.nn import functional as F
+    from torch.utils.data import DataLoader
+    from torch.utils.data import DataLoader
+    import pytorch_lightning as pl
+    from torch.utils.data import random_split
+
+.. _3-steps:
+
+####################
+Lightning in 3 steps
+####################
+
+**In this guide we'll show you how to organize your PyTorch code into Lightning in 3 simple steps.**
+
+Organizing your code with PyTorch Lightning makes your code:
+
+* Keep all the flexibility (this is all pure PyTorch), but removes a ton of boilerplate
+* More readable by decoupling the research code from the engineering
+* Easier to reproduce
+* Less error prone by automating most of the training loop and tricky engineering
+* Scalable to any hardware without changing your model
+
+----------
+
+Here's a 2 minute conversion guide for PyTorch projects:
+
+.. raw:: html
+
+    <video width="100%" controls autoplay src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pl_quick_start_full.m4v"></video>
+
+----------
+
+*********************************
+Step 0: Install PyTorch Lightning
+*********************************
+
+
+You can install using `pip <https://pypi.org/project/pytorch-lightning/>`_ 
+
+.. code-block:: bash
+
+    pip install pytorch-lightning
+    
+Or with `conda <https://anaconda.org/conda-forge/pytorch-lightning>`_ (see how to install conda `here <https://docs.conda.io/projects/conda/en/latest/user-guide/install/>`_):
+
+.. code-block:: bash
+
+    conda install pytorch-lightning -c conda-forge
+
+You could also use conda environments
+
+.. code-block:: bash
+
+    conda activate my_env
+    pip install pytorch-lightning
+
+
+----------
+
+******************************
+Step 1: Define LightningModule
+******************************
+
+.. code-block::
+
+    import os
+    import torch
+    import torch.nn.functional as F
+    from torchvision.datasets import MNIST
+    from torchvision import transforms
+    from torch.utils.data import DataLoader
+    import pytorch_lightning as pl
+    from torch.utils.data import random_split
+
+    class LitModel(pl.LightningModule):
+
+        def __init__(self):
+            super().__init__()
+            self.layer_1 = torch.nn.Linear(28 * 28, 128)
+            self.layer_2 = torch.nn.Linear(128, 10)
+
+        def forward(self, x):
+            x = x.view(x.size(0), -1)
+            x = self.layer_1(x)
+            x = F.relu(x)
+            x = self.layer_2(x)
+            return x
+            
+        def configure_optimizers(self):
+            optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
+            return optimizer
+ 
+        def training_step(self, batch, batch_idx):
+            x, y = batch
+            y_hat = self(x)
+            loss = F.cross_entropy(y_hat, y)
+            result = pl.TrainResult(loss)
+            return result
+            
+        def validation_step(self, batch, batch_idx):
+            x, y = batch
+            y_hat = self(x)
+            loss = F.cross_entropy(y_hat, y)
+            result = pl.EvalResult(checkpoint_on=loss)
+            result.log('val_loss', loss)
+            return result
+
+        def test_step(self, batch, batch_idx):
+            x, y = batch
+            y_hat = self(x)
+            loss = F.cross_entropy(y_hat, y)
+            result = pl.EvalResult()
+            result.log('test_loss', loss)
+            return result
+
+The :class:`~pytorch_lightning.core.LightningModule` holds your research code:
+
+- The Train loop
+- The Validation loop
+- The Test loop
+- The Model + system architecture
+- The Optimizer
+
+A :class:`~pytorch_lightning.core.LightningModule` is a :class:`torch.nn.Module` but with added functionality.
+It organizes your research code into :ref:`hooks`.
+            
+In the snippet above we override the basic hooks, but a full list of hooks to customize can be found under :ref:`hooks`.
+
+You can use your :class:`~pytorch_lightning.core.LightningModule` just like a PyTorch model.
+
+.. code-block:: python
+
+    model = LitModel()
+    model.eval()
+
+    y_hat = model(x)
+
+    model.anything_you_can_do_with_pytorch()
+    
+More details in :ref:`lightning-module` docs.
+
+Convert your PyTorch Module to Lightning
+========================================
+
+1. Move your computational code
+-------------------------------
+Move the model architucture and forward pass to your :class:`~pytorch_lightning.core.LightningModule`.
+
+.. code-block::
+
+    class LitModel(pl.LightningModule):
+
+        def __init__(self):
+            super().__init__()
+            self.layer_1 = torch.nn.Linear(28 * 28, 128)
+            self.layer_2 = torch.nn.Linear(128, 10)
+
+        def forward(self, x):
+            x = x.view(x.size(0), -1)
+            x = self.layer_1(x)
+            x = F.relu(x)
+            x = self.layer_2(x)
+            return x
+            
+2. Move the optimizer(s) and schedulers
+---------------------------------------
+Move your optimizers to :func:`pytorch_lightning.core.LightningModule.configure_optimizers` hook. Make sure to use the hook parameters (self in this case).
+
+.. code-block::
+
+    class LitModel(pl.LightningModule):
+
+        def configure_optimizers(self):
+            optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
+            return optimizer
+            
+3. Find the train loop "meat"
+-----------------------------
+Lightning automates most of the trining for you, the epoch and batch iterations, all you need to keep is the training step logic. This should go into :func:`pytorch_lightning.core.LightningModule.training_step` hook (make sure to use the hook parameters, self in this case):
+
+.. code-block::
+
+    class LitModel(pl.LightningModule):
+
+        def training_step(self, batch, batch_idx):
+            x, y = batch
+            y_hat = self(x)
+            loss = F.cross_entropy(y_hat, y)
+            return loss
+
+4. Find the val loop "meat"
+-----------------------------
+Lightning automates the validation (enabling gradients in the train loop and disabling in eval). To add an (optional) validation loop add logic to :func:`pytorch_lightning.core.LightningModule.validation_step` hook (make sure to use the hook parameters, self in this case):
+
+.. testcode::
+
+    class LitModel(LightningModule):
+
+        def validation_step(self, batch, batch_idx):
+            x, y = batch
+            y_hat = self(x)
+            val_loss = F.cross_entropy(y_hat, y)
+            return val_loss
+            
+5. Find the test loop "meat"
+-----------------------------
+You might also need an optional test loop. Add the following callback to your :class:`~pytorch_lightning.core.LightningModule`
+
+.. code-block::
+
+    class LitModel(pl.LightningModule):
+
+        def test_step(self, batch, batch_idx):
+            x, y = batch
+            y_hat = self(x)
+            loss = F.cross_entropy(y_hat, y)
+            result = pl.EvalResult()
+            result.log('test_loss', loss)
+            return result
+
+.. note:: The test loop is not automated in Lightning. You will need to specifically call test (this is done so you don't use the test set by mistake).
+
+6. Remove any .cuda() or to.device() calls
+------------------------------------------
+Your :class:`~pytorch_lightning.core.LightningModule` can automatically run on any hardware!
+
+7. Wrap loss in a TrainResult/EvalResult
+----------------------------------------
+Instead of returning the loss you can also use :class:`~pytorch_lightning.core.step_result.TrainResult` and :class:`~pytorch_lightning.core.step_result.EvalResult`, plain Dict objects that give you options for logging on every step and/or at the end of the epoch.
+It also allows logging to the progress bar (by setting prog_bar=True). Read more in :ref:`result`.
+
+.. code-block::
+
+    class LitModel(pl.LightningModule):
+
+        def training_step(self, batch, batch_idx):
+            x, y = batch
+            y_hat = self(x)
+            loss = F.cross_entropy(y_hat, y)
+            result = pl.TrainResult(loss)
+            # Add logging to progress bar (note that efreshing the progress bar too frequently
+            # in Jupyter notebooks or Colab may freeze your UI) 
+            result.log('train_loss', loss, prog_bar=True)
+            return result
+            
+        def validation_step(self, batch, batch_idx):
+            x, y = batch
+            y_hat = self(x)
+            loss = F.cross_entropy(y_hat, y)
+            # Checkpoint model based on validation loss
+            result = pl.EvalResult(checkpoint_on=loss)
+            result.log('val_loss', loss)
+            return result
+
+            
+8. Override default callbacks
+-----------------------------
+A :class:`~pytorch_lightning.core.LightningModule` handles advances cases by allowing you to override any critical part of training
+via :ref:`hooks` that are called on your :class:`~pytorch_lightning.core.LightningModule`.
+
+.. code-block::
+
+    class LitModel(pl.LightningModule):
+
+        def backward(self, trainer, loss, optimizer, optimizer_idx):
+            loss.backward()
+            
+        def optimizer_step(self, epoch, batch_idx,
+                           optimizer, optimizer_idx,
+                           second_order_closure,
+                           on_tpu, using_native_amp, using_lbfgs):
+            optimizer.step()
+            
+For certain train/val/test loops, you may wish to do more than just logging. In this case,
+you can also implement `__epoch_end` which gives you the output for each step
+
+Here's the motivating Pytorch example:
+
+.. code-block:: python
+
+    validation_step_outputs = []
+    for batch_idx, batch in val_dataloader():
+        out = validation_step(batch, batch_idx)
+        validation_step_outputs.append(out)
+
+    validation_epoch_end(validation_step_outputs)
+
+And the lightning equivalent
+
+.. code-block::
+
+    class LitModel(pl.LightningModule):
+    
+        def validation_step(self, batch, batch_idx):
+            loss = ...
+            predictions = ...
+            result = pl.EvalResult(checkpoint_on=loss)
+            result.log('val_loss', loss)
+            result.predictions = predictions
+
+         def validation_epoch_end(self, validation_step_outputs):
+            all_val_losses = validation_step_outputs.val_loss
+            all_predictions = validation_step_outputs.predictions
+
+----------
+
+**********************************
+Step 2: Fit with Lightning Trainer
+**********************************
+
+.. code-block::
+
+    # dataloaders
+    dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
+    train, val = random_split(dataset, [55000, 5000])
+    train_loader = DataLoader(train)
+    val_loader = DataLoader(val)
+
+    # init model
+    model = LitModel()
+
+    # most basic trainer, uses good defaults (auto-tensorboard, checkpoints, logs, and more)
+    trainer = pl.Trainer()
+    trainer.fit(model, train_loader, val_loader)
+
+Init :class:`~pytorch_lightning.core.LightningModule`, your PyTorch dataloaders, and then the PyTorch Lightning :class:`~pytorch_lightning.trainer.Trainer`.
+The :class:`~pytorch_lightning.trainer.Trainer` will automate:
+
+* The epoch iteration
+* The batch iteration
+* The calling of optimizer.step()
+* :ref:`weights-loading`
+* Logging to Tensorboard (see :ref:`loggers` options)
+* :ref:`multi-gpu-training` support
+* :ref:`tpu`
+* :ref:`16-bit` support
+
+All automated code is rigorously tested and benchmarked.
+
+Check out more flags in the :ref:`trainer` docs.
+
+Using CPUs/GPUs/TPUs
+====================
+It's trivial to use CPUs, GPUs or TPUs in Lightning. There's NO NEED to change your code, simply change the :class:`~pytorch_lightning.trainer.Trainer` options.
+
+.. code-block:: python
+
+  # train on 1024 CPUs across 128 machines
+    trainer = pl.Trainer(
+        num_processes=8,
+        num_nodes=128
+    )
+
+.. code-block:: python
+
+    # train on 1 GPU
+    trainer = pl.Trainer(gpus=1)
+
+.. code-block:: python
+
+    # train on 256 GPUs
+    trainer = pl.Trainer(
+        gpus=8,
+        num_nodes=32
+    )
+
+.. code-block:: python
+
+    # Multi GPU with mixed precision
+    trainer = pl.Trainer(gpus=2, precision=16)
+
+.. code-block:: python
+
+    # Train on TPUs
+    trainer = pl.Trainer(tpu_cores=8)
+
+Without changing a SINGLE line of your code, you can now do the following with the above code:
+
+.. code-block:: python
+
+    # train on TPUs using 16 bit precision with early stopping
+    # using only half the training data and checking validation every quarter of a training epoch
+    trainer = pl.Trainer(
+        tpu_cores=8,
+        precision=16,
+        early_stop_callback=True,
+        limit_train_batches=0.5,
+        val_check_interval=0.25
+    )
+    
+************************
+Step 3: Define Your Data
+************************
+Lightning works with pure PyTorch DataLoaders
+
+.. code-block:: python
+
+    train_dataloader = DataLoader(...)
+    val_dataloader = DataLoader(...)
+    trainer.fit(model, train_dataloader, val_dataloader)
+
+Optional: DataModule
+====================
+DataLoader and data processing code tends to end up scattered around.
+Make your data code more reusable by organizing
+it into a :class:`~pytorch_lightning.core.datamodule.LightningDataModule`
+
+.. code-block:: python
+
+  class MNISTDataModule(pl.LightningDataModule):
+
+        def __init__(self, batch_size=32):
+            super().__init__()
+            self.batch_size = batch_size
+        
+        # When doing distributed training, Datamodules have two optional arguments for
+        # granular control over download/prepare/splitting data: 
+
+        # OPTIONAL, called only on 1 GPU/machine
+        def prepare_data(self):
+            MNIST(os.getcwd(), train=True, download=True)
+            MNIST(os.getcwd(), train=False, download=True)
+
+        # OPTIONAL, called for every GPU/machine (assigning state is OK)
+        def setup(self, stage):
+            # transforms
+            transform=transforms.Compose([
+                transforms.ToTensor(),
+                transforms.Normalize((0.1307,), (0.3081,))
+            ])
+            # split dataset
+            if stage == 'fit':
+                mnist_train = MNIST(os.getcwd(), train=True, transform=transform)
+                self.mnist_train, self.mnist_val = random_split(mnist_train, [55000, 5000])
+            if stage == 'test':
+                mnist_test = MNIST(os.getcwd(), train=False, transform=transform)
+                self.mnist_test = MNIST(os.getcwd(), train=False, download=True)
+
+        # return the dataloader for each split
+        def train_dataloader(self):
+            mnist_train = DataLoader(self.mnist_train, batch_size=self.batch_size)
+            return mnist_train
+
+        def val_dataloader(self):
+            mnist_val = DataLoader(self.mnist_val, batch_size=self.batch_size)
+            return mnist_val
+        
+        def test_dataloader(self):
+            mnist_test = DataLoader(mnist_test, batch_size=self.batch_size)
+            return mnist_test
+
+:class:`~pytorch_lightning.core.datamodule.LightningDataModule` is designed to enable sharing and reusing data splits
+and transforms across different projects. It encapsulates all the steps needed to process data: downloading,
+tokenizeing, processing etc.
+
+Now you can simply pass your :class:`~pytorch_lightning.core.datamodule.LightningDataModule` to
+the :class:`~pytorch_lightning.trainer.Trainer`:
+
+.. code-block::
+
+    # init model
+    model = LitModel()
+
+    # init data
+    dm = MNISTDataModule()
+
+    # train
+    trainer = pl.Trainer()
+    trainer.fit(model, dm)
+
+    # test
+    trainer.test(datamodule=dm)
+
+DataModules are specifically useful for building models based on data. Read more on :ref:`data-modules`.
+
+**********
+Learn more
+**********
+
+That's it! Once you build your module, data, and call trainer.fit(), Lightning trainer calls each loop at the correct time as needed.
+
+You can then boot up your logger or tensorboard instance to view training logs
+
+.. code-block:: bash
+
+    tensorboard --logdir ./lightning_logs
+ 
+---------------
+
+
+Advanced Lightning Features
+===========================
+
+Once you define and train your first Lightning model, you might want to try other cool features like
+
+- :ref:`loggers`
+- `Automatic checkpointing <https://pytorch-lightning.readthedocs.io/en/stable/weights_loading.html>`_
+- `Automatic early stopping <https://pytorch-lightning.readthedocs.io/en/stable/early_stopping.html>`_
+- `Add custom callbacks <https://pytorch-lightning.readthedocs.io/en/stable/callbacks.html>`_ (self-contained programs that can be reused across projects)
+- `Dry run mode <https://pytorch-lightning.readthedocs.io/en/stable/debugging.html#fast-dev-run>`_ (Hit every line of your code once to see if you have bugs, instead of waiting hours to crash on validation ;)
+- `Automatically overfit your model for a sanity test <https://pytorch-lightning.readthedocs.io/en/stable/debugging.html?highlight=overfit#make-model-overfit-on-subset-of-data>`_
+- `Automatic truncated-back-propagation-through-time <https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.trainer.training_loop.html?highlight=truncated#truncated-backpropagation-through-time>`_
+- `Automatically scale your batch size <https://pytorch-lightning.readthedocs.io/en/stable/training_tricks.html?highlight=batch%20size#auto-scaling-of-batch-size>`_
+- `Automatically find a good learning rate <https://pytorch-lightning.readthedocs.io/en/stable/lr_finder.html>`_
+- `Load checkpoints directly from S3 <https://pytorch-lightning.readthedocs.io/en/stable/weights_loading.html#checkpoint-loading>`_
+- `Profile your code for speed/memory bottlenecks <https://pytorch-lightning.readthedocs.io/en/stable/profiler.html>`_
+- `Scale to massive compute clusters <https://pytorch-lightning.readthedocs.io/en/stable/slurm.html>`_
+- `Use multiple dataloaders per train/val/test loop <https://pytorch-lightning.readthedocs.io/en/stable/multiple_loaders.html>`_
+- `Use multiple optimizers to do Reinforcement learning or even GANs <https://pytorch-lightning.readthedocs.io/en/stable/optimizers.html?highlight=multiple%20optimizers#use-multiple-optimizers-like-gans>`_
+
+Or read our :ref:`introduction-guide` to learn more!
+
+-------------
+
+Masterclass
+===========
+
+Go pro by tunning in to our Masterclass! New episodes every week.
+
+.. image:: _images/general/PTL101_youtube_thumbnail.jpg
+    :width: 500
+    :align: center
+    :alt: Masterclass
+    :target: https://www.youtube.com/playlist?list=PLaMu-SDt_RB5NUm67hU2pdE75j6KaIOv2
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -11,7 +11,7 @@ PyTorch Lightning Documentation
   :name: start
   :caption: Start Here

-   new-project
+   3_steps
   introduction_guide
   performance

--- a/docs/source/new-project.rst
+++ b/docs/source/new-project.rst
@ -1,686 +0,0 @@
-.. testsetup:: *
-
-    from pytorch_lightning.core.lightning import LightningModule
-    from pytorch_lightning.core.datamodule import LightningDataModule
-    from pytorch_lightning.trainer.trainer import Trainer
-    import os
-    import torch
-    from torch.nn import functional as F
-    from torch.utils.data import DataLoader
-    from torch.utils.data import DataLoader
-    import pytorch_lightning as pl
-    from torch.utils.data import random_split
-
-.. _quick-start:
-
-Quick Start
-===========
-
-PyTorch Lightning is nothing more than organized PyTorch code.
-
-Once you've organized it into a LightningModule, it automates most of the training for you.
-
-Here's a 2 minute conversion guide for PyTorch projects:
-
-.. raw:: html
-
-    <video width="100%" controls autoplay src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pl_quick_start_full.m4v"></video>
-
----------
-
-Step 1: Build LightningModule
-----------------------------
-A lightningModule defines
-
- Train loop
- Val loop
- Test loop
- Model + system architecture
- Optimizer
-
-.. code-block::
-
-    import os
-    import torch
-    import torch.nn.functional as F
-    from torchvision.datasets import MNIST
-    from torchvision import transforms
-    from torch.utils.data import DataLoader
-    import pytorch_lightning as pl
-    from torch.utils.data import random_split
-
-    class LitModel(pl.LightningModule):
-
-        def __init__(self):
-            super().__init__()
-            self.l1 = torch.nn.Linear(28 * 28, 10)
-
-        def forward(self, x):
-            return torch.relu(self.l1(x.view(x.size(0), -1)))
-
-        def training_step(self, batch, batch_idx):
-            x, y = batch
-            y_hat = self(x)
-            loss = F.cross_entropy(y_hat, y)
-            return loss
-
-        def configure_optimizers(self):
-            return torch.optim.Adam(self.parameters(), lr=0.0005)
-
----------
-
-Step 2: Fit with a Trainer
--------------------------
-The trainer calls each loop at the correct time as needed. It also ensures it all works
-well across any accelerator.
-
-.. raw:: html
-
-    <video width="100%" controls autoplay src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pt_trainer_mov.m4v"></video>
-
-|
-
-Here's an example of using the Trainer:
-
-.. code-block::
-
-    # dataloader
-    dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
-    train_loader = DataLoader(dataset)
-
-    # init model
-    model = LitModel()
-
-    # most basic trainer, uses good defaults (auto-tensorboard, checkpoints, logs, and more)
-    trainer = pl.Trainer()
-    trainer.fit(model, train_loader)
-
-Using GPUs/TPUs
-^^^^^^^^^^^^^^^
-It's trivial to use GPUs or TPUs in Lightning. There's NO NEED to change your code, simply change the Trainer options.
-
-.. code-block:: python
-
-    # train on 1, 2, 4, n GPUs
-    Trainer(gpus=1)
-    Trainer(gpus=2)
-    Trainer(gpus=8, num_nodes=n)
-
-    # train on TPUs
-    Trainer(tpu_cores=8)
-    Trainer(tpu_cores=128)
-
-    # even half precision
-    Trainer(gpus=2, precision=16)
-
-The code above gives you the following for free:
-
- Automatic checkpoints
- Automatic Tensorboard (or the logger of your choice)
- Automatic CPU/GPU/TPU training
- Automatic 16-bit precision
-
-All of it 100% rigorously tested and benchmarked
-
--------------
-
-Lightning under the hood
-^^^^^^^^^^^^^^^^^^^^^^^^
-Lightning is designed for state of the art research ideas by researchers and research engineers from top labs.
-
-A LightningModule handles advances cases by allowing you to override any critical part of training
-via hooks that are called on your LightningModule.
-
-.. raw:: html
-
-    <video width="100%" controls autoplay src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pt_callbacks_mov.m4v"></video>
-
----------------
-
-Training loop under the hood
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-This is the training loop pseudocode that lightning does under the hood:
-
-.. code-block:: python
-
-    # init model
-    model = LitModel()
-
-    # enable training
-    torch.set_grad_enabled(True)
-    model.train()
-
-    # get data + optimizer
-    train_dataloader = model.train_dataloader()
-    optimizer = model.configure_optimizers()
-
-    for epoch in epochs:
-        for batch in train_dataloader:
-            # forward (TRAINING_STEP)
-            loss = model.training_step(batch)
-
-            # backward
-            loss.backward()
-
-            # apply and clear grads
-            optimizer.step()
-            optimizer.zero_grad()
-
-Main take-aways:
-
- Lightning sets .train() and enables gradients when entering the training loop.
- Lightning iterates over the epochs automatically.
- Lightning iterates the dataloaders automatically.
- Training_step gives you full control of the main loop.
- .backward(), .step(), .zero_grad() are called for you. BUT, you can override this if you need manual control.
-
----------
-
-Adding a Validation loop
------------------------
-To add an (optional) validation loop add the following function
-
-.. testcode::
-
-    class LitModel(LightningModule):
-
-        def validation_step(self, batch, batch_idx):
-            x, y = batch
-            y_hat = self(x)
-            loss = F.cross_entropy(y_hat, y)
-
-            result = pl.EvalResult(checkpoint_on=loss)
-            result.log('val_loss', loss)
-            return result
-
-.. note:: EvalResult is a plain Dict, with convenience functions for logging
-
-And now the trainer will call the validation loop automatically
-
-.. code-block:: python
-
-    # pass in the val dataloader to the trainer as well
-    trainer.fit(
-        model,
-        train_dataloader,
-        val_dataloader
-    )
-
-Validation loop under the hood
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Under the hood in pseudocode, lightning does the following:
-
-.. code-block:: python
-
-    # ...
-    for batch in train_dataloader:
-        loss = model.training_step()
-        loss.backward()
-        # ...
-
-        if validate_at_some_point:
-            # disable grads + batchnorm + dropout
-            torch.set_grad_enabled(False)
-            model.eval()
-
-            val_outs = []
-            for val_batch in model.val_dataloader:
-                val_out = model.validation_step(val_batch)
-                val_outs.append(val_out)
-            model.validation_epoch_end(val_outs)
-
-            # enable grads + batchnorm + dropout
-            torch.set_grad_enabled(True)
-            model.train()
-
-Lightning automatically:
-
- Enables gradients and sets model to train() in the train loop
- Disables gradients and sets model to eval() in val loop
- After val loop ends, enables gradients and sets model to train()
-
-------------
-
-Adding a Test loop
------------------
-You might also need an optional test loop
-
-.. testcode::
-
-    class LitModel(LightningModule):
-
-        def test_step(self, batch, batch_idx):
-            x, y = batch
-            y_hat = self(x)
-            loss = F.cross_entropy(y_hat, y)
-
-            result = pl.EvalResult()
-            result.log('test_loss', loss)
-            return result
-
-
-However, this time you need to specifically call test (this is done so you don't use the test set by mistake)
-
-.. code-block:: python
-
-    # OPTION 1:
-    # test after fit
-    trainer.fit(model)
-    trainer.test(test_dataloaders=test_dataloader)
-
-    # OPTION 2:
-    # test after loading weights
-    model = LitModel.load_from_checkpoint(PATH)
-    trainer = Trainer()
-    trainer.test(test_dataloaders=test_dataloader)
-
-Test loop under the hood
-^^^^^^^^^^^^^^^^^^^^^^^^
-Under the hood, lightning does the following in (pseudocode):
-
-.. code-block:: python
-
-    # disable grads + batchnorm + dropout
-    torch.set_grad_enabled(False)
-    model.eval()
-
-    test_outs = []
-    for test_batch in model.test_dataloader:
-        test_out = model.test_step(val_batch)
-        test_outs.append(test_out)
-
-    model.test_epoch_end(test_outs)
-
-    # enable grads + batchnorm + dropout
-    torch.set_grad_enabled(True)
-    model.train()
-
---------------
-
-Data
----
-Lightning operates on standard PyTorch Dataloaders (of any flavor). Use dataloaders in 3 ways.
-
-Data in fit
-^^^^^^^^^^^
-Pass the dataloaders into `trainer.fit()`
-
-.. code-block:: python
-
-    trainer.fit(model, train_dataloader, val_dataloader)
-
-Data in LightningModule
-^^^^^^^^^^^^^^^^^^^^^^^
-For fast research prototyping, it might be easier to link the model with the dataloaders.
-
-.. code-block:: python
-
-    class LitModel(pl.LightningModule):
-
-        def train_dataloader(self):
-            # your train transforms
-            return DataLoader(YOUR_DATASET)
-
-        def val_dataloader(self):
-            # your val transforms
-            return DataLoader(YOUR_DATASET)
-
-        def test_dataloader(self):
-            # your test transforms
-            return DataLoader(YOUR_DATASET)
-
-And fit like so:
-
-.. code-block:: python
-
-    model = LitModel()
-    trainer.fit(model)
-
-DataModule
-^^^^^^^^^^
-A more reusable approach is to define a DataModule which is simply a collection of all 3 data splits but
-also captures:
-
- download instructions.
- processing.
- splitting.
- etc...
-
-Here's an illustration that explains how to refactor your code into reusable DataModules.
-
-.. raw:: html
-
-    <video width="100%" controls autoplay src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pt_dm_vid.m4v"></video>
-
-|
-
-And the matching code:
-
-|
-
-.. testcode:: python
-
-    class MNISTDataModule(LightningDataModule):
-
-        def __init__(self, batch_size=32):
-            super().__init__()
-            self.batch_size = batch_size
-
-        def prepare_data(self):
-            # optional to support downloading only once when using multi-GPU or multi-TPU
-            MNIST(os.getcwd(), train=True, download=True)
-            MNIST(os.getcwd(), train=False, download=True)
-
-        def setup(self, stage):
-            transform=transforms.Compose([
-                transforms.ToTensor(),
-                transforms.Normalize((0.1307,), (0.3081,))
-            ])
-
-            if stage == 'fit':
-                mnist_train = MNIST(os.getcwd(), train=True, transform=transform)
-                self.mnist_train, self.mnist_val = random_split(mnist_train, [55000, 5000])
-            if stage == 'test':
-                mnist_test = MNIST(os.getcwd(), train=False, transform=transform)
-                self.mnist_test = MNIST(os.getcwd(), train=False, download=True)
-
-        def train_dataloader(self):
-            mnist_train = DataLoader(self.mnist_train, batch_size=self.batch_size)
-            return mnist_train
-
-        def val_dataloader(self):
-            mnist_val = DataLoader(self.mnist_val, batch_size=self.batch_size)
-            return mnist_val
-
-        def test_dataloader(self):
-            mnist_test = DataLoader(mnist_test, batch_size=self.batch_size)
-            return mnist_test
-
-And train like so:
-
-.. code-block:: python
-
-    dm = MNISTDataModule()
-    trainer.fit(model, dm)
-
-When doing distributed training, Datamodules have two optional arguments for granular control
-over download/prepare/splitting data
-
-.. code-block:: python
-
-    class MyDataModule(LightningDataModule):
-
-        def prepare_data(self):
-            # called only on 1 GPU
-            download()
-            tokenize()
-            etc()
-
-        def setup(self, stage=None):
-            # called on every GPU (assigning state is OK)
-            self.train = ...
-            self.val = ...
-
-        def train_dataloader(self):
-            # do more...
-            return self.train
-
-Building models based on Data
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Datamodules are the recommended approach when building models based on the data.
-
-First, define the information that you might need.
-
-.. code-block:: python
-
-    class MyDataModule(LightningDataModule):
-
-        def __init__(self):
-            super().__init__()
-            self.train_dims = None
-            self.vocab_size = 0
-
-        def prepare_data(self):
-            download_dataset()
-            tokenize()
-            build_vocab()
-
-        def setup(self, stage=None):
-            vocab = load_vocab
-            self.vocab_size = len(vocab)
-
-            self.train, self.val, self.test = load_datasets()
-            self.train_dims = self.train.next_batch.size()
-
-        def train_dataloader(self):
-            transforms = ...
-            return DataLoader(self.train, transforms)
-
-        def val_dataloader(self):
-            transforms = ...
-            return DataLoader(self.val, transforms)
-
-        def test_dataloader(self):
-            transforms = ...
-            return DataLoader(self.test, transforms)
-
-Next, materialize the data and build your model
-
-.. code-block:: python
-
-    # build module
-    dm = MyDataModule()
-    dm.prepare_data()
-    dm.setup()
-
-    # pass in the properties you want
-    model = LitModel(image_width=dm.train_dims[0], vocab_length=dm.vocab_size)
-
-    # train
-    trainer.fit(model, dm)
-
-----------------
-
-Logging/progress bar
--------------------
-
-|
-
-.. image:: /_images/mnist_imgs/mnist_tb.png
-    :width: 300
-    :align: center
-    :alt: Example TB logs
-
-|
-
-Lightning has built-in logging to any of the supported loggers or progress bar.
-
-Log in train loop
-^^^^^^^^^^^^^^^^^
-To log from the training loop use the `log` method in the `TrainResult`.
-
-.. code-block:: python
-
-    def training_step(self, batch, batch_idx):
-        loss = ...
-        result = pl.TrainResult(minimize=loss)
-        result.log('train_loss', loss)
-        return result
-
-The `TrainResult` gives you options for logging on every step and/or at the end of the epoch.
-It also allows logging to the progress bar.
-
-.. code-block:: python
-
-        # equivalent
-        result.log('train_loss', loss)
-        result.log('train_loss', loss, prog_bar=False, logger=True, on_step=True, on_epoch=False)
-
-Then boot up your logger or tensorboard instance to view training logs
-
-.. code-block:: bash
-
-    tensorboard --logdir ./lightning_logs
-
-.. warning:: Refreshing the progress bar too frequently in Jupyter notebooks or Colab may freeze your UI.
-    We recommend you set `Trainer(progress_bar_refresh_rate=10)`
-
-Log in Val/Test loop
-^^^^^^^^^^^^^^^^^^^^
-To log from the validation or test loop use the `EvalResult`.
-
-.. code-block:: python
-
-    def validation_step(self, batch, batch_idx):
-        loss = ...
-        result = pl.EvalResult()
-        result.log_dict({'val_loss': loss, 'val_acc': acc})
-        return result
-
-Log to the progress bar
-^^^^^^^^^^^^^^^^^^^^^^^
-|
-
-.. code-block:: shell
-
-    Epoch 1:   4%|▎         | 40/1095 [00:03<01:37, 10.84it/s, loss=4.501, v_num=10]
-
-|
-
-In addition to visual logging, you can log to the progress bar by setting `prog_bar` to True
-
-.. code-block:: python
-
-    def training_step(self, batch, batch_idx):
-        loss = ...
-        result = pl.TrainResult(loss)
-        result.log('train_loss', loss, prog_bar=True)
-
-----------------
-
-Advanced loop aggregation
-------------------------
-For certain train/val/test loops, you may wish to do more than just logging. In this case,
-you can also implement `__epoch_end` which gives you the output for each step
-
-Here's the motivating Pytorch example:
-
-.. code-block:: python
-
-    validation_step_outputs = []
-    for batch_idx, batch in val_dataloader():
-        out = validation_step(batch, batch_idx)
-        validation_step_outputs.append(out)
-
-    validation_epoch_end(validation_step_outputs)
-
-And the lightning equivalent
-
-.. code-block:: python
-
-    def validation_step(self, batch, batch_idx):
-        loss = ...
-        predictions = ...
-        result = pl.EvalResult(checkpoint_on=loss)
-        result.log('val_loss', loss)
-        result.predictions = predictions
-
-     def validation_epoch_end(self, validation_step_outputs):
-        all_val_losses = validation_step_outputs.val_loss
-        all_predictions = validation_step_outputs.predictions
-
-Why do you need Lightning?
--------------------------
-The MAIN teakeaway points are:
-
- Lightning is for professional AI researchers/production teams.
- Lightning is organized PyTorch. It is not an abstraction.
- You STILL keep pure PyTorch.
- You DON't lose any flexibility.
- You can get rid of all of your boilerplate.
- You make your code generalizable to any hardware.
- Your code is now readable and easier to reproduce (ie: you help with the reproducibility crisis).
- Your LightningModule is still just a pure PyTorch module.
-
-Lightning is for you if
-^^^^^^^^^^^^^^^^^^^^^^^
-
- You're a professional researcher/ml engineer working on non-trivial deep learning.
- You already know PyTorch and are not a beginner.
- You want to iterate through research much faster.
- You want to put models into production much faster.
- You need full control of all the details but don't need the boilerplate.
- You want to leverage code written by hundreds of AI researchers, research engs and PhDs from the world's top AI labs.
- You need GPUs, multi-node training, half-precision and TPUs.
- You want research code that is rigorously tested (500+ tests) across CPUs/multi-GPUs/multi-TPUs on every pull-request.
-
-Some more cool features
-^^^^^^^^^^^^^^^^^^^^^^^
-Here are (some) of the other things you can do with lightning:
-
- Automatic checkpointing.
- Automatic early stopping.
- Automatically overfit your model for a sanity test.
- Automatic truncated-back-propagation-through-time.
- Automatically scale your batch size.
- Automatically attempt to find a good learning rate.
- Add arbitrary callbacks
- Hit every line of your code once to see if you have bugs (instead of waiting hours to crash on validation ;)
- Load checkpoints directly from S3.
- Move from CPUs to GPUs or TPUs without code changes.
- Profile your code for speed/memory bottlenecks.
- Scale to massive compute clusters.
- Use multiple dataloaders per train/val/test loop.
- Use multiple optimizers to do Reinforcement learning or even GANs.
-
-Example:
-^^^^^^^^
-Without changing a SINGLE line of your code, you can now do the following with the above code
-
-.. code-block:: python
-
-    # train on TPUs using 16 bit precision with early stopping
-    # using only half the training data and checking validation every quarter of a training epoch
-    trainer = Trainer(
-        tpu_cores=8,
-        precision=16,
-        early_stop_callback=True,
-        limit_train_batches=0.5,
-        val_check_interval=0.25
-    )
-
-    # train on 256 GPUs
-    trainer = Trainer(
-        gpus=8,
-        num_nodes=32
-    )
-
-    # train on 1024 CPUs across 128 machines
-    trainer = Trainer(
-        num_processes=8,
-        num_nodes=128
-    )
-
-And the best part is that your code is STILL just PyTorch... meaning you can do anything you
-would normally do.
-
-.. code-block:: python
-
-    model = LitModel()
-    model.eval()
-
-    y_hat = model(x)
-
-    model.anything_you_can_do_with_pytorch()
-
---------------
-
-Masterclass
-----------
-You can learn Lightning in-depth by watching our Masterclass.
-
-.. image:: _images/general/PTL101_youtube_thumbnail.jpg
-    :width: 500
-    :align: center
-    :alt: Masterclass
-    :target: https://www.youtube.com/playlist?list=PLaMu-SDt_RB5NUm67hU2pdE75j6KaIOv2