From 6f3f688c272ca00279a7a58a12d361ddf90e014c Mon Sep 17 00:00:00 2001
From: William Falcon <waf2107@columbia.edu>
Date: Sun, 30 Aug 2020 10:01:09 -0400
Subject: [PATCH] updated docs (#3268)

* updated docs

* updated docs
---
 docs/source/introduction_guide.rst | 345 +++++++++++++++--------------
 1 file changed, 173 insertions(+), 172 deletions(-)
diff --git a/docs/source/introduction_guide.rst b/docs/source/introduction_guide.rst
index d671e030b6..7e4416e0cd 100644
--- a/docs/source/introduction_guide.rst
+++ b/docs/source/introduction_guide.rst
@@ -21,162 +21,6 @@ We'll accomplish the following:
 
 --------------
 
-*********************
-Why PyTorch Lightning
-*********************
-
-a. Less boilerplate
-===================
-
-Research and production code starts with simple code, but quickly grows in complexity
-once you add gpu training, 16-bit, checkpointing, logging, etc...
-
-PyTorch Lightning implements these features for you and tests them rigorously to make sure you can
-instead focus on the research idea.
-
-Writing less engineering/bolierplate code means:
-
-- fewer bugs
-- faster iteration
-- faster prototyping
-
-b. More functionality
-=====================
-
-In PyTorch Lightning you leverage code written by hundreds of AI researchers,
-research engs and PhDs from the world's top AI labs,
-implementing all the latest best practices and SOTA features such as
-
-- GPU, Multi GPU, TPU training
-- Multi node training
-- Auto logging
-- ...
-- Gradient accumulation
-
-c. Less error prone
-===================
-
-Why re-invent the wheel?
-
-Use PyTorch Lightning to enjoy a deep learning structure that is rigorously tested (500+ tests)
-across CPUs/multi-GPUs/multi-TPUs on every pull-request.
-
-We promise our collective team of 20+ from the top labs has thought about training more than you :)
-
-d. Not a new library
-====================
-
-PyTorch Lightning is organized PyTorch - no need to learn a new framework.
-
-Switching your model to Lightning is straight forward - here's a 2-minute video on how to do it.
-
-.. raw:: html
-
-    <video width="100%" controls autoplay src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pl_quick_start_full.m4v"></video>
-
-Your projects WILL grow in complexity and you WILL end up engineering more than trying out new ideas...
-Defer the hardest parts to Lightning!
-
-----------------
-
-********************
-Lightning Philosophy
-********************
-Lightning structures your deep learning code in 4 parts:
-
-- Research code
-- Engineering code
-- Non-essential code
-- Data code
-
-Research code
-=============
-In the MNIST generation example, the research code
-would be the particular system and how it's trained (ie: A GAN or VAE or GPT).
-
-.. code-block:: python
-
-    l1 = nn.Linear(...)
-    l2 = nn.Linear(...)
-    decoder = Decoder()
-
-    x1 = l1(x)
-    x2 = l2(x2)
-    out = decoder(features, x)
-
-    loss = perceptual_loss(x1, x2, x) + CE(out, x)
-    
-In Lightning, this code is organized into a :ref:`lightning-module`.
-
-Engineering code
-================
-
-The Engineering code is all the code related to training this system. Things such as early stopping, distribution
-over GPUs, 16-bit precision, etc. This is normally code that is THE SAME across most projects.
-
-.. code-block:: python
-
-    model.cuda(0)
-    x = x.cuda(0)
-
-    distributed = DistributedParallel(model)
-
-    with gpu_zero:
-        download_data()
-
-    dist.barrier()
-    
-In Lightning, this code is abstracted out by the :ref:`trainer`.
-
-Non-essential code
-==================
-
-This is code that helps the research but isn't relevant to the research code. Some examples might be:
-
-1. Inspect gradients
-2. Log to tensorboard.
-
-|
-
-.. code-block:: python
-
-    # log samples
-    z = Q.rsample()
-    generated = decoder(z)
-    self.experiment.log('images', generated)
-    
-In Lightning this code is organized into :ref:`callbacks`.
-
-Data code
-=========
-Lightning uses standard PyTorch DataLoaders or anything that gives a batch of data.
-This code tends to end up getting messy with transforms, normalization constants and data splitting
-spread all over files.
-
-.. code-block:: python
-
-    # data
-    train = MNIST(...)
-    train, val = split(train, val)
-    test = MNIST(...)
-
-    # transforms
-    train_transforms = ...
-    val_transforms = ...
-    test_transforms = ...
-
-    # dataloader ...
-    # download with dist.barrier() for multi-gpu, etc...
-
-This code gets specially complicated once you start doing multi-gpu training or needing info about
-the data to build your models.
-
-In Lightning this code is organized inside a :ref:`data-modules`.
-
-.. note:: DataModules are optional but encouraged, otherwise you can use standard DataModules
-
-----------------
-
 **************************
 From MNIST to AutoEncoders
 **************************
@@ -213,8 +57,8 @@ The research
 The Model
 ---------
 
-The :class:`~pytorch_lightning.core.LightningModule` holds all the core research ingredients: 
- 
+The :class:`~pytorch_lightning.core.LightningModule` holds all the core research ingredients:
+
 - The model
 
 - The optimizers
@@ -245,21 +89,13 @@ Let's first start with the model. In this case we'll design a 3-layer neural net
 
         # (b, 1, 28, 28) -> (b, 1*28*28)
         x = x.view(batch_size, -1)
-
-        # layer 1
         x = self.layer_1(x)
         x = torch.relu(x)
-
-        # layer 2
         x = self.layer_2(x)
         x = torch.relu(x)
-
-        # layer 3
         x = self.layer_3(x)
 
-        # probability distribution over labels
         x = torch.log_softmax(x, dim=1)
-
         return x
 
 Notice this is a :class:`~pytorch_lightning.core.LightningModule` instead of a `torch.nn.Module`. A LightningModule is
@@ -280,6 +116,18 @@ equivalent to a pure PyTorch Module except it has added functionality. However,
     torch.Size([1, 10])
 
 
+Now we add the training_step which has all our training loop logic
+
+.. code-block:: python
+
+    class LitMNIST(LightningModule):
+
+        def training_step(self, batch, batch_idx):
+            x, y = batch
+            logits = self(x)
+            loss = F.nll_loss(logits, y)
+            return loss
+
 Data
 ----
 
@@ -315,7 +163,7 @@ Lightning operates on pure dataloaders. Here's the PyTorch code for loading MNIS
     Extracting ...
     Processing...
     Done!
-    
+
 You can use DataLoaders in 3 ways:
 
 1. Pass DataLoaders to .fit()
@@ -327,7 +175,7 @@ Pass in the dataloaders to the `.fit()` function.
     model = LitMNIST()
     trainer = Trainer()
     trainer.fit(model, mnist_train)
-    
+
 
 2. LightningModule DataLoaders
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -337,7 +185,7 @@ For fast research prototyping, it might be easier to link the model with the dat
 .. code-block:: python
 
     class LitMNIST(pl.LightningModule):
-        
+
         def train_dataloader(self):
             # transforms
             # prepare transforms standard to MNIST
@@ -347,7 +195,7 @@ For fast research prototyping, it might be easier to link the model with the dat
             mnist_train = MNIST(os.getcwd(), train=True, download=True)
             mnist_train = DataLoader(mnist_train, batch_size=64)
             return DataLoader(mnist_train)
-            
+
         def val_dataloader(self):
             transforms = ...
             return DataLoader(self.val, transforms)
@@ -355,7 +203,7 @@ For fast research prototyping, it might be easier to link the model with the dat
         def test_dataloader(self):
             transforms = ...
             return DataLoader(self.test, transforms)
-            
+
 DataLoaders are already in the model, no need to specify on .fit().
 
 .. code-block:: python
@@ -496,7 +344,7 @@ However, if you have multiple optimizers use the matching parameters
 
         def configure_optimizers(self):
             return Adam(self.generator(), lr=1e-3), Adam(self.discriminator(), lr=1e-3)
-            
+
 
 Training step
 -------------
@@ -1166,5 +1014,158 @@ And pass the callbacks into the trainer
 
 .. include:: transfer_learning.rst
 
+----------
 
+*********************
+Why PyTorch Lightning
+*********************
 
+a. Less boilerplate
+===================
+
+Research and production code starts with simple code, but quickly grows in complexity
+once you add gpu training, 16-bit, checkpointing, logging, etc...
+
+PyTorch Lightning implements these features for you and tests them rigorously to make sure you can
+instead focus on the research idea.
+
+Writing less engineering/bolierplate code means:
+
+- fewer bugs
+- faster iteration
+- faster prototyping
+
+b. More functionality
+=====================
+
+In PyTorch Lightning you leverage code written by hundreds of AI researchers,
+research engs and PhDs from the world's top AI labs,
+implementing all the latest best practices and SOTA features such as
+
+- GPU, Multi GPU, TPU training
+- Multi node training
+- Auto logging
+- ...
+- Gradient accumulation
+
+c. Less error prone
+===================
+
+Why re-invent the wheel?
+
+Use PyTorch Lightning to enjoy a deep learning structure that is rigorously tested (500+ tests)
+across CPUs/multi-GPUs/multi-TPUs on every pull-request.
+
+We promise our collective team of 20+ from the top labs has thought about training more than you :)
+
+d. Not a new library
+====================
+
+PyTorch Lightning is organized PyTorch - no need to learn a new framework.
+
+Switching your model to Lightning is straight forward - here's a 2-minute video on how to do it.
+
+.. raw:: html
+
+    <video width="100%" controls autoplay src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pl_quick_start_full.m4v"></video>
+
+Your projects WILL grow in complexity and you WILL end up engineering more than trying out new ideas...
+Defer the hardest parts to Lightning!
+
+----------------
+
+********************
+Lightning Philosophy
+********************
+Lightning structures your deep learning code in 4 parts:
+
+- Research code
+- Engineering code
+- Non-essential code
+- Data code
+
+Research code
+=============
+In the MNIST generation example, the research code
+would be the particular system and how it's trained (ie: A GAN or VAE or GPT).
+
+.. code-block:: python
+
+    l1 = nn.Linear(...)
+    l2 = nn.Linear(...)
+    decoder = Decoder()
+
+    x1 = l1(x)
+    x2 = l2(x2)
+    out = decoder(features, x)
+
+    loss = perceptual_loss(x1, x2, x) + CE(out, x)
+    
+In Lightning, this code is organized into a :ref:`lightning-module`.
+
+Engineering code
+================
+
+The Engineering code is all the code related to training this system. Things such as early stopping, distribution
+over GPUs, 16-bit precision, etc. This is normally code that is THE SAME across most projects.
+
+.. code-block:: python
+
+    model.cuda(0)
+    x = x.cuda(0)
+
+    distributed = DistributedParallel(model)
+
+    with gpu_zero:
+        download_data()
+
+    dist.barrier()
+    
+In Lightning, this code is abstracted out by the :ref:`trainer`.
+
+Non-essential code
+==================
+
+This is code that helps the research but isn't relevant to the research code. Some examples might be:
+
+1. Inspect gradients
+2. Log to tensorboard.
+
+|
+
+.. code-block:: python
+
+    # log samples
+    z = Q.rsample()
+    generated = decoder(z)
+    self.experiment.log('images', generated)
+    
+In Lightning this code is organized into :ref:`callbacks`.
+
+Data code
+=========
+Lightning uses standard PyTorch DataLoaders or anything that gives a batch of data.
+This code tends to end up getting messy with transforms, normalization constants and data splitting
+spread all over files.
+
+.. code-block:: python
+
+    # data
+    train = MNIST(...)
+    train, val = split(train, val)
+    test = MNIST(...)
+
+    # transforms
+    train_transforms = ...
+    val_transforms = ...
+    test_transforms = ...
+
+    # dataloader ...
+    # download with dist.barrier() for multi-gpu, etc...
+
+This code gets specially complicated once you start doing multi-gpu training or needing info about
+the data to build your models.
+
+In Lightning this code is organized inside a :ref:`data-modules`.
+
+.. note:: DataModules are optional but encouraged, otherwise you can use standard DataModules