parent
14b28190dd
commit
6f3f688c27
|
@ -21,162 +21,6 @@ We'll accomplish the following:
|
|||
|
||||
--------------
|
||||
|
||||
*********************
|
||||
Why PyTorch Lightning
|
||||
*********************
|
||||
|
||||
a. Less boilerplate
|
||||
===================
|
||||
|
||||
Research and production code starts with simple code, but quickly grows in complexity
|
||||
once you add gpu training, 16-bit, checkpointing, logging, etc...
|
||||
|
||||
PyTorch Lightning implements these features for you and tests them rigorously to make sure you can
|
||||
instead focus on the research idea.
|
||||
|
||||
Writing less engineering/bolierplate code means:
|
||||
|
||||
- fewer bugs
|
||||
- faster iteration
|
||||
- faster prototyping
|
||||
|
||||
b. More functionality
|
||||
=====================
|
||||
|
||||
In PyTorch Lightning you leverage code written by hundreds of AI researchers,
|
||||
research engs and PhDs from the world's top AI labs,
|
||||
implementing all the latest best practices and SOTA features such as
|
||||
|
||||
- GPU, Multi GPU, TPU training
|
||||
- Multi node training
|
||||
- Auto logging
|
||||
- ...
|
||||
- Gradient accumulation
|
||||
|
||||
c. Less error prone
|
||||
===================
|
||||
|
||||
Why re-invent the wheel?
|
||||
|
||||
Use PyTorch Lightning to enjoy a deep learning structure that is rigorously tested (500+ tests)
|
||||
across CPUs/multi-GPUs/multi-TPUs on every pull-request.
|
||||
|
||||
We promise our collective team of 20+ from the top labs has thought about training more than you :)
|
||||
|
||||
d. Not a new library
|
||||
====================
|
||||
|
||||
PyTorch Lightning is organized PyTorch - no need to learn a new framework.
|
||||
|
||||
Switching your model to Lightning is straight forward - here's a 2-minute video on how to do it.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<video width="100%" controls autoplay src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pl_quick_start_full.m4v"></video>
|
||||
|
||||
Your projects WILL grow in complexity and you WILL end up engineering more than trying out new ideas...
|
||||
Defer the hardest parts to Lightning!
|
||||
|
||||
----------------
|
||||
|
||||
********************
|
||||
Lightning Philosophy
|
||||
********************
|
||||
Lightning structures your deep learning code in 4 parts:
|
||||
|
||||
- Research code
|
||||
- Engineering code
|
||||
- Non-essential code
|
||||
- Data code
|
||||
|
||||
Research code
|
||||
=============
|
||||
In the MNIST generation example, the research code
|
||||
would be the particular system and how it's trained (ie: A GAN or VAE or GPT).
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
l1 = nn.Linear(...)
|
||||
l2 = nn.Linear(...)
|
||||
decoder = Decoder()
|
||||
|
||||
x1 = l1(x)
|
||||
x2 = l2(x2)
|
||||
out = decoder(features, x)
|
||||
|
||||
loss = perceptual_loss(x1, x2, x) + CE(out, x)
|
||||
|
||||
In Lightning, this code is organized into a :ref:`lightning-module`.
|
||||
|
||||
Engineering code
|
||||
================
|
||||
|
||||
The Engineering code is all the code related to training this system. Things such as early stopping, distribution
|
||||
over GPUs, 16-bit precision, etc. This is normally code that is THE SAME across most projects.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
model.cuda(0)
|
||||
x = x.cuda(0)
|
||||
|
||||
distributed = DistributedParallel(model)
|
||||
|
||||
with gpu_zero:
|
||||
download_data()
|
||||
|
||||
dist.barrier()
|
||||
|
||||
In Lightning, this code is abstracted out by the :ref:`trainer`.
|
||||
|
||||
Non-essential code
|
||||
==================
|
||||
|
||||
This is code that helps the research but isn't relevant to the research code. Some examples might be:
|
||||
|
||||
1. Inspect gradients
|
||||
2. Log to tensorboard.
|
||||
|
||||
|
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# log samples
|
||||
z = Q.rsample()
|
||||
generated = decoder(z)
|
||||
self.experiment.log('images', generated)
|
||||
|
||||
In Lightning this code is organized into :ref:`callbacks`.
|
||||
|
||||
Data code
|
||||
=========
|
||||
Lightning uses standard PyTorch DataLoaders or anything that gives a batch of data.
|
||||
This code tends to end up getting messy with transforms, normalization constants and data splitting
|
||||
spread all over files.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# data
|
||||
train = MNIST(...)
|
||||
train, val = split(train, val)
|
||||
test = MNIST(...)
|
||||
|
||||
# transforms
|
||||
train_transforms = ...
|
||||
val_transforms = ...
|
||||
test_transforms = ...
|
||||
|
||||
# dataloader ...
|
||||
# download with dist.barrier() for multi-gpu, etc...
|
||||
|
||||
This code gets specially complicated once you start doing multi-gpu training or needing info about
|
||||
the data to build your models.
|
||||
|
||||
In Lightning this code is organized inside a :ref:`data-modules`.
|
||||
|
||||
.. note:: DataModules are optional but encouraged, otherwise you can use standard DataModules
|
||||
|
||||
----------------
|
||||
|
||||
**************************
|
||||
From MNIST to AutoEncoders
|
||||
**************************
|
||||
|
@ -245,21 +89,13 @@ Let's first start with the model. In this case we'll design a 3-layer neural net
|
|||
|
||||
# (b, 1, 28, 28) -> (b, 1*28*28)
|
||||
x = x.view(batch_size, -1)
|
||||
|
||||
# layer 1
|
||||
x = self.layer_1(x)
|
||||
x = torch.relu(x)
|
||||
|
||||
# layer 2
|
||||
x = self.layer_2(x)
|
||||
x = torch.relu(x)
|
||||
|
||||
# layer 3
|
||||
x = self.layer_3(x)
|
||||
|
||||
# probability distribution over labels
|
||||
x = torch.log_softmax(x, dim=1)
|
||||
|
||||
return x
|
||||
|
||||
Notice this is a :class:`~pytorch_lightning.core.LightningModule` instead of a `torch.nn.Module`. A LightningModule is
|
||||
|
@ -280,6 +116,18 @@ equivalent to a pure PyTorch Module except it has added functionality. However,
|
|||
torch.Size([1, 10])
|
||||
|
||||
|
||||
Now we add the training_step which has all our training loop logic
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class LitMNIST(LightningModule):
|
||||
|
||||
def training_step(self, batch, batch_idx):
|
||||
x, y = batch
|
||||
logits = self(x)
|
||||
loss = F.nll_loss(logits, y)
|
||||
return loss
|
||||
|
||||
Data
|
||||
----
|
||||
|
||||
|
@ -1166,5 +1014,158 @@ And pass the callbacks into the trainer
|
|||
|
||||
.. include:: transfer_learning.rst
|
||||
|
||||
----------
|
||||
|
||||
*********************
|
||||
Why PyTorch Lightning
|
||||
*********************
|
||||
|
||||
a. Less boilerplate
|
||||
===================
|
||||
|
||||
Research and production code starts with simple code, but quickly grows in complexity
|
||||
once you add gpu training, 16-bit, checkpointing, logging, etc...
|
||||
|
||||
PyTorch Lightning implements these features for you and tests them rigorously to make sure you can
|
||||
instead focus on the research idea.
|
||||
|
||||
Writing less engineering/bolierplate code means:
|
||||
|
||||
- fewer bugs
|
||||
- faster iteration
|
||||
- faster prototyping
|
||||
|
||||
b. More functionality
|
||||
=====================
|
||||
|
||||
In PyTorch Lightning you leverage code written by hundreds of AI researchers,
|
||||
research engs and PhDs from the world's top AI labs,
|
||||
implementing all the latest best practices and SOTA features such as
|
||||
|
||||
- GPU, Multi GPU, TPU training
|
||||
- Multi node training
|
||||
- Auto logging
|
||||
- ...
|
||||
- Gradient accumulation
|
||||
|
||||
c. Less error prone
|
||||
===================
|
||||
|
||||
Why re-invent the wheel?
|
||||
|
||||
Use PyTorch Lightning to enjoy a deep learning structure that is rigorously tested (500+ tests)
|
||||
across CPUs/multi-GPUs/multi-TPUs on every pull-request.
|
||||
|
||||
We promise our collective team of 20+ from the top labs has thought about training more than you :)
|
||||
|
||||
d. Not a new library
|
||||
====================
|
||||
|
||||
PyTorch Lightning is organized PyTorch - no need to learn a new framework.
|
||||
|
||||
Switching your model to Lightning is straight forward - here's a 2-minute video on how to do it.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<video width="100%" controls autoplay src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/pl_quick_start_full.m4v"></video>
|
||||
|
||||
Your projects WILL grow in complexity and you WILL end up engineering more than trying out new ideas...
|
||||
Defer the hardest parts to Lightning!
|
||||
|
||||
----------------
|
||||
|
||||
********************
|
||||
Lightning Philosophy
|
||||
********************
|
||||
Lightning structures your deep learning code in 4 parts:
|
||||
|
||||
- Research code
|
||||
- Engineering code
|
||||
- Non-essential code
|
||||
- Data code
|
||||
|
||||
Research code
|
||||
=============
|
||||
In the MNIST generation example, the research code
|
||||
would be the particular system and how it's trained (ie: A GAN or VAE or GPT).
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
l1 = nn.Linear(...)
|
||||
l2 = nn.Linear(...)
|
||||
decoder = Decoder()
|
||||
|
||||
x1 = l1(x)
|
||||
x2 = l2(x2)
|
||||
out = decoder(features, x)
|
||||
|
||||
loss = perceptual_loss(x1, x2, x) + CE(out, x)
|
||||
|
||||
In Lightning, this code is organized into a :ref:`lightning-module`.
|
||||
|
||||
Engineering code
|
||||
================
|
||||
|
||||
The Engineering code is all the code related to training this system. Things such as early stopping, distribution
|
||||
over GPUs, 16-bit precision, etc. This is normally code that is THE SAME across most projects.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
model.cuda(0)
|
||||
x = x.cuda(0)
|
||||
|
||||
distributed = DistributedParallel(model)
|
||||
|
||||
with gpu_zero:
|
||||
download_data()
|
||||
|
||||
dist.barrier()
|
||||
|
||||
In Lightning, this code is abstracted out by the :ref:`trainer`.
|
||||
|
||||
Non-essential code
|
||||
==================
|
||||
|
||||
This is code that helps the research but isn't relevant to the research code. Some examples might be:
|
||||
|
||||
1. Inspect gradients
|
||||
2. Log to tensorboard.
|
||||
|
||||
|
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# log samples
|
||||
z = Q.rsample()
|
||||
generated = decoder(z)
|
||||
self.experiment.log('images', generated)
|
||||
|
||||
In Lightning this code is organized into :ref:`callbacks`.
|
||||
|
||||
Data code
|
||||
=========
|
||||
Lightning uses standard PyTorch DataLoaders or anything that gives a batch of data.
|
||||
This code tends to end up getting messy with transforms, normalization constants and data splitting
|
||||
spread all over files.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# data
|
||||
train = MNIST(...)
|
||||
train, val = split(train, val)
|
||||
test = MNIST(...)
|
||||
|
||||
# transforms
|
||||
train_transforms = ...
|
||||
val_transforms = ...
|
||||
test_transforms = ...
|
||||
|
||||
# dataloader ...
|
||||
# download with dist.barrier() for multi-gpu, etc...
|
||||
|
||||
This code gets specially complicated once you start doing multi-gpu training or needing info about
|
||||
the data to build your models.
|
||||
|
||||
In Lightning this code is organized inside a :ref:`data-modules`.
|
||||
|
||||
.. note:: DataModules are optional but encouraged, otherwise you can use standard DataModules
|
||||
|
|
Loading…
Reference in New Issue