203 lines
5.6 KiB
ReStructuredText
203 lines
5.6 KiB
ReStructuredText
###########
|
|
Style guide
|
|
###########
|
|
A main goal of Lightning is to improve readability and reproducibility. Imagine looking into any GitHub repo,
|
|
finding a lightning module and knowing exactly where to look to find the things you care about.
|
|
|
|
The goal of this style guide is to encourage Lightning code to be structured similarly.
|
|
|
|
--------------
|
|
|
|
***************
|
|
LightningModule
|
|
***************
|
|
These are best practices about structuring your LightningModule
|
|
|
|
Systems vs models
|
|
=================
|
|
|
|
.. figure:: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/model_system.png
|
|
:width: 400
|
|
|
|
The main principle behind a LightningModule is that a full system should be self-contained.
|
|
In Lightning we differentiate between a system and a model.
|
|
|
|
A model is something like a resnet18, RNN, etc.
|
|
|
|
A system defines how a collection of models interact with each other. Examples of this are:
|
|
|
|
* GANs
|
|
* Seq2Seq
|
|
* BERT
|
|
* etc
|
|
|
|
A LightningModule can define both a system and a model.
|
|
|
|
Here's a LightningModule that defines a model:
|
|
|
|
.. code-block:: python
|
|
|
|
class LitModel(pl.LightningModule):
|
|
def __init__(self, num_layers: int = 3)
|
|
super().__init__()
|
|
self.layer_1 = nn.Linear(...)
|
|
self.layer_2 = nn.Linear(...)
|
|
self.layer_3 = nn.Linear(...)
|
|
|
|
Here's a lightningModule that defines a system:
|
|
|
|
.. code-block:: python
|
|
|
|
class LitModel(pl.LightningModule):
|
|
def __init__(self, encoder: nn.Module = None, decoder: nn.Module = None)
|
|
super().__init__()
|
|
self.encoder = encoder
|
|
self.decoder = decoder
|
|
|
|
For fast prototyping it's often useful to define all the computations in a LightningModule. For reusability
|
|
and scalability it might be better to pass in the relevant backbones.
|
|
|
|
Self-contained
|
|
==============
|
|
A Lightning module should be self-contained. A good test to see how self-contained your model is, is to ask
|
|
yourself this question:
|
|
|
|
"Can someone drop this file into a Trainer without knowing anything about the internals?"
|
|
|
|
For example, we couple the optimizer with a model because the majority of models require a specific optimizer with
|
|
a specific learning rate scheduler to work well.
|
|
|
|
Init
|
|
====
|
|
The first place where LightningModules tend to stop being self-contained is in the init. Try to define all the relevant
|
|
sensible defaults in the init so that the user doesn't have to guess.
|
|
|
|
Here's an example where a user will have to go hunt through files to figure out how to init this LightningModule.
|
|
|
|
.. code-block:: python
|
|
|
|
class LitModel(pl.LightningModule):
|
|
def __init__(self, params):
|
|
self.lr = params.lr
|
|
self.coef_x = params.coef_x
|
|
|
|
Models defined as such leave you with many questions; what is coef_x? is it a string? a float? what is the range? etc...
|
|
|
|
Instead, be explicit in your init
|
|
|
|
.. code-block:: python
|
|
|
|
class LitModel(pl.LightningModule):
|
|
def __init__(self, encoder: nn.Module, coeff_x: float = 0.2, lr: float = 1e-3)
|
|
|
|
Now the user doesn't have to guess. Instead they know the value type and the model has a sensible default where the
|
|
user can see the value immediately.
|
|
|
|
|
|
Method order
|
|
============
|
|
The only required methods in the LightningModule are:
|
|
|
|
* init
|
|
* training_step
|
|
* configure_optimizers
|
|
|
|
However, if you decide to implement the rest of the optional methods, the recommended order is:
|
|
|
|
* model/system definition (init)
|
|
* if doing inference, define forward
|
|
* training hooks
|
|
* validation hooks
|
|
* test hooks
|
|
* configure_optimizers
|
|
* any other hooks
|
|
|
|
In practice, this code looks like:
|
|
|
|
.. code-block:: python
|
|
|
|
class LitModel(pl.LightningModule):
|
|
|
|
def __init__(...):
|
|
|
|
def forward(...):
|
|
|
|
def training_step(...)
|
|
|
|
def training_step_end(...)
|
|
|
|
def training_epoch_end(...)
|
|
|
|
def validation_step(...)
|
|
|
|
def validation_step_end(...)
|
|
|
|
def validation_epoch_end(...)
|
|
|
|
def test_step(...)
|
|
|
|
def test_step_end(...)
|
|
|
|
def test_epoch_end(...)
|
|
|
|
def configure_optimizers(...)
|
|
|
|
def any_extra_hook(...)
|
|
|
|
Forward vs training_step
|
|
========================
|
|
We recommend using forward for inference/predictions and keeping training_step independent
|
|
|
|
.. code-block:: python
|
|
|
|
def forward(...):
|
|
embeddings = self.encoder(x)
|
|
|
|
def training_step(...):
|
|
x, y = ...
|
|
z = self.encoder(x)
|
|
pred = self.decoder(z)
|
|
...
|
|
|
|
However, when using DataParallel, you will need to call forward manually
|
|
|
|
.. code-block:: python
|
|
|
|
def training_step(...):
|
|
x, y = ...
|
|
z = self(x) # < ---------- instead of self.encoder(x)
|
|
pred = self.decoder(z)
|
|
...
|
|
|
|
--------------
|
|
|
|
****
|
|
Data
|
|
****
|
|
These are best practices for handling data.
|
|
|
|
Dataloaders
|
|
===========
|
|
Lightning uses dataloaders to handle all the data flow through the system. Whenever you structure dataloaders,
|
|
make sure to tune the number of workers for maximum efficiency.
|
|
|
|
.. warning:: Make sure not to use ddp_spawn with num_workers > 0 or you will bottleneck your code.
|
|
|
|
DataModules
|
|
===========
|
|
Lightning introduced datamodules. The problem with dataloaders is that sharing full datasets is often still challenging
|
|
because all these questions need to be answered:
|
|
|
|
* What splits were used?
|
|
* How many samples does this dataset have?
|
|
* What transforms were used?
|
|
* etc...
|
|
|
|
It's for this reason that we recommend you use datamodules. This is specially important when collaborating because
|
|
it will save your team a lot of time as well.
|
|
|
|
All they need to do is drop a datamodule into a lightning trainer and not worry about what was done to the data.
|
|
|
|
This is true for both academic and corporate settings where data cleaning and ad-hoc instructions slow down the progress
|
|
of iterating through ideas.
|