2022-04-19 18:15:47 +00:00
:orphan:
.. _debugging_basic:
########################
Debug your model (basic)
########################
2023-07-03 18:16:45 +00:00
**Audience** : Users who want to learn the basics of debugging models.
2022-04-19 18:15:47 +00:00
2023-07-03 18:16:45 +00:00
.. video :: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/yt/Trainer+flags+7-+debugging_1.mp4
:poster: https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/yt_thumbs/thumb_debugging.png
:width: 400
:muted:
2022-04-19 18:15:47 +00:00
----
***** ***** ***** ***** ***** ***** *** *
How does Lightning help me debug ?
***** ***** ***** ***** ***** ***** *** *
The Lightning Trainer has *a lot* of arguments devoted to maximizing your debugging productivity.
----
***** ***** ***** *
Set a breakpoint
***** ***** ***** *
A breakpoint stops your code execution so you can inspect variables, etc... and allow your code to execute one line at a time.
.. code :: python
def function_to_debug():
x = 2
# set breakpoint
import pdb
pdb.set_trace()
2023-01-30 10:07:52 +00:00
y = x**2
2022-04-19 18:15:47 +00:00
In this example, the code will stop before executing the `` y = x**2 `` line.
----
***** ***** ***** ***** ***** ***** ***** *
Run all your model code once quickly
***** ***** ***** ***** ***** ***** ***** *
If you've ever trained a model for days only to crash during validation or testing then this trainer argument is about to become your best friend.
2023-02-27 20:14:23 +00:00
The :paramref: `~lightning.pytorch.trainer.trainer.Trainer.fast_dev_run` argument in the trainer runs 5 batch of training, validation, test and prediction data through your trainer to see if there are any bugs:
2022-04-19 18:15:47 +00:00
.. code :: python
2023-10-25 00:45:59 +00:00
trainer = Trainer(fast_dev_run=True)
2022-04-19 18:15:47 +00:00
To change how many batches to use, change the argument to an integer. Here we run 7 batches of each:
.. code :: python
2023-10-25 00:45:59 +00:00
trainer = Trainer(fast_dev_run=7)
2022-04-19 18:15:47 +00:00
.. note ::
This argument will disable tuner, checkpoint callbacks, early stopping callbacks,
2023-02-27 20:14:23 +00:00
loggers and logger callbacks like :class: `~lightning.pytorch.callbacks.lr_monitor.LearningRateMonitor` and
:class: `~lightning.pytorch.callbacks.device_stats_monitor.DeviceStatsMonitor` .
2022-04-19 18:15:47 +00:00
----
***** ***** ***** ***** *** *
Shorten the epoch length
***** ***** ***** ***** *** *
Sometimes it's helpful to only use a fraction of your training, val, test, or predict data (or a set number of batches).
For example, you can use 20% of the training set and 1% of the validation set.
On larger datasets like Imagenet, this can help you debug or test a few things faster than waiting for a full epoch.
.. testcode ::
# use only 10% of training data and 1% of val data
trainer = Trainer(limit_train_batches=0.1, limit_val_batches=0.01)
# use 10 batches of train and 5 batches of val
trainer = Trainer(limit_train_batches=10, limit_val_batches=5)
----
***** ***** ***** ***
Run a Sanity Check
***** ***** ***** ***
Lightning runs **2** steps of validation in the beginning of training.
This avoids crashing in the validation loop sometime deep into a lengthy training loop.
2023-02-27 20:14:23 +00:00
(See: :paramref: `~lightning.pytorch.trainer.trainer.Trainer.num_sanity_val_steps`
argument of :class: `~lightning.pytorch.trainer.trainer.Trainer` )
2022-04-19 18:15:47 +00:00
.. testcode ::
trainer = Trainer(num_sanity_val_steps=2)
----
***** ***** ***** ***** ***** ***** ***** **
Print LightningModule weights summary
***** ***** ***** ***** ***** ***** ***** **
Whenever the `` .fit() `` function gets called, the Trainer will print the weights summary for the LightningModule.
.. code :: python
trainer.fit(...)
this generate a table like:
.. code-block :: text
2024-02-15 20:13:35 +00:00
| Name | Type | Params | Mode
-------------------------------------------
0 | net | Sequential | 132 K | train
1 | net.0 | Linear | 131 K | train
2 | net.1 | BatchNorm1d | 1.0 K | train
2022-04-19 18:15:47 +00:00
2023-02-27 20:14:23 +00:00
To add the child modules to the summary add a :class: `~lightning.pytorch.callbacks.model_summary.ModelSummary` :
2022-04-19 18:15:47 +00:00
.. testcode ::
2023-02-27 20:14:23 +00:00
from lightning.pytorch.callbacks import ModelSummary
2022-04-19 18:15:47 +00:00
trainer = Trainer(callbacks=[ModelSummary(max_depth=-1)])
2022-09-02 08:53:29 +00:00
To print the model summary if `` .fit() `` is not called:
.. code-block :: python
2023-02-27 20:14:23 +00:00
from lightning.pytorch.utilities.model_summary import ModelSummary
2022-09-02 08:53:29 +00:00
model = LitModel()
summary = ModelSummary(model, max_depth=-1)
print(summary)
2022-04-19 18:15:47 +00:00
To turn off the autosummary use:
.. code :: python
2023-10-25 00:45:59 +00:00
trainer = Trainer(enable_model_summary=False)
2022-04-19 18:15:47 +00:00
----
***** ***** ***** ***** ***** ***** *****
Print input output layer dimensions
***** ***** ***** ***** ***** ***** *****
Another debugging tool is to display the intermediate input- and output sizes of all your layers by setting the
`` example_input_array `` attribute in your LightningModule.
.. code-block :: python
class LitModel(LightningModule):
def __init__(self, *args, * *kwargs):
self.example_input_array = torch.Tensor(32, 1, 28, 28)
With the input array, the summary table will include the input and output layer dimensions:
.. code-block :: text
2024-02-15 20:13:35 +00:00
| Name | Type | Params | Mode | In sizes | Out sizes
----------------------------------------------------------------------
0 | net | Sequential | 132 K | train | [10, 256] | [10, 512]
1 | net.0 | Linear | 131 K | train | [10, 256] | [10, 512]
2 | net.1 | BatchNorm1d | 1.0 K | train | [10, 512] | [10, 512]
2022-04-19 18:15:47 +00:00
when you call `` .fit() `` on the Trainer. This can help you find bugs in the composition of your layers.