2022-04-19 18:15:47 +00:00
:orphan:
.. _debugging_intermediate:
###############################
Debug your model (intermediate)
###############################
**Audience** : Users who want to debug their ML code
----
***** ***** ***** ***** ***** **
Why should I debug ML code?
***** ***** ***** ***** ***** **
Machine learning code requires debugging mathematical correctness, which is not something non-ML code has to deal with. Lightning implements a few best-practice techniques to give all users, expert level ML debugging abilities.
----
***** ***** ***** ***** ***** ***** ***** ***
Overfit your model on a Subset of Data
***** ***** ***** ***** ***** ***** ***** ***
A good debugging technique is to take a tiny portion of your data (say 2 samples per class),
and try to get your model to overfit. If it can't, it's a sign it won't work with large datasets.
2023-02-27 20:14:23 +00:00
(See: :paramref: `~lightning.pytorch.trainer.trainer.Trainer.overfit_batches`
argument of :class: `~lightning.pytorch.trainer.trainer.Trainer` )
2022-04-19 18:15:47 +00:00
.. testcode ::
# use only 1% of training data (and turn off validation)
trainer = Trainer(overfit_batches=0.01)
# similar, but with a fixed 10 batches
trainer = Trainer(overfit_batches=10)
When using this argument, the validation loop will be disabled. We will also replace the sampler
in the training set to turn off shuffle for you.
----
***** ***** ***** ***** ***** ***** **
Look-out for exploding gradients
***** ***** ***** ***** ***** ***** **
2023-02-14 12:38:17 +00:00
One major problem that plagues models is exploding gradients.
Gradient clipping is one technique that can help keep gradients from exploding.
2022-04-19 18:15:47 +00:00
2023-02-14 12:38:17 +00:00
You can keep an eye on the gradient norm by logging it in your LightningModule:
.. code-block :: python
from lightning.pytorch.utilities import grad_norm
2023-02-28 09:04:43 +00:00
2023-02-14 12:38:17 +00:00
def on_before_optimizer_step(self, optimizer):
# Compute the 2-norm for each layer
# If using mixed precision, the gradients are already unscaled here
norms = grad_norm(self.layer, norm_type=2)
self.log_dict(norms)
2022-04-19 18:15:47 +00:00
2023-02-14 12:38:17 +00:00
This will plot the 2-norm of each layer to your experiment manager.
If you notice the norm is going up, there's a good chance your gradients will explode.
2022-04-19 18:15:47 +00:00
2023-09-26 15:54:44 +00:00
One technique to stop exploding gradients is to clip the gradient when the norm is above a certain threshold:
2022-04-19 18:15:47 +00:00
.. testcode ::
# DEFAULT (ie: don't clip)
trainer = Trainer(gradient_clip_val=0)
# clip gradients' global norm to <=0.5 using gradient_clip_algorithm='norm' by default
trainer = Trainer(gradient_clip_val=0.5)
# clip gradients' maximum magnitude to <=0.5
trainer = Trainer(gradient_clip_val=0.5, gradient_clip_algorithm="value")
----
***** ***** ***** ***** *****
Detect autograd anomalies
***** ***** ***** ***** *****
Lightning helps you detect anomalies in the PyTorh autograd engine via PyTorch's built-in
`Anomaly Detection Context-manager <https://pytorch.org/docs/stable/autograd.html#anomaly-detection> `_ .
Enable it via the **detect_anomaly** trainer argument:
.. testcode ::
trainer = Trainer(detect_anomaly=True)