lightning/docs/source/debugging.rst

Debugging
=========
The following are flags that make debugging much easier.

Fast dev run
------------
This flag runs a "unit test" by running 1 training batch and 1 validation batch.
The point is to detect any bugs in the training/validation loop without having to wait for
a full epoch to crash.

(See: :paramref:`~pytorch_lightning.trainer.trainer.Trainer.fast_dev_run`
argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`)

.. code-block:: python

    trainer = pl.Trainer(fast_dev_run=True)

Inspect gradient norms
----------------------
Logs (to a logger), the norm of each weight matrix.

(See: :paramref:`~pytorch_lightning.trainer.trainer.Trainer.track_grad_norm`
argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`)

.. code-block:: python

    # the 2-norm
    trainer = pl.Trainer(track_grad_norm=2)

Log GPU usage
-------------
Logs (to a logger) the GPU usage for each GPU on the master machine.

(See: :paramref:`~pytorch_lightning.trainer.trainer.Trainer.log_gpu_memory`
argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`)

.. code-block:: python

    trainer = pl.Trainer(log_gpu_memory=True)

Make model overfit on subset of data
------------------------------------

A good debugging technique is to take a tiny portion of your data (say 2 samples per class),
and try to get your model to overfit. If it can't, it's a sign it won't work with large datasets.

(See: :paramref:`~pytorch_lightning.trainer.trainer.Trainer.overfit_pct`
argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`)

.. code-block:: python

    trainer = pl.Trainer(overfit_pct=0.01)

Print the parameter count by layer
----------------------------------
Whenever the .fit() function gets called, the Trainer will print the weights summary for the lightningModule.
To disable this behavior, turn off this flag:

(See: :paramref:`~pytorch_lightning.trainer.trainer.Trainer.weights_summary`
argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`)

.. code-block:: python

    trainer = pl.Trainer(weights_summary=None)


Set the number of validation sanity steps
-----------------------------------------
Lightning runs a few steps of validation in the beginning of training.
This avoids crashing in the validation loop sometime deep into a lengthy training loop.

(See: :paramref:`~pytorch_lightning.trainer.trainer.Trainer.num_sanity_val_steps`
argument of :class:`~pytorch_lightning.trainer.trainer.Trainer`)

.. code-block:: python

    # DEFAULT
    trainer = Trainer(num_sanity_val_steps=5)