lightning/docs/source/experiment_reporting.rst

123 lines
3.4 KiB
ReStructuredText
Raw Normal View History

Experiment Reporting
=====================
Lightning supports many different experiment loggers. These loggers allow you to monitor losses, images, text, etc...
as training progresses. They usually provide a GUI to visualize and can sometimes even snapshot hyperparameters
used in each experiment.
Control logging frequency
^^^^^^^^^^^^^^^^^^^^^^^^^
It may slow training down to log every single batch. Trainer has an option to log every k batches instead.
.. code-block:: python
# k = 10
Trainer(row_log_interval=10)
Control log writing frequency
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Writing to a logger can be expensive. In Lightning you can set the interval at which you
want to log using this trainer flag.
CI: Force docs warnings to be raised as errors (+ fix all) (#1191) * add argument to force warn * fix automodule error * fix permalink error * fix indentation warning * fix warning * fix import warnings * fix duplicate label warning * fix bullet point indentation warning * fix duplicate label warning * fix "import not top level" warning * line too long * fix indentation * fix bullet points indentation warning * fix hooks warnings * fix reference problem with excluded test_tube * fix indentation in print * change imports for trains logger * remove pandas type annotation * Update pytorch_lightning/core/lightning.py * include bullet points inside note * remove old quick start guide (unused) * fix unused warning * fix formatting * fix duplicate label issue * fix duplicate label warning (replaced by class ref) * fix tick * fix indentation warnings * docstring ticks * remove obsolete docstring typing * Revert "remove old quick start guide (unused)" This reverts commit d51bb40695442c8fa11bc9df74f6db56264f7509. * added old quick start guide to navigation * remove unused tutorials file * ignore some modules that got deprecated and are not used anymore * fix duplicate label warning * move examples doc and exclude pl_examples from autodoc * fix formatting for configure_optimizer * fix no blank line warnings * fix "see also" labels and add paramref extension * fix more reference problems * fix multi-gpu reference * fix weird warning * fix indentation and unrecognized characters in code block * fix warning "... not included in toctree" * fix PIL import error * fix duplicate target "here" warning * fix broken link * revert accidentally moved pl_examples * changelog * stdout * note some things to know Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: J. Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-03-20 19:49:01 +00:00
.. seealso::
:class:`~pytorch_lightning.trainer.trainer.Trainer`
.. code-block:: python
k = 100
Trainer(log_save_interval=k)
Log metrics
^^^^^^^^^^^
To plot metrics into whatever logger you passed in (tensorboard, comet, neptune, TRAINS, etc...)
1. training_epoch_end, validation_epoch_end, test_epoch_end will all log anything in the "log" key of the return dict.
.. code-block:: python
def training_epoch_end(self, outputs):
loss = some_loss()
...
logs = {'train_loss': loss}
results = {'log': logs}
return results
def validation_epoch_end(self, outputs):
loss = some_loss()
...
logs = {'val_loss': loss}
results = {'log': logs}
return results
def test_epoch_end(self, outputs):
loss = some_loss()
...
logs = {'test_loss': loss}
results = {'log': logs}
return results
2. In addition, you can also use any arbitrary functionality from a particular logger from within your LightningModule.
For instance, here we log images using tensorboard.
.. code-block:: python
def training_step(self, batch, batch_idx):
self.generated_imgs = self.decoder.generate()
sample_imgs = self.generated_imgs[:6]
grid = torchvision.utils.make_grid(sample_imgs)
self.logger.experiment.add_image('generated_images', grid, 0)
...
return results
Modify progress bar
^^^^^^^^^^^^^^^^^^^
Each return dict from the training_end, validation_end, testing_end and training_step also has
a key called "progress_bar".
Here we show the validation loss in the progress bar
.. code-block:: python
def validation_epoch_end(self, outputs):
loss = some_loss()
...
logs = {'val_loss': loss}
results = {'progress_bar': logs}
return results
Snapshot hyperparameters
^^^^^^^^^^^^^^^^^^^^^^^^
When training a model, it's useful to know what hyperparams went into that model.
When Lightning creates a checkpoint, it stores a key "hparams" with the hyperparams.
.. code-block:: python
lightning_checkpoint = torch.load(filepath, map_location=lambda storage, loc: storage)
hyperparams = lightning_checkpoint['hparams']
Some loggers also allow logging the hyperparams used in the experiment. For instance,
when using the TestTubeLogger or the TensorBoardLogger, all hyperparams will show
in the `hparams tab <https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_hparams>`_.
Snapshot code
^^^^^^^^^^^^^
Loggers also allow you to snapshot a copy of the code used in this experiment.
For example, TestTubeLogger does this with a flag:
.. code-block:: python
from pytorch_lightning.loggers import TestTubeLogger
logger = TestTubeLogger(create_git_tag=True)