2020-05-05 02:16:54 +00:00
|
|
|
.. testsetup:: *
|
|
|
|
|
|
|
|
from pytorch_lightning.trainer.trainer import Trainer
|
|
|
|
|
|
|
|
|
2020-02-11 04:55:22 +00:00
|
|
|
Experiment Reporting
|
|
|
|
=====================
|
|
|
|
|
|
|
|
Lightning supports many different experiment loggers. These loggers allow you to monitor losses, images, text, etc...
|
|
|
|
as training progresses. They usually provide a GUI to visualize and can sometimes even snapshot hyperparameters
|
|
|
|
used in each experiment.
|
|
|
|
|
|
|
|
|
|
|
|
Control logging frequency
|
2020-02-11 12:41:15 +00:00
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
It may slow training down to log every single batch. Trainer has an option to log every k batches instead.
|
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
.. testcode::
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
k = 10
|
|
|
|
trainer = Trainer(row_log_interval=k)
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
Control log writing frequency
|
2020-02-11 12:41:15 +00:00
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
Writing to a logger can be expensive. In Lightning you can set the interval at which you
|
|
|
|
want to log using this trainer flag.
|
|
|
|
|
2020-03-20 19:49:01 +00:00
|
|
|
.. seealso::
|
|
|
|
:class:`~pytorch_lightning.trainer.trainer.Trainer`
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
.. testcode::
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
k = 100
|
|
|
|
trainer = Trainer(log_save_interval=k)
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
Log metrics
|
2020-02-11 12:41:15 +00:00
|
|
|
^^^^^^^^^^^
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-03-14 17:02:14 +00:00
|
|
|
To plot metrics into whatever logger you passed in (tensorboard, comet, neptune, TRAINS, etc...)
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-03-05 17:32:45 +00:00
|
|
|
1. training_epoch_end, validation_epoch_end, test_epoch_end will all log anything in the "log" key of the return dict.
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
.. testcode::
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
def training_epoch_end(self, outputs):
|
|
|
|
loss = some_loss()
|
|
|
|
...
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
logs = {'train_loss': loss}
|
|
|
|
results = {'log': logs}
|
|
|
|
return results
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
def validation_epoch_end(self, outputs):
|
|
|
|
loss = some_loss()
|
|
|
|
...
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
logs = {'val_loss': loss}
|
|
|
|
results = {'log': logs}
|
|
|
|
return results
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
def test_epoch_end(self, outputs):
|
|
|
|
loss = some_loss()
|
|
|
|
...
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
logs = {'test_loss': loss}
|
|
|
|
results = {'log': logs}
|
|
|
|
return results
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-03-05 17:32:45 +00:00
|
|
|
2. In addition, you can also use any arbitrary functionality from a particular logger from within your LightningModule.
|
2020-02-11 04:55:22 +00:00
|
|
|
For instance, here we log images using tensorboard.
|
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
.. testcode::
|
|
|
|
:skipif: not TORCHVISION_AVAILABLE
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
def training_step(self, batch, batch_idx):
|
|
|
|
self.generated_imgs = self.decoder.generate()
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
sample_imgs = self.generated_imgs[:6]
|
|
|
|
grid = torchvision.utils.make_grid(sample_imgs)
|
|
|
|
self.logger.experiment.add_image('generated_images', grid, 0)
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
...
|
|
|
|
return results
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
Modify progress bar
|
2020-02-11 12:41:15 +00:00
|
|
|
^^^^^^^^^^^^^^^^^^^
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
Each return dict from the training_end, validation_end, testing_end and training_step also has
|
|
|
|
a key called "progress_bar".
|
|
|
|
|
|
|
|
Here we show the validation loss in the progress bar
|
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
.. testcode::
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
def validation_epoch_end(self, outputs):
|
|
|
|
loss = some_loss()
|
|
|
|
...
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
logs = {'val_loss': loss}
|
|
|
|
results = {'progress_bar': logs}
|
|
|
|
return results
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
Snapshot hyperparameters
|
2020-02-11 12:41:15 +00:00
|
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
2020-06-17 21:44:11 +00:00
|
|
|
|
2020-02-11 04:55:22 +00:00
|
|
|
When training a model, it's useful to know what hyperparams went into that model.
|
|
|
|
When Lightning creates a checkpoint, it stores a key "hparams" with the hyperparams.
|
|
|
|
|
|
|
|
.. code-block:: python
|
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
lightning_checkpoint = torch.load(filepath, map_location=lambda storage, loc: storage)
|
|
|
|
hyperparams = lightning_checkpoint['hparams']
|
2020-02-11 04:55:22 +00:00
|
|
|
|
|
|
|
Some loggers also allow logging the hyperparams used in the experiment. For instance,
|
|
|
|
when using the TestTubeLogger or the TensorBoardLogger, all hyperparams will show
|
|
|
|
in the `hparams tab <https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_hparams>`_.
|
|
|
|
|
|
|
|
Snapshot code
|
2020-02-11 12:41:15 +00:00
|
|
|
^^^^^^^^^^^^^
|
2020-06-17 21:44:11 +00:00
|
|
|
|
2020-02-11 04:55:22 +00:00
|
|
|
Loggers also allow you to snapshot a copy of the code used in this experiment.
|
|
|
|
For example, TestTubeLogger does this with a flag:
|
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
.. testcode::
|
2020-02-11 04:55:22 +00:00
|
|
|
|
2020-05-05 02:16:54 +00:00
|
|
|
from pytorch_lightning.loggers import TestTubeLogger
|
|
|
|
logger = TestTubeLogger('.', create_git_tag=True)
|