**Audience:** Users who want to do advanced speed optimizations by customizing the logging behavior.
----
****************************
Change progress bar defaults
****************************
To change the default values (ie: version number) shown in the progress bar, override the :meth:`~pytorch_lightning.callbacks.progress.base.ProgressBarBase.get_metrics` method in your logger.
..code-block:: python
from pytorch_lightning.callbacks.progress import Tqdm
class CustomProgressBar(Tqdm):
def get_metrics(self, *args, **kwargs):
# don't show the version number
items = super().get_metrics()
items.pop("v_num", None)
return items
----
************************************
Customize tracking to speed up model
************************************
Modify logging frequency
========================
Logging a metric on every single batch can slow down training. By default, Lightning logs every 50 rows, or 50 training steps.
To change this behaviour, set the *log_every_n_steps*:class:`~pytorch_lightning.trainer.trainer.Trainer` flag.
..testcode::
k = 10
trainer = Trainer(log_every_n_steps=k)
----
Modify flushing frequency
=========================
Metrics are kept in memory for N steps to improve training efficiency. Every N steps, metrics flush to disk. To change the frequency of this flushing, use the *flush_logs_every_n_steps* Trainer argument.
..code-block:: python
# faster training, high memory
Trainer(flush_logs_every_n_steps=500)
# slower training, low memory
Trainer(flush_logs_every_n_steps=500)
The higher *flush_logs_every_n_steps* is, the faster the model will train but the memory will build up until the next flush.
The smaller *flush_logs_every_n_steps* is, the slower the model will train but memory will be kept to a minimum.
TODO: chart
----
******************
Customize self.log
******************
The LightningModule *self.log* method offers many configurations to customize its behavior.
----
add_dataloader_idx
==================
**Default:** True
If True, appends the index of the current dataloader to the name (when using multiple dataloaders). If False, user needs to give unique names for each dataloader to not mix the values.
..code-block:: python
self.log(add_dataloader_idx=True)
----
batch_size
==========
**Default:** None
Current batch size used for accumulating logs logged with ``on_epoch=True``. This will be directly inferred from the loaded batch, but for some data structures you might need to explicitly provide it.
..code-block:: python
self.log(batch_size=32)
----
enable_graph
============
**Default:** True
If True, will not auto detach the graph.
..code-block:: python
self.log(enable_graph=True)
----
logger
======
**Default:** True
Send logs to the logger like ``Tensorboard``, or any other custom logger passed to the :class:`~pytorch_lightning.trainer.trainer.Trainer` (Default: ``True``).
..code-block:: python
self.log(logger=True)
----
on_epoch
========
**Default:** It varies
If this is True, that specific *self.log* call accumulates and reduces all metrics to the end of the epoch.
..code-block:: python
self.log(on_epoch=True)
The default value depends in which function this is called
..code-block:: python
def training_step(self, batch, batch_idx):
# Default: False
self.log(on_epoch=False)
def validation_step(self, batch, batch_idx):
# Default: True
self.log(on_epoch=True)
def test_step(self, batch, batch_idx):
# Default: True
self.log(on_epoch=True)
----
on_step
=======
**Default:** It varies
If this is True, that specific *self.log* call will NOT accumulate metrics. Instead it will generate a timeseries across steps.
..code-block:: python
self.log(on_step=True)
The default value depends in which function this is called
..code-block:: python
def training_step(self, batch, batch_idx):
# Default: True
self.log(on_step=True)
def validation_step(self, batch, batch_idx):
# Default: False
self.log(on_step=False)
def test_step(self, batch, batch_idx):
# Default: False
self.log(on_step=False)
----
prog_bar
========
**Default:** False
If set to True, logs will be sent to the progress bar.
..code-block:: python
self.log(prog_bar=True)
----
rank_zero_only
==============
**Default:** True
Whether the value will be logged only on rank 0. This will prevent synchronization which would produce a deadlock as not all processes would perform this log call.
Reduction function over step values for end of epoch. Uses :meth:`torch.mean` by default and is not applied when a :class:`torchmetrics.Metric` is logged.
If True, reduces the metric across devices. Use with care as this may lead to a significant communication overhead.
..code-block:: python
self.log(sync_dist=False)
----
sync_dist_group
===============
**Default:** None
The DDP group to sync across.
..code-block:: python
import torch.distributed as dist
group = dist.init_process_group("nccl", rank=self.global_rank, world_size=self.world_size)
self.log(sync_dist_group=group)
----
***************************************
Enable metrics for distributed training
***************************************
For certain types of metrics that need complex aggregation, we recommended to build your metric using torchmetric which ensures all the complexities of metric aggregation in distributed environments is handled.
First, implement your metric:
..code-block:: python
import torch
import torchmetrics
class MyAccuracy(Metric):
def __init__(self, dist_sync_on_step=False):
# call `self.add_state`for every internal state that is needed for the metrics computations
# dist_reduce_fx indicates the function that should be used to reduce