Clarify `self.log(..., rank_zero_only=True|False)` (#19056)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
This commit is contained in:
Adrian Wälchli 2023-11-23 19:02:21 +01:00 committed by GitHub
parent a6da1e3351
commit 90043798e4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 29 additions and 10 deletions

View File

@ -121,14 +121,16 @@ It is possible to perform some computation manually and log the reduced result o
mean = torch.mean(self.all_gather(self.outputs))
self.outputs.clear() # free memory
# When logging only on rank 0, don't forget to add
# When you call `self.log` only on rank 0, don't forget to add
# `rank_zero_only=True` to avoid deadlocks on synchronization.
# caveat: monitoring this is unimplemented. see https://github.com/Lightning-AI/lightning/issues/15852
# Caveat: monitoring this is unimplemented, see https://github.com/Lightning-AI/lightning/issues/15852
if self.trainer.is_global_zero:
self.log("my_reduced_metric", mean, rank_zero_only=True)
----
**********************
Make models pickleable
**********************

View File

@ -141,7 +141,7 @@ The :meth:`~lightning.pytorch.core.LightningModule.log` method has a few options
* ``sync_dist_group``: The DDP group to sync across.
* ``add_dataloader_idx``: If True, appends the index of the current dataloader to the name (when using multiple dataloaders). If False, user needs to give unique names for each dataloader to not mix the values.
* ``batch_size``: Current batch size used for accumulating logs logged with ``on_epoch=True``. This will be directly inferred from the loaded batch, but for some data structures you might need to explicitly provide it.
* ``rank_zero_only``: Whether the value will be logged only on rank 0. This will prevent synchronization which would produce a deadlock as not all processes would perform this log call.
* ``rank_zero_only``: Set this to ``True`` only if you call ``self.log`` explicitly only from rank 0. If ``True`` you won't be able to access or specify this metric in callbacks (e.g. early stopping).
.. list-table:: Default behavior of logging in Callback or LightningModule
:widths: 50 25 25

View File

@ -196,13 +196,26 @@ If set to True, logs will be sent to the progress bar.
rank_zero_only
==============
**Default:** True
**Default:** False
Whether the value will be logged only on rank 0. This will prevent synchronization which would produce a deadlock as not all processes would perform this log call.
Tells Lightning if you are calling ``self.log`` from every process (default) or only from rank 0.
This is for advanced users who want to reduce their metric manually across processes, but still want to benefit from automatic logging via ``self.log``.
- Set ``False`` (default) if you are calling ``self.log`` from every process.
- Set ``True`` if you are calling ``self.log`` from rank 0 only. Caveat: you won't be able to use this metric as a monitor in callbacks (e.g., early stopping).
.. code-block:: python
self.log(rank_zero_only=True)
# Default
self.log(..., rank_zero_only=False)
# If you call `self.log` on rank 0 only, you need to set `rank_zero_only=True`
if self.trainer.global_rank == 0:
self.log(..., rank_zero_only=True)
# DON'T do this, it will cause deadlocks!
self.log(..., rank_zero_only=True)
----

View File

@ -400,8 +400,10 @@ class LightningModule(
but for some data structures you might need to explicitly provide it.
metric_attribute: To restore the metric state, Lightning requires the reference of the
:class:`torchmetrics.Metric` in your model. This is found automatically if it is a model attribute.
rank_zero_only: Whether the value will be logged only on rank 0. This will prevent synchronization which
would produce a deadlock as not all processes would perform this log call.
rank_zero_only: Tells Lightning if you are calling ``self.log`` from every process (default) or only from
rank 0. If ``True``, you won't be able to use this metric as a monitor in callbacks
(e.g., early stopping). Warning: Improper use can lead to deadlocks! See
:ref:`Advanced Logging <visualize/logging_advanced:rank_zero_only>` for more details.
"""
if self._fabric is not None:
@ -563,8 +565,10 @@ class LightningModule(
each dataloader to not mix values.
batch_size: Current batch size. This will be directly inferred from the loaded batch,
but some data structures might need to explicitly provide it.
rank_zero_only: Whether the value will be logged only on rank 0. This will prevent synchronization which
would produce a deadlock as not all processes would perform this log call.
rank_zero_only: Tells Lightning if you are calling ``self.log`` from every process (default) or only from
rank 0. If ``True``, you won't be able to use this metric as a monitor in callbacks
(e.g., early stopping). Warning: Improper use can lead to deadlocks! See
:ref:`Advanced Logging <visualize/logging_advanced:rank_zero_only>` for more details.
"""
if self._fabric is not None: