diff --git a/docs/source/advanced/multi_gpu.rst b/docs/source/advanced/multi_gpu.rst index 3e159f5ba7..c73ca126d5 100644 --- a/docs/source/advanced/multi_gpu.rst +++ b/docs/source/advanced/multi_gpu.rst @@ -288,9 +288,13 @@ after which the root node will aggregate the results. .. warning:: DP use is discouraged by PyTorch and Lightning. State is not maintained on the replicas created by the :class:`~torch.nn.DataParallel` wrapper and you may see errors or misbehavior if you assign state to the module - in the ``forward()`` or ``*_step()`` methods. For the same reason we do cannot fully support + in the ``forward()`` or ``*_step()`` methods. For the same reason we cannot fully support :ref:`manual_optimization` with DP. Use DDP which is more stable and at least 3x faster. +.. warning:: DP only supports scattering and gathering primitive collections of tensors like lists, dicts, etc. + Therefore the :meth:`~pytorch_lightning.core.hooks.ModelHooks.transfer_batch_to_device` hook does not apply in + this mode and if you have overridden it, it will not be called. + .. testcode:: :skipif: torch.cuda.device_count() < 2