diff --git a/docs/source/advanced/multi_gpu.rst b/docs/source/advanced/multi_gpu.rst
index 3e159f5ba7..c73ca126d5 100644
--- a/docs/source/advanced/multi_gpu.rst
+++ b/docs/source/advanced/multi_gpu.rst
@@ -288,9 +288,13 @@ after which the root node will aggregate the results.
 
 .. warning:: DP use is discouraged by PyTorch and Lightning. State is not maintained on the replicas created by the
     :class:`~torch.nn.DataParallel` wrapper and you may see errors or misbehavior if you assign state to the module
-    in the ``forward()`` or ``*_step()`` methods. For the same reason we do cannot fully support
+    in the ``forward()`` or ``*_step()`` methods. For the same reason we cannot fully support
     :ref:`manual_optimization` with DP. Use DDP which is more stable and at least 3x faster.
 
+.. warning:: DP only supports scattering and gathering primitive collections of tensors like lists, dicts, etc.
+    Therefore the :meth:`~pytorch_lightning.core.hooks.ModelHooks.transfer_batch_to_device` hook does not apply in
+    this mode and if you have overridden it, it will not be called.
+
 .. testcode::
     :skipif: torch.cuda.device_count() < 2