diff --git a/docs/source/multi_gpu.rst b/docs/source/multi_gpu.rst
index e528bab3bb..1d696a7ec8 100644
--- a/docs/source/multi_gpu.rst
+++ b/docs/source/multi_gpu.rst
@@ -260,7 +260,7 @@ Distributed Data Parallel
 
 3. Each process inits the model.
 
-.. note:: Make sure to set the random seed so that each model initializes with the same weights.
+.. note:: Make sure to set the random seed before the instantiation of a ``Trainer()`` so that each model initializes with the same weights.
 
 4. Each process performs a full forward and backward pass in parallel.