diff --git a/docs/Trainer/Distributed training.md b/docs/Trainer/Distributed training.md
index dcd8a422b2..42f42d1bff 100644
--- a/docs/Trainer/Distributed training.md	
+++ b/docs/Trainer/Distributed training.md	
@@ -40,13 +40,32 @@ In this setting, the model will run on all 8 GPUs at once using DataParallel und
 os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
 os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"
 
-# DEFAULT
+
 trainer = Trainer(gpus=[0,1,2,3,4,5,6,7])
 ```
 
 ---
 #### Multi-node
-COMING SOON.
+Multi-node training is easily done by specifying these flags.
+```python
+# train on 12*8 GPUs
+trainer = Trainer(gpus=[0,1,2,3,4,5,6,7], nb_gpu_nodes=12)
+```
+
+In addition, make sure to set up your SLURM job correctly via the [SlurmClusterObject](https://williamfalcon.github.io/test-tube/hpc/SlurmCluster/). In particular, specify the number of tasks per node correctly.
+
+```python
+cluster = SlurmCluster(
+    hyperparam_optimizer=test_tube.HyperOptArgumentParser(),
+    log_path='/some/path/to/save',
+)
+
+# configure cluster
+cluster.per_experiment_nb_nodes = 12 
+cluster.per_experiment_nb_gpus = 8
+
+cluster.add_slurm_cmd(cmd='ntasks-per-node', value=8, comment='1 task per gpu')
+```
 
 ---
 #### Self-balancing architecture