342 B

Raw Blame History

Multi-node example

To run this demo which launches a single job that trains on 2 nodes (2 gpus per node), do the following:

Log into the jumphost node of your SLURM-managed cluster.
Create a conda environment with Lightning and a GPU PyTorch version.
Submit this script.

sbatch job_submit.sh --env=YourEnv