lightning/pl_examples/multi_node_examples
Jirka Borovec 3a58937d8b rename variables nb -> num (#567)
* rename nb -> num

* flake8

* batch_nb, epoch_nb, gpu_nb, split_nb

* add _num deprecations
2019-12-04 06:57:10 -05:00
..
README.md changes examples to pl_examples for name connflict 2019-10-19 00:41:17 +02:00
__init__.py changes examples to pl_examples for name connflict 2019-10-19 00:41:17 +02:00
ddp2_job_submit.sh changes examples to pl_examples for name connflict 2019-10-19 00:41:17 +02:00
ddp_job_submit.sh changes examples to pl_examples for name connflict 2019-10-19 00:41:17 +02:00
multi_node_ddp2_demo.py rename variables nb -> num (#567) 2019-12-04 06:57:10 -05:00
multi_node_ddp_demo.py rename variables nb -> num (#567) 2019-12-04 06:57:10 -05:00

README.md

Multi-node example

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total). To run this demo do the following:

  1. Log into the jumphost node of your SLURM-managed cluster.
  2. Create a conda environment with Lightning and a GPU PyTorch version.
  3. Choose a script to submit

DDP

Submit this job to run with distributedDataParallel (2 nodes, 2 gpus each)

sbatch ddp_job_submit.sh YourEnv

DDP2

Submit this job to run with a different implementation of distributedDataParallel. In this version, each node acts like DataParallel but syncs across nodes like DDP.

sbatch ddp2_job_submit.sh YourEnv