lightning/pl_examples/README.md

# Examples   
This folder has 3 sections:   

## Basic Examples   
Use these examples to test how lightning works.   

#### Test on CPU  
```bash
python cpu_template.py
```

---   
#### Train on a single GPU
```bash
python gpu_template.py --gpus 1
```   

---    
#### DataParallel (dp)   
Train on multiple GPUs using DataParallel.

```bash
python gpu_template.py --gpus 2 --distributed_backend dp
```   

---
#### DistributedDataParallel (ddp)    

Train on multiple GPUs using DistributedDataParallel   
```bash
python gpu_template.py --gpus 2 --distributed_backend ddp
```

---
#### DistributedDataParallel+DP (ddp2)    

Train on multiple GPUs using DistributedDataParallel + dataparallel.
On a single node, uses all GPUs for 1 model. Then shares gradient information
across nodes.   
```bash
python gpu_template.py --gpus 2 --distributed_backend ddp2
```

## Multi-node example   

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
To run this demo do the following:

1. Log into the jumphost node of your SLURM-managed cluster.  
2. Create a conda environment with Lightning and a GPU PyTorch version.   
3. Choose a script to submit    

### DDP  
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
```bash
sbatch ddp_job_submit.sh YourEnv
```

### DDP2  
Submit this job to run with a different implementation of DistributedDataParallel.
In this version, each node acts like DataParallel but syncs across nodes like DDP.
```bash
sbatch ddp2_job_submit.sh YourEnv
```

## Domain templates   
These are templates to show common approaches such as GANs and RL.
cleaned up demos 2019-10-05 18:13:55 +00:00			`# Examples`
simplify examples structure (#1247) * simplify examples structure * update changelog * fix imports * rename example * rename scripts * changelog 2020-04-03 21:57:34 +00:00			`This folder has 3 sections:`
Example docs formatting (#1364) * update basic examples * update domain examples * reinforse -> reinforce * update full examples * update multi node examples * update examples readme * fix copy paste * fix line too long 2020-04-03 19:01:40 +00:00
simplify examples structure (#1247) * simplify examples structure * update changelog * fix imports * rename example * rename scripts * changelog 2020-04-03 21:57:34 +00:00			`## Basic Examples`
			`Use these examples to test how lightning works.`
cleaned up demos 2019-10-05 18:13:55 +00:00
simplify examples structure (#1247) * simplify examples structure * update changelog * fix imports * rename example * rename scripts * changelog 2020-04-03 21:57:34 +00:00			`#### Test on CPU`
			```bash
			`python cpu_template.py`
			```
cleaned up demos 2019-10-05 18:13:55 +00:00
simplify examples structure (#1247) * simplify examples structure * update changelog * fix imports * rename example * rename scripts * changelog 2020-04-03 21:57:34 +00:00			`---`
			`#### Train on a single GPU`
			```bash
			`python gpu_template.py --gpus 1`
			```
cleaned up demos 2019-10-05 18:13:55 +00:00
simplify examples structure (#1247) * simplify examples structure * update changelog * fix imports * rename example * rename scripts * changelog 2020-04-03 21:57:34 +00:00			`---`
			`#### DataParallel (dp)`
			`Train on multiple GPUs using DataParallel.`

			```bash
			`python gpu_template.py --gpus 2 --distributed_backend dp`
			```

			`---`
			`#### DistributedDataParallel (ddp)`

			`Train on multiple GPUs using DistributedDataParallel`
			```bash
			`python gpu_template.py --gpus 2 --distributed_backend ddp`
			```

			`---`
			`#### DistributedDataParallel+DP (ddp2)`

			`Train on multiple GPUs using DistributedDataParallel + dataparallel.`
			`On a single node, uses all GPUs for 1 model. Then shares gradient information`
			`across nodes.`
			```bash
			`python gpu_template.py --gpus 2 --distributed_backend ddp2`
			```

			`## Multi-node example`

			`This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).`
			`To run this demo do the following:`

			`1. Log into the jumphost node of your SLURM-managed cluster.`
			`2. Create a conda environment with Lightning and a GPU PyTorch version.`
			`3. Choose a script to submit`

			`### DDP`
			`Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)`
			```bash
			`sbatch ddp_job_submit.sh YourEnv`
			```

			`### DDP2`
			`Submit this job to run with a different implementation of DistributedDataParallel.`
			`In this version, each node acts like DataParallel but syncs across nodes like DDP.`
			```bash
			`sbatch ddp2_job_submit.sh YourEnv`
			```

			`## Domain templates`
			`These are templates to show common approaches such as GANs and RL.`