2021-01-07 05:24:47 +00:00
## Basic Examples
Use these examples to test how lightning works.
2019-10-05 18:13:55 +00:00
2020-09-23 04:19:46 +00:00
#### MNIST
Trains MNIST where the model is defined inside the LightningModule.
2019-10-05 18:13:55 +00:00
```bash
2020-09-23 04:19:46 +00:00
# cpu
python mnist.py
2019-10-05 18:13:55 +00:00
2020-09-23 04:19:46 +00:00
# gpus (any number)
python mnist.py
2019-10-05 18:13:55 +00:00
2020-09-23 04:19:46 +00:00
# dataparallel
python mnist.py --gpus 2 --distributed_backend 'dp'
```
2019-10-05 18:13:55 +00:00
2020-11-06 14:53:46 +00:00
---
#### MNIST with DALI
The MNIST example above using [NVIDIA DALI ](https://developer.nvidia.com/DALI ).
Requires NVIDIA DALI to be installed based on your CUDA version, see [here ](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html ).
```bash
python mnist_dali.py
```
---
2020-09-23 04:19:46 +00:00
#### Image classifier
Generic image classifier with an arbitrary backbone (ie: a simple system)
2019-10-05 18:13:55 +00:00
```bash
2020-09-23 04:19:46 +00:00
# cpu
python image_classifier.py
2019-10-05 18:13:55 +00:00
2020-09-23 04:19:46 +00:00
# gpus (any number)
python image_classifier.py --gpus 2
2019-10-05 18:13:55 +00:00
2020-09-23 04:19:46 +00:00
# dataparallel
python image_classifier.py --gpus 2 --distributed_backend 'dp'
2019-10-05 18:13:55 +00:00
```
2021-01-07 05:24:47 +00:00
---
2020-09-23 04:19:46 +00:00
#### Autoencoder
Showing the power of a system... arbitrarily complex training loops
2019-10-05 18:13:55 +00:00
```bash
2020-09-23 04:19:46 +00:00
# cpu
python autoencoder.py
2020-04-03 21:57:34 +00:00
2020-09-23 04:19:46 +00:00
# gpus (any number)
python autoencoder.py --gpus 2
2020-04-03 21:57:34 +00:00
2020-09-23 04:19:46 +00:00
# dataparallel
python autoencoder.py --gpus 2 --distributed_backend 'dp'
```
2021-01-07 05:24:47 +00:00
---
# Multi-node example
2020-04-03 21:57:34 +00:00
This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
To run this demo do the following:
2021-01-07 05:24:47 +00:00
1. Log into the jumphost node of your SLURM-managed cluster.
2. Create a conda environment with Lightning and a GPU PyTorch version.
3. Choose a script to submit
2020-04-03 21:57:34 +00:00
2021-01-07 05:24:47 +00:00
#### DDP
2020-04-03 21:57:34 +00:00
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
```bash
2020-09-23 04:19:46 +00:00
sbatch submit_ddp_job.sh YourEnv
2020-04-03 21:57:34 +00:00
```
2021-01-07 05:24:47 +00:00
#### DDP2
2020-04-03 21:57:34 +00:00
Submit this job to run with a different implementation of DistributedDataParallel.
In this version, each node acts like DataParallel but syncs across nodes like DDP.
```bash
2020-09-23 04:19:46 +00:00
sbatch submit_ddp2_job.sh YourEnv
2020-04-03 21:57:34 +00:00
```