History

Jirka Borovec aa52c930f4 test examples (#3643 ) * test examples * testing * testing * typo * req * exception Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>		2020-09-24 17:33:11 +02:00
..
README.md	fix dp issues + update examples and test examples (#3618 )	2020-09-23 00:19:46 -04:00
__init__.py	changes examples to pl_examples for name connflict	2019-10-19 00:41:17 +02:00
autoencoder.py	test examples (#3643 )	2020-09-24 17:33:11 +02:00
image_classifier.py	fix examples (#3631 )	2020-09-23 17:58:03 -04:00
mnist.py	fix examples (#3631 )	2020-09-23 17:58:03 -04:00
submit_ddp2_job.sh	fix dp issues + update examples and test examples (#3618 )	2020-09-23 00:19:46 -04:00
submit_ddp_job.sh	fix dp issues + update examples and test examples (#3618 )	2020-09-23 00:19:46 -04:00

README.md

Basic Examples

Use these examples to test how lightning works.

MNIST

Trains MNIST where the model is defined inside the LightningModule.

# cpu
python mnist.py

# gpus (any number)
python mnist.py

# dataparallel
python mnist.py --gpus 2 --distributed_backend 'dp'

Image classifier

Generic image classifier with an arbitrary backbone (ie: a simple system)

# cpu
python image_classifier.py

# gpus (any number)
python image_classifier.py --gpus 2

# dataparallel
python image_classifier.py --gpus 2 --distributed_backend 'dp'

Autoencoder

Showing the power of a system... arbitrarily complex training loops

# cpu
python autoencoder.py

# gpus (any number)
python autoencoder.py --gpus 2

# dataparallel
python autoencoder.py --gpus 2 --distributed_backend 'dp'

Multi-node example

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total). To run this demo do the following:

Log into the jumphost node of your SLURM-managed cluster.
Create a conda environment with Lightning and a GPU PyTorch version.
Choose a script to submit

DDP

Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)

sbatch submit_ddp_job.sh YourEnv

DDP2

Submit this job to run with a different implementation of DistributedDataParallel. In this version, each node acts like DataParallel but syncs across nodes like DDP.

sbatch submit_ddp2_job.sh YourEnv