lightning/pl_examples/basic_examples
Jirka Borovec 9a5d40aff4
test PL examples (#4551)
* test PL examples

* minor formatting

* skip failing

* skip failing

* args

* mnist datamodule

* refactor tests

* refactor tests

* skip

* skip

* drop DM

* drop DM

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-17 19:35:17 +01:00
..
README.md Add Dali MNIST example (#3721) 2020-11-06 14:53:46 +00:00
__init__.py changes examples to pl_examples for name connflict 2019-10-19 00:41:17 +02:00
autoencoder.py ref: part a of #3733 (#3766) 2020-10-01 08:15:23 -04:00
image_classifier.py ref: remove weight loading hack for ddp_cpu (#3808) 2020-10-02 19:28:50 -04:00
mnist_classifier.py test PL examples (#4551) 2020-11-17 19:35:17 +01:00
mnist_classifier_dali.py test PL examples (#4551) 2020-11-17 19:35:17 +01:00
submit_ddp2_job.sh Rename distributed_backend to accelerator in examples (#4657) 2020-11-15 15:47:14 +01:00
submit_ddp_job.sh Rename distributed_backend to accelerator in examples (#4657) 2020-11-15 15:47:14 +01:00

README.md

Basic Examples

Use these examples to test how lightning works.

MNIST

Trains MNIST where the model is defined inside the LightningModule.

# cpu
python mnist.py

# gpus (any number)
python mnist.py

# dataparallel
python mnist.py --gpus 2 --distributed_backend 'dp'

MNIST with DALI

The MNIST example above using NVIDIA DALI. Requires NVIDIA DALI to be installed based on your CUDA version, see here.

python mnist_dali.py

Image classifier

Generic image classifier with an arbitrary backbone (ie: a simple system)

# cpu
python image_classifier.py

# gpus (any number)
python image_classifier.py --gpus 2

# dataparallel
python image_classifier.py --gpus 2 --distributed_backend 'dp'

Autoencoder

Showing the power of a system... arbitrarily complex training loops

# cpu
python autoencoder.py

# gpus (any number)
python autoencoder.py --gpus 2

# dataparallel
python autoencoder.py --gpus 2 --distributed_backend 'dp'

Multi-node example

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total). To run this demo do the following:

  1. Log into the jumphost node of your SLURM-managed cluster.
  2. Create a conda environment with Lightning and a GPU PyTorch version.
  3. Choose a script to submit

DDP

Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)

sbatch submit_ddp_job.sh YourEnv

DDP2

Submit this job to run with a different implementation of DistributedDataParallel. In this version, each node acts like DataParallel but syncs across nodes like DDP.

sbatch submit_ddp2_job.sh YourEnv