History

Jirka Borovec 7e2e874d95 Refactor: legacy accelerators and plugins (#5645 ) * tests: legacy * legacy: accel * legacy: plug * fix imports * mypy * flake8		2021-01-26 20:04:36 -05:00
..
README.md	Fix pre-commit trailing-whitespace and end-of-file-fixer hooks. (#5387 )	2021-01-26 14:27:56 +01:00
__init__.py	…
autoencoder.py	Apply isort to `pl_examples/` (#5291 )	2021-01-06 12:47:53 +01:00
backbone_image_classifier.py	Apply isort to `pl_examples/` (#5291 )	2021-01-06 12:47:53 +01:00
conv_sequential_example.py	Refactor: legacy accelerators and plugins (#5645 )	2021-01-26 20:04:36 -05:00
dali_image_classifier.py	fix formatting - flake8 + isort	2021-01-06 21:31:48 +01:00
mnist_datamodule.py	fix num_workers for Windows example (#5375 )	2021-01-06 19:28:30 -05:00
simple_image_classifier.py	update isort config (#5335 )	2021-01-06 12:49:23 +01:00
submit_ddp2_job.sh	Rename distributed_backend to accelerator in examples (#4657 )	2020-11-15 15:47:14 +01:00
submit_ddp_job.sh	Rename distributed_backend to accelerator in examples (#4657 )	2020-11-15 15:47:14 +01:00

README.md

Basic Examples

Use these examples to test how lightning works.

MNIST

Trains MNIST where the model is defined inside the LightningModule.

# cpu
python mnist.py

# gpus (any number)
python mnist.py

# dataparallel
python mnist.py --gpus 2 --distributed_backend 'dp'

MNIST with DALI

The MNIST example above using NVIDIA DALI. Requires NVIDIA DALI to be installed based on your CUDA version, see here.

python mnist_dali.py

Image classifier

Generic image classifier with an arbitrary backbone (ie: a simple system)

# cpu
python image_classifier.py

# gpus (any number)
python image_classifier.py --gpus 2

# dataparallel
python image_classifier.py --gpus 2 --distributed_backend 'dp'

Autoencoder

Showing the power of a system... arbitrarily complex training loops

# cpu
python autoencoder.py

# gpus (any number)
python autoencoder.py --gpus 2

# dataparallel
python autoencoder.py --gpus 2 --distributed_backend 'dp'

Multi-node example

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total). To run this demo do the following:

Log into the jumphost node of your SLURM-managed cluster.
Create a conda environment with Lightning and a GPU PyTorch version.
Choose a script to submit

DDP

Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)

sbatch submit_ddp_job.sh YourEnv

DDP2

Submit this job to run with a different implementation of DistributedDataParallel. In this version, each node acts like DataParallel but syncs across nodes like DDP.

sbatch submit_ddp2_job.sh YourEnv