9876df16a2
* Update Bolts link * Update Bolts link * formt Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> |
||
---|---|---|
.. | ||
README.md | ||
__init__.py | ||
autoencoder.py | ||
backbone_image_classifier.py | ||
conv_sequential_example.py | ||
dali_image_classifier.py | ||
mnist_datamodule.py | ||
profiler_example.py | ||
simple_image_classifier.py | ||
submit_ddp2_job.sh | ||
submit_ddp_job.sh |
README.md
Basic Examples
Use these examples to test how lightning works.
MNIST
Trains MNIST where the model is defined inside the LightningModule
.
# cpu
python simple_image_classifier.py
# gpus (any number)
python simple_image_classifier.py --gpus 2
# dataparallel
python simple_image_classifier.py --gpus 2 --distributed_backend 'dp'
MNIST with DALI
The MNIST example above using NVIDIA DALI. Requires NVIDIA DALI to be installed based on your CUDA version, see here.
python dali_image_classifier.py
Image classifier
Generic image classifier with an arbitrary backbone (ie: a simple system)
# cpu
python backbone_image_classifier.py
# gpus (any number)
python backbone_image_classifier.py --gpus 2
# dataparallel
python backbone_image_classifier.py --gpus 2 --distributed_backend 'dp'
Autoencoder
Showing the power of a system... arbitrarily complex training loops
# cpu
python autoencoder.py
# gpus (any number)
python autoencoder.py --gpus 2
# dataparallel
python autoencoder.py --gpus 2 --distributed_backend 'dp'
Multi-node example
This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total). To run this demo do the following:
- Log into the jumphost node of your SLURM-managed cluster.
- Create a conda environment with Lightning and a GPU PyTorch version.
- Choose a script to submit
DDP
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
sbatch submit_ddp_job.sh YourEnv
DDP2
Submit this job to run with a different implementation of DistributedDataParallel. In this version, each node acts like DataParallel but syncs across nodes like DDP.
sbatch submit_ddp2_job.sh YourEnv