6ad27187f3
* fix imagenet example: lr_scheduler, loader workers, batch size when ddp * Fix evaluation for imagenet example * add imagenet example test * cleanup * gpu * add imagenet example evluation test * fix test output * test is fixed in master, remove unecessary hack * CHANGE * Apply suggestions from code review * image net example * update imagenet example * update example * pep * imports * type hint * docs * obsolete arg * [wip] fix imagenet example: lr_scheduler, loader workers, batch size when ddp (#2432) * fix imagenet example: lr_scheduler, loader workers, batch size when ddp * Fix evaluation for imagenet example * add imagenet example test * cleanup * gpu * add imagenet example evluation test * fix test output * test is fixed in master, remove unecessary hack * CHANGE * Apply suggestions from code review Co-authored-by: Jirka <jirka@pytorchlightning.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update chlog * add missing chlog * pep * pep Co-authored-by: Ruotian Luo <rluo@ttic.edu> Co-authored-by: Jirka <jirka@pytorchlightning.ai> |
||
---|---|---|
.. | ||
basic_examples | ||
domain_templates | ||
models | ||
README.md | ||
__init__.py | ||
test_examples.py |
README.md
Examples
This folder has 3 sections:
Basic Examples
Use these examples to test how lightning works.
Test on CPU
python cpu_template.py
Train on a single GPU
python gpu_template.py --gpus 1
DataParallel (dp)
Train on multiple GPUs using DataParallel.
python gpu_template.py --gpus 2 --distributed_backend dp
DistributedDataParallel (ddp)
Train on multiple GPUs using DistributedDataParallel
python gpu_template.py --gpus 2 --distributed_backend ddp
DistributedDataParallel+DP (ddp2)
Train on multiple GPUs using DistributedDataParallel + dataparallel. On a single node, uses all GPUs for 1 model. Then shares gradient information across nodes.
python gpu_template.py --gpus 2 --distributed_backend ddp2
Multi-node example
This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total). To run this demo do the following:
- Log into the jumphost node of your SLURM-managed cluster.
- Create a conda environment with Lightning and a GPU PyTorch version.
- Choose a script to submit
DDP
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
sbatch ddp_job_submit.sh YourEnv
DDP2
Submit this job to run with a different implementation of DistributedDataParallel. In this version, each node acts like DataParallel but syncs across nodes like DDP.
sbatch ddp2_job_submit.sh YourEnv
Domain templates
These are templates to show common approaches such as GANs and RL.