lightning/pl_examples
Adrian Wälchli 6ad27187f3
Finish PR #2432: Imagenet example updates + basic testing (#2889)
* fix imagenet example: lr_scheduler, loader workers, batch size when ddp

* Fix evaluation for imagenet example

* add imagenet example test

* cleanup

* gpu

* add imagenet example evluation test

* fix test output

* test is fixed in master, remove unecessary hack

* CHANGE

* Apply suggestions from code review

* image net example

* update imagenet example

* update example

* pep

* imports

* type hint

* docs

* obsolete arg

* [wip] fix imagenet example: lr_scheduler, loader workers, batch size when ddp (#2432)

* fix imagenet example: lr_scheduler, loader workers, batch size when ddp

* Fix evaluation for imagenet example

* add imagenet example test

* cleanup

* gpu

* add imagenet example evluation test

* fix test output

* test is fixed in master, remove unecessary hack

* CHANGE

* Apply suggestions from code review

Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update chlog

* add missing chlog

* pep

* pep

Co-authored-by: Ruotian Luo <rluo@ttic.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-08-09 06:02:07 -04:00
..
basic_examples clean imports (#2867) 2020-08-08 00:33:51 +02:00
domain_templates Finish PR #2432: Imagenet example updates + basic testing (#2889) 2020-08-09 06:02:07 -04:00
models clean imports (#2867) 2020-08-08 00:33:51 +02:00
README.md simplify examples structure (#1247) 2020-04-03 17:57:34 -04:00
__init__.py simplify examples structure (#1247) 2020-04-03 17:57:34 -04:00
test_examples.py Finish PR #2432: Imagenet example updates + basic testing (#2889) 2020-08-09 06:02:07 -04:00

README.md

Examples

This folder has 3 sections:

Basic Examples

Use these examples to test how lightning works.

Test on CPU

python cpu_template.py

Train on a single GPU

python gpu_template.py --gpus 1

DataParallel (dp)

Train on multiple GPUs using DataParallel.

python gpu_template.py --gpus 2 --distributed_backend dp

DistributedDataParallel (ddp)

Train on multiple GPUs using DistributedDataParallel

python gpu_template.py --gpus 2 --distributed_backend ddp

DistributedDataParallel+DP (ddp2)

Train on multiple GPUs using DistributedDataParallel + dataparallel. On a single node, uses all GPUs for 1 model. Then shares gradient information across nodes.

python gpu_template.py --gpus 2 --distributed_backend ddp2

Multi-node example

This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total). To run this demo do the following:

  1. Log into the jumphost node of your SLURM-managed cluster.
  2. Create a conda environment with Lightning and a GPU PyTorch version.
  3. Choose a script to submit

DDP

Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)

sbatch ddp_job_submit.sh YourEnv

DDP2

Submit this job to run with a different implementation of DistributedDataParallel. In this version, each node acts like DataParallel but syncs across nodes like DDP.

sbatch ddp2_job_submit.sh YourEnv

Domain templates

These are templates to show common approaches such as GANs and RL.