7dc58bd286
* squash variant a variant b add test revert rename add changelog docs move changelog entry to top use hooks wip wipp layer summary clean up, refactor type hints rename remove obsolete code rename unused imports simplify formatting of table and increase readability doctest superclass object update examples print unknown sizes more docs and doctest testing unknown layers add rnn test remove main restore train mode test device wip device constant simplify model forward transfer return summary object in method extend tests fix summary for empty module extend tests refactor and added hook variant a variant b add test revert rename add changelog docs move changelog entry to top remove hardcoded string simplify test unknown shapes and all others comments for tests fix hparams attribute * update default * unused import * clean up * replace hardcoded strings * fix doctest * fix top/full * black * fix rnn test * fix rnn * update debugging docs update docs typo update docs update docs * add changelog * extract constant * setter and getter * move parity models to test folder * parameterize mode |
||
---|---|---|
.. | ||
basic_examples | ||
domain_templates | ||
models | ||
README.md | ||
__init__.py |
README.md
Examples
This folder has 3 sections:
Basic Examples
Use these examples to test how lightning works.
Test on CPU
python cpu_template.py
Train on a single GPU
python gpu_template.py --gpus 1
DataParallel (dp)
Train on multiple GPUs using DataParallel.
python gpu_template.py --gpus 2 --distributed_backend dp
DistributedDataParallel (ddp)
Train on multiple GPUs using DistributedDataParallel
python gpu_template.py --gpus 2 --distributed_backend ddp
DistributedDataParallel+DP (ddp2)
Train on multiple GPUs using DistributedDataParallel + dataparallel. On a single node, uses all GPUs for 1 model. Then shares gradient information across nodes.
python gpu_template.py --gpus 2 --distributed_backend ddp2
Multi-node example
This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total). To run this demo do the following:
- Log into the jumphost node of your SLURM-managed cluster.
- Create a conda environment with Lightning and a GPU PyTorch version.
- Choose a script to submit
DDP
Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
sbatch ddp_job_submit.sh YourEnv
DDP2
Submit this job to run with a different implementation of DistributedDataParallel. In this version, each node acts like DataParallel but syncs across nodes like DDP.
sbatch ddp2_job_submit.sh YourEnv
Domain templates
These are templates to show common approaches such as GANs and RL.