57 lines
1.7 KiB
Markdown
57 lines
1.7 KiB
Markdown
# PYTORCH-LIGHTNING DOCUMENTATION
|
|
|
|
###### Main Docs
|
|
- [LightningModule](Pytorch-Lightning/LightningModule)
|
|
- [Trainer](Trainer/)
|
|
|
|
###### New project Quick Start
|
|
1. [Define a LightningModule](https://github.com/williamFalcon/pytorch-lightning/blob/master/examples/new_project_templates/lightning_module_template.py)
|
|
2. Pick a trainer
|
|
- [Basic CPU Trainer](https://github.com/williamFalcon/pytorch-lightning/blob/master/examples/new_project_templates/trainer_cpu_template.py)
|
|
- [GPU cluster Trainer](https://github.com/williamFalcon/pytorch-lightning/blob/master/examples/new_project_templates/trainer_gpu_cluster_template.py)
|
|
|
|
###### Quick start examples
|
|
- CPU example
|
|
- Single GPU example
|
|
- Multi-gpu example
|
|
- SLURM cluster grid search example
|
|
|
|
###### Training loop
|
|
- Accumulate gradients
|
|
- Check GPU usage
|
|
- Check which gradients are nan
|
|
- Check validation every n epochs
|
|
- Display metrics in progress bar
|
|
- Force training for min or max epochs
|
|
- Inspect gradient norms
|
|
- Hooks
|
|
- Learning rate annealing
|
|
- Make model overfit on subset of data
|
|
- Multiple optimizers (like GANs)
|
|
- Set how much of the training set to check (1-100%)
|
|
- training_step function
|
|
|
|
###### Validation loop
|
|
- Display metrics in progress bar
|
|
- hooks
|
|
- Set how much of the validation set to check (1-100%)
|
|
- Set validation check frequency within 1 training epoch (1-100%)
|
|
- validation_step function
|
|
- Why does validation run first for 5 steps?
|
|
|
|
###### Distributed training
|
|
- Single-gpu
|
|
- Multi-gpu
|
|
- Multi-node
|
|
- 16-bit mixed precision
|
|
|
|
###### Checkpointing
|
|
- Model saving
|
|
- Model loading
|
|
|
|
###### Computing cluster (SLURM)
|
|
- Automatic checkpointing
|
|
- Automatic saving, loading
|
|
- Running grid search on a cluster
|
|
- Walltime auto-resubmit
|