4.8 KiB
4.8 KiB
PYTORCH-LIGHTNING DOCUMENTATION
Main Docs
New project Quick Start
- Define a LightningModule
- Pick a trainer
Quick start examples
- CPU example
- Single GPU example
- Multi-gpu example
- SLURM cluster grid search example
Checkpointing
- Model saving
- Model loading
######Computing cluster (SLURM)
- Automatic checkpointing
- Automatic saving, loading
- Running grid search on a cluster
- Walltime auto-resubmit
######Debugging
- Fast dev run
- Inspect gradient norms
- Log GPU usage
- Make model overfit on subset of data
- Print the parameter count by layer
- Pring which gradients are nan
######Distributed training
######Experiment Logging
- Display metrics in progress bar
- Log arbitrary metrics
- Log metric row every k batches
- Process position
- Save a snapshot of all hyperparameters
- Snapshot code for a training run
- Write logs file to csv every k batches
######Training loop
- Accumulate gradients
- Anneal Learning rate
- Force training for min or max epochs
- Force disable early stop
- Use multiple optimizers (like GANs)
- Set how much of the training set to check (1-100%)
######Validation loop