added docs page

This commit is contained in:
William Falcon 2019-06-27 08:31:39 -04:00
parent 1de54e598e
commit e2bcf1ecff
1 changed files with 26 additions and 17 deletions

View File

@ -8,13 +8,36 @@
- CPU example - CPU example
- Single GPU example - Single GPU example
- Multi-gpu example - Multi-gpu example
- SLURM cluster example - SLURM cluster grid search example
###### Training loop
- Accumulate gradients
- Check GPU usage
- Check which gradients are nan
- Check validation every n epochs
- Display metrics in progress bar
- Force training for min or max epochs
- Inspect gradient norms
- Hooks
- Learning rate annealing
- Make model overfit on subset of data
- Multiple optimizers (like GANs)
- Set how much of the training set to check (1-100%)
- training_step function
###### Validation loop
- Display metrics in progress bar
- hooks
- Set how much of the validation set to check (1-100%)
- Set validation check frequency within 1 training epoch (1-100%)
- validation_step function
- Why does validation run first for 5 steps?
###### Distributed training ###### Distributed training
- Single-gpu - Single-gpu
- Multi-gpu - Multi-gpu
- Multi-node - Multi-node
- 16-bit mixed precision
###### Checkpointing ###### Checkpointing
- Model saving - Model saving
@ -22,20 +45,6 @@
###### Computing cluster (SLURM) ###### Computing cluster (SLURM)
- Automatic checkpointing - Automatic checkpointing
- Automatic saving, loading - Automatic saving, loading
- Running grid search on a cluster
- Walltime auto-resubmit - Walltime auto-resubmit
###### Common training use cases
- 16-bit mixed precision
- Accumulate gradients
- Check val many times during 1 training epoch
- Check GPU usage
- Check validation every n epochs
- Check which gradients are nan
- Inspect gradient norms
- Learning rate annealing
- Make model overfit on subset of data
- Min, max epochs
- Multiple optimizers (like GANs)
- Run a sanity check of model val and tng step
- Set how much of the tng, val, test sets to check (1-100%)