added docs page
This commit is contained in:
parent
1de54e598e
commit
e2bcf1ecff
|
@ -8,13 +8,36 @@
|
|||
- CPU example
|
||||
- Single GPU example
|
||||
- Multi-gpu example
|
||||
- SLURM cluster example
|
||||
- SLURM cluster grid search example
|
||||
|
||||
###### Training loop
|
||||
- Accumulate gradients
|
||||
- Check GPU usage
|
||||
- Check which gradients are nan
|
||||
- Check validation every n epochs
|
||||
- Display metrics in progress bar
|
||||
- Force training for min or max epochs
|
||||
- Inspect gradient norms
|
||||
- Hooks
|
||||
- Learning rate annealing
|
||||
- Make model overfit on subset of data
|
||||
- Multiple optimizers (like GANs)
|
||||
- Set how much of the training set to check (1-100%)
|
||||
- training_step function
|
||||
|
||||
###### Validation loop
|
||||
- Display metrics in progress bar
|
||||
- hooks
|
||||
- Set how much of the validation set to check (1-100%)
|
||||
- Set validation check frequency within 1 training epoch (1-100%)
|
||||
- validation_step function
|
||||
- Why does validation run first for 5 steps?
|
||||
|
||||
###### Distributed training
|
||||
- Single-gpu
|
||||
- Multi-gpu
|
||||
- Multi-node
|
||||
- 16-bit mixed precision
|
||||
|
||||
###### Checkpointing
|
||||
- Model saving
|
||||
|
@ -22,20 +45,6 @@
|
|||
|
||||
###### Computing cluster (SLURM)
|
||||
- Automatic checkpointing
|
||||
- Automatic saving, loading
|
||||
- Automatic saving, loading
|
||||
- Running grid search on a cluster
|
||||
- Walltime auto-resubmit
|
||||
|
||||
###### Common training use cases
|
||||
- 16-bit mixed precision
|
||||
- Accumulate gradients
|
||||
- Check val many times during 1 training epoch
|
||||
- Check GPU usage
|
||||
- Check validation every n epochs
|
||||
- Check which gradients are nan
|
||||
- Inspect gradient norms
|
||||
- Learning rate annealing
|
||||
- Make model overfit on subset of data
|
||||
- Min, max epochs
|
||||
- Multiple optimizers (like GANs)
|
||||
- Run a sanity check of model val and tng step
|
||||
- Set how much of the tng, val, test sets to check (1-100%)
|
||||
|
|
Loading…
Reference in New Issue