added docs page
This commit is contained in:
parent
1de54e598e
commit
e2bcf1ecff
|
@ -8,13 +8,36 @@
|
||||||
- CPU example
|
- CPU example
|
||||||
- Single GPU example
|
- Single GPU example
|
||||||
- Multi-gpu example
|
- Multi-gpu example
|
||||||
- SLURM cluster example
|
- SLURM cluster grid search example
|
||||||
|
|
||||||
|
###### Training loop
|
||||||
|
- Accumulate gradients
|
||||||
|
- Check GPU usage
|
||||||
|
- Check which gradients are nan
|
||||||
|
- Check validation every n epochs
|
||||||
|
- Display metrics in progress bar
|
||||||
|
- Force training for min or max epochs
|
||||||
|
- Inspect gradient norms
|
||||||
|
- Hooks
|
||||||
|
- Learning rate annealing
|
||||||
|
- Make model overfit on subset of data
|
||||||
|
- Multiple optimizers (like GANs)
|
||||||
|
- Set how much of the training set to check (1-100%)
|
||||||
|
- training_step function
|
||||||
|
|
||||||
|
###### Validation loop
|
||||||
|
- Display metrics in progress bar
|
||||||
|
- hooks
|
||||||
|
- Set how much of the validation set to check (1-100%)
|
||||||
|
- Set validation check frequency within 1 training epoch (1-100%)
|
||||||
|
- validation_step function
|
||||||
|
- Why does validation run first for 5 steps?
|
||||||
|
|
||||||
###### Distributed training
|
###### Distributed training
|
||||||
- Single-gpu
|
- Single-gpu
|
||||||
- Multi-gpu
|
- Multi-gpu
|
||||||
- Multi-node
|
- Multi-node
|
||||||
|
- 16-bit mixed precision
|
||||||
|
|
||||||
###### Checkpointing
|
###### Checkpointing
|
||||||
- Model saving
|
- Model saving
|
||||||
|
@ -22,20 +45,6 @@
|
||||||
|
|
||||||
###### Computing cluster (SLURM)
|
###### Computing cluster (SLURM)
|
||||||
- Automatic checkpointing
|
- Automatic checkpointing
|
||||||
- Automatic saving, loading
|
- Automatic saving, loading
|
||||||
|
- Running grid search on a cluster
|
||||||
- Walltime auto-resubmit
|
- Walltime auto-resubmit
|
||||||
|
|
||||||
###### Common training use cases
|
|
||||||
- 16-bit mixed precision
|
|
||||||
- Accumulate gradients
|
|
||||||
- Check val many times during 1 training epoch
|
|
||||||
- Check GPU usage
|
|
||||||
- Check validation every n epochs
|
|
||||||
- Check which gradients are nan
|
|
||||||
- Inspect gradient norms
|
|
||||||
- Learning rate annealing
|
|
||||||
- Make model overfit on subset of data
|
|
||||||
- Min, max epochs
|
|
||||||
- Multiple optimizers (like GANs)
|
|
||||||
- Run a sanity check of model val and tng step
|
|
||||||
- Set how much of the tng, val, test sets to check (1-100%)
|
|
||||||
|
|
Loading…
Reference in New Issue