added docs page

2019-06-27 08:31:39 -04:00 · 2019-06-27 08:31:39 -04:00 · e2bcf1ecff
parent 1de54e598e
commit e2bcf1ecff
1 changed files with 26 additions and 17 deletions
--- a/docs/index.md
+++ b/docs/index.md
@ -8,13 +8,36 @@
 - CPU example   
 - Single GPU example   
 - Multi-gpu example 
- SLURM cluster example      
+- SLURM cluster grid search example      
 ###### Training loop
 - Accumulate gradients
 - Check GPU usage
 - Check which gradients are nan
 - Check validation every n epochs
 - Display metrics in progress bar
 - Force training for min or max epochs
 - Inspect gradient norms
 - Hooks
 - Learning rate annealing
 - Make model overfit on subset of data
 - Multiple optimizers (like GANs)
 - Set how much of the training set to check (1-100%)
 - training_step function
 ###### Validation loop
 - Display metrics in progress bar
 - hooks
 - Set how much of the validation set to check (1-100%)
 - Set validation check frequency within 1 training epoch (1-100%)
 - validation_step function
 - Why does validation run first for 5 steps?
 ###### Distributed training
 - Single-gpu      
 - Multi-gpu      
 - Multi-node   
 - 16-bit mixed precision
 ###### Checkpointing
 - Model saving
@ -22,20 +45,6 @@
 ###### Computing cluster (SLURM)
 - Automatic checkpointing   
- Automatic saving, loading   
+- Automatic saving, loading  
 - Running grid search on a cluster 
 - Walltime auto-resubmit   
 ###### Common training use cases 
 - 16-bit mixed precision
 - Accumulate gradients
 - Check val many times during 1 training epoch
 - Check GPU usage
 - Check validation every n epochs
 - Check which gradients are nan
 - Inspect gradient norms
 - Learning rate annealing
 - Make model overfit on subset of data
 - Min, max epochs
 - Multiple optimizers (like GANs)
 - Run a sanity check of model val and tng step
 - Set how much of the tng, val, test sets to check (1-100%)