From e2bcf1ecff8ac64210cbbd0a4132270b5c67ab6d Mon Sep 17 00:00:00 2001 From: William Falcon Date: Thu, 27 Jun 2019 08:31:39 -0400 Subject: [PATCH] added docs page --- docs/index.md | 43 ++++++++++++++++++++++++++----------------- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/docs/index.md b/docs/index.md index 721bd04201..4867166782 100644 --- a/docs/index.md +++ b/docs/index.md @@ -8,13 +8,36 @@ - CPU example - Single GPU example - Multi-gpu example -- SLURM cluster example +- SLURM cluster grid search example +###### Training loop +- Accumulate gradients +- Check GPU usage +- Check which gradients are nan +- Check validation every n epochs +- Display metrics in progress bar +- Force training for min or max epochs +- Inspect gradient norms +- Hooks +- Learning rate annealing +- Make model overfit on subset of data +- Multiple optimizers (like GANs) +- Set how much of the training set to check (1-100%) +- training_step function + +###### Validation loop +- Display metrics in progress bar +- hooks +- Set how much of the validation set to check (1-100%) +- Set validation check frequency within 1 training epoch (1-100%) +- validation_step function +- Why does validation run first for 5 steps? ###### Distributed training - Single-gpu - Multi-gpu - Multi-node +- 16-bit mixed precision ###### Checkpointing - Model saving @@ -22,20 +45,6 @@ ###### Computing cluster (SLURM) - Automatic checkpointing -- Automatic saving, loading +- Automatic saving, loading +- Running grid search on a cluster - Walltime auto-resubmit - -###### Common training use cases -- 16-bit mixed precision -- Accumulate gradients -- Check val many times during 1 training epoch -- Check GPU usage -- Check validation every n epochs -- Check which gradients are nan -- Inspect gradient norms -- Learning rate annealing -- Make model overfit on subset of data -- Min, max epochs -- Multiple optimizers (like GANs) -- Run a sanity check of model val and tng step -- Set how much of the tng, val, test sets to check (1-100%)