added val loop options

2019-06-27 13:47:15 -04:00 · 2019-06-27 13:47:15 -04:00 · c636193c44
parent db29488847
commit c636193c44
2 changed files with 19 additions and 66 deletions
--- a/docs/Trainer/Training
+++ b/docs/Trainer/Training
@ -22,38 +22,6 @@ trainer = Trainer(lr_scheduler_milestones=None)
 trainer = Trainer(lr_scheduler_milestones=[100, 200, 300])
 ```

---
-#### Check GPU usage
-Lightning automatically logs gpu usage to the test tube logs. It'll only do it at the metric logging interval, so it doesn't slow down training.
-
---
-#### Check which gradients are nan 
-This option prints a list of tensors with nan gradients.
-``` {.python}
-# DEFAULT
-trainer = Trainer(print_nan_grads=False)
-```
-
---
-#### Display metrics in progress bar 
-``` {.python}
-# DEFAULT
-trainer = Trainer(progress_bar=True)
-```
-
---
-#### Display the parameter count by layer
-By default lightning prints a list of parameters *and submodules* when it starts training.
-
---
-#### Fast dev run 
-This flag is meant for debugging a full train/val/test loop. It'll activate callbacks, everything but only with 1 training and 1 validation batch.
-Use this to debug a full run of your program quickly
-``` {.python}
-# DEFAULT
-trainer = Trainer(fast_dev_run=False)
-```
-
 ---
 #### Force training for min or max epochs
 It can be useful to force training for a minimum number of epochs or limit to a max number
@ -82,30 +50,6 @@ trainer = Trainer(track_grad_norm=2)
 ```


---
-#### Make model overfit on subset of data
-A useful debugging trick is to make your model overfit a tiny fraction of the data.
-``` {.python}
-# DEFAULT don't overfit (ie: normal training)
-trainer = Trainer(overfit_pct=0.0)
-
-# overfit on 1% of data 
-trainer = Trainer(overfit_pct=0.01)
-```
-
---
-#### Process position
-When running multiple models on the same machine we want to decide which progress bar to use.
-Lightning will stack progress bars according to this value. 
-``` {.python}
-# DEFAULT
-trainer = Trainer(process_position=0)
-
-# if this is the second model on the node, show the second progress bar below
-trainer = Trainer(process_position=1)
-```
-
-
 ---
 #### Set how much of the training set to check
 If you don't want to check 100% of the training set (for debugging or if it's huge), set this flag
--- a/docs/Trainer/index.md
+++ b/docs/Trainer/index.md
@ -20,17 +20,9 @@ But of course the fun is in all the advanced things it can do:

 - [Accumulate gradients](Training%20Loop/#accumulated-gradients)
 - [Anneal Learning rate](Training%20Loop/#anneal-learning-rate)
- [Check GPU usage](Training%20Loop/#Check-gpu-usage)
- [Check which gradients are nan](Training%20Loop/#check-which-gradients-are-nan)
- [Display metrics in progress bar](Training%20Loop/#display-metrics-in-progress-bar)
- [Display the parameter count by layer](Training%20Loop/#display-the-parameter-count-by-layer)
- [Fast dev run](Training%20Loop/#fast-dev-run)
 - [Force training for min or max epochs](Training%20Loop/#force-training-for-min-or-max-epochs)
 - [Force disable early stop](Training%20Loop/#force-disable-early-stop)
- [Inspect gradient norms](Training%20Loop/#inspect-gradient-norms)
- [Make model overfit on subset of data](Training%20Loop/#make-model-overfit-on-subset-of-data)
 - [Use multiple optimizers (like GANs)](../Pytorch-lightning/LightningModule/#configure_optimizers)
- [Process position](Training%20Loop/#process-position)
 - [Set how much of the training set to check (1-100%)](Training%20Loop/#set-how-much-of-the-training-set-to-check)

 **Validation loop**    
@ -40,14 +32,31 @@ But of course the fun is in all the advanced things it can do:
 - [Set how much of the test set to check](Validation%20Loop/#set-how-much-of-the-test-set-to-check)
 - [Set validation check frequency within 1 training epoch](Validation%20Loop/#set-validation-check-frequency-within-1-training-epoch)
 - [Set the number of validation sanity steps](Validation%20Loop/#set-the-number-of-validation-sanity-steps)
- [Check validation every n epochs](Validation%20Loop/#check-validation-every-n-epochs)
+
+**Debugging**  
+
+- [Fast dev run](Debugging/#fast-dev-run)
+- [Inspect gradient norms](Debugging/#inspect-gradient-norms)
+- [Log GPU usage](Debugging/#Log-gpu-usage)
+- [Make model overfit on subset of data](Debugging/#make-model-overfit-on-subset-of-data)
+- [Print the parameter count by layer](Debugging/#print-the-parameter-count-by-layer)
+- [Pring which gradients are nan](Debugging/#print-which-gradients-are-nan)
+
+
+**Experiment Logging**   
+
+- [Display metrics in progress bar](Logging/#display-metrics-in-progress-bar)
+- Log arbitrary metrics
+- [Process position](Logging/#process-position)
+- Save a snapshot of all hyperparameters
+- Save a snapshot of the code for a particular model run

 **Distributed training**    

+- 16-bit mixed precision
 - Single-gpu      
 - Multi-gpu      
 - Multi-node   
- 16-bit mixed precision

 **Checkpointing**