From c636193c447332d7e316e1d6db62633be48d84a0 Mon Sep 17 00:00:00 2001 From: William Falcon Date: Thu, 27 Jun 2019 13:47:15 -0400 Subject: [PATCH] added val loop options --- docs/Trainer/Training Loop.md | 56 ----------------------------------- docs/Trainer/index.md | 29 +++++++++++------- 2 files changed, 19 insertions(+), 66 deletions(-) diff --git a/docs/Trainer/Training Loop.md b/docs/Trainer/Training Loop.md index 284f24cea3..64cb944206 100644 --- a/docs/Trainer/Training Loop.md +++ b/docs/Trainer/Training Loop.md @@ -22,38 +22,6 @@ trainer = Trainer(lr_scheduler_milestones=None) trainer = Trainer(lr_scheduler_milestones=[100, 200, 300]) ``` ---- -#### Check GPU usage -Lightning automatically logs gpu usage to the test tube logs. It'll only do it at the metric logging interval, so it doesn't slow down training. - ---- -#### Check which gradients are nan -This option prints a list of tensors with nan gradients. -``` {.python} -# DEFAULT -trainer = Trainer(print_nan_grads=False) -``` - ---- -#### Display metrics in progress bar -``` {.python} -# DEFAULT -trainer = Trainer(progress_bar=True) -``` - ---- -#### Display the parameter count by layer -By default lightning prints a list of parameters *and submodules* when it starts training. - ---- -#### Fast dev run -This flag is meant for debugging a full train/val/test loop. It'll activate callbacks, everything but only with 1 training and 1 validation batch. -Use this to debug a full run of your program quickly -``` {.python} -# DEFAULT -trainer = Trainer(fast_dev_run=False) -``` - --- #### Force training for min or max epochs It can be useful to force training for a minimum number of epochs or limit to a max number @@ -82,30 +50,6 @@ trainer = Trainer(track_grad_norm=2) ``` ---- -#### Make model overfit on subset of data -A useful debugging trick is to make your model overfit a tiny fraction of the data. -``` {.python} -# DEFAULT don't overfit (ie: normal training) -trainer = Trainer(overfit_pct=0.0) - -# overfit on 1% of data -trainer = Trainer(overfit_pct=0.01) -``` - ---- -#### Process position -When running multiple models on the same machine we want to decide which progress bar to use. -Lightning will stack progress bars according to this value. -``` {.python} -# DEFAULT -trainer = Trainer(process_position=0) - -# if this is the second model on the node, show the second progress bar below -trainer = Trainer(process_position=1) -``` - - --- #### Set how much of the training set to check If you don't want to check 100% of the training set (for debugging or if it's huge), set this flag diff --git a/docs/Trainer/index.md b/docs/Trainer/index.md index 1acd81e1e3..ff5a96c01a 100644 --- a/docs/Trainer/index.md +++ b/docs/Trainer/index.md @@ -20,17 +20,9 @@ But of course the fun is in all the advanced things it can do: - [Accumulate gradients](Training%20Loop/#accumulated-gradients) - [Anneal Learning rate](Training%20Loop/#anneal-learning-rate) -- [Check GPU usage](Training%20Loop/#Check-gpu-usage) -- [Check which gradients are nan](Training%20Loop/#check-which-gradients-are-nan) -- [Display metrics in progress bar](Training%20Loop/#display-metrics-in-progress-bar) -- [Display the parameter count by layer](Training%20Loop/#display-the-parameter-count-by-layer) -- [Fast dev run](Training%20Loop/#fast-dev-run) - [Force training for min or max epochs](Training%20Loop/#force-training-for-min-or-max-epochs) - [Force disable early stop](Training%20Loop/#force-disable-early-stop) -- [Inspect gradient norms](Training%20Loop/#inspect-gradient-norms) -- [Make model overfit on subset of data](Training%20Loop/#make-model-overfit-on-subset-of-data) - [Use multiple optimizers (like GANs)](../Pytorch-lightning/LightningModule/#configure_optimizers) -- [Process position](Training%20Loop/#process-position) - [Set how much of the training set to check (1-100%)](Training%20Loop/#set-how-much-of-the-training-set-to-check) **Validation loop** @@ -40,14 +32,31 @@ But of course the fun is in all the advanced things it can do: - [Set how much of the test set to check](Validation%20Loop/#set-how-much-of-the-test-set-to-check) - [Set validation check frequency within 1 training epoch](Validation%20Loop/#set-validation-check-frequency-within-1-training-epoch) - [Set the number of validation sanity steps](Validation%20Loop/#set-the-number-of-validation-sanity-steps) -- [Check validation every n epochs](Validation%20Loop/#check-validation-every-n-epochs) + +**Debugging** + +- [Fast dev run](Debugging/#fast-dev-run) +- [Inspect gradient norms](Debugging/#inspect-gradient-norms) +- [Log GPU usage](Debugging/#Log-gpu-usage) +- [Make model overfit on subset of data](Debugging/#make-model-overfit-on-subset-of-data) +- [Print the parameter count by layer](Debugging/#print-the-parameter-count-by-layer) +- [Pring which gradients are nan](Debugging/#print-which-gradients-are-nan) + + +**Experiment Logging** + +- [Display metrics in progress bar](Logging/#display-metrics-in-progress-bar) +- Log arbitrary metrics +- [Process position](Logging/#process-position) +- Save a snapshot of all hyperparameters +- Save a snapshot of the code for a particular model run **Distributed training** +- 16-bit mixed precision - Single-gpu - Multi-gpu - Multi-node -- 16-bit mixed precision **Checkpointing**