From db29488847c9297110316afa497333dbed44cee2 Mon Sep 17 00:00:00 2001 From: William Falcon Date: Thu, 27 Jun 2019 13:29:01 -0400 Subject: [PATCH] added val loop options --- docs/Trainer/Training Loop.md | 40 +++++++++++++++++------ docs/Trainer/Vaildation loop.md | 3 -- docs/Trainer/Validation loop.md | 57 +++++++++++++++++++++++++++++++++ docs/Trainer/index.md | 15 +++++---- 4 files changed, 97 insertions(+), 18 deletions(-) delete mode 100644 docs/Trainer/Vaildation loop.md create mode 100644 docs/Trainer/Validation loop.md diff --git a/docs/Trainer/Training Loop.md b/docs/Trainer/Training Loop.md index d78b65bfa7..284f24cea3 100644 --- a/docs/Trainer/Training Loop.md +++ b/docs/Trainer/Training Loop.md @@ -34,14 +34,6 @@ This option prints a list of tensors with nan gradients. trainer = Trainer(print_nan_grads=False) ``` ---- -#### Check validation every n epochs -If you have a small dataset you might want to check validation every n epochs -``` {.python} -# DEFAULT -trainer = Trainer(check_val_every_n_epoch=1) -``` - --- #### Display metrics in progress bar ``` {.python} @@ -53,6 +45,15 @@ trainer = Trainer(progress_bar=True) #### Display the parameter count by layer By default lightning prints a list of parameters *and submodules* when it starts training. +--- +#### Fast dev run +This flag is meant for debugging a full train/val/test loop. It'll activate callbacks, everything but only with 1 training and 1 validation batch. +Use this to debug a full run of your program quickly +``` {.python} +# DEFAULT +trainer = Trainer(fast_dev_run=False) +``` + --- #### Force training for min or max epochs It can be useful to force training for a minimum number of epochs or limit to a max number @@ -61,6 +62,14 @@ It can be useful to force training for a minimum number of epochs or limit to a trainer = Trainer(min_nb_epochs=1, max_nb_epochs=1000) ``` +--- +#### Force disable early stop +Use this to turn off early stopping and run training to the [max_epoch](#force-training-for-min-or-max-epochs) +``` {.python} +# DEFAULT +trainer = Trainer(enable_early_stop=True) +``` + --- #### Inspect gradient norms Looking at grad norms can help you figure out where training might be going wrong. @@ -84,9 +93,22 @@ trainer = Trainer(overfit_pct=0.0) trainer = Trainer(overfit_pct=0.01) ``` +--- +#### Process position +When running multiple models on the same machine we want to decide which progress bar to use. +Lightning will stack progress bars according to this value. +``` {.python} +# DEFAULT +trainer = Trainer(process_position=0) + +# if this is the second model on the node, show the second progress bar below +trainer = Trainer(process_position=1) +``` + + --- #### Set how much of the training set to check -If you don't want to check 100% of the validation set (for debugging or if it's huge), set this flag +If you don't want to check 100% of the training set (for debugging or if it's huge), set this flag ``` {.python} # DEFAULT trainer = Trainer(train_percent_check=1.0) diff --git a/docs/Trainer/Vaildation loop.md b/docs/Trainer/Vaildation loop.md deleted file mode 100644 index 3b9cafcebf..0000000000 --- a/docs/Trainer/Vaildation loop.md +++ /dev/null @@ -1,3 +0,0 @@ -The lightning validation loop handles everything except the actual computations of your model. To decide what will happen in your validation loop, define the [validation_step function](../../Pytorch-lightning/LightningModule/#validation_step). - -Below are all the things lightning automates for you in the validation loop. \ No newline at end of file diff --git a/docs/Trainer/Validation loop.md b/docs/Trainer/Validation loop.md new file mode 100644 index 0000000000..693df88904 --- /dev/null +++ b/docs/Trainer/Validation loop.md @@ -0,0 +1,57 @@ +The lightning validation loop handles everything except the actual computations of your model. To decide what will happen in your validation loop, define the [validation_step function](../../Pytorch-lightning/LightningModule/#validation_step). +Below are all the things lightning automates for you in the validation loop. + +**Note** +Lightning will run 5 steps of validation in the beginning of training as a sanity check so you don't have to wait until a full epoch to catch possible validation issues. + + + + +--- +#### Check validation every n epochs +If you have a small dataset you might want to check validation every n epochs +``` {.python} +# DEFAULT +trainer = Trainer(check_val_every_n_epoch=1) +``` + +--- +#### Set how much of the validation set to check +If you don't want to check 100% of the validation set (for debugging or if it's huge), set this flag +``` {.python} +# DEFAULT +trainer = Trainer(val_percent_check=1.0) + +# check 10% only +trainer = Trainer(val_percent_check=0.1) +``` + +--- +#### Set how much of the test set to check +If you don't want to check 100% of the test set (for debugging or if it's huge), set this flag +``` {.python} +# DEFAULT +trainer = Trainer(test_percent_check=1.0) + +# check 10% only +trainer = Trainer(test_percent_check=0.1) +``` + +--- +#### Set validation check frequency within 1 training epoch +For large datasets it's often desirable to check validation multiple times within a training loop +``` {.python} +# DEFAULT +trainer = Trainer(val_check_interval=0.95) + +# check every .25 of an epoch +trainer = Trainer(val_check_interval=0.25) +``` + +--- +#### Set the number of validation sanity steps +Lightning runs a few steps of validation in the beginning of training. This avoids crashing in the validation loop sometime deep into a lengthy training loop. +``` {.python} +# DEFAULT +trainer = Trainer(nb_sanity_val_steps=5) +``` \ No newline at end of file diff --git a/docs/Trainer/index.md b/docs/Trainer/index.md index a8c054bcfb..1acd81e1e3 100644 --- a/docs/Trainer/index.md +++ b/docs/Trainer/index.md @@ -22,22 +22,25 @@ But of course the fun is in all the advanced things it can do: - [Anneal Learning rate](Training%20Loop/#anneal-learning-rate) - [Check GPU usage](Training%20Loop/#Check-gpu-usage) - [Check which gradients are nan](Training%20Loop/#check-which-gradients-are-nan) -- [Check validation every n epochs](Training%20Loop/#check-validation-every-n-epochs) - [Display metrics in progress bar](Training%20Loop/#display-metrics-in-progress-bar) - [Display the parameter count by layer](Training%20Loop/#display-the-parameter-count-by-layer) +- [Fast dev run](Training%20Loop/#fast-dev-run) - [Force training for min or max epochs](Training%20Loop/#force-training-for-min-or-max-epochs) +- [Force disable early stop](Training%20Loop/#force-disable-early-stop) - [Inspect gradient norms](Training%20Loop/#inspect-gradient-norms) - [Make model overfit on subset of data](Training%20Loop/#make-model-overfit-on-subset-of-data) - [Use multiple optimizers (like GANs)](../Pytorch-lightning/LightningModule/#configure_optimizers) +- [Process position](Training%20Loop/#process-position) - [Set how much of the training set to check (1-100%)](Training%20Loop/#set-how-much-of-the-training-set-to-check) **Validation loop** -- Display metrics in progress bar -- Set how much of the validation set to check (1-100%) -- Set validation check frequency within 1 training epoch (1-100%) -- validation_step function -- Why does validation run first for 5 steps? +- [Check validation every n epochs](Validation%20Loop/#check-validation-every-n-epochs) +- [Set how much of the validation set to check](Validation%20Loop/#set-how-much-of-the-validation-set-to-check) +- [Set how much of the test set to check](Validation%20Loop/#set-how-much-of-the-test-set-to-check) +- [Set validation check frequency within 1 training epoch](Validation%20Loop/#set-validation-check-frequency-within-1-training-epoch) +- [Set the number of validation sanity steps](Validation%20Loop/#set-the-number-of-validation-sanity-steps) +- [Check validation every n epochs](Validation%20Loop/#check-validation-every-n-epochs) **Distributed training**