diff --git a/docs/Trainer/Logging.md b/docs/Trainer/Logging.md new file mode 100644 index 0000000000..e575b833d0 --- /dev/null +++ b/docs/Trainer/Logging.md @@ -0,0 +1,31 @@ + + + +--- +#### Display metrics in progress bar +``` {.python} +# DEFAULT +trainer = Trainer(progress_bar=True) +``` + + + +--- +#### Print which gradients are nan +This option prints a list of tensors with nan gradients. +``` {.python} +# DEFAULT +trainer = Trainer(print_nan_grads=False) +``` + +--- +#### Process position +When running multiple models on the same machine we want to decide which progress bar to use. +Lightning will stack progress bars according to this value. +``` {.python} +# DEFAULT +trainer = Trainer(process_position=0) + +# if this is the second model on the node, show the second progress bar below +trainer = Trainer(process_position=1) +``` diff --git a/docs/Trainer/debugging.md b/docs/Trainer/debugging.md new file mode 100644 index 0000000000..1cd247c1d9 --- /dev/null +++ b/docs/Trainer/debugging.md @@ -0,0 +1,48 @@ +These flags are useful to help debug a model. + +--- +#### Fast dev run +This flag is meant for debugging a full train/val/test loop. It'll activate callbacks, everything but only with 1 training and 1 validation batch. +Use this to debug a full run of your program quickly +``` {.python} +# DEFAULT +trainer = Trainer(fast_dev_run=False) +``` + +--- +#### Inspect gradient norms +Looking at grad norms can help you figure out where training might be going wrong. +``` {.python} +# DEFAULT (-1 doesn't track norms) +trainer = Trainer(track_grad_norm=-1) + +# track the LP norm (P=2 here) +trainer = Trainer(track_grad_norm=2) +``` + +--- +#### Make model overfit on subset of data +A useful debugging trick is to make your model overfit a tiny fraction of the data. +``` {.python} +# DEFAULT don't overfit (ie: normal training) +trainer = Trainer(overfit_pct=0.0) + +# overfit on 1% of data +trainer = Trainer(overfit_pct=0.01) +``` + +--- +#### Print the parameter count by layer +By default lightning prints a list of parameters *and submodules* when it starts training. + +--- +#### Print which gradients are nan +This option prints a list of tensors with nan gradients. +``` {.python} +# DEFAULT +trainer = Trainer(print_nan_grads=False) +``` + +--- +#### Log GPU usage +Lightning automatically logs gpu usage to the test tube logs. It'll only do it at the metric logging interval, so it doesn't slow down training. \ No newline at end of file diff --git a/docs/Trainer/hooks.md b/docs/Trainer/hooks.md new file mode 100644 index 0000000000..e69de29bb2