lightning/docs/Trainer/Training Loop.md

The lightning training loop handles everything except the actual computations of your model. To decide what will happen in your training loop, define the [training_step function](../../Pytorch-lightning/LightningModule/#training_step).

Below are all the things lightning automates for you in the training loop.

---
#### Accumulated gradients  
Accumulated gradients runs K small batches of size N before doing a backwards pass. The effect is a large effective batch size of size KxN. 

``` {.python}
# DEFAULT (ie: no accumulated grads)
trainer = Trainer(accumulate_grad_batches=1)
```

---
#### Anneal Learning rate
Cut the learning rate by 10 at every epoch listed in this list.
``` {.python}
# DEFAULT (don't anneal)
trainer = Trainer(lr_scheduler_milestones=None)

# cut LR by 10 at 100, 200, and 300 epochs 
trainer = Trainer(lr_scheduler_milestones=[100, 200, 300])
```

---
#### Check GPU usage
Lightning automatically logs gpu usage to the test tube logs. It'll only do it at the metric logging interval, so it doesn't slow down training.

---
#### Check which gradients are nan 
This option prints a list of tensors with nan gradients.
``` {.python}
# DEFAULT
trainer = Trainer(print_nan_grads=False)
```

---
#### Check validation every n epochs 
If you have a small dataset you might want to check validation every n epochs
``` {.python}
# DEFAULT
trainer = Trainer(check_val_every_n_epoch=1)
```

---
#### Display metrics in progress bar 
``` {.python}
# DEFAULT
trainer = Trainer(progress_bar=True)
```

---
#### Display the parameter count by layer
By default lightning prints a list of parameters *and submodules* when it starts training.

---
#### Force training for min or max epochs
It can be useful to force training for a minimum number of epochs or limit to a max number
``` {.python}
# DEFAULT
trainer = Trainer(min_nb_epochs=1, max_nb_epochs=1000)
```

---
#### Inspect gradient norms
Looking at grad norms can help you figure out where training might be going wrong.
``` {.python}
# DEFAULT (-1 doesn't track norms)
trainer = Trainer(track_grad_norm=-1)

# track the LP norm (P=2 here)
trainer = Trainer(track_grad_norm=2)
```


---
#### Make model overfit on subset of data
A useful debugging trick is to make your model overfit a tiny fraction of the data.
``` {.python}
# DEFAULT don't overfit (ie: normal training)
trainer = Trainer(overfit_pct=0.0)

# overfit on 1% of data 
trainer = Trainer(overfit_pct=0.01)
```

---
#### Set how much of the training set to check
If you don't want to check 100% of the validation set (for debugging or if it's huge), set this flag
``` {.python}
# DEFAULT
trainer = Trainer(train_percent_check=1.0)

# check 10% only
trainer = Trainer(train_percent_check=0.1)
```
renamed options 2019-06-27 16:13:55 +00:00			`The lightning training loop handles everything except the actual computations of your model. To decide what will happen in your training loop, define the [training_step function](../../Pytorch-lightning/LightningModule/#training_step).`

			`Below are all the things lightning automates for you in the training loop.`
prog bar option 2019-06-27 15:22:13 +00:00
			`---`
			`#### Accumulated gradients`
			`Accumulated gradients runs K small batches of size N before doing a backwards pass. The effect is a large effective batch size of size KxN.`

			``` {.python}
renamed options 2019-06-27 15:59:27 +00:00			`# DEFAULT (ie: no accumulated grads)`
prog bar option 2019-06-27 15:22:13 +00:00			`trainer = Trainer(accumulate_grad_batches=1)`
			```

renamed options 2019-06-27 15:59:27 +00:00			`---`
			`#### Anneal Learning rate`
			`Cut the learning rate by 10 at every epoch listed in this list.`
			``` {.python}
			`# DEFAULT (don't anneal)`
			`trainer = Trainer(lr_scheduler_milestones=None)`

			`# cut LR by 10 at 100, 200, and 300 epochs`
			`trainer = Trainer(lr_scheduler_milestones=[100, 200, 300])`
			```

prog bar option 2019-06-27 15:22:13 +00:00			`---`
			`#### Check GPU usage`
			`Lightning automatically logs gpu usage to the test tube logs. It'll only do it at the metric logging interval, so it doesn't slow down training.`

			`---`
			`#### Check which gradients are nan`
			`This option prints a list of tensors with nan gradients.`
			``` {.python}
renamed options 2019-06-27 15:59:27 +00:00			`# DEFAULT`
renamed options 2019-06-27 15:27:11 +00:00			`trainer = Trainer(print_nan_grads=False)`
prog bar option 2019-06-27 15:22:13 +00:00			```

			`---`
			`#### Check validation every n epochs`
			`If you have a small dataset you might want to check validation every n epochs`
			``` {.python}
renamed options 2019-06-27 15:59:27 +00:00			`# DEFAULT`
prog bar option 2019-06-27 15:22:13 +00:00			`trainer = Trainer(check_val_every_n_epoch=1)`
			```

			`---`
			`#### Display metrics in progress bar`
			``` {.python}
renamed options 2019-06-27 15:59:27 +00:00			`# DEFAULT`
prog bar option 2019-06-27 15:22:13 +00:00			`trainer = Trainer(progress_bar=True)`
renamed options 2019-06-27 15:27:11 +00:00			```

			`---`
			`#### Display the parameter count by layer`
			`By default lightning prints a list of parameters and submodules when it starts training.`

			`---`
			`#### Force training for min or max epochs`
			`It can be useful to force training for a minimum number of epochs or limit to a max number`
			``` {.python}
renamed options 2019-06-27 15:59:27 +00:00			`# DEFAULT`
renamed options 2019-06-27 15:27:11 +00:00			`trainer = Trainer(min_nb_epochs=1, max_nb_epochs=1000)`
			```
renamed options 2019-06-27 15:59:27 +00:00
			`---`
			`#### Inspect gradient norms`
			`Looking at grad norms can help you figure out where training might be going wrong.`
			``` {.python}
			`# DEFAULT (-1 doesn't track norms)`
			`trainer = Trainer(track_grad_norm=-1)`

			`# track the LP norm (P=2 here)`
			`trainer = Trainer(track_grad_norm=2)`
			```


			`---`
			`#### Make model overfit on subset of data`
			`A useful debugging trick is to make your model overfit a tiny fraction of the data.`
			``` {.python}
			`# DEFAULT don't overfit (ie: normal training)`
			`trainer = Trainer(overfit_pct=0.0)`

			`# overfit on 1% of data`
			`trainer = Trainer(overfit_pct=0.01)`
			```

			`---`
			`#### Set how much of the training set to check`
			`If you don't want to check 100% of the validation set (for debugging or if it's huge), set this flag`
			``` {.python}
			`# DEFAULT`
			`trainer = Trainer(train_percent_check=1.0)`

			`# check 10% only`
			`trainer = Trainer(train_percent_check=0.1)`
			```