lightning/docs/Trainer/Training Loop.md

The lightning training loop handles everything except the actual computations of your model. To decide what will happen in your training loop, define the [training_step function](https://williamfalcon.github.io/pytorch-lightning/LightningModule/RequiredTrainerInterface/#training_step).

Below are all the things lightning automates for you in the training loop.

---
#### Accumulated gradients
Accumulated gradients runs K small batches of size N before doing a backwards pass. The effect is a large effective batch size of size KxN.

``` {.python}
# DEFAULT (ie: no accumulated grads)
trainer = Trainer(accumulate_grad_batches=1)
```

---
#### Force training for min or max epochs
It can be useful to force training for a minimum number of epochs or limit to a max number
``` {.python}
# DEFAULT
trainer = Trainer(min_nb_epochs=1, max_nb_epochs=1000)
```

---
#### Early stopping
The trainer already sets up default early stopping for you.
To modify this behavior, pass in your own EarlyStopping callback.
``` {.python}
from pytorch_lightning.callbacks import EarlyStopping

# DEFAULTS used by Trainer
early_stop_callback = EarlyStopping(
    monitor='val_loss',
    min_delta=0.00,
    patience=3,
    verbose=False,
    mode='min'
)

# without passing anything in, uses the default callback above
trainer = Trainer()

# pass in your own to override the default callback
trainer = Trainer(early_stop_callback=early_stop_callback)

# pass in None to disable it
trainer = Trainer(early_stop_callback=None)
```

---
#### Force disable early stop
To disable early stopping pass None to the early_stop_callback
``` {.python}
# DEFAULT
trainer = Trainer(early_stop_callback=None)
```

---
#### Gradient Clipping
Gradient clipping may be enabled to avoid exploding gradients.
Specifically, this will [clip the gradient norm computed over all model parameters *together*](https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_norm_).

``` {.python}
# DEFAULT (ie: don't clip)
trainer = Trainer(gradient_clip_val=0)

# clip gradients with norm above 0.5
trainer = Trainer(gradient_clip_val=0.5)
```

---
#### Inspect gradient norms
Looking at grad norms can help you figure out where training might be going wrong.
``` {.python}
# DEFAULT (-1 doesn't track norms)
trainer = Trainer(track_grad_norm=-1)

# track the LP norm (P=2 here)
trainer = Trainer(track_grad_norm=2)
```


---
#### Set how much of the training set to check
If you don't want to check 100% of the training set (for debugging or if it's huge), set this flag.

train_percent_check will be overwritten by overfit_pct if `overfit_pct > 0`

``` {.python}
# DEFAULT
trainer = Trainer(train_percent_check=1.0)

# check 10% only
trainer = Trainer(train_percent_check=0.1)
```

---
#### Packed sequences as inputs
When using PackedSequence, do 2 things:
1. return either a padded tensor in dataset or a list of variable length tensors in the dataloader collate_fn (example above shows the list implementation).    
2. Pack the sequence in forward or training and validation steps depending on use case.

``` {.python}
# For use in dataloader
def collate_fn(batch):
    x = [item[0] for item in batch]
    y = [item[1] for item in batch]
    return x, y

# In module
def training_step(self, batch, batch_nb):
    x = rnn.pack_sequence(batch[0], enforce_sorted=False)
    y = rnn.pack_sequence(batch[1], enforce_sorted=False)
```

---
#### Truncated Back Propagation Through Time
There are times when multiple backwards passes are needed for each batch. For example, it may save memory to use Truncated Back Propagation Through Time when training RNNs.

When this flag is enabled each batch is split into sequences of size truncated_bptt_steps and passed to training_step(...) separately. A default splitting function is provided, however, you can override it for more flexibility. See [tbptt_split_batch](https://williamfalcon.github.io/pytorch-lightning/Trainer/hooks#tbptt_split_batch).

``` {.python}
# DEFAULT (single backwards pass per batch)
trainer = Trainer(truncated_bptt_steps=None)

# (split batch into sequences of size 2)
trainer = Trainer(truncated_bptt_steps=2)
```
updated doc indexes 2019-07-28 12:13:40 +00:00			`The lightning training loop handles everything except the actual computations of your model. To decide what will happen in your training loop, define the [training_step function](https://williamfalcon.github.io/pytorch-lightning/LightningModule/RequiredTrainerInterface/#training_step).`
renamed options 2019-06-27 16:13:55 +00:00
			`Below are all the things lightning automates for you in the training loop.`
prog bar option 2019-06-27 15:22:13 +00:00
			`---`
Add tbptt (#429) * Add truncated bptt * Fix rebase error * AutoPep8 * Address comments, incl default bptt_split impl * Add tbptt test * Add default split for lists/tuples * Add tbptt docs * Fix trainer spacing * Update RequiredTrainerInterface.md 2019-10-31 10:45:28 +00:00			`#### Accumulated gradients`
			`Accumulated gradients runs K small batches of size N before doing a backwards pass. The effect is a large effective batch size of size KxN.`
prog bar option 2019-06-27 15:22:13 +00:00
			``` {.python}
renamed options 2019-06-27 15:59:27 +00:00			`# DEFAULT (ie: no accumulated grads)`
prog bar option 2019-06-27 15:22:13 +00:00			`trainer = Trainer(accumulate_grad_batches=1)`
			```

renamed options 2019-06-27 15:27:11 +00:00			`---`
			`#### Force training for min or max epochs`
			`It can be useful to force training for a minimum number of epochs or limit to a max number`
			``` {.python}
renamed options 2019-06-27 15:59:27 +00:00			`# DEFAULT`
renamed options 2019-06-27 15:27:11 +00:00			`trainer = Trainer(min_nb_epochs=1, max_nb_epochs=1000)`
			```
renamed options 2019-06-27 15:59:27 +00:00
Add EarlyStop documentation (#245) * Update Training Loop.md * Update index.md * Update README.md * Update Training Loop.md * Update Training Loop.md 2019-09-25 18:52:40 +00:00			`---`
			`#### Early stopping`
Add tbptt (#429) * Add truncated bptt * Fix rebase error * AutoPep8 * Address comments, incl default bptt_split impl * Add tbptt test * Add default split for lists/tuples * Add tbptt docs * Fix trainer spacing * Update RequiredTrainerInterface.md 2019-10-31 10:45:28 +00:00			`The trainer already sets up default early stopping for you.`
Fixes #292 (#303) * early stopping callback is not default * added a default logger * added default checkpoint callback * added default checkpoint/loggers * added default checkpoint/loggers * updated docs * cleaned demos * cleaned demos * cleaned demos * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers 2019-10-04 23:48:57 +00:00			`To modify this behavior, pass in your own EarlyStopping callback.`
Add EarlyStop documentation (#245) * Update Training Loop.md * Update index.md * Update README.md * Update Training Loop.md * Update Training Loop.md 2019-09-25 18:52:40 +00:00			``` {.python}
			`from pytorch_lightning.callbacks import EarlyStopping`

Fixes #292 (#303) * early stopping callback is not default * added a default logger * added default checkpoint callback * added default checkpoint/loggers * added default checkpoint/loggers * updated docs * cleaned demos * cleaned demos * cleaned demos * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers 2019-10-04 23:48:57 +00:00			`# DEFAULTS used by Trainer`
Add EarlyStop documentation (#245) * Update Training Loop.md * Update index.md * Update README.md * Update Training Loop.md * Update Training Loop.md 2019-09-25 18:52:40 +00:00			`early_stop_callback = EarlyStopping(`
			`monitor='val_loss',`
			`min_delta=0.00,`
Fixes #292 (#303) * early stopping callback is not default * added a default logger * added default checkpoint callback * added default checkpoint/loggers * added default checkpoint/loggers * updated docs * cleaned demos * cleaned demos * cleaned demos * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers 2019-10-04 23:48:57 +00:00			`patience=3,`
Add EarlyStop documentation (#245) * Update Training Loop.md * Update index.md * Update README.md * Update Training Loop.md * Update Training Loop.md 2019-09-25 18:52:40 +00:00			`verbose=False,`
Fixes #292 (#303) * early stopping callback is not default * added a default logger * added default checkpoint callback * added default checkpoint/loggers * added default checkpoint/loggers * updated docs * cleaned demos * cleaned demos * cleaned demos * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers * clean up docs around loggers 2019-10-04 23:48:57 +00:00			`mode='min'`
Add EarlyStop documentation (#245) * Update Training Loop.md * Update index.md * Update README.md * Update Training Loop.md * Update Training Loop.md 2019-09-25 18:52:40 +00:00			`)`

Fixes #347 (#393) 2019-10-18 22:51:48 +00:00			`# without passing anything in, uses the default callback above`
			`trainer = Trainer()`

Add tbptt (#429) * Add truncated bptt * Fix rebase error * AutoPep8 * Address comments, incl default bptt_split impl * Add tbptt test * Add default split for lists/tuples * Add tbptt docs * Fix trainer spacing * Update RequiredTrainerInterface.md 2019-10-31 10:45:28 +00:00			`# pass in your own to override the default callback`
Add EarlyStop documentation (#245) * Update Training Loop.md * Update index.md * Update README.md * Update Training Loop.md * Update Training Loop.md 2019-09-25 18:52:40 +00:00			`trainer = Trainer(early_stop_callback=early_stop_callback)`
Fixes #347 (#393) 2019-10-18 22:51:48 +00:00
Add tbptt (#429) * Add truncated bptt * Fix rebase error * AutoPep8 * Address comments, incl default bptt_split impl * Add tbptt test * Add default split for lists/tuples * Add tbptt docs * Fix trainer spacing * Update RequiredTrainerInterface.md 2019-10-31 10:45:28 +00:00			`# pass in None to disable it`
Fixes #347 (#393) 2019-10-18 22:51:48 +00:00			`trainer = Trainer(early_stop_callback=None)`
Add EarlyStop documentation (#245) * Update Training Loop.md * Update index.md * Update README.md * Update Training Loop.md * Update Training Loop.md 2019-09-25 18:52:40 +00:00			```

added val loop options 2019-06-27 17:29:01 +00:00			`---`
Add tbptt (#429) * Add truncated bptt * Fix rebase error * AutoPep8 * Address comments, incl default bptt_split impl * Add tbptt test * Add default split for lists/tuples * Add tbptt docs * Fix trainer spacing * Update RequiredTrainerInterface.md 2019-10-31 10:45:28 +00:00			`#### Force disable early stop`
Fixes #347 (#393) 2019-10-18 22:51:48 +00:00			`To disable early stopping pass None to the early_stop_callback`
added val loop options 2019-06-27 17:29:01 +00:00			``` {.python}
			`# DEFAULT`
Fixes #347 (#393) 2019-10-18 22:51:48 +00:00			`trainer = Trainer(early_stop_callback=None)`
added val loop options 2019-06-27 17:29:01 +00:00			```

added gradient clipping 2019-06-28 22:00:57 +00:00			`---`
docs(trainer): fix gradient clipping entry (#85) - replace copy and paste error - write brief description - add link to pytorch docs for specific clipping implementation - add example configuration 2019-08-09 19:02:14 +00:00			`#### Gradient Clipping`
			`Gradient clipping may be enabled to avoid exploding gradients.`
			`Specifically, this will [clip the gradient norm computed over all model parameters together](https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_norm_).`

added gradient clipping 2019-06-28 22:00:57 +00:00			``` {.python}
			`# DEFAULT (ie: don't clip)`
Rename variables (#124) - data_batch → batch - batch_i → batch_idx - dataloader_i → dataloader_idx - tng → training - training_dataloader → train_dataloader - add_log_row_interval → row_log_interval - gradient_clip → gradient_clip_val - prog → progress - tqdm_dic → tqdm_dict 2019-09-25 23:05:06 +00:00			`trainer = Trainer(gradient_clip_val=0)`
added gradient clipping 2019-06-28 22:00:57 +00:00
docs(trainer): fix gradient clipping entry (#85) - replace copy and paste error - write brief description - add link to pytorch docs for specific clipping implementation - add example configuration 2019-08-09 19:02:14 +00:00			`# clip gradients with norm above 0.5`
Rename variables (#124) - data_batch → batch - batch_i → batch_idx - dataloader_i → dataloader_idx - tng → training - training_dataloader → train_dataloader - add_log_row_interval → row_log_interval - gradient_clip → gradient_clip_val - prog → progress - tqdm_dic → tqdm_dict 2019-09-25 23:05:06 +00:00			`trainer = Trainer(gradient_clip_val=0.5)`
docs(trainer): fix gradient clipping entry (#85) - replace copy and paste error - write brief description - add link to pytorch docs for specific clipping implementation - add example configuration 2019-08-09 19:02:14 +00:00			```
added gradient clipping 2019-06-28 22:00:57 +00:00
renamed options 2019-06-27 15:59:27 +00:00			`---`
			`#### Inspect gradient norms`
			`Looking at grad norms can help you figure out where training might be going wrong.`
			``` {.python}
			`# DEFAULT (-1 doesn't track norms)`
			`trainer = Trainer(track_grad_norm=-1)`

			`# track the LP norm (P=2 here)`
			`trainer = Trainer(track_grad_norm=2)`
			```


			`---`
			`#### Set how much of the training set to check`
elaborate on the correlation between overfit_pct and xxx_percent_check (#132) * Update Training Loop.md * update docs and elaborate on the correlation 2019-08-17 14:23:25 +00:00			`If you don't want to check 100% of the training set (for debugging or if it's huge), set this flag.`

			train_percent_check will be overwritten by overfit_pct if `overfit_pct > 0`

renamed options 2019-06-27 15:59:27 +00:00			``` {.python}
			`# DEFAULT`
			`trainer = Trainer(train_percent_check=1.0)`

			`# check 10% only`
			`trainer = Trainer(train_percent_check=0.1)`
			```
Add tbptt (#429) * Add truncated bptt * Fix rebase error * AutoPep8 * Address comments, incl default bptt_split impl * Add tbptt test * Add default split for lists/tuples * Add tbptt docs * Fix trainer spacing * Update RequiredTrainerInterface.md 2019-10-31 10:45:28 +00:00
packed sequence clarification in train_dataloader (#443) * packed sequence clarification in train_dataloader * moved changes to training loop * removed changes from required interface * added index entry 2019-11-03 10:26:27 +00:00			`---`
			`#### Packed sequences as inputs`
			`When using PackedSequence, do 2 things:`
			`1. return either a padded tensor in dataset or a list of variable length tensors in the dataloader collate_fn (example above shows the list implementation).`
			`2. Pack the sequence in forward or training and validation steps depending on use case.`

			``` {.python}
			`# For use in dataloader`
			`def collate_fn(batch):`
			`x = [item[0] for item in batch]`
			`y = [item[1] for item in batch]`
			`return x, y`

			`# In module`
			`def training_step(self, batch, batch_nb):`
			`x = rnn.pack_sequence(batch[0], enforce_sorted=False)`
			`y = rnn.pack_sequence(batch[1], enforce_sorted=False)`
			```

Add tbptt (#429) * Add truncated bptt * Fix rebase error * AutoPep8 * Address comments, incl default bptt_split impl * Add tbptt test * Add default split for lists/tuples * Add tbptt docs * Fix trainer spacing * Update RequiredTrainerInterface.md 2019-10-31 10:45:28 +00:00			`---`
			`#### Truncated Back Propagation Through Time`
			`There are times when multiple backwards passes are needed for each batch. For example, it may save memory to use Truncated Back Propagation Through Time when training RNNs.`

			`When this flag is enabled each batch is split into sequences of size truncated_bptt_steps and passed to training_step(...) separately. A default splitting function is provided, however, you can override it for more flexibility. See [tbptt_split_batch](https://williamfalcon.github.io/pytorch-lightning/Trainer/hooks#tbptt_split_batch).`

			``` {.python}
			`# DEFAULT (single backwards pass per batch)`
			`trainer = Trainer(truncated_bptt_steps=None)`

			`# (split batch into sequences of size 2)`
			`trainer = Trainer(truncated_bptt_steps=2)`
			```