lightning/docs/Trainer/index.md

# Trainer
[[Github Code](https://github.com/williamFalcon/pytorch-lightning/blob/master/pytorch_lightning/models/trainer.py)]

The lightning trainer abstracts best practices for running a training, val, test routine. It calls parts of your model when it wants to hand over full control and otherwise makes training assumptions which are now standard practice in AI research.

This is the basic use of the trainer:

``` {.python}
from pytorch_lightning import Trainer

model = LightningTemplate()

trainer = Trainer()
trainer.fit(model)
```

But of course the fun is in all the advanced things it can do:


**Checkpointing**    

- Model saving
- Model loading 

**Computing cluster (SLURM)**    

- [Running grid search on a cluster](SLURM%20Managed%20Cluster/#running-grid-search-on-a-cluster)
- [Walltime auto-resubmit](SLURM%20Managed%20Cluster/#walltime-auto-resubmit)   

**Debugging**  

- [Fast dev run](Debugging/#fast-dev-run)
- [Inspect gradient norms](Debugging/#inspect-gradient-norms)
- [Log GPU usage](Debugging/#Log-gpu-usage)
- [Make model overfit on subset of data](Debugging/#make-model-overfit-on-subset-of-data)
- [Print the parameter count by layer](Debugging/#print-the-parameter-count-by-layer)
- [Pring which gradients are nan](Debugging/#print-which-gradients-are-nan)


**Distributed training**    

- [16-bit mixed precision](Distributed%20training/#16-bit-mixed-precision)
- [Multi-GPU](Distributed%20training/#Multi-GPU)
- [Multi-node](Distributed%20training/#Multi-node)
- [Single GPU](Distributed%20training/#single-gpu)
- [Self-balancing architecture](Distributed%20training/#self-balancing-architecture)


**Experiment Logging**   

- [Display metrics in progress bar](Logging/#display-metrics-in-progress-bar)
- Log arbitrary metrics
- [Log metric row every k batches](Logging/#log-metric-row-every-k-batches)
- [Process position](Logging/#process-position)
- [Save a snapshot of all hyperparameters](Logging/#save-a-snapshot-of-all-hyperparameters) 
- [Snapshot code for a training run](Logging/#snapshot-code-for-a-training-run) 
- [Write logs file to csv every k batches](Logging/#write-logs-file-to-csv-every-k-batches)

**Training loop**    

- [Accumulate gradients](Training%20Loop/#accumulated-gradients)
- [Anneal Learning rate](Training%20Loop/#anneal-learning-rate)
- [Force training for min or max epochs](Training%20Loop/#force-training-for-min-or-max-epochs)
- [Force disable early stop](Training%20Loop/#force-disable-early-stop)
- [Use multiple optimizers (like GANs)](../Pytorch-lightning/LightningModule/#configure_optimizers)
- [Set how much of the training set to check (1-100%)](Training%20Loop/#set-how-much-of-the-training-set-to-check)

**Validation loop**    

- [Check validation every n epochs](Validation%20Loop/#check-validation-every-n-epochs)
- [Set how much of the validation set to check](Validation%20Loop/#set-how-much-of-the-validation-set-to-check)
- [Set how much of the test set to check](Validation%20Loop/#set-how-much-of-the-test-set-to-check)
- [Set validation check frequency within 1 training epoch](Validation%20Loop/#set-validation-check-frequency-within-1-training-epoch)
- [Set the number of validation sanity steps](Validation%20Loop/#set-the-number-of-validation-sanity-steps)
added trainer docs 2019-06-27 15:03:53 +00:00			`# Trainer`
			`[[Github Code](https://github.com/williamFalcon/pytorch-lightning/blob/master/pytorch_lightning/models/trainer.py)]`

			`The lightning trainer abstracts best practices for running a training, val, test routine. It calls parts of your model when it wants to hand over full control and otherwise makes training assumptions which are now standard practice in AI research.`

			`This is the basic use of the trainer:`

			``` {.python}
			`from pytorch_lightning import Trainer`

			`model = LightningTemplate()`

			`trainer = Trainer()`
			`trainer.fit(model)`
			```

			`But of course the fun is in all the advanced things it can do:`


added val loop options 2019-06-27 17:58:13 +00:00			`Checkpointing`
added trainer docs 2019-06-27 15:03:53 +00:00
added val loop options 2019-06-27 17:58:13 +00:00			`- Model saving`
			`- Model loading`
added trainer docs 2019-06-27 15:03:53 +00:00
added val loop options 2019-06-27 17:58:13 +00:00			`Computing cluster (SLURM)`

removed checkpoint save_function option 2019-06-28 21:14:18 +00:00			`- [Running grid search on a cluster](SLURM%20Managed%20Cluster/#running-grid-search-on-a-cluster)`
			`- [Walltime auto-resubmit](SLURM%20Managed%20Cluster/#walltime-auto-resubmit)`
added val loop options 2019-06-27 17:47:15 +00:00
			`Debugging`

			`- [Fast dev run](Debugging/#fast-dev-run)`
			`- [Inspect gradient norms](Debugging/#inspect-gradient-norms)`
			`- [Log GPU usage](Debugging/#Log-gpu-usage)`
			`- [Make model overfit on subset of data](Debugging/#make-model-overfit-on-subset-of-data)`
			`- [Print the parameter count by layer](Debugging/#print-the-parameter-count-by-layer)`
			`- [Pring which gradients are nan](Debugging/#print-which-gradients-are-nan)`


added val loop options 2019-06-27 17:58:13 +00:00			`Distributed training`

debugging and gpu guide 2019-06-27 18:22:00 +00:00			`- [16-bit mixed precision](Distributed%20training/#16-bit-mixed-precision)`
			`- [Multi-GPU](Distributed%20training/#Multi-GPU)`
			`- [Multi-node](Distributed%20training/#Multi-node)`
			`- [Single GPU](Distributed%20training/#single-gpu)`
			`- [Self-balancing architecture](Distributed%20training/#self-balancing-architecture)`

added val loop options 2019-06-27 17:58:13 +00:00
added val loop options 2019-06-27 17:47:15 +00:00			`Experiment Logging`

			`- [Display metrics in progress bar](Logging/#display-metrics-in-progress-bar)`
			`- Log arbitrary metrics`
debugging and gpu guide 2019-06-27 18:22:00 +00:00			`- [Log metric row every k batches](Logging/#log-metric-row-every-k-batches)`
added val loop options 2019-06-27 17:47:15 +00:00			`- [Process position](Logging/#process-position)`
debugging and gpu guide 2019-06-27 18:22:00 +00:00			`- [Save a snapshot of all hyperparameters](Logging/#save-a-snapshot-of-all-hyperparameters)`
			`- [Snapshot code for a training run](Logging/#snapshot-code-for-a-training-run)`
added val loop options 2019-06-27 17:58:13 +00:00			`- [Write logs file to csv every k batches](Logging/#write-logs-file-to-csv-every-k-batches)`
added trainer docs 2019-06-27 15:03:53 +00:00
added val loop options 2019-06-27 17:58:13 +00:00			`Training loop`
added trainer docs 2019-06-27 15:03:53 +00:00
added val loop options 2019-06-27 17:58:13 +00:00			`- [Accumulate gradients](Training%20Loop/#accumulated-gradients)`
			`- [Anneal Learning rate](Training%20Loop/#anneal-learning-rate)`
			`- [Force training for min or max epochs](Training%20Loop/#force-training-for-min-or-max-epochs)`
			`- [Force disable early stop](Training%20Loop/#force-disable-early-stop)`
			`- [Use multiple optimizers (like GANs)](../Pytorch-lightning/LightningModule/#configure_optimizers)`
			`- [Set how much of the training set to check (1-100%)](Training%20Loop/#set-how-much-of-the-training-set-to-check)`
added trainer docs 2019-06-27 15:03:53 +00:00
added val loop options 2019-06-27 17:58:13 +00:00			`Validation loop`
added trainer docs 2019-06-27 15:03:53 +00:00
added val loop options 2019-06-27 17:58:13 +00:00			`- [Check validation every n epochs](Validation%20Loop/#check-validation-every-n-epochs)`
			`- [Set how much of the validation set to check](Validation%20Loop/#set-how-much-of-the-validation-set-to-check)`
			`- [Set how much of the test set to check](Validation%20Loop/#set-how-much-of-the-test-set-to-check)`
			`- [Set validation check frequency within 1 training epoch](Validation%20Loop/#set-validation-check-frequency-within-1-training-epoch)`
			`- [Set the number of validation sanity steps](Validation%20Loop/#set-the-number-of-validation-sanity-steps)`