lightning/docs/Trainer/index.md

2.8 KiB

Trainer

[Github Code]

The lightning trainer abstracts best practices for running a training, val, test routine. It calls parts of your model when it wants to hand over full control and otherwise makes training assumptions which are now standard practice in AI research.

This is the basic use of the trainer:

from pytorch_lightning import Trainer

model = LightningTemplate()

trainer = Trainer()
trainer.fit(model)

But of course the fun is in all the advanced things it can do:

Checkpointing

  • Model saving
  • Model loading

Computing cluster (SLURM)

  • Automatic checkpointing
  • Automatic saving, loading
  • Running grid search on a cluster
  • Walltime auto-resubmit

Debugging

Distributed training

  • 16-bit mixed precision
  • Single-gpu
  • Multi-gpu
  • Multi-node

Experiment Logging

Training loop

Validation loop