lightning/docs/Trainer/Checkpointing.md

40 lines
1.1 KiB
Markdown
Raw Normal View History

Lightning can automate saving and loading checkpoints.
---
### Model saving
2019-06-28 21:42:32 +00:00
To enable checkpointing, define the checkpoint callback and give it to the trainer.
``` {.python}
from pytorch_lightning.utils.pt_callbacks import ModelCheckpoint
2019-06-28 21:42:32 +00:00
checkpoint_callback = ModelCheckpoint(
filepath='/path/to/store/weights.ckpt',
2019-06-28 21:42:32 +00:00
save_best_only=True,
verbose=True,
2019-06-28 21:42:32 +00:00
monitor='val_loss',
mode='min'
)
2019-06-28 21:42:32 +00:00
trainer = Trainer(checkpoint_callback=checkpoint_callback)
```
2019-08-07 11:09:37 +00:00
---
### Restoring training session
You might want to not only load a model but also continue training it. Use this method to
restore the trainer state as well. This will continue from the epoch and global step you last left off.
However, the dataloaders will start from the first batch again (if you shuffled it shouldn't matter).
Lightning will restore the session if you pass an experiment with the same version and there's a saved checkpoint.
``` {.python}
from test_tube import Experiment
exp = Experiment(version=a_previous_version_with_a_saved_checkpoint)
Trainer(experiment=exp)
trainer = Trainer(checkpoint_callback=checkpoint_callback)
# the trainer is now restored
```
2019-06-28 21:42:32 +00:00