2019-06-28 21:14:18 +00:00
|
|
|
Lightning can automate saving and loading checkpoints.
|
|
|
|
|
|
|
|
---
|
|
|
|
### Model saving
|
2019-06-28 21:42:32 +00:00
|
|
|
To enable checkpointing, define the checkpoint callback and give it to the trainer.
|
2019-06-28 21:14:18 +00:00
|
|
|
|
|
|
|
``` {.python}
|
|
|
|
from pytorch_lightning.utils.pt_callbacks import ModelCheckpoint
|
|
|
|
|
2019-06-28 21:42:32 +00:00
|
|
|
checkpoint_callback = ModelCheckpoint(
|
2019-06-28 21:14:18 +00:00
|
|
|
filepath='/path/to/store/weights.ckpt',
|
2019-06-28 21:42:32 +00:00
|
|
|
save_best_only=True,
|
2019-06-28 21:14:18 +00:00
|
|
|
verbose=True,
|
2019-06-28 21:42:32 +00:00
|
|
|
monitor='val_loss',
|
|
|
|
mode='min'
|
2019-06-28 21:14:18 +00:00
|
|
|
)
|
2019-06-28 21:42:32 +00:00
|
|
|
|
|
|
|
trainer = Trainer(checkpoint_callback=checkpoint_callback)
|
|
|
|
```
|
|
|
|
|
2019-08-07 11:09:37 +00:00
|
|
|
---
|
|
|
|
### Restoring training session
|
|
|
|
You might want to not only load a model but also continue training it. Use this method to
|
|
|
|
restore the trainer state as well. This will continue from the epoch and global step you last left off.
|
|
|
|
However, the dataloaders will start from the first batch again (if you shuffled it shouldn't matter).
|
|
|
|
|
|
|
|
Lightning will restore the session if you pass an experiment with the same version and there's a saved checkpoint.
|
|
|
|
``` {.python}
|
|
|
|
from test_tube import Experiment
|
|
|
|
|
|
|
|
exp = Experiment(version=a_previous_version_with_a_saved_checkpoint)
|
|
|
|
Trainer(experiment=exp)
|
|
|
|
|
|
|
|
trainer = Trainer(checkpoint_callback=checkpoint_callback)
|
|
|
|
# the trainer is now restored
|
|
|
|
```
|
|
|
|
|
2019-06-28 21:42:32 +00:00
|
|
|
|
|
|
|
|