2020-08-13 22:56:51 +00:00
.. _optimizers:
2020-02-11 04:55:22 +00:00
Optimization
===============
Learning rate scheduling
2020-06-17 21:44:11 +00:00
------------------------
2020-02-11 04:55:22 +00:00
Every optimizer you use can be paired with any `LearningRateScheduler <https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate> `_ .
2020-05-05 02:16:54 +00:00
.. testcode ::
2020-02-11 04:55:22 +00:00
# no LR scheduler
def configure_optimizers(self):
return Adam(...)
# Adam + LR scheduler
def configure_optimizers(self):
2020-04-10 20:14:51 +00:00
optimizer = Adam(...)
scheduler = ReduceLROnPlateau(optimizer, ...)
return [optimizer], [scheduler]
2020-02-11 04:55:22 +00:00
2020-05-28 20:06:15 +00:00
# Two optimizers each with a scheduler
2020-02-11 04:55:22 +00:00
def configure_optimizers(self):
2020-04-10 20:14:51 +00:00
optimizer1 = Adam(...)
optimizer2 = SGD(...)
scheduler1 = ReduceLROnPlateau(optimizer1, ...)
scheduler2 = LambdaLR(optimizer2, ...)
return [optimizer1, optimizer2], [scheduler1, scheduler2]
2020-02-11 04:55:22 +00:00
2020-03-19 13:22:29 +00:00
# Same as above with additional params passed to the first scheduler
def configure_optimizers(self):
optimizers = [Adam(...), SGD(...)]
schedulers = [
{
2020-04-10 20:14:51 +00:00
'scheduler': ReduceLROnPlateau(optimizers[0], ...),
2020-03-19 13:22:29 +00:00
'monitor': 'val_recall', # Default: val_loss
'interval': 'epoch',
'frequency': 1
},
2020-04-10 20:14:51 +00:00
LambdaLR(optimizers[1], ...)
2020-03-19 13:22:29 +00:00
]
return optimizers, schedulers
2020-06-19 06:38:10 +00:00
----------
2020-02-11 04:55:22 +00:00
Use multiple optimizers (like GANs)
2020-06-17 21:44:11 +00:00
-----------------------------------
2020-02-11 04:55:22 +00:00
To use multiple optimizers return > 1 optimizers from :meth: `pytorch_lightning.core.LightningModule.configure_optimizers`
2020-05-05 02:16:54 +00:00
.. testcode ::
2020-02-11 04:55:22 +00:00
# one optimizer
def configure_optimizers(self):
return Adam(...)
# two optimizers, no schedulers
def configure_optimizers(self):
return Adam(...), SGD(...)
# Two optimizers, one scheduler for adam only
def configure_optimizers(self):
return [Adam(...), SGD(...)], [ReduceLROnPlateau()]
Lightning will call each optimizer sequentially:
.. code-block :: python
for epoch in epochs:
for batch in data:
for opt in optimizers:
train_step(opt)
opt.step()
for scheduler in scheduler:
scheduler.step()
2020-06-19 06:38:10 +00:00
----------
2020-02-11 04:55:22 +00:00
Step optimizers at arbitrary intervals
2020-06-17 21:44:11 +00:00
--------------------------------------
2020-02-11 04:55:22 +00:00
To do more interesting things with your optimizers such as learning rate warm-up or odd scheduling,
2020-02-27 21:07:51 +00:00
override the :meth: `optimizer_step` function.
2020-02-11 04:55:22 +00:00
For example, here step optimizer A every 2 batches and optimizer B every 4 batches
2020-05-05 02:16:54 +00:00
.. testcode ::
2020-02-11 04:55:22 +00:00
2020-09-04 16:57:21 +00:00
def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_idx, second_order_closure=None, on_tpu=False, using_native_amp=False, using_lbfgs=False):
2020-02-11 04:55:22 +00:00
optimizer.step()
2020-06-25 20:02:16 +00:00
def optimizer_zero_grad(self, current_epoch, batch_idx, optimizer, opt_idx):
optimizer.zero_grad()
2020-02-11 04:55:22 +00:00
# Alternating schedule for optimizer steps (ie: GANs)
2020-09-04 16:57:21 +00:00
def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_idx, second_order_closure=None, on_tpu=False, using_native_amp=False, using_lbfgs=False):
2020-02-11 04:55:22 +00:00
# update generator opt every 2 steps
if optimizer_i == 0:
if batch_nb % 2 == 0 :
optimizer.step()
optimizer.zero_grad()
# update discriminator opt every 4 steps
if optimizer_i == 1:
if batch_nb % 4 == 0 :
optimizer.step()
optimizer.zero_grad()
# ...
# add as many optimizers as you want
Here we add a learning-rate warm up
2020-05-05 02:16:54 +00:00
.. testcode ::
2020-02-11 04:55:22 +00:00
# learning rate warm-up
2020-09-04 16:57:21 +00:00
def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_idx, second_order_closure=None, on_tpu=False, using_native_amp=False, using_lbfgs=False):
2020-02-11 04:55:22 +00:00
# warm up lr
if self.trainer.global_step < 500:
lr_scale = min(1., float(self.trainer.global_step + 1) / 500.)
for pg in optimizer.param_groups:
pg['lr'] = lr_scale * self.hparams.learning_rate
# update params
optimizer.step()
2020-02-27 21:07:51 +00:00
optimizer.zero_grad()
2020-09-23 08:43:10 +00:00
----------
Using the closure functions for optimization
--------------------------------------------
When using optimization schemes such as LBFGS, the `second_order_closure` needs to be enabled. By default, this function is defined by wrapping the `training_step` and the backward steps as follows
.. testcode ::
def second_order_closure(pl_module, split_batch, batch_idx, opt_idx, optimizer, hidden):
# Model training step on a given batch
result = pl_module.training_step(split_batch, batch_idx, opt_idx, hidden)
# Model backward pass
pl_module.backward(result, optimizer, opt_idx)
# on_after_backward callback
pl_module.on_after_backward(result.training_step_output, batch_idx, result.loss)
return result
# This default `second_order_closure` function can be enabled by passing it directly into the `optimizer.step`
def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_idx, second_order_closure, on_tpu=False, using_native_amp=False, using_lbfgs=False):
# update params
optimizer.step(second_order_closure)