32 lines
945 B
ReStructuredText
32 lines
945 B
ReStructuredText
Training Tricks
|
|
================
|
|
Lightning implements various tricks to help during training
|
|
|
|
Accumulate gradients
|
|
-------------------------------------
|
|
Accumulated gradients runs K small batches of size N before doing a backwards pass.
|
|
The effect is a large effective batch size of size KxN.
|
|
|
|
.. note:: See: :ref:`trainer`
|
|
|
|
.. code-block:: python
|
|
|
|
# DEFAULT (ie: no accumulated grads)
|
|
trainer = Trainer(accumulate_grad_batches=1)
|
|
|
|
|
|
Gradient Clipping
|
|
-------------------------------------
|
|
Gradient clipping may be enabled to avoid exploding gradients. Specifically, this will `clip the gradient
|
|
norm <https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_norm_>`_ computed over all model parameters together.
|
|
|
|
.. note:: See: :ref:`trainer`
|
|
|
|
.. code-block:: python
|
|
|
|
# DEFAULT (ie: don't clip)
|
|
trainer = Trainer(gradient_clip_val=0)
|
|
|
|
# clip gradients with norm above 0.5
|
|
trainer = Trainer(gradient_clip_val=0.5)
|