Training Tricks ================ Lightning implements various tricks to help during training Accumulate gradients ------------------------------------- Accumulated gradients runs K small batches of size N before doing a backwards pass. The effect is a large effective batch size of size KxN. .. note:: See: :ref:`trainer` .. code-block:: python # DEFAULT (ie: no accumulated grads) trainer = Trainer(accumulate_grad_batches=1) Gradient Clipping ------------------------------------- Gradient clipping may be enabled to avoid exploding gradients. Specifically, this will `clip the gradient norm `_ computed over all model parameters together. .. note:: See: :ref:`trainer` .. code-block:: python # DEFAULT (ie: don't clip) trainer = Trainer(gradient_clip_val=0) # clip gradients with norm above 0.5 trainer = Trainer(gradient_clip_val=0.5)