From 61617c64d565246f0e9932071a7cb8ba90305a2b Mon Sep 17 00:00:00 2001
From: Matthew Honnibal <honnibal+gh@gmail.com>
Date: Sat, 16 Mar 2019 21:39:02 +0100
Subject: [PATCH] Revert changes to optimizer default hyper-params (WIP)
 (#3415)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

While developing v2.1, I ran a bunch of hyper-parameter search
experiments to find settings that performed well for spaCy's NER and
parser. I ended up changing the default Adam settings from beta1=0.9,
beta2=0.999, eps=1e-8 to beta1=0.8, beta2=0.8, eps=1e-5. This was giving
a small improvement in accuracy (like, 0.4%).

Months later, I run the models with Prodigy, which uses beam-search
decoding even when the model has been trained with a greedy objective.
The new models performed terribly...So, wtf? After a couple of days
debugging, I figured out that the new optimizer settings was causing the
model to converge to solutions where the top-scoring class often had
a score of like, -80. The variance on the weights had gone up
enormously. I guess I needed to update the L2 regularisation as well?

Anyway. Let's just revert the change --- if the optimizer is finding
such extreme solutions, that seems bad, and not nearly worth the small
improvement in accuracy.

Currently training a slate of models, to verify the accuracy change is minimal.
Once the training is complete, we can merge this.

<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
---
 spacy/_ml.py | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/spacy/_ml.py b/spacy/_ml.py
index c08dce100..a32d2cf20 100644
--- a/spacy/_ml.py
+++ b/spacy/_ml.py
@@ -48,11 +48,11 @@ def cosine(vec1, vec2):
 
 def create_default_optimizer(ops, **cfg):
     learn_rate = util.env_opt("learn_rate", 0.001)
-    beta1 = util.env_opt("optimizer_B1", 0.8)
-    beta2 = util.env_opt("optimizer_B2", 0.8)
-    eps = util.env_opt("optimizer_eps", 0.00001)
+    beta1 = util.env_opt("optimizer_B1", 0.9)
+    beta2 = util.env_opt("optimizer_B2", 0.999)
+    eps = util.env_opt("optimizer_eps", 1e-8)
     L2 = util.env_opt("L2_penalty", 1e-6)
-    max_grad_norm = util.env_opt("grad_norm_clip", 5.0)
+    max_grad_norm = util.env_opt("grad_norm_clip", 1.0)
     optimizer = Adam(ops, learn_rate, L2=L2, beta1=beta1, beta2=beta2, eps=eps)
     optimizer.max_grad_norm = max_grad_norm
     optimizer.device = ops.device