Note CoreNLP tokenizer correction on website

2018-07-02 11:35:31 +02:00 · 2018-07-02 11:35:31 +02:00 · a85620a731
parent 06c6dc6fbc
commit a85620a731
1 changed files with 9 additions and 3 deletions
--- a/website/usage/_facts-figures/_benchmarks.jade
+++ b/website/usage/_facts-figures/_benchmarks.jade
@ -157,7 +157,13 @@ p

 +infobox("Important note", "⚠️")
    |  This evaluation was conducted in 2015. We're working on benchmarks on
-    |  current CPU and GPU hardware.
+    |  current CPU and GPU hardware. In the meantime, we're grateful to the
+    |  Stanford folks for drawing our attention to what seems
+    |  to be #[+a("https://nlp.stanford.edu/software/tokenizer.html#Speed") a long-standing error] 
+    |  in our CoreNLP benchmarks, especially for their 
+    |  tokenizer. Until we run corrected experiments, we have updated the table
+    |  using their figures.
+

 +aside("Methodology")
    |  #[strong Set up:] 100,000 plain-text documents were streamed from an
@ -183,14 +189,14 @@ p
    +row
        +cell #[strong spaCy]
        each data in [ "0.2ms", "1ms", "19ms"]
-            +cell("num") #[strong=data]
+            +cell("num")=data

        each data in ["1x", "1x", "1x"]
            +cell("num")=data

    +row
        +cell CoreNLP
-        each data in ["2ms", "10ms", "49ms", "10x", "10x", "2.6x"]
+        each data in ["0.18ms", "10ms", "49ms", "0.9x", "10x", "2.6x"]
            +cell("num")=data
    +row
        +cell ZPar