mirror of https://github.com/explosion/spaCy.git
Note CoreNLP tokenizer correction on website
This commit is contained in:
parent
06c6dc6fbc
commit
a85620a731
|
@ -157,7 +157,13 @@ p
|
||||||
|
|
||||||
+infobox("Important note", "⚠️")
|
+infobox("Important note", "⚠️")
|
||||||
| This evaluation was conducted in 2015. We're working on benchmarks on
|
| This evaluation was conducted in 2015. We're working on benchmarks on
|
||||||
| current CPU and GPU hardware.
|
| current CPU and GPU hardware. In the meantime, we're grateful to the
|
||||||
|
| Stanford folks for drawing our attention to what seems
|
||||||
|
| to be #[+a("https://nlp.stanford.edu/software/tokenizer.html#Speed") a long-standing error]
|
||||||
|
| in our CoreNLP benchmarks, especially for their
|
||||||
|
| tokenizer. Until we run corrected experiments, we have updated the table
|
||||||
|
| using their figures.
|
||||||
|
|
||||||
|
|
||||||
+aside("Methodology")
|
+aside("Methodology")
|
||||||
| #[strong Set up:] 100,000 plain-text documents were streamed from an
|
| #[strong Set up:] 100,000 plain-text documents were streamed from an
|
||||||
|
@ -183,14 +189,14 @@ p
|
||||||
+row
|
+row
|
||||||
+cell #[strong spaCy]
|
+cell #[strong spaCy]
|
||||||
each data in [ "0.2ms", "1ms", "19ms"]
|
each data in [ "0.2ms", "1ms", "19ms"]
|
||||||
+cell("num") #[strong=data]
|
+cell("num")=data
|
||||||
|
|
||||||
each data in ["1x", "1x", "1x"]
|
each data in ["1x", "1x", "1x"]
|
||||||
+cell("num")=data
|
+cell("num")=data
|
||||||
|
|
||||||
+row
|
+row
|
||||||
+cell CoreNLP
|
+cell CoreNLP
|
||||||
each data in ["2ms", "10ms", "49ms", "10x", "10x", "2.6x"]
|
each data in ["0.18ms", "10ms", "49ms", "0.9x", "10x", "2.6x"]
|
||||||
+cell("num")=data
|
+cell("num")=data
|
||||||
+row
|
+row
|
||||||
+cell ZPar
|
+cell ZPar
|
||||||
|
|
Loading…
Reference in New Issue