mirror of https://github.com/explosion/spaCy.git
29 lines
604 B
ReStructuredText
29 lines
604 B
ReStructuredText
|
Why
|
||
|
===
|
||
|
|
||
|
Benchmarks
|
||
|
----------
|
||
|
|
||
|
Efficiency
|
||
|
----------
|
||
|
|
||
|
+--------+-------+--------------+--------------+
|
||
|
| System | Time | Words/second | Speed Factor |
|
||
|
+--------+-------+--------------+--------------+
|
||
|
| NLTK | 6m4s | 89,000 | 1.00 |
|
||
|
+--------+-------+--------------+--------------+
|
||
|
| spaCy | 9.5s | 3,093,000 | 38.30 |
|
||
|
+--------+-------+--------------+--------------+
|
||
|
|
||
|
|
||
|
Accuracy
|
||
|
--------
|
||
|
|
||
|
The comparison refers to 30 million words from the English Gigaword, on
|
||
|
a Maxbook Air. For context, calling string.split() on the data completes in
|
||
|
about 5s.
|
||
|
|
||
|
Pros and Cons
|
||
|
-------------
|
||
|
|