* Update sales copy

This commit is contained in:
Matthew Honnibal 2014-11-03 13:54:18 +11:00
parent ae52f9f38c
commit b8d5881333
1 changed files with 14 additions and 6 deletions

View File

@ -9,13 +9,21 @@ spaCy NLP Tokenizer and Lexicon
spaCy is a library for industrial strength NLP in Python and Cython. Its core
values are efficiency, accuracy and minimalism.
* Efficiency: spaCy is
* Efficiency: spaCy is TODOx faster than the Stanford tools, and TODOx faster
than NLTK. You won't find faster NLP tools. Using spaCy will save you
thousands in server costs, and will force you to make fewer compromises.
It does not attempt to be comprehensive,
or to provide lavish syntactic sugar. This isn't a library that covers 43 known
algorithms to do X. You get 1 --- the best one --- with a simple, low-level interface.
For commercial users, the code is free but the data isn't. For researchers, both
are free and always will be.
* Accuracy: All spaCy tools are within 0.5% of the current published
state-of-the-art, on both news and web text. NLP moves fast, so always check
the numbers --- and don't settle for tools that aren't backed by
rigorous recent evaluation. An algorithm that was "close enough to state-of-the-art"
5 years ago is probably crap by today's standards.
* Minimalism: This isn't a library that covers 43 known algorithms to do X. You
get 1 --- the best one --- with a simple, low-level interface. This keeps the
code-base small and concrete. Our Python APIs use lists and
dictionaries, and our C/Cython APIs use arrays and simple structs.
Comparison
----------