spaCy/docs/source/index.rst

48 lines
1.9 KiB
ReStructuredText

.. spaCy documentation master file, created by
sphinx-quickstart on Tue Aug 19 16:27:38 2014.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
spaCy NLP Tokenizer and Lexicon
================================
spaCy is a library for industrial strength NLP in Python and Cython. Its core
values are efficiency, accuracy and minimalism.
* Efficiency: spaCy is TODOx faster than the Stanford tools, and TODOx faster
than NLTK. You won't find faster NLP tools. Using spaCy will save you
thousands in server costs, and will force you to make fewer compromises.
* Accuracy: All spaCy tools are within 0.5% of the current published
state-of-the-art, on both news and web text. NLP moves fast, so always check
the numbers --- and don't settle for tools that aren't backed by
rigorous recent evaluation. An algorithm that was "close enough to state-of-the-art"
5 years ago is probably crap by today's standards.
* Minimalism: This isn't a library that covers 43 known algorithms to do X. You
get 1 --- the best one --- with a simple, low-level interface. This keeps the
code-base small and concrete. Our Python APIs use lists and
dictionaries, and our C/Cython APIs use arrays and simple structs.
Comparison
----------
+-------------+-------------+---+-----------+--------------+
| POS taggers | Speed (w/s) | % Acc. (news) | % Acc. (web) |
+-------------+-------------+---------------+--------------+
| spaCy | | | |
+-------------+-------------+---------------+--------------+
| Stanford | 16,000 | | |
+-------------+-------------+---------------+--------------+
| NLTK | | | |
+-------------+-------------+---------------+--------------+
.. toctree::
:hidden:
:maxdepth: 3
what/index.rst
why/index.rst
how/index.rst