mirror of https://github.com/explosion/spaCy.git
48 lines
1.9 KiB
ReStructuredText
48 lines
1.9 KiB
ReStructuredText
.. spaCy documentation master file, created by
|
|
sphinx-quickstart on Tue Aug 19 16:27:38 2014.
|
|
You can adapt this file completely to your liking, but it should at least
|
|
contain the root `toctree` directive.
|
|
|
|
spaCy NLP Tokenizer and Lexicon
|
|
================================
|
|
|
|
spaCy is a library for industrial strength NLP in Python. Its core
|
|
values are:
|
|
|
|
* **Efficiency**: You won't find faster NLP tools. For shallow analysis, it's 10x
|
|
faster than Stanford Core NLP, and over 200x faster than NLTK. Its parser is
|
|
over 100x faster than Stanford's.
|
|
|
|
* **Accuracy**: All spaCy tools are within 0.5% of the current published
|
|
state-of-the-art, on both news and web text. NLP moves fast, so always check
|
|
the numbers --- and don't settle for tools that aren't backed by
|
|
rigorous recent evaluation.
|
|
|
|
* **Minimalism**: This isn't a library that covers 43 known algorithms to do X. You
|
|
get 1 --- the best one --- with a simple, low-level interface. This keeps the
|
|
code-base small and concrete. Our Python APIs use lists and
|
|
dictionaries, and our C/Cython APIs use arrays and simple structs.
|
|
|
|
|
|
Comparison
|
|
----------
|
|
|
|
+----------------+-------------+--------+---------------+--------------+
|
|
| Tokenize & Tag | Speed (w/s) | Memory | % Acc. (news) | % Acc. (web) |
|
|
+----------------+-------------+--------+---------------+--------------+
|
|
| spaCy | 107,000 | 1.3gb | 96.7 | |
|
|
+----------------+-------------+--------+---------------+--------------+
|
|
| Stanford | 8,000 | 1.5gb | 96.7 | |
|
|
+----------------+-------------+--------+---------------+--------------+
|
|
| NLTK | 543 | 61mb | 94.0 | |
|
|
+----------------+-------------+--------+---------------+--------------+
|
|
|
|
|
|
.. toctree::
|
|
:hidden:
|
|
:maxdepth: 3
|
|
|
|
what/index.rst
|
|
why/index.rst
|
|
how/index.rst
|