💫 Industrial-strength Natural Language Processing (NLP) in Python

ai artificial-intelligence cython data-science deep-learning entity-linking machine-learning named-entity-recognition natural-language-processing neural-network neural-networks nlp nlp-library python spacy starred-explosion-repo starred-repo text-classification tokenization

Go to file

Matthew Honnibal 65934b7cd4 * Enforce import of ujson in strings.pyx, because otherwise it's too slow		2015-11-04 00:32:02 +11:00
appveyor@9f94a16f0e	Adding submodule spaCy-appveyor-toolkit	2015-10-25 20:22:49 +03:00
bin	* Update conll_train.py script for spaCy v0.97	2015-10-31 00:53:51 +11:00
contributors	Add contributor.	2015-10-07 17:55:46 -07:00
corpora/en	* Add wordnet	2015-09-21 19:06:48 +10:00
examples	* Add simple deep feed-forward neural network text classification example.	2015-10-19 23:44:49 +11:00
lang_data	* Fix non-breaking space in specials.json	2015-10-19 12:46:11 +11:00
services	* Add displacy service	2015-10-28 17:36:11 +01:00
spacy	* Enforce import of ujson in strings.pyx, because otherwise it's too slow	2015-11-04 00:32:02 +11:00
website	* Replace deprecated repvec reference in twitter-filter	2015-11-03 03:21:26 +11:00
.appveyor.yml	Added project dir to PYTHONPATH	2015-10-25 21:51:33 +03:00
.gitignore	Added Windows file to .gitignore	2015-10-13 10:58:30 +03:00
.gitmodules	Switching to henningpeters/spaCy-appveyor-toolkit	2015-10-26 00:16:35 +03:00
.travis.yml	* Update travis.yml for new tests path	2015-10-26 00:31:04 +11:00
LICENSE.txt	* Change from AGPL to MIT	2015-09-28 07:37:12 +10:00
MANIFEST.in	* Add manifest file	2015-01-30 16:49:02 +11:00
README-MSVC.txt	Small addition to MSVC readme	2015-10-25 23:05:11 +03:00
README.md	Update README.md	2015-10-27 01:32:40 +11:00
bootstrap_python_env.sh	* Add bootstrap script	2015-03-16 14:01:36 -04:00
fabfile.py	* Fix prebuild command	2015-11-03 07:30:33 +01:00
requirements.txt	* Update requirements	2015-11-03 02:40:01 +11:00
setup.py	* Rename spans.pyx to span.pyx	2015-11-04 00:14:40 +11:00
wordnet_license.txt	* Add WordNet license file	2015-02-01 16:11:53 +11:00

README.md

spaCy: Industrial-strength NLP

spaCy is a library for advanced natural language processing in Python and Cython.

Documentation and details: http://spacy.io/

spaCy is built on the very latest research, but it isn't researchware. It was designed from day 1 to be used in real products. It's commercial open-source software, released under the MIT license.

Features

Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
Named entity recognition (82.6% accuracy on OntoNotes 5)
Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
Easy to use word vectors
All strings mapped to integer IDs
Export to numpy data arrays
Alignment maintained to original string, ensuring easy mark up calculation
Range of easy-to-use orthographic features.
No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Top Peformance

Fastest in the world: <50ms per document. No faster system has ever been announced.
Accuracy within 1% of the current state of the art on all tasks performed (parsing, named entity recognition, part-of-speech tagging). The only more accurate systems are an order of magnitude slower or more.

Supports

CPython 2.7
CPython 3.4
OSX
Linux
Cygwin

Want to support:

Visual Studio

Difficult to support:

PyPy 2.7
PyPy 3.4