💫 Industrial-strength Natural Language Processing (NLP) in Python

ai artificial-intelligence cython data-science deep-learning entity-linking machine-learning named-entity-recognition natural-language-processing neural-network neural-networks nlp nlp-library python spacy starred-explosion-repo starred-repo text-classification tokenization

Go to file

Matthew Honnibal 27f988b167 * Remove the vectors option to Vocab, preferring to either load vectors from disk, or set them on the Lexeme objects.		2015-09-15 14:41:48 +10:00
bin	* Copy tag_map.json in init_model	2015-09-12 05:54:02 +02:00
contributors	Merge pull request #85 from NSchrading/master	2015-09-07 09:05:19 +10:00
corpora/en	* Add clusters file	2015-07-23 09:35:56 +02:00
examples	* Begin rewriting twitter_filter examples	2015-08-22 22:12:26 +02:00
lang_data	* Bug fix to gazetteer.json	2015-09-10 14:50:44 +02:00
spacy	* Remove the vectors option to Vocab, preferring to either load vectors from disk, or set them on the Lexeme objects.	2015-09-15 14:41:48 +10:00
tests	* Add tests for new vectors functionality	2015-09-14 17:48:13 +10:00
.gitignore	* Ignore keys and other things	2015-08-22 22:12:07 +02:00
.travis.yml	* Remove OSX from build matrix	2015-09-13 00:02:03 +02:00
LICENSE.txt	Tweak line spacing	2015-04-19 13:01:38 -07:00
MANIFEST.in	* Add manifest file	2015-01-30 16:49:02 +11:00
README.md	* Upd readme	2015-07-01 15:39:38 +02:00
bootstrap_python_env.sh	* Add bootstrap script	2015-03-16 14:01:36 -04:00
dev_setup.py	Tweak line spacing	2015-04-19 13:01:38 -07:00
fabfile.py	* Uploade prebuild command in fabfile	2015-09-13 01:27:49 +02:00
requirements.txt	* Require preshed 0.41	2015-07-25 22:36:43 +02:00
setup.py	* Inc version	2015-09-13 01:26:29 +02:00
wordnet_license.txt	* Add WordNet license file	2015-02-01 16:11:53 +11:00

README.md

spaCy: Industrial-strength NLP

spaCy is a library for advanced natural language processing in Python and Cython.

Documentation and details: http://spacy.io/

spaCy is built on the very latest research, but it isn't researchware. It was designed from day 1 to be used in real products. You can buy a commercial license, or you can use it under the AGPL.

Features

Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
Named entity recognition (82.6% accuracy on OntoNotes 5)
Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
Easy to use word vectors
All strings mapped to integer IDs
Export to numpy data arrays
Alignment maintained to original string, ensuring easy mark up calculation
Range of easy-to-use orthographic features.
No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Top Pefomance

Fastest in the world: <50ms per document. No faster system has ever been announced.
Accuracy within 1% of the current state of the art on all tasks performed (parsing, named entity recognition, part-of-speech tagging). The only more accurate systems are an order of magnitude slower or more.

Supports

CPython 2.7
CPython 3.4
OSX
Linux
Cygwin

Want to support:

Visual Studio

Difficult to support:

PyPy 2.7
PyPy 3.4