💫 Industrial-strength Natural Language Processing (NLP) in Python
Go to file
Matthew Honnibal 5edac11225 * Wrap self.parse in nogil, and break if an invalid move is predicted. The invalid break is a work-around that papers over likely bugs, but we can't easily break in the nogil block, and otherwise we'll get an infinite loop. Need to set this as an error flag. 2015-09-06 04:15:00 +02:00
bin * Copy gazetteer file in init_model 2015-08-06 16:07:23 +02:00
contributors Add CLA for suchow 2015-04-19 13:01:38 -07:00
corpora/en * Add clusters file 2015-07-23 09:35:56 +02:00
docs * Work on reorganization of docs 2015-08-08 19:14:32 +02:00
lang_data/en * Add spaCy to gazetteer 2015-08-08 23:30:49 +02:00
spacy * Wrap self.parse in nogil, and break if an invalid move is predicted. The invalid break is a work-around that papers over likely bugs, but we can't easily break in the nogil block, and otherwise we'll get an infinite loop. Need to set this as an error flag. 2015-09-06 04:15:00 +02:00
tests * Fix test partial parse 2015-08-08 23:45:36 +02:00
.gitignore * Ignore spacy/serialize/*.cpp 2015-07-17 01:36:49 +02:00
.travis.yml * Fix travis.yml 2015-07-24 01:43:27 +02:00
LICENSE.txt Tweak line spacing 2015-04-19 13:01:38 -07:00
MANIFEST.in * Add manifest file 2015-01-30 16:49:02 +11:00
README.md * Upd readme 2015-07-01 15:39:38 +02:00
bootstrap_python_env.sh * Add bootstrap script 2015-03-16 14:01:36 -04:00
dev_setup.py Tweak line spacing 2015-04-19 13:01:38 -07:00
fabfile.py * Update prebuild command, for shell bug 2015-07-27 01:52:04 +02:00
requirements.txt * Require preshed 0.41 2015-07-25 22:36:43 +02:00
setup.py * Compile spacy.matcher 2015-08-05 23:48:11 +02:00
wordnet_license.txt * Add WordNet license file 2015-02-01 16:11:53 +11:00

README.md

spaCy: Industrial-strength NLP

spaCy is a library for advanced natural language processing in Python and Cython.

Documentation and details: http://spacy.io/

spaCy is built on the very latest research, but it isn't researchware. It was designed from day 1 to be used in real products. You can buy a commercial license, or you can use it under the AGPL.

Features

  • Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
  • Named entity recognition (82.6% accuracy on OntoNotes 5)
  • Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
  • Easy to use word vectors
  • All strings mapped to integer IDs
  • Export to numpy data arrays
  • Alignment maintained to original string, ensuring easy mark up calculation
  • Range of easy-to-use orthographic features.
  • No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Top Pefomance

  • Fastest in the world: <50ms per document. No faster system has ever been announced.
  • Accuracy within 1% of the current state of the art on all tasks performed (parsing, named entity recognition, part-of-speech tagging). The only more accurate systems are an order of magnitude slower or more.

Supports

  • CPython 2.7
  • CPython 3.4
  • OSX
  • Linux
  • Cygwin

Want to support:

  • Visual Studio

Difficult to support:

  • PyPy 2.7
  • PyPy 3.4