💫 Industrial-strength Natural Language Processing (NLP) in Python
Go to file
maxirmx 815994a212 MSVC x86-64 Pyton 2.7 dirty build 2015-10-10 17:32:44 +03:00
bin * Add script to train models off the UD treebanks. Note that the UD data is restricted to research purposes only, and should only be used to train models for academic experiments. 2015-10-08 12:01:08 +11:00
contributors Add contributor. 2015-10-07 17:55:46 -07:00
corpora/en * Add wordnet 2015-09-21 19:06:48 +10:00
examples * Whitespace 2015-10-06 10:37:07 +11:00
lang_data * Start adding auxiliaries to morphs.json 2015-09-27 16:56:34 +10:00
spacy Merge pull request #129 from chrisdubois/patch-1 2015-10-08 12:04:41 +11:00
tests * Add a test for Issue #118: Matcher behaves unpredictably with overlapping entities 2015-10-01 16:21:00 +10:00
website * Set details(open=true) on docs while we redesign 2015-09-30 11:48:15 +10:00
.gitignore * Add sass-cache to gitignore 2015-09-24 18:14:21 +10:00
.travis.yml proposal for doctests 2015-09-24 16:57:11 +02:00
LICENSE.txt * Change from AGPL to MIT 2015-09-28 07:37:12 +10:00
MANIFEST.in * Add manifest file 2015-01-30 16:49:02 +11:00
README.md * Fix typo in README 2015-09-29 23:02:08 +10:00
bootstrap_python_env.sh * Add bootstrap script 2015-03-16 14:01:36 -04:00
build-Python27.bat MSVC x86-64 Pyton 2.7 dirty build 2015-10-10 17:32:44 +03:00
fabfile.py * Update the publish command, so that it creates a git tag 2015-09-22 02:26:10 +02:00
requirements.txt MSVC x86-64 Pyton 2.7 dirty build 2015-10-10 17:32:44 +03:00
setup.py MSVC x86-64 Pyton 2.7 dirty build 2015-10-10 17:32:44 +03:00
wordnet_license.txt * Add WordNet license file 2015-02-01 16:11:53 +11:00

README.md

spaCy: Industrial-strength NLP

spaCy is a library for advanced natural language processing in Python and Cython.

Documentation and details: http://spacy.io/

spaCy is built on the very latest research, but it isn't researchware. It was designed from day 1 to be used in real products. It's commercial open-source software, released under the MIT license.

Features

  • Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
  • Named entity recognition (82.6% accuracy on OntoNotes 5)
  • Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
  • Easy to use word vectors
  • All strings mapped to integer IDs
  • Export to numpy data arrays
  • Alignment maintained to original string, ensuring easy mark up calculation
  • Range of easy-to-use orthographic features.
  • No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Top Peformance

  • Fastest in the world: <50ms per document. No faster system has ever been announced.
  • Accuracy within 1% of the current state of the art on all tasks performed (parsing, named entity recognition, part-of-speech tagging). The only more accurate systems are an order of magnitude slower or more.

Supports

  • CPython 2.7
  • CPython 3.4
  • OSX
  • Linux
  • Cygwin

Want to support:

  • Visual Studio

Difficult to support:

  • PyPy 2.7
  • PyPy 3.4