From d8ef2d6b614e0cf27c5b1240574ff960836c496c Mon Sep 17 00:00:00 2001 From: Matthew Honnibal Date: Wed, 1 Jul 2015 15:27:37 +0200 Subject: [PATCH] * Upd README.md --- README.md | 41 +++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 249f7a6fb..d2006cf48 100644 --- a/README.md +++ b/README.md @@ -1,29 +1,38 @@ spaCy ===== -http://spacy.io +spaCy is a library for industrial-strength NLP in Python and Cython. -The fastest natural language understanding pipeline available. Focus on good API, easy installation, and documentation. +Documentation and details: http://spacy.io/ -Commercial licenses available, otherwise under AGPL. +spaCy is built on the very latest research, but it isn't researchware. It was +designed from day 1 to be used in real products. + +I left academia to make spaCy my full-time job. You can buy a commercial +license, or you can use it under the AGPL. -2015-06-24 v0.86 ----------------- +Features +-------- -* Improvements in parser accuracy +* Labelled dependency parsing (91.8% accuracy on OntoNotes 5) +* Named entity recognition (82.6% accuracy on OntoNotes 5) +* Part-of-speech tagging (97.1% accuracy on OntoNotes 5) +* Easy to use word vectors +* All strings mapped to integer IDs +* Export to numpy data arrays +* Alignment maintained to original string, ensuring easy mark up calculation +* Range of easy-to-use orthographic features. +* No pre-processing required. spaCy takes raw text as input, warts and newlines and all. -2015-06-08 v0.85 ----------------- - -* Improvements in parser accuracy - -2015-05-12 v0.84 ----------------- - -* Bug fixes for parsing -* Bug fixes for named entity recognition +Top Pefomance +------------- +* Fastest in the world: <50ms per document. No faster system has ever been + announced. +* Accuracy within 1% of the current state of the art on all tasks performed + (parsing, named entity recognition, part-of-speech tagging). The only more + accurate systems are an order of magnitude slower or more. Supports --------