Minor copyediting

This commit is contained in:
Jordan Suchow 2015-04-19 01:56:32 -07:00
parent 7bddd15e27
commit 1b79d947b9
7 changed files with 17 additions and 19 deletions

View File

@ -8,12 +8,12 @@ python:
- "2.7"
- "3.4"
# command to install dependencies
# install dependencies
install:
- "pip install --upgrade setuptools"
- "pip install -r requirements.txt"
- "export PYTHONPATH=`pwd`"
- "python setup.py build_ext --inplace"
# command to run tests
# run tests
script:
- py.test tests/

View File

@ -3,20 +3,18 @@ spaCy
http://honnibal.github.io/spaCy
Fast, state-of-the-art natural language processing pipeline. Commercial licenses available, or use under AGPL.
A pipeline for fast, state-of-the-art natural language processing. Commercial licenses available, otherwise under AGPL.
Version 0.80 released
---------------------
2015-04-13
* Preliminary named entity recognition support. Accuracy is currently
substantially behind the current state-of-the-art. I'm working on
improvements.
* Preliminary support for named-entity recognition. Its accuracy is substantially behind the state-of-the-art. I'm working on improvements.
* Better sentence boundary detection, drawn from the syntactic structure.
* Lots of bug fixes
* Lots of bug fixes.
Supports:

View File

@ -28,14 +28,14 @@ can access an excellent set of pre-computed orthographic and distributional feat
>>> are.check_flag(en.CAN_NOUN)
False
spaCy makes it easy to write very efficient NLP applications, because your feature
spaCy makes it easy to write efficient NLP applications, because your feature
functions have to do almost no work: almost every lexical property you'll want
is pre-computed for you. See the tutorial for an example POS tagger.
Benchmark
---------
The tokenizer itself is also very efficient:
The tokenizer itself is also efficient:
+--------+-------+--------------+--------------+
| System | Time | Words/second | Speed Factor |
@ -56,7 +56,7 @@ Pros:
- All tokens come with indices into the original string
- Full unicode support
- Extensible to other languages
- Extendable to other languages
- Batch operations computed efficiently in Cython
- Cython API
- numpy interoperability

View File

@ -135,7 +135,7 @@ lexical types.
In a sample of text, vocabulary size grows exponentially slower than word
count. So any computations we can perform over the vocabulary and apply to the
word count are very efficient.
word count are efficient.
Part-of-speech Tagger

View File

@ -37,7 +37,7 @@ tokenizer is suitable for production use.
I used to think that the NLP community just needed to do more to communicate
its findings to software engineers. So I wrote two blog posts, explaining
`how to write a part-of-speech tagger`_ and `parser`_. Both were very well received,
`how to write a part-of-speech tagger`_ and `parser`_. Both were well received,
and there's been a bit of interest in `my research software`_ --- even though
it's entirely undocumented, and mostly unuseable to anyone but me.
@ -202,7 +202,7 @@ this:
We wanted to refine the logic so that only adverbs modifying evocative verbs
of communication, like "pleaded", were highlighted. We've now built a vector that
represents that type of word, so now we can highlight adverbs based on very
represents that type of word, so now we can highlight adverbs based on
subtle logic, honing in on adverbs that seem the most stylistically
problematic, given our starting assumptions:

View File

@ -35,7 +35,7 @@ And if you're ever in acquisition or IPO talks, the story is simple.
spaCy can also be used as free open-source software, under the Aferro GPL
license. If you use it this way, you must comply with the AGPL license terms.
When you distribute your project, or offer it as a network service, you must
distribute the source-code, and grant users an AGPL license to it.
distribute the source-code and grant users an AGPL license to it.
.. I left academia in June 2014, just when I should have been submitting my first

View File

@ -7,8 +7,8 @@ Updates
Five days ago I presented the alpha release of spaCy, a natural language
processing library that brings state-of-the-art technology to small companies.
spaCy has been very well received, and there are now a lot of eyes on the project.
Naturally, lots of issues have surfaced. I'm very grateful to those who've reported
spaCy has been well received, and there are now a lot of eyes on the project.
Naturally, lots of issues have surfaced. I'm grateful to those who've reported
them. I've worked hard to address them as quickly as I could.
Bug Fixes
@ -26,7 +26,7 @@ Bug Fixes
just store an index into that list, instead of a hash.
* Parse tree navigation API was rough, and buggy.
The parse-tree navigation API was the last thing I added before v0.3. I've
The parse-tree navigation API was the last thing I added before v0.3. I've
now replaced it with something better. The previous API design was flawed,
and the implementation was buggy --- Token.child() and Token.head were
sometimes inconsistent.
@ -108,9 +108,9 @@ input to be segmented into sentences, but with no sentence segmenter. This
caused a drop in parse accuracy of 4%!
Over the last five days, I've worked hard to correct this. I implemented the
modifications to the parsing algorithm I had planned, from Dongdong Zhang et al
modifications to the parsing algorithm I had planned, from Dongdong Zhang et al.
(2013), and trained and evaluated the parser on raw text, using the version of
the WSJ distributed by Read et al (2012), and used in Dridan and Oepen's
the WSJ distributed by Read et al. (2012), and used in Dridan and Oepen's
experiments.
I'm pleased to say that on the WSJ at least, spaCy 0.4 performs almost exactly