Minor copyediting

2015-04-19 01:56:32 -07:00 · 2015-04-19 01:56:32 -07:00 · 1b79d947b9
parent 7bddd15e27
commit 1b79d947b9
7 changed files with 17 additions and 19 deletions
--- a/.travis.yml
+++ b/.travis.yml
@ -8,12 +8,12 @@ python:
  - "2.7"
  - "3.4"

-# command to install dependencies
+# install dependencies
 install:
  - "pip install --upgrade setuptools"
  - "pip install -r requirements.txt"
  - "export PYTHONPATH=`pwd`"
  - "python setup.py build_ext --inplace"
-# command to run tests
+# run tests
 script:
  - py.test tests/
--- a/README.md
+++ b/README.md
@ -3,20 +3,18 @@ spaCy

 http://honnibal.github.io/spaCy

-Fast, state-of-the-art natural language processing pipeline. Commercial licenses available, or use under AGPL.
+A pipeline for fast, state-of-the-art natural language processing. Commercial licenses available, otherwise under AGPL.

 Version 0.80 released
 ---------------------

 2015-04-13

-* Preliminary named entity recognition support. Accuracy is currently
-  substantially behind the current state-of-the-art. I'm working on
-  improvements. 
+* Preliminary support for named-entity recognition. Its accuracy is substantially behind the state-of-the-art. I'm working on improvements. 

 * Better sentence boundary detection, drawn from the syntactic structure.

-* Lots of bug fixes
+* Lots of bug fixes.


 Supports:
--- a/docs/source/guide/overview.rst
+++ b/docs/source/guide/overview.rst
@ -28,14 +28,14 @@ can access an excellent set of pre-computed orthographic and distributional feat
    >>> are.check_flag(en.CAN_NOUN)
    False

-spaCy makes it easy to write very efficient NLP applications, because your feature
+spaCy makes it easy to write efficient NLP applications, because your feature
 functions have to do almost no work: almost every lexical property you'll want
 is pre-computed for you.  See the tutorial for an example POS tagger.

 Benchmark
 ---------

-The tokenizer itself is also very efficient:
+The tokenizer itself is also efficient:

 +--------+-------+--------------+--------------+
 | System | Time	 | Words/second | Speed Factor |
@ -56,7 +56,7 @@ Pros:

 - All tokens come with indices into the original string
 - Full unicode support
- Extensible to other languages
+- Extendable to other languages
 - Batch operations computed efficiently in Cython
 - Cython API
 - numpy interoperability
--- a/docs/source/howworks.rst
+++ b/docs/source/howworks.rst
@ -135,7 +135,7 @@ lexical types.

 In a sample of text, vocabulary size grows exponentially slower than word
 count.  So any computations we can perform over the vocabulary and apply to the
-word count are very efficient.
+word count are efficient.


 Part-of-speech Tagger
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -37,7 +37,7 @@ tokenizer is suitable for production use.

 I used to think that the NLP community just needed to do more to communicate
 its findings to software engineers.  So I wrote two blog posts, explaining
-`how to write a part-of-speech tagger`_ and `parser`_.  Both were very well received,
+`how to write a part-of-speech tagger`_ and `parser`_.  Both were well received,
 and there's been a bit of interest in `my research software`_ --- even though
 it's entirely undocumented, and mostly unuseable to anyone but me.

@ -202,7 +202,7 @@ this:

 We wanted to refine the logic so that only adverbs modifying evocative verbs
 of communication, like "pleaded", were highlighted.  We've now built a vector that
-represents that type of word, so now we can highlight adverbs based on very
+represents that type of word, so now we can highlight adverbs based on
 subtle logic, honing in on adverbs that seem the most stylistically
 problematic, given our starting assumptions:

--- a/docs/source/license.rst
+++ b/docs/source/license.rst
@ -35,7 +35,7 @@ And if you're ever in acquisition or IPO talks, the story is simple.
 spaCy can also be used as free open-source software, under the Aferro GPL
 license.  If you use it this way, you must comply with the AGPL license terms.
 When you distribute your project, or offer it as a network service, you must
-distribute the source-code, and grant users an AGPL license to it.
+distribute the source-code and grant users an AGPL license to it.


 .. I left academia in June 2014, just when I should have been submitting my first
--- a/docs/source/updates.rst
+++ b/docs/source/updates.rst
@ -7,8 +7,8 @@ Updates
 Five days ago I presented the alpha release of spaCy, a natural language
 processing library that brings state-of-the-art technology to small companies.

-spaCy has been very well received, and there are now a lot of eyes on the project.
-Naturally, lots of issues have surfaced.  I'm very grateful to those who've reported
+spaCy has been well received, and there are now a lot of eyes on the project.
+Naturally, lots of issues have surfaced.  I'm grateful to those who've reported
 them.  I've worked hard to address them as quickly as I could.

 Bug Fixes
@ -26,7 +26,7 @@ Bug Fixes
    just store an index into that list, instead of a hash.

 * Parse tree navigation API was rough, and buggy.
-    The parse-tree navigation API was the last thing I added before v0.3. I've
+    The parse-tree navigation API was the last thing I added before v0.3.  I've
    now replaced it with something better.  The previous API design was flawed,
    and the implementation was buggy --- Token.child() and Token.head were
    sometimes inconsistent.
@ -108,9 +108,9 @@ input to be segmented into sentences, but with no sentence segmenter.  This
 caused a drop in parse accuracy of 4%!

 Over the last five days, I've worked hard to correct this.  I implemented the
-modifications to the parsing algorithm I had planned, from Dongdong Zhang et al
+modifications to the parsing algorithm I had planned, from Dongdong Zhang et al.
 (2013), and trained and evaluated the parser on raw text, using the version of
-the WSJ distributed by Read et al (2012), and used in Dridan and Oepen's
+the WSJ distributed by Read et al. (2012), and used in Dridan and Oepen's
 experiments.

 I'm pleased to say that on the WSJ at least, spaCy 0.4 performs almost exactly