From 32f58b19d149b2d97fe3ad5fbbee233ced293798 Mon Sep 17 00:00:00 2001
From: Matthew Honnibal <honnibal@gmail.com>
Date: Sat, 24 Jan 2015 17:47:51 +1100
Subject: [PATCH] * Edits to quickstart

---
 docs/source/quickstart.rst | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
index 4e750b6f5..a8c2fa0f3 100644
--- a/docs/source/quickstart.rst
+++ b/docs/source/quickstart.rst
@@ -12,9 +12,8 @@ Install
     $ pip install spacy
     $ python -m spacy.en.download
 
-The download command fetches and installs the parser model and word representations,
-which are too big to host on PyPi (about 100mb each).  The data is installed within
-the spacy.en package directory.
+The download command fetches and installs about 200mb of data, which it installs
+within the spacy.en package directory.
 
 Usage
 -----
@@ -37,18 +36,14 @@ e.g. `pizza.orth_` and `pizza.orth` provide the integer ID and the string of
 the original orthographic form of the word, with no string normalizations
 applied.
 
-  .. note::
-  
-  en.English.__call__ is stateful --- it has an important **side-effect**:
-  spaCy maps strings to sequential integers, so when it processes a new
-  word, the mapping table is updated.
+  .. note::  en.English.__call__ is stateful --- it has an important **side-effect**.
 
-  Future releases will feature a way to reconcile :py:class:`strings.StringStore`
-  mappings, but for now, you should only work with one instance of the pipeline
-  at a time.
-
-  This issue only affects rare words.  spaCy's pre-compiled lexicon has 260,000
-  words; the string IDs for these words will always be consistent.
+    When it processes a previously unseen word, it increments the ID counter,
+    assigns the ID to the string, and writes the mapping in
+    :py:data:`English.vocab.strings` (instance of
+    :py:class:`strings.StringStore`).
+    Future releases will feature a way to reconcile  mappings, but for now, you
+    should only work with one instance of the pipeline at a time.
 
 
 (Most of the) API at a glance
@@ -76,7 +71,7 @@ applied.
 
 **Get dict or numpy array:**
 
-    .. py:method:: tokens.Tokens.to_array(self, attr_ids: List[int]) --> numpy.ndarray[ndim=2, dtype=int32]
+    .. py:method:: tokens.Tokens.to_array(self, attr_ids: List[int]) --> ndarray[ndim=2, dtype=long]
 
     .. py:method:: tokens.Tokens.count_by(self, attr_id: int) --> Dict[int, int]
 
@@ -93,7 +88,7 @@ applied.
   .. py:attribute:: lexeme.Lexeme.repvec
 
 
-**Navigate dependency parse**
+**Navigate to tree- or string-neighbor tokens**
 
   .. py:method:: nbor(self, i=1) --> Token
 
@@ -115,8 +110,6 @@ applied.
 
     Length, in unicode code-points. Equal to len(self.orth_).
     
-    self.string[self.length:] gets whitespace.
-
   .. py:attribute:: idx: int
 
     Starting offset of word in the original string.