* Edits to quickstart

2015-01-24 17:47:51 +11:00 · 2015-01-24 17:47:51 +11:00 · 32f58b19d1
parent a97bed9359
commit 32f58b19d1
1 changed files with 11 additions and 18 deletions
--- a/docs/source/quickstart.rst
+++ b/docs/source/quickstart.rst
@ -12,9 +12,8 @@ Install
    $ pip install spacy
    $ python -m spacy.en.download
-The download command fetches and installs the parser model and word representations,
+The download command fetches and installs about 200mb of data, which it installs
-which are too big to host on PyPi (about 100mb each).  The data is installed within
+within the spacy.en package directory.
 the spacy.en package directory.
 Usage
 -----
@ -37,18 +36,14 @@ e.g. `pizza.orth_` and `pizza.orth` provide the integer ID and the string of
 the original orthographic form of the word, with no string normalizations
 applied.
-  .. note::
+  .. note::  en.English.__call__ is stateful --- it has an important **side-effect**.
-  en.English.__call__ is stateful --- it has an important **side-effect**:
+    When it processes a previously unseen word, it increments the ID counter,
-  spaCy maps strings to sequential integers, so when it processes a new
+    assigns the ID to the string, and writes the mapping in
-  word, the mapping table is updated.
+    :py:data:`English.vocab.strings` (instance of
-
+    :py:class:`strings.StringStore`).
-  Future releases will feature a way to reconcile :py:class:`strings.StringStore`
+    Future releases will feature a way to reconcile  mappings, but for now, you
-  mappings, but for now, you should only work with one instance of the pipeline
+    should only work with one instance of the pipeline at a time.
  at a time.
  This issue only affects rare words.  spaCy's pre-compiled lexicon has 260,000
  words; the string IDs for these words will always be consistent.
 (Most of the) API at a glance
@ -76,7 +71,7 @@ applied.
 **Get dict or numpy array:**
-    .. py:method:: tokens.Tokens.to_array(self, attr_ids: List[int]) --> numpy.ndarray[ndim=2, dtype=int32]
+    .. py:method:: tokens.Tokens.to_array(self, attr_ids: List[int]) --> ndarray[ndim=2, dtype=long]
    .. py:method:: tokens.Tokens.count_by(self, attr_id: int) --> Dict[int, int]
@ -93,7 +88,7 @@ applied.
  .. py:attribute:: lexeme.Lexeme.repvec
-**Navigate dependency parse**
+**Navigate to tree- or string-neighbor tokens**
  .. py:method:: nbor(self, i=1) --> Token
@ -115,8 +110,6 @@ applied.
    Length, in unicode code-points. Equal to len(self.orth_).
    self.string[self.length:] gets whitespace.
  .. py:attribute:: idx: int
    Starting offset of word in the original string.