* Add features table

2015-01-16 19:04:03 +11:00 · 2015-01-16 19:04:03 +11:00 · 2e14f09d2f
parent 1590788dd4
commit 2e14f09d2f
1 changed files with 45 additions and 0 deletions
--- a/docs/source/quickstart.rst
+++ b/docs/source/quickstart.rst
@ -106,4 +106,49 @@ Create a bag-of-words representation:
  .. py:attribute:: head: Token


+Features
+--------

+--------------------------------------------------------------------------+
+| Boolean Features                                                         |
+----------+---------------------------------------------------------------+
+| IS_ALPHA | :py:meth:`str.isalpha`                                        |
+----------+---------------------------------------------------------------+
+| IS_DIGIT | :py:meth:`str.isdigit`                                        |
+----------+---------------------------------------------------------------+
+| IS_LOWER | :py:meth:`str.islower`                                        |
+----------+---------------------------------------------------------------+
+| IS_SPACE | :py:meth:`str.isspace`                                        |
+----------+---------------------------------------------------------------+
+| IS_TITLE | :py:meth:`str.istitle`                                        |
+----------+---------------------------------------------------------------+
+| IS_UPPER | :py:meth:`str.isupper`                                        |
+----------+---------------------------------------------------------------+
+| IS_ASCII | all(ord(c) < 128 for c in string)                             |
+----------+---------------------------------------------------------------+
+| IS_PUNCT | all(unicodedata.category(c).startswith('P') for c in string)  |
+----------+---------------------------------------------------------------+
+| LIKE_URL | Using various heuristics, does the string resemble a URL?     |
+----------+---------------------------------------------------------------+
+| LIKE_NUM | "Two", "10", "1,000", "10.54", "1/2" etc all match            |
+----------+---------------------------------------------------------------+
+| ID of string features                                                    |
+----------+---------------------------------------------------------------+
+| SIC      | The original string, unmodified.                              |
+----------+---------------------------------------------------------------+
+| NORM1    | The string after level 1 normalization: case, spelling        |
+----------+---------------------------------------------------------------+
+| NORM2    | The string after level 2 normalization                        |
+----------+---------------------------------------------------------------+
+| SHAPE    | Word shape, e.g. 10 --> dd, Garden --> Xxxx, Hi!5 --> Xx!d    |
+----------+---------------------------------------------------------------+
+| PREFIX   | A short slice from the start of the string.                   |
+----------+---------------------------------------------------------------+
+| SUFFIX   | A short slice from the end of the string.                     |
+----------+---------------------------------------------------------------+
+| CLUSTER  | Brown cluster ID of the word                                  |
+----------+---------------------------------------------------------------+
+| LEMMA    | The word's lemma, i.e. morphological suffixes removed         |
+----------+---------------------------------------------------------------+
+| TAG      | The word's part-of-speech tag                                 |
+----------+---------------------------------------------------------------+