mirror of https://github.com/explosion/spaCy.git
* Add features table
This commit is contained in:
parent
1590788dd4
commit
2e14f09d2f
|
@ -106,4 +106,49 @@ Create a bag-of-words representation:
|
|||
.. py:attribute:: head: Token
|
||||
|
||||
|
||||
Features
|
||||
--------
|
||||
|
||||
+--------------------------------------------------------------------------+
|
||||
| Boolean Features |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| IS_ALPHA | :py:meth:`str.isalpha` |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| IS_DIGIT | :py:meth:`str.isdigit` |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| IS_LOWER | :py:meth:`str.islower` |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| IS_SPACE | :py:meth:`str.isspace` |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| IS_TITLE | :py:meth:`str.istitle` |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| IS_UPPER | :py:meth:`str.isupper` |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| IS_ASCII | all(ord(c) < 128 for c in string) |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| IS_PUNCT | all(unicodedata.category(c).startswith('P') for c in string) |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| LIKE_URL | Using various heuristics, does the string resemble a URL? |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| LIKE_NUM | "Two", "10", "1,000", "10.54", "1/2" etc all match |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| ID of string features |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| SIC | The original string, unmodified. |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| NORM1 | The string after level 1 normalization: case, spelling |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| NORM2 | The string after level 2 normalization |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| SHAPE | Word shape, e.g. 10 --> dd, Garden --> Xxxx, Hi!5 --> Xx!d |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| PREFIX | A short slice from the start of the string. |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| SUFFIX | A short slice from the end of the string. |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| CLUSTER | Brown cluster ID of the word |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| LEMMA | The word's lemma, i.e. morphological suffixes removed |
|
||||
+----------+---------------------------------------------------------------+
|
||||
| TAG | The word's part-of-speech tag |
|
||||
+----------+---------------------------------------------------------------+
|
||||
|
|
Loading…
Reference in New Issue