Improve docs on matcher attributes [ci skip] (closes #4063)

2019-08-06 12:13:42 +02:00 · 2019-08-06 12:13:42 +02:00 · 223bde5cf6
parent 2bfae0b167
commit 223bde5cf6
1 changed files with 32 additions and 2 deletions
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -153,8 +153,8 @@ processes.
 #### Available token attributes {#adding-patterns-attributes}
-The available token pattern keys are uppercase versions of the
+The available token pattern keys correspond to a number of
-[`Token` attributes](/api/token#attributes). The most relevant ones for
+[`Token` attributes](/api/token#attributes). The supported attributes for
 rule-based matching are:
 | Attribute                              | Type    |  Description                                                                                           |
@ -171,6 +171,36 @@ rule-based matching are:
 | `ENT_TYPE`                             | unicode | The token's entity label.                                                                              |
 | `_` <Tag variant="new">2.1</Tag>       | dict    | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). |
 <Accordion title="Does it matter if the attribute names are uppercase or lowercase?">
 No, it shouldn't. spaCy will normalize the names internally and
 `{"LOWER": "text"}` and `{"lower": "text"}` will both produce the same result.
 Using the uppercase version is mostly a convention to make it clear that the
 attributes are "special" and don't exactly map to the token attributes like
 `Token.lower` and `Token.lower_`.
 </Accordion>
 <Accordion title="Why are not all token attributes supported?">
 spaCy can't provide access to all of the attributes because the `Matcher` loops
 over the Cython data, not the Python objects. Inside the matcher, we're dealing
 with a [`TokenC` struct](/api/cython-structs#tokenc) – we don't have an instance
 of [`Token`](/api/token). This means that all of the attributes that refer to
 computed properties can't be accessed.
 The uppercase attribute names like `LOWER` or `IS_PUNCT` refer to symbols from
 the
 [`spacy.attrs`](https://github.com/explosion/spaCy/tree/master/spacy/attrs.pyx)
 enum table. They're passed into a function that essentially is a big case/switch
 statement, to figure out which struct field to return. The same attribute
 identifiers are used in [`Doc.to_array`](/api/doc#to_array), and a few other
 places in the code where you need to describe fields like this.
 </Accordion>
 ---
 <Infobox title="Tip: Try the interactive matcher explorer">
 [![Matcher demo](../images/matcher-demo.jpg)](https://explosion.ai/demos/matcher)