Auto-format

This commit is contained in:
Ines Montani 2019-08-06 12:13:31 +02:00
parent 15be09ceb0
commit 2bfae0b167
1 changed files with 15 additions and 14 deletions

View File

@ -157,18 +157,18 @@ The available token pattern keys are uppercase versions of the
[`Token` attributes](/api/token#attributes). The most relevant ones for
rule-based matching are:
| Attribute | Type |  Description |
| -------------------------------------- | ------- | ------------------------------------------------------------------------------------------------ |
| `ORTH` | unicode | The exact verbatim text of a token. |
| `TEXT` <Tag variant="new">2.1</Tag> | unicode | The exact verbatim text of a token. |
| `LOWER` | unicode | The lowercase form of the token text. |
|  `LENGTH` | int | The length of the token text. |
|  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT` | bool | Token text consists of alphanumeric characters, ASCII characters, digits. |
|  `IS_LOWER`, `IS_UPPER`, `IS_TITLE` | bool | Token text is in lowercase, uppercase, titlecase. |
|  `IS_PUNCT`, `IS_SPACE`, `IS_STOP` | bool | Token is punctuation, whitespace, stop word. |
|  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL` | bool | Token text resembles a number, URL, email. |
|  `POS`, `TAG`, `DEP`, `LEMMA`, `SHAPE` | unicode | The token's simple and extended part-of-speech tag, dependency label, lemma, shape. |
| `ENT_TYPE` | unicode | The token's entity label. |
| Attribute | Type |  Description |
| -------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------ |
| `ORTH` | unicode | The exact verbatim text of a token. |
| `TEXT` <Tag variant="new">2.1</Tag> | unicode | The exact verbatim text of a token. |
| `LOWER` | unicode | The lowercase form of the token text. |
|  `LENGTH` | int | The length of the token text. |
|  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT` | bool | Token text consists of alphanumeric characters, ASCII characters, digits. |
|  `IS_LOWER`, `IS_UPPER`, `IS_TITLE` | bool | Token text is in lowercase, uppercase, titlecase. |
|  `IS_PUNCT`, `IS_SPACE`, `IS_STOP` | bool | Token is punctuation, whitespace, stop word. |
|  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL` | bool | Token text resembles a number, URL, email. |
|  `POS`, `TAG`, `DEP`, `LEMMA`, `SHAPE` | unicode | The token's simple and extended part-of-speech tag, dependency label, lemma, shape. |
| `ENT_TYPE` | unicode | The token's entity label. |
| `_` <Tag variant="new">2.1</Tag> | dict | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). |
<Infobox title="Tip: Try the interactive matcher explorer">
@ -1140,8 +1140,9 @@ To apply this logic automatically when we process a text, we can add it to the
above logic also expects that entities are merged into single tokens. spaCy
ships with a handy built-in `merge_entities` that takes care of that. Instead of
just printing the result, you could also write it to
[custom attributes](/usage/processing-pipelines#custom-components-attributes) on the
entity `Span` for example `._.orgs` or `._.prev_orgs` and `._.current_orgs`.
[custom attributes](/usage/processing-pipelines#custom-components-attributes) on
the entity `Span` for example `._.orgs` or `._.prev_orgs` and
`._.current_orgs`.
> #### Merging entities
>