Adding a note on retrieving the string rep of the match_id (#4904)

Stolen from here: https://stackoverflow.com/questions/47638877/using-phrasematcher-in-spacy-to-find-multiple-match-types
This commit is contained in:
Martin A. Kayser 2020-02-03 03:58:59 -08:00 committed by GitHub
parent 6ff947e1f9
commit 02a44c5be2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 11 additions and 0 deletions

View File

@ -70,6 +70,17 @@ Find all token sequences matching the supplied patterns on the `Doc`.
| `doc` | `Doc` | The document to match over. | | `doc` | `Doc` | The document to match over. |
| **RETURNS** | list | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end]`. The `match_id` is the ID of the added match pattern. | | **RETURNS** | list | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end]`. The `match_id` is the ID of the added match pattern. |
<Infobox title="Note on retrieving the string representation of the match_id" variant="warning">
Because spaCy stores all strings as integers, the match_id you get back will be an integer, too but you can always get the string representation by looking it up in the vocabulary's StringStore, i.e. nlp.vocab.strings:
```
match_id_string = nlp.vocab.strings[match_id]
```
</Infobox>
## PhraseMatcher.pipe {#pipe tag="method"} ## PhraseMatcher.pipe {#pipe tag="method"}
Match a stream of documents, yielding them in turn. Match a stream of documents, yielding them in turn.