spaCy/website/docs/api/phrasematcher.md

---
title: PhraseMatcher
teaser: Match sequences of tokens, based on documents
tag: class
source: spacy/matcher/phrasematcher.pyx
new: 2
---

The `PhraseMatcher` lets you efficiently match large terminology lists. While
the [`Matcher`](/api/matcher) lets you match sequences based on lists of token
descriptions, the `PhraseMatcher` accepts match patterns in the form of `Doc`
objects. See the [usage guide](/usage/rule-based-matching#phrasematcher) for
examples.

## PhraseMatcher.\_\_init\_\_ {#init tag="method"}

Create the rule-based `PhraseMatcher`. Setting a different `attr` to match on
will change the token attributes that will be compared to determine a match. By
default, the incoming `Doc` is checked for sequences of tokens with the same
`ORTH` value, i.e. the verbatim token text. Matching on the attribute `LOWER`
will result in case-insensitive matching, since only the lowercase token texts
are compared. In theory, it's also possible to match on sequences of the same
part-of-speech tags or dependency labels.

If `validate=True` is set, additional validation is performed when pattern are
added. At the moment, it will check whether a `Doc` has attributes assigned that
aren't necessary to produce the matches (for example, part-of-speech tags if the
`PhraseMatcher` matches on the token text). Since this can often lead to
significantly worse performance when creating the pattern, a `UserWarning` will
be shown.

> #### Example
>
> ```python
> from spacy.matcher import PhraseMatcher
> matcher = PhraseMatcher(nlp.vocab)
> ```

| Name                                    | Description                                                                                            |
| --------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| `vocab`                                 | The vocabulary object, which must be shared with the documents the matcher will operate on. ~~Vocab~~  |
| `attr` <Tag variant="new">2.1</Tag>     | The token attribute to match on. Defaults to `ORTH`, i.e. the verbatim token text. ~~Union[int, str]~~ |
| `validate` <Tag variant="new">2.1</Tag> | Validate patterns added to the matcher. ~~bool~~                                                       |

## PhraseMatcher.\_\_call\_\_ {#call tag="method"}

Find all token sequences matching the supplied patterns on the `Doc`.

> #### Example
>
> ```python
> from spacy.matcher import PhraseMatcher
>
> matcher = PhraseMatcher(nlp.vocab)
> matcher.add("OBAMA", [nlp("Barack Obama")])
> doc = nlp("Barack Obama lifts America one last time in emotional farewell")
> matches = matcher(doc)
> ```

| Name        | Description                         |
| ----------- | ----------------------------------- |
| `doc`       | The document to match over. ~~Doc~~ |
| **RETURNS** | list                                | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end]`. The `match_id` is the ID of the added match pattern. ~~List[Tuple[int, int, int]]~~ |

<Infobox title="Note on retrieving the string representation of the match_id" variant="warning">

Because spaCy stores all strings as integers, the `match_id` you get back will
be an integer, too – but you can always get the string representation by looking
it up in the vocabulary's `StringStore`, i.e. `nlp.vocab.strings`:

```python
match_id_string = nlp.vocab.strings[match_id]
```

</Infobox>

## PhraseMatcher.pipe {#pipe tag="method"}

Match a stream of documents, yielding them in turn.

> #### Example
>
> ```python
>   from spacy.matcher import PhraseMatcher
>   matcher = PhraseMatcher(nlp.vocab)
>   for doc in matcher.pipe(docs, batch_size=50):
>       pass
> ```

| Name                                          | Description                                                                                                                                                                                                                         |
| --------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs`                                        | A stream of documents. ~~Iterable[Doc]~~                                                                                                                                                                                            |
| `batch_size`                                  | The number of documents to accumulate into a working set. ~~int~~                                                                                                                                                                   |
| `return_matches` <Tag variant="new">2.1</Tag> | Yield the match lists along with the docs, making results `(doc, matches)` tuples. ~~bool~~                                                                                                                                         |
| `as_tuples`                                   | Interpret the input stream as `(doc, context)` tuples, and yield `(result, context)` tuples out. If both `return_matches` and `as_tuples` are `True`, the output will be a sequence of `((doc, matches), context)` tuples. ~~bool~~ |
| **YIELDS**                                    | Documents and optional matches or context in order. ~~Union[Doc, Tuple[Doc, Any], Tuple[Tuple[Doc, Any], Any]]~~                                                                                                                    |

## PhraseMatcher.\_\_len\_\_ {#len tag="method"}

Get the number of rules added to the matcher. Note that this only returns the
number of rules (identical with the number of IDs), not the number of individual
patterns.

> #### Example
>
> ```python
>   matcher = PhraseMatcher(nlp.vocab)
>   assert len(matcher) == 0
>   matcher.add("OBAMA", [nlp("Barack Obama")])
>   assert len(matcher) == 1
> ```

| Name        | Description                  |
| ----------- | ---------------------------- |
| **RETURNS** | The number of rules. ~~int~~ |

## PhraseMatcher.\_\_contains\_\_ {#contains tag="method"}

Check whether the matcher contains rules for a match ID.

> #### Example
>
> ```python
>   matcher = PhraseMatcher(nlp.vocab)
>   assert "OBAMA" not in matcher
>   matcher.add("OBAMA", [nlp("Barack Obama")])
>   assert "OBAMA" in matcher
> ```

| Name        | Description                                                    |
| ----------- | -------------------------------------------------------------- |
| `key`       | The match ID. ~~str~~                                          |
| **RETURNS** | Whether the matcher contains rules for this match ID. ~~bool~~ |

## PhraseMatcher.add {#add tag="method"}

Add a rule to the matcher, consisting of an ID key, one or more patterns, and a
callback function to act on the matches. The callback function will receive the
arguments `matcher`, `doc`, `i` and `matches`. If a pattern already exists for
the given ID, the patterns will be extended. An `on_match` callback will be
overwritten.

> #### Example
>
> ```python
>   def on_match(matcher, doc, id, matches):
>       print('Matched!', matches)
>
>   matcher = PhraseMatcher(nlp.vocab)
>   matcher.add("OBAMA", [nlp("Barack Obama")], on_match=on_match)
>   matcher.add("HEALTH", [nlp("health care reform"), nlp("healthcare reform")], on_match=on_match)
>   doc = nlp("Barack Obama urges Congress to find courage to defend his healthcare reforms")
>   matches = matcher(doc)
> ```

<Infobox title="Changed in v3.0" variant="warning">

As of spaCy v3.0, `PhraseMatcher.add` takes a list of patterns as the second
argument (instead of a variable number of arguments). The `on_match` callback
becomes an optional keyword argument.

```diff
patterns = [nlp("health care reform"), nlp("healthcare reform")]
- matcher.add("HEALTH", on_match, *patterns)
+ matcher.add("HEALTH", patterns, on_match=on_match)
```

</Infobox>

| Name           | Description                                                                                                                                                |
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `match_id`     | str                                                                                                                                                        | An ID for the thing you're matching. ~~str~~ |
| `docs`         | `Doc` objects of the phrases to match. ~~List[Doc]~~                                                                                                       |
| _keyword-only_ |                                                                                                                                                            |  |
| `on_match`     | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |

## PhraseMatcher.remove {#remove tag="method" new="2.2"}

Remove a rule from the matcher by match ID. A `KeyError` is raised if the key
does not exist.

> #### Example
>
> ```python
> matcher = PhraseMatcher(nlp.vocab)
> matcher.add("OBAMA", [nlp("Barack Obama")])
> assert "OBAMA" in matcher
> matcher.remove("OBAMA")
> assert "OBAMA" not in matcher
> ```

| Name  | Description                       |
| ----- | --------------------------------- |
| `key` | The ID of the match rule. ~~str~~ |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								---
 								title: PhraseMatcher
 								teaser: Match sequences of tokens, based on documents
 								tag: class
 								source: spacy/matcher/phrasematcher.pyx
 								new: 2
 								---
 								The `PhraseMatcher` lets you efficiently match large terminology lists. While
 								the [`Matcher`](/api/matcher) lets you match sequences based on lists of token
 								descriptions, the `PhraseMatcher` accepts match patterns in the form of `Doc`
-												Update docs and fix consistency

											
										
										
											2020-08-09 20:31:52 +00:00
+								objects. See the [usage guide](/usage/rule-based-matching#phrasematcher) for
 								examples.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								## PhraseMatcher.\_\_init\_\_ {#init tag="method"}
 								Create the rule-based `PhraseMatcher`. Setting a different `attr` to match on
 								will change the token attributes that will be compared to determine a match. By
 								default, the incoming `Doc` is checked for sequences of tokens with the same
 								`ORTH` value, i.e. the verbatim token text. Matching on the attribute `LOWER`
 								will result in case-insensitive matching, since only the lowercase token texts
 								are compared. In theory, it's also possible to match on sequences of the same
 								part-of-speech tags or dependency labels.
 								If `validate=True` is set, additional validation is performed when pattern are
 								added. At the moment, it will check whether a `Doc` has attributes assigned that
 								aren't necessary to produce the matches (for example, part-of-speech tags if the
 								`PhraseMatcher` matches on the token text). Since this can often lead to
 								significantly worse performance when creating the pattern, a `UserWarning` will
 								be shown.
 								> #### Example
 								>
 								> ```python
 								> from spacy.matcher import PhraseMatcher
 								> matcher = PhraseMatcher(nlp.vocab)
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 14:45:24 +00:00
+								| Name                                    | Description                                                                                            |
 								| --------------------------------------- | ------------------------------------------------------------------------------------------------------ |
 								| `vocab`                                 | The vocabulary object, which must be shared with the documents the matcher will operate on. ~~Vocab~~  |
 								| `attr` <Tag variant="new">2.1</Tag>     | The token attribute to match on. Defaults to `ORTH`, i.e. the verbatim token text. ~~Union[int, str]~~ |
 								| `validate` <Tag variant="new">2.1</Tag> | Validate patterns added to the matcher. ~~bool~~                                                       |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								## PhraseMatcher.\_\_call\_\_ {#call tag="method"}
 								Find all token sequences matching the supplied patterns on the `Doc`.
 								> #### Example
 								>
 								> ```python
 								> from spacy.matcher import PhraseMatcher
 								>
 								> matcher = PhraseMatcher(nlp.vocab)
-												Update matcher usage examples [ci skip]

											
										
										
											2020-07-02 13:39:45 +00:00
+								> matcher.add("OBAMA", [nlp("Barack Obama")])
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								> doc = nlp("Barack Obama lifts America one last time in emotional farewell")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								> matches = matcher(doc)
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 14:45:24 +00:00
+								| Name        | Description                         |
 								| ----------- | ----------------------------------- |
 								| `doc`       | The document to match over. ~~Doc~~ |
 								| **RETURNS** | list                                | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end]`. The `match_id` is the ID of the added match pattern. ~~List[Tuple[int, int, int]]~~ |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
-												Adding a note on retrieving the string rep of the match_id (#4904)

Stolen from here: https://stackoverflow.com/questions/47638877/using-phrasematcher-in-spacy-to-find-multiple-match-types
											
										
										
											2020-02-03 11:58:59 +00:00
+								<Infobox title="Note on retrieving the string representation of the match_id" variant="warning">
-												Adjust formatting [ci skip]

											
										
										
											2020-02-03 12:00:02 +00:00
+								Because spaCy stores all strings as integers, the `match_id` you get back will
 								be an integer, too – but you can always get the string representation by looking
 								it up in the vocabulary's `StringStore`, i.e. `nlp.vocab.strings`:
-												Adding a note on retrieving the string rep of the match_id (#4904)

Stolen from here: https://stackoverflow.com/questions/47638877/using-phrasematcher-in-spacy-to-find-multiple-match-types
											
										
										
											2020-02-03 11:58:59 +00:00
-												Adjust formatting [ci skip]

											
										
										
											2020-02-03 12:00:02 +00:00
+								```python
-												Adding a note on retrieving the string rep of the match_id (#4904)

Stolen from here: https://stackoverflow.com/questions/47638877/using-phrasematcher-in-spacy-to-find-multiple-match-types
											
										
										
											2020-02-03 11:58:59 +00:00
+								match_id_string = nlp.vocab.strings[match_id]
 								```
 								</Infobox>
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								## PhraseMatcher.pipe {#pipe tag="method"}
 								Match a stream of documents, yielding them in turn.
 								> #### Example
 								>
 								> ```python
 								>   from spacy.matcher import PhraseMatcher
 								>   matcher = PhraseMatcher(nlp.vocab)
-												Fix in docs: pipe(docs) instead of pipe(texts) (#5680)

Very minor fix in docs, specifically in this part:

```
 matcher = PhraseMatcher(nlp.vocab)
>   for doc in matcher.pipe(texts, batch_size=50):
>       pass
```

`texts` suggests the input is an iterable of strings. I replaced it for `docs`.
											
										
										
											2020-06-30 18:00:50 +00:00
+								>   for doc in matcher.pipe(docs, batch_size=50):
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								>       pass
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 14:45:24 +00:00
+								| Name                                          | Description                                                                                                                                                                                                                         |
 								| --------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `docs`                                        | A stream of documents. ~~Iterable[Doc]~~                                                                                                                                                                                            |
 								| `batch_size`                                  | The number of documents to accumulate into a working set. ~~int~~                                                                                                                                                                   |
 								| `return_matches` <Tag variant="new">2.1</Tag> | Yield the match lists along with the docs, making results `(doc, matches)` tuples. ~~bool~~                                                                                                                                         |
 								| `as_tuples`                                   | Interpret the input stream as `(doc, context)` tuples, and yield `(result, context)` tuples out. If both `return_matches` and `as_tuples` are `True`, the output will be a sequence of `((doc, matches), context)` tuples. ~~bool~~ |
 								| **YIELDS**                                    | Documents and optional matches or context in order. ~~Union[Doc, Tuple[Doc, Any], Tuple[Tuple[Doc, Any], Any]]~~                                                                                                                    |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								## PhraseMatcher.\_\_len\_\_ {#len tag="method"}
 								Get the number of rules added to the matcher. Note that this only returns the
 								number of rules (identical with the number of IDs), not the number of individual
 								patterns.
 								> #### Example
 								>
 								> ```python
 								>   matcher = PhraseMatcher(nlp.vocab)
 								>   assert len(matcher) == 0
-												Update matcher usage examples [ci skip]

											
										
										
											2020-07-02 13:39:45 +00:00
+								>   matcher.add("OBAMA", [nlp("Barack Obama")])
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								>   assert len(matcher) == 1
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 14:45:24 +00:00
+								| Name        | Description                  |
 								| ----------- | ---------------------------- |
 								| **RETURNS** | The number of rules. ~~int~~ |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								## PhraseMatcher.\_\_contains\_\_ {#contains tag="method"}
 								Check whether the matcher contains rules for a match ID.
 								> #### Example
 								>
 								> ```python
 								>   matcher = PhraseMatcher(nlp.vocab)
 								>   assert "OBAMA" not in matcher
-												Update matcher usage examples [ci skip]

											
										
										
											2020-07-02 13:39:45 +00:00
+								>   matcher.add("OBAMA", [nlp("Barack Obama")])
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								>   assert "OBAMA" in matcher
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 14:45:24 +00:00
+								| Name        | Description                                                    |
 								| ----------- | -------------------------------------------------------------- |
 								| `key`       | The match ID. ~~str~~                                          |
 								| **RETURNS** | Whether the matcher contains rules for this match ID. ~~bool~~ |
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
 								## PhraseMatcher.add {#add tag="method"}
 								Add a rule to the matcher, consisting of an ID key, one or more patterns, and a
 								callback function to act on the matches. The callback function will receive the
 								arguments `matcher`, `doc`, `i` and `matches`. If a pattern already exists for
 								the given ID, the patterns will be extended. An `on_match` callback will be
 								overwritten.
 								> #### Example
 								>
 								> ```python
 								>   def on_match(matcher, doc, id, matches):
 								>       print('Matched!', matches)
 								>
 								>   matcher = PhraseMatcher(nlp.vocab)
-												Update matcher usage examples [ci skip]

											
										
										
											2020-07-02 13:39:45 +00:00
+								>   matcher.add("OBAMA", [nlp("Barack Obama")], on_match=on_match)
 								>   matcher.add("HEALTH", [nlp("health care reform"), nlp("healthcare reform")], on_match=on_match)
-												Remove u-strings and fix formatting [ci skip]

											
										
										
											2019-09-12 14:11:15 +00:00
+								>   doc = nlp("Barack Obama urges Congress to find courage to defend his healthcare reforms")
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 18:31:19 +00:00
+								>   matches = matcher(doc)
 								> ```
-												Update matcher usage examples [ci skip]

											
										
										
											2020-07-02 13:39:45 +00:00
+								<Infobox title="Changed in v3.0" variant="warning">
-												Implement new API for {Phrase}Matcher.add (backwards-compatible) (#4522)

* Implement new API for {Phrase}Matcher.add (backwards-compatible)

* Update docs

* Also update DependencyMatcher.add

* Update internals

* Rewrite tests to use new API

* Add basic check for common mistake

Raise error with suggestion if user likely passed in a pattern instead of a list of patterns

* Fix typo [ci skip]

											
										
										
											2019-10-25 20:21:08 +00:00
-												Update matcher usage examples [ci skip]

											
										
										
											2020-07-02 13:39:45 +00:00
+								As of spaCy v3.0, `PhraseMatcher.add` takes a list of patterns as the second
 								argument (instead of a variable number of arguments). The `on_match` callback
-												Implement new API for {Phrase}Matcher.add (backwards-compatible) (#4522)

* Implement new API for {Phrase}Matcher.add (backwards-compatible)

* Update docs

* Also update DependencyMatcher.add

* Update internals

* Rewrite tests to use new API

* Add basic check for common mistake

Raise error with suggestion if user likely passed in a pattern instead of a list of patterns

* Fix typo [ci skip]

											
										
										
											2019-10-25 20:21:08 +00:00
+								becomes an optional keyword argument.
 								```diff
 								patterns = [nlp("health care reform"), nlp("healthcare reform")]
 								- matcher.add("HEALTH", on_match, *patterns)
 								+ matcher.add("HEALTH", patterns, on_match=on_match)
 								```
 								</Infobox>
-												Update docs, types and API consistency

											
										
										
											2020-08-17 14:45:24 +00:00
+								| Name           | Description                                                                                                                                                |
 								| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
 								| `match_id`     | str                                                                                                                                                        | An ID for the thing you're matching. ~~str~~ |
 								| `docs`         | `Doc` objects of the phrases to match. ~~List[Doc]~~                                                                                                       |
 								| _keyword-only_ |                                                                                                                                                            |  |
 								| `on_match`     | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |
-												Update matcher usage examples [ci skip]

											
										
										
											2020-07-02 13:39:45 +00:00
-												Document PhraseMatcher.remove [ci skip]

											
										
										
											2019-09-27 14:34:53 +00:00
+								## PhraseMatcher.remove {#remove tag="method" new="2.2"}
 								Remove a rule from the matcher by match ID. A `KeyError` is raised if the key
 								does not exist.
 								> #### Example
 								>
 								> ```python
 								> matcher = PhraseMatcher(nlp.vocab)
-												Update matcher usage examples [ci skip]

											
										
										
											2020-07-02 13:39:45 +00:00
+								> matcher.add("OBAMA", [nlp("Barack Obama")])
-												Document PhraseMatcher.remove [ci skip]

											
										
										
											2019-09-27 14:34:53 +00:00
+								> assert "OBAMA" in matcher
 								> matcher.remove("OBAMA")
 								> assert "OBAMA" not in matcher
 								> ```
-												Update docs, types and API consistency

											
										
										
											2020-08-17 14:45:24 +00:00
+								| Name  | Description                       |
 								| ----- | --------------------------------- |
 								| `key` | The ID of the match rule. ~~str~~ |