diff --git a/spacy/vocab.pyx b/spacy/vocab.pyx index db73e9d91..1008797b3 100644 --- a/spacy/vocab.pyx +++ b/spacy/vocab.pyx @@ -61,6 +61,8 @@ cdef class Vocab: lookups (Lookups): Container for large lookup tables and dictionaries. oov_prob (float): Default OOV probability. vectors_name (unicode): Optional name to identify the vectors table. + get_noun_chunks (Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]): + A function that yields base noun phrases used for Doc.noun_chunks. """ lex_attr_getters = lex_attr_getters if lex_attr_getters is not None else {} if lookups in (None, True, False): diff --git a/website/docs/api/doc.md b/website/docs/api/doc.md index e4d24d2c0..45feb8774 100644 --- a/website/docs/api/doc.md +++ b/website/docs/api/doc.md @@ -616,8 +616,10 @@ phrase, or "NP chunk", is a noun phrase that does not permit other NPs to be nested within it – so no NP-level coordination, no prepositional phrases, and no relative clauses. -If the `noun_chunk` [syntax iterator](/usage/adding-languages#language-data) has -not been implemeted for the given language, a `NotImplementedError` is raised. +To customize the noun chunk iterator in a loaded pipeline, modify +[`nlp.vocab.get_noun_chunks`](/api/vocab#attributes). If the `noun_chunk` +[syntax iterator](/usage/adding-languages#language-data) has not been +implemented for the given language, a `NotImplementedError` is raised. > #### Example > diff --git a/website/docs/api/vocab.md b/website/docs/api/vocab.md index a2ca63002..8fe769cdd 100644 --- a/website/docs/api/vocab.md +++ b/website/docs/api/vocab.md @@ -21,14 +21,14 @@ Create the vocabulary. > vocab = Vocab(strings=["hello", "world"]) > ``` -| Name | Description | -| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `lex_attr_getters` | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~ | -| `strings` | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~ | -| `lookups` | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~ | -| `oov_prob` | The default OOV probability. Defaults to `-20.0`. ~~float~~ | -| `vectors_name` 2.2 | A name to identify the vectors table. ~~str~~ | -| `writing_system` | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~ | +| Name | Description | +| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `lex_attr_getters` | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~ | +| `strings` | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~ | +| `lookups` | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~ | +| `oov_prob` | The default OOV probability. Defaults to `-20.0`. ~~float~~ | +| `vectors_name` 2.2 | A name to identify the vectors table. ~~str~~ | +| `writing_system` | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~ | | `get_noun_chunks` | A function that yields base noun phrases used for [`Doc.noun_chunks`](/ap/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ | ## Vocab.\_\_len\_\_ {#len tag="method"} @@ -182,14 +182,14 @@ subword features by average over n-grams of `orth` (introduced in spaCy `v2.1`). | Name | Description | | ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | `orth` | The hash value of a word, or its unicode string. ~~Union[int, str]~~ | -| `minn` 2.1 | Minimum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~ | -| `maxn` 2.1 | Maximum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~ | +| `minn` 2.1 | Minimum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~ | +| `maxn` 2.1 | Maximum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~ | | **RETURNS** | A word vector. Size and shape are determined by the `Vocab.vectors` instance. ~~numpy.ndarray[ndim=1, dtype=float32]~~ | ## Vocab.set_vector {#set_vector tag="method" new="2"} -Set a vector for a word in the vocabulary. Words can be referenced by string -or hash value. +Set a vector for a word in the vocabulary. Words can be referenced by string or +hash value. > #### Example > @@ -300,13 +300,14 @@ Load state from a binary string. > assert type(PERSON) == int > ``` -| Name | Description | -| --------------------------------------------- | ------------------------------------------------------------------------------- | -| `strings` | A table managing the string-to-int mapping. ~~StringStore~~ | -| `vectors` 2 | A table associating word IDs to word vectors. ~~Vectors~~ | -| `vectors_length` | Number of dimensions for each word vector. ~~int~~ | -| `lookups` | The available lookup tables in this vocab. ~~Lookups~~ | -| `writing_system` 2.1 | A dict with information about the language's writing system. ~~Dict[str, Any]~~ | +| Name | Description | +| ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `strings` | A table managing the string-to-int mapping. ~~StringStore~~ | +| `vectors` 2 | A table associating word IDs to word vectors. ~~Vectors~~ | +| `vectors_length` | Number of dimensions for each word vector. ~~int~~ | +| `lookups` | The available lookup tables in this vocab. ~~Lookups~~ | +| `writing_system` 2.1 | A dict with information about the language's writing system. ~~Dict[str, Any]~~ | +| `get_noun_chunks` 3.0 | A function that yields base noun phrases used for [`Doc.noun_chunks`](/ap/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ | ## Serialization fields {#serialization-fields}