Merge pull request #5791 from adrianeboyd/docs/morphology

2020-07-25 15:10:21 +02:00 · 2020-07-25 15:10:21 +02:00 · eb9acae34d
parent 71242327b2 41525901ef
commit eb9acae34d
5 changed files with 324 additions and 1 deletions
--- a/website/docs/api/morphanalysis.md
+++ b/website/docs/api/morphanalysis.md
@ -0,0 +1,153 @@
+---
+title: MorphAnalysis
+tag: class
+source: spacy/tokens/morphanalysis.pyx
+---
+
+Stores a single morphological analysis.
+
+
+## MorphAnalysis.\_\_init\_\_ {#init tag="method"}
+
+Initialize a MorphAnalysis object from a UD FEATS string or a dictionary of
+morphological features.
+
+> #### Example
+>
+> ```python
+> from spacy.tokens import MorphAnalysis
+> 
+> feats = "Feat1=Val1|Feat2=Val2"
+> m = MorphAnalysis(nlp.vocab, feats)
+> ```
+
+| Name        | Type               | Description                   |
+| ----------- | ------------------ | ----------------------------- |
+| `vocab`     | `Vocab`            | The vocab.                    |
+| `features`  | `Union[Dict, str]` | The morphological features.   |
+| **RETURNS** | `MorphAnalysis`    | The newly constructed object. |
+
+
+## MorphAnalysis.\_\_contains\_\_ {#contains tag="method"}
+
+Whether a feature/value pair is in the analysis.
+
+> #### Example
+>
+> ```python
+> feats = "Feat1=Val1,Val2|Feat2=Val2"
+> morph = MorphAnalysis(nlp.vocab, feats)
+> assert "Feat1=Val1" in morph
+> ```
+
+| Name        | Type  | Description                           |
+| ----------- | ----- | ------------------------------------- |
+| **RETURNS** | `str` | A feature/value pair in the analysis. |
+
+
+## MorphAnalysis.\_\_iter\_\_ {#iter tag="method"}
+
+Iterate over the feature/value pairs in the analysis.
+
+> #### Example
+>
+> ```python
+> feats = "Feat1=Val1,Val3|Feat2=Val2"
+> morph = MorphAnalysis(nlp.vocab, feats)
+> assert list(morph) == ["Feat1=Va1", "Feat1=Val3", "Feat2=Val2"]
+> ```
+
+| Name       | Type  | Description                           |
+| ---------- | ----- | ------------------------------------- |
+| **YIELDS** | `str` | A feature/value pair in the analysis. |
+
+
+## MorphAnalysis.\_\_len\_\_ {#len tag="method"}
+
+Returns the number of features in the analysis.
+
+> #### Example
+>
+> ```python
+> feats = "Feat1=Val1,Val2|Feat2=Val2"
+> morph = MorphAnalysis(nlp.vocab, feats)
+> assert len(morph) == 3
+> ```
+
+| Name        | Type  | Description                             |
+| ----------- | ----- | --------------------------------------- |
+| **RETURNS** | `int` | The number of features in the analysis. |
+
+
+## MorphAnalysis.\_\_str\_\_ {#str tag="method"}
+
+Returns the morphological analysis in the UD FEATS string format.
+
+> #### Example
+>
+> ```python
+> feats = "Feat1=Val1,Val2|Feat2=Val2"
+> morph = MorphAnalysis(nlp.vocab, feats)
+> assert str(morph) == feats
+> ```
+
+| Name        | Type  | Description                      |
+| ----------- | ----- | ---------------------------------|
+| **RETURNS** | `str` | The analysis in UD FEATS format. |
+
+
+## MorphAnalysis.get {#get tag="method"}
+
+Retrieve values for a feature by field.
+
+> #### Example
+>
+> ```python
+> feats = "Feat1=Val1,Val2"
+> morph = MorphAnalysis(nlp.vocab, feats)
+> assert morph.get("Feat1") == ["Val1", "Val2"]
+> ```
+
+| Name        | Type   | Description                         |
+| ----------- | ------ | ----------------------------------- |
+| `field`     | `str`  | The field to retrieve.              |
+| **RETURNS** | `list` | A list of the individual features.  |
+
+
+## MorphAnalysis.to_dict {#to_dict tag="method"}
+
+Produce a dict representation of the analysis, in the same format as the tag
+map.
+
+> #### Example
+>
+> ```python
+> feats = "Feat1=Val1,Val2|Feat2=Val2"
+> morph = MorphAnalysis(nlp.vocab, feats)
+> assert morph.to_dict() == {"Feat1": "Val1,Val2", "Feat2": "Val2"}
+> ```
+
+| Name        | Type   | Description                              |
+| ----------- | ------ | -----------------------------------------|
+| **RETURNS** | `dict` | The dict representation of the analysis. |
+
+
+## MorphAnalysis.from_id {#from_id tag="classmethod"}
+
+Create a morphological analysis from a given hash ID.
+
+> #### Example
+>
+> ```python
+> feats = "Feat1=Val1|Feat2=Val2"
+> hash = nlp.vocab.strings[feats]
+> morph = MorphAnalysis.from_id(nlp.vocab, hash)
+> assert str(morph) == feats
+> ```
+
+| Name    | Type    | Description                      |
+| ------- | ------- | -------------------------------- |
+| `vocab` | `Vocab` | The vocab.                       |
+| `key`   | `int`   | The hash of the features string. |
+
+
--- a/website/docs/api/morphology.md
+++ b/website/docs/api/morphology.md
@ -0,0 +1,165 @@
+---
+title: Morphology
+tag: class
+source: spacy/morphology.pyx
+---
+
+Store the possible morphological analyses for a language, and index them
+by hash. To save space on each token, tokens only know the hash of their
+morphological analysis, so queries of morphological attributes are delegated to
+this class.
+
+
+## Morphology.\_\_init\_\_ {#init tag="method"}
+
+Create a Morphology object using the tag map, lemmatizer and exceptions.
+
+> #### Example
+>
+> ```python
+> from spacy.morphology import Morphology
+>
+> morphology = Morphology(strings, tag_map, lemmatizer)
+> ```
+
+| Name        | Type                                     | Description                                                                                               |
+| ----------- | ---------------------------------------- | --------------------------------------------------------------------------------------------------------- |
+| `strings`   | `StringStore`                            | The string store.                                                             |
+| `tag_map`   | `Dict[str, Dict]`                        | The tag map.                                                                  |
+| `lemmatizer`| `Lemmatizer`                             | The lemmatizer.                                                               |
+| `exc`       | `Dict[str, Dict]`                        | A dictionary of exceptions in the format `{tag: {orth: {"POS": "X", "Feat1": "Val1, "Feat2": "Val2", ...}` |
+| **RETURNS** | `Morphology`                             | The newly constructed object.                                                                             |
+
+
+## Morphology.add {#add tag="method"}
+
+Insert a morphological analysis in the morphology table, if not already
+present. The morphological analysis may be provided in the UD FEATS format as a
+string or in the tag map dictionary format. Returns the hash of the new
+analysis.
+
+> #### Example
+>
+> ```python
+> feats = "Feat1=Val1|Feat2=Val2"
+> hash = nlp.vocab.morphology.add(feats)
+> assert hash == nlp.vocab.strings[feats]
+> ```
+
+| Name        | Type                | Description                 |
+| ----------- | ------------------- | --------------------------- |
+| `features`  | `Union[Dict, str]`  | The morphological features. |
+
+
+## Morphology.get {#get tag="method"}
+
+> #### Example
+>
+> ```python
+> feats = "Feat1=Val1|Feat2=Val2"
+> hash = nlp.vocab.morphology.add(feats)
+> assert nlp.vocab.morphology.get(hash) == feats
+> ```
+
+Get the FEATS string for the hash of the morphological analysis.
+
+| Name        | Type   | Description                             |
+| ----------- | ------ | --------------------------------------- |
+| `morph`     | int    | The hash of the morphological analysis. |
+
+
+## Morphology.load_tag_map {#load_tag_map tag="method"}
+
+Replace the current tag map with the provided tag map.
+
+| Name        | Type               | Description  |
+| ----------- | ------------------ | ------------ |
+| `tag_map`   | `Dict[str, Dict]`  | The tag map. |
+
+
+## Morphology.load_morph_exceptions {#load_morph_exceptions tag="method"}
+
+Replace the current morphological exceptions with the provided exceptions.
+
+| Name          | Type               | Description                   |
+| ------------- | ------------------ | ----------------------------- |
+| `morph_rules` | `Dict[str, Dict]`  | The morphological exceptions. |
+
+
+## Morphology.add_special_case {#add_special_case tag="method"}
+
+Add a special-case rule to the morphological analyzer. Tokens whose tag and
+orth match the rule will receive the specified properties.
+
+> #### Example
+>
+> ```python
+> attrs = {"POS": "DET", "Definite": "Def"}
+> morphology.add_special_case("DT", "the", attrs)
+> ```
+
+| Name        | Type | Description                                    |
+| ----------- | ---- | ---------------------------------------------- |
+| `tag_str`   | str  | The fine-grained tag.                          |
+| `orth_str`  | str  | The token text.                                |
+| `attrs`     | dict | The features to assign for this token and tag. |
+
+
+## Morphology.exc {#exc tag="property"}
+
+The current morphological exceptions.
+
+| Name       | Type  | Description                                         |
+| ---------- | ----- | --------------------------------------------------- |
+| **YIELDS** | dict  | The current dictionary of morphological exceptions. |
+
+
+## Morphology.lemmatize {#lemmatize tag="method"}
+
+TODO
+
+
+## Morphology.feats_to_dict {#feats_to_dict tag="staticmethod"}
+
+Convert a string FEATS representation to a dictionary of features and values in
+the same format as the tag map.
+
+> #### Example
+>
+> ```python
+> from spacy.morphology import Morphology
+> d = Morphology.feats_to_dict("Feat1=Val1|Feat2=Val2")
+> assert d == {"Feat1": "Val1", "Feat2": "Val2"}
+> ```
+
+| Name        | Type | Description                                                   |
+| ----------- | ---- | ------------------------------------------------------------- |
+| `feats`     | str  | The morphological features in Universal Dependencies FEATS format. |
+| **RETURNS** | dict | The morphological features as a dictionary. |
+
+
+## Morphology.dict_to_feats {#dict_to_feats tag="staticmethod"}
+
+Convert a dictionary of features and values to a string FEATS representation.
+
+> #### Example
+>
+> ```python
+> from spacy.morphology import Morphology
+> f = Morphology.dict_to_feats({"Feat1": "Val1", "Feat2": "Val2"})
+> assert f == "Feat1=Val1|Feat2=Val2"
+> ```
+
+| Name         | Type              | Description                                                          |
+| ------------ | ----------------- | --------------------------------------------------------------------- |
+| `feats_dict` | `Dict[str, Dict]` | The morphological features as a dictionary.                           |
+| **RETURNS**  | str               | The morphological features as in Universal Dependencies FEATS format. |
+
+
+## Attributes {#attributes}
+
+| Name          | Type  | Description                                  |
+| ------------- | ----- | -------------------------------------------- |
+| `FEATURE_SEP` | `str` | The FEATS feature separator. Default is `|`. |
+| `FIELD_SEP`   | `str` | The FEATS field separator. Default is `=`.   |
+| `VALUE_SEP`   | `str` | The FEATS value separator. Default is `,`.   |
--- a/website/docs/api/token.md
+++ b/website/docs/api/token.md
@ -450,6 +450,8 @@ The L2 norm of the token's vector representation.
 | `pos_`                                       | str          | Coarse-grained part-of-speech from the [Universal POS tag set](https://universaldependencies.org/docs/u/pos/).                                                                                                                                                 |
 | `tag`                                        | int          | Fine-grained part-of-speech.                                                                                                                                                                                                                                   |
 | `tag_`                                       | str          | Fine-grained part-of-speech.                                                                                                                                                                                                                                   |
+| `morph`                                      | `MorphAnalysis` | Morphological analysis.                                                                                                                                                     |
+| `morph_`                                     | str             | Morphological analysis in UD FEATS format.                                                                                                                                  |
 | `dep`                                        | int          | Syntactic dependency relation.                                                                                                                                                                                                                                 |
 | `dep_`                                       | str          | Syntactic dependency relation.                                                                                                                                                                                                                                 |
 | `lang`                                       | int          | Language of the parent document's vocabulary.                                                                                                                                                                                                                  |
--- a/website/docs/usage/101/_architecture.md
+++ b/website/docs/usage/101/_architecture.md
@ -24,6 +24,7 @@ an **annotated document**. It also orchestrates training and serialization.
 | [`Span`](/api/span)     | A slice from a `Doc` object.                                                                                                                            |
 | [`Token`](/api/token)   | An individual token — i.e. a word, punctuation symbol, whitespace, etc.                                                                                 |
 | [`Lexeme`](/api/lexeme) | An entry in the vocabulary. It's a word type with no context, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse etc. |
+| [`MorphAnalysis`](/api/morphanalysis) | A morphological analysis.                                                                                                                 |

 ### Processing pipeline {#architecture-pipeline}

@ -32,7 +33,7 @@ an **annotated document**. It also orchestrates training and serialization.
 | [`Language`](/api/language)                 | A text-processing pipeline. Usually you'll load this once per process as `nlp` and pass the instance around your application. |
 | [`Tokenizer`](/api/tokenizer)               | Segment text, and create `Doc` objects with the discovered segment boundaries.                                                |
 | [`Lemmatizer`](/api/lemmatizer)             | Determine the base forms of words.                                                                                            |
-| `Morphology`                                | Assign linguistic features like lemmas, noun case, verb tense etc. based on the word and its part-of-speech tag.              |
+| [`Morphology`](/api/morphology)             | Assign linguistic features like lemmas, noun case, verb tense etc. based on the word and its part-of-speech tag.              |
 | [`Tagger`](/api/tagger)                     | Annotate part-of-speech tags on `Doc` objects.                                                                                |
 | [`DependencyParser`](/api/dependencyparser) | Annotate syntactic dependencies on `Doc` objects.                                                                             |
 | [`EntityRecognizer`](/api/entityrecognizer) | Annotate named entities, e.g. persons or products, on `Doc` objects.                                                          |
--- a/website/meta/sidebars.json
+++ b/website/meta/sidebars.json
@ -102,6 +102,8 @@
                    { "text": "StringStore", "url": "/api/stringstore" },
                    { "text": "Vectors", "url": "/api/vectors" },
                    { "text": "Lookups", "url": "/api/lookups" },
+                    { "text": "Morphology", "url": "/api/morphology" },
+                    { "text": "MorphAnalysis", "url": "/api/morphanalysis" },
                    { "text": "KnowledgeBase", "url": "/api/kb" },
                    { "text": "Scorer", "url": "/api/scorer" },
                    { "text": "Corpus", "url": "/api/corpus" }