From 3aa57ce6c9ab162715cad72563b25f5aecb28966 Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Mon, 21 Sep 2020 09:07:20 +0200 Subject: [PATCH 1/5] Update alignment mode in Doc.char_span docs --- website/docs/api/doc.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/website/docs/api/doc.md b/website/docs/api/doc.md index 380f6a172..44316ea1e 100644 --- a/website/docs/api/doc.md +++ b/website/docs/api/doc.md @@ -187,8 +187,8 @@ Remove a previously registered extension. ## Doc.char_span {#char_span tag="method" new="2"} Create a `Span` object from the slice `doc.text[start_idx:end_idx]`. Returns -`None` if the character indices don't map to a valid span using the default mode -`"strict". +`None` if the character indices don't map to a valid span using the default +alignment mode `"strict". > #### Example > @@ -198,15 +198,15 @@ Create a `Span` object from the slice `doc.text[start_idx:end_idx]`. Returns > assert span.text == "New York" > ``` -| Name | Description | -| ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `start` | The index of the first character of the span. ~~int~~ | -| `end` | The index of the last character after the span. ~int~~ | -| `label` | A label to attach to the span, e.g. for named entities. ~~Union[int, str]~~ | -| `kb_id` 2.2 | An ID from a knowledge base to capture the meaning of a named entity. ~~Union[int, str]~~ | -| `vector` | A meaning representation of the span. ~~numpy.ndarray[ndim=1, dtype=float32]~~ | -| `mode` | How character indices snap to token boundaries. Options: `"strict"` (no snapping), `"inside"` (span of all tokens completely within the character span), `"outside"` (span of all tokens at least partially covered by the character span). Defaults to `"strict"`. ~~str~~ | -| **RETURNS** | The newly constructed object or `None`. ~~Optional[Span]~~ | +| Name | Description | +| ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `start` | The index of the first character of the span. ~~int~~ | +| `end` | The index of the last character after the span. ~int~~ | +| `label` | A label to attach to the span, e.g. for named entities. ~~Union[int, str]~~ | +| `kb_id` 2.2 | An ID from a knowledge base to capture the meaning of a named entity. ~~Union[int, str]~~ | +| `vector` | A meaning representation of the span. ~~numpy.ndarray[ndim=1, dtype=float32]~~ | +| `alignment_mode` | How character indices snap to token boundaries. Options: `"strict"` (no snapping), `"contract"` (span of all tokens completely within the character span), `"expand"` (span of all tokens at least partially covered by the character span). Defaults to `"strict"`. ~~str~~ | +| **RETURNS** | The newly constructed object or `None`. ~~Optional[Span]~~ | ## Doc.similarity {#similarity tag="method" model="vectors"} From cc71ec901f26ae1c3bfb62b6bd776295200f418e Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Mon, 21 Sep 2020 09:08:55 +0200 Subject: [PATCH 2/5] Fix typo in saving and loading usage docs --- website/docs/usage/saving-loading.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/website/docs/usage/saving-loading.md b/website/docs/usage/saving-loading.md index 3a95bf6aa..06fb18591 100644 --- a/website/docs/usage/saving-loading.md +++ b/website/docs/usage/saving-loading.md @@ -299,9 +299,10 @@ installed in the same environment – that's it. When you load a pipeline, spaCy will generally use its `config.cfg` to set up the language class and construct the pipeline. The pipeline is specified as a -list of strings, e.g. `pipeline = ["tagger", "paser", "ner"]`. For each of those -strings, spaCy will call `nlp.add_pipe` and look up the name in all factories -defined by the decorators [`@Language.component`](/api/language#component) and +list of strings, e.g. `pipeline = ["tagger", "parser", "ner"]`. For each of +those strings, spaCy will call `nlp.add_pipe` and look up the name in all +factories defined by the decorators +[`@Language.component`](/api/language#component) and [`@Language.factory`](/api/language#factory). This means that you have to import your custom components _before_ loading the pipeline. From fc9c78da25202322c9ec042b529a6a3f91d48e4d Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Tue, 22 Sep 2020 09:23:47 +0200 Subject: [PATCH 3/5] Add MorphAnalysis to API sidebar --- website/meta/sidebars.json | 1 + 1 file changed, 1 insertion(+) diff --git a/website/meta/sidebars.json b/website/meta/sidebars.json index e27817c92..28915ebb7 100644 --- a/website/meta/sidebars.json +++ b/website/meta/sidebars.json @@ -119,6 +119,7 @@ { "text": "Corpus", "url": "/api/corpus" }, { "text": "KnowledgeBase", "url": "/api/kb" }, { "text": "Lookups", "url": "/api/lookups" }, + { "text": "MorphAnalysis", "url": "/api/morphanalysis" }, { "text": "Morphology", "url": "/api/morphology" }, { "text": "Scorer", "url": "/api/scorer" }, { "text": "StringStore", "url": "/api/stringstore" }, From 844db6ff12441f63f51d4d9921cdaf4e6af61a04 Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Tue, 22 Sep 2020 09:31:47 +0200 Subject: [PATCH 4/5] Update architecture overview --- website/docs/usage/101/_architecture.md | 32 ++++++++++++------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/website/docs/usage/101/_architecture.md b/website/docs/usage/101/_architecture.md index 98011f173..6e9120022 100644 --- a/website/docs/usage/101/_architecture.md +++ b/website/docs/usage/101/_architecture.md @@ -65,22 +65,22 @@ Matchers help you find and extract information from [`Doc`](/api/doc) objects based on match patterns describing the sequences you're looking for. A matcher operates on a `Doc` and gives you access to the matched tokens **in context**. -| Name | Description | -| --------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [`Matcher`](/api/matcher) | Match sequences of tokens, based on pattern rules, similar to regular expressions. | -| [`PhraseMatcher`](/api/phrasematcher) | Match sequences of tokens based on phrases. | -| [`DependencyMatcher`](/api/dependencymatcher) | Match sequences of tokens based on dependency trees using the [Semgrex syntax](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html). | +| Name | Description | +| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [`Matcher`](/api/matcher) | Match sequences of tokens, based on pattern rules, similar to regular expressions. | +| [`PhraseMatcher`](/api/phrasematcher) | Match sequences of tokens based on phrases. | +| [`DependencyMatcher`](/api/dependencymatcher) | Match sequences of tokens based on dependency trees using [Semgrex operators](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html). | ### Other classes {#architecture-other} -| Name | Description | -| ------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------- | -| [`Vocab`](/api/vocab) | The shared vocabulary that stores strings and gives you access to [`Lexeme`](/api/lexeme) objects. | -| [`StringStore`](/api/stringstore) | Map strings to and from hash values. | -| [`Vectors`](/api/vectors) | Container class for vector data keyed by string. | -| [`Lookups`](/api/lookups) | Container for convenient access to large lookup tables and dictionaries. | -| [`Morphology`](/api/morphology) | Assign linguistic features like lemmas, noun case, verb tense etc. based on the word and its part-of-speech tag. | -| [`MorphAnalysis`](/api/morphology#morphanalysis) | A morphological analysis. | -| [`KnowledgeBase`](/api/kb) | Storage for entities and aliases of a knowledge base for entity linking. | -| [`Scorer`](/api/scorer) | Compute evaluation scores. | -| [`Corpus`](/api/corpus) | Class for managing annotated corpora for training and evaluation data. | +| Name | Description | +| ------------------------------------------------ | -------------------------------------------------------------------------------------------------- | +| [`Vocab`](/api/vocab) | The shared vocabulary that stores strings and gives you access to [`Lexeme`](/api/lexeme) objects. | +| [`StringStore`](/api/stringstore) | Map strings to and from hash values. | +| [`Vectors`](/api/vectors) | Container class for vector data keyed by string. | +| [`Lookups`](/api/lookups) | Container for convenient access to large lookup tables and dictionaries. | +| [`Morphology`](/api/morphology) | Store morphological analyses and map them to and from hash values. | +| [`MorphAnalysis`](/api/morphology#morphanalysis) | A morphological analysis. | +| [`KnowledgeBase`](/api/kb) | Storage for entities and aliases of a knowledge base for entity linking. | +| [`Scorer`](/api/scorer) | Compute evaluation scores. | +| [`Corpus`](/api/corpus) | Class for managing annotated corpora for training and evaluation data. | From e05d6d358d04166779093d2acff0e2c3bb95fe04 Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Tue, 22 Sep 2020 09:36:37 +0200 Subject: [PATCH 5/5] Update API sidebar MorphAnalysis link --- website/meta/sidebars.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/meta/sidebars.json b/website/meta/sidebars.json index 28915ebb7..c5404b68e 100644 --- a/website/meta/sidebars.json +++ b/website/meta/sidebars.json @@ -119,7 +119,7 @@ { "text": "Corpus", "url": "/api/corpus" }, { "text": "KnowledgeBase", "url": "/api/kb" }, { "text": "Lookups", "url": "/api/lookups" }, - { "text": "MorphAnalysis", "url": "/api/morphanalysis" }, + { "text": "MorphAnalysis", "url": "/api/morphology#morphanalysis" }, { "text": "Morphology", "url": "/api/morphology" }, { "text": "Scorer", "url": "/api/scorer" }, { "text": "StringStore", "url": "/api/stringstore" },