From d0b3af9222809b858ca0fa4e23a85c8f3d357eae Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Sun, 24 Feb 2019 22:21:25 +0100 Subject: [PATCH] Fix remaining inaccuracies in API docs (closes #2329) --- website/docs/api/dependencyparser.md | 20 ++++++++-------- website/docs/api/entityrecognizer.md | 20 ++++++++-------- website/docs/api/goldparse.md | 35 ++++++++++++++++------------ website/docs/api/tagger.md | 19 ++++++++------- website/docs/api/textcategorizer.md | 23 +++++++++--------- 5 files changed, 62 insertions(+), 55 deletions(-) diff --git a/website/docs/api/dependencyparser.md b/website/docs/api/dependencyparser.md index b08e6139a..ca3725647 100644 --- a/website/docs/api/dependencyparser.md +++ b/website/docs/api/dependencyparser.md @@ -47,8 +47,8 @@ shortcut for this and instantiate the component using its string name and ## DependencyParser.\_\_call\_\_ {#call tag="method"} Apply the pipe to one document. The document is modified in place, and returned. -This usually happens under the hood when you call the `nlp` object on a text and -all pipeline components are applied to the `Doc` in order. Both +This usually happens under the hood when the `nlp` object is called on a text +and all pipeline components are applied to the `Doc` in order. Both [`__call__`](/api/dependencyparser#call) and [`pipe`](/api/dependencyparser#pipe) delegate to the [`predict`](/api/dependencyparser#predict) and @@ -70,8 +70,9 @@ all pipeline components are applied to the `Doc` in order. Both ## DependencyParser.pipe {#pipe tag="method"} -Apply the pipe to a stream of documents. Both -[`__call__`](/api/dependencyparser#call) and +Apply the pipe to a stream of documents. This usually happens under the hood +when the `nlp` object is called on a text and all pipeline components are +applied to the `Doc` in order. Both [`__call__`](/api/dependencyparser#call) and [`pipe`](/api/dependencyparser#pipe) delegate to the [`predict`](/api/dependencyparser#predict) and [`set_annotations`](/api/dependencyparser#set_annotations) methods. @@ -79,9 +80,8 @@ Apply the pipe to a stream of documents. Both > #### Example > > ```python -> texts = [u"One doc", u"...", u"Lots of docs"] > parser = DependencyParser(nlp.vocab) -> for doc in parser.pipe(texts, batch_size=50): +> for doc in parser.pipe(docs, batch_size=50): > pass > ``` @@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them. > scores = parser.predict([doc1, doc2]) > ``` -| Name | Type | Description | -| ----------- | -------- | ------------------------- | -| `docs` | iterable | The documents to predict. | -| **RETURNS** | - | Scores from the model. | +| Name | Type | Description | +| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `docs` | iterable | The documents to predict. | +| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. | ## DependencyParser.set_annotations {#set_annotations tag="method"} diff --git a/website/docs/api/entityrecognizer.md b/website/docs/api/entityrecognizer.md index 43de2c15c..8f71005bc 100644 --- a/website/docs/api/entityrecognizer.md +++ b/website/docs/api/entityrecognizer.md @@ -47,8 +47,8 @@ shortcut for this and instantiate the component using its string name and ## EntityRecognizer.\_\_call\_\_ {#call tag="method"} Apply the pipe to one document. The document is modified in place, and returned. -This usually happens under the hood when you call the `nlp` object on a text and -all pipeline components are applied to the `Doc` in order. Both +This usually happens under the hood when the `nlp` object is called on a text +and all pipeline components are applied to the `Doc` in order. Both [`__call__`](/api/entityrecognizer#call) and [`pipe`](/api/entityrecognizer#pipe) delegate to the [`predict`](/api/entityrecognizer#predict) and @@ -70,8 +70,9 @@ all pipeline components are applied to the `Doc` in order. Both ## EntityRecognizer.pipe {#pipe tag="method"} -Apply the pipe to a stream of documents. Both -[`__call__`](/api/entityrecognizer#call) and +Apply the pipe to a stream of documents. This usually happens under the hood +when the `nlp` object is called on a text and all pipeline components are +applied to the `Doc` in order. Both [`__call__`](/api/entityrecognizer#call) and [`pipe`](/api/entityrecognizer#pipe) delegate to the [`predict`](/api/entityrecognizer#predict) and [`set_annotations`](/api/entityrecognizer#set_annotations) methods. @@ -79,9 +80,8 @@ Apply the pipe to a stream of documents. Both > #### Example > > ```python -> texts = [u"One doc", u"...", u"Lots of docs"] > ner = EntityRecognizer(nlp.vocab) -> for doc in ner.pipe(texts, batch_size=50): +> for doc in ner.pipe(docs, batch_size=50): > pass > ``` @@ -102,10 +102,10 @@ Apply the pipeline's model to a batch of docs, without modifying them. > scores = ner.predict([doc1, doc2]) > ``` -| Name | Type | Description | -| ----------- | -------- | ------------------------- | -| `docs` | iterable | The documents to predict. | -| **RETURNS** | - | Scores from the model. | +| Name | Type | Description | +| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `docs` | iterable | The documents to predict. | +| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. | ## EntityRecognizer.set_annotations {#set_annotations tag="method"} diff --git a/website/docs/api/goldparse.md b/website/docs/api/goldparse.md index f46aa4216..1f71c5d58 100644 --- a/website/docs/api/goldparse.md +++ b/website/docs/api/goldparse.md @@ -7,17 +7,23 @@ source: spacy/gold.pyx ## GoldParse.\_\_init\_\_ {#init tag="method"} -Create a `GoldParse`. +Create a `GoldParse`. Unlike annotations in `entities`, label annotations in +`cats` can overlap, i.e. a single word can be covered by multiple labelled +spans. The [`TextCategorizer`](/api/textcategorizer) component expects true +examples of a label to have the value `1.0`, and negative examples of a label to +have the value `0.0`. Labels not in the dictionary are treated as missing – the +gradient for those labels will be zero. -| Name | Type | Description | -| ----------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -| `doc` | `Doc` | The document the annotations refer to. | -| `words` | iterable | A sequence of unicode word strings. | -| `tags` | iterable | A sequence of strings, representing tag annotations. | -| `heads` | iterable | A sequence of integers, representing syntactic head offsets. | -| `deps` | iterable | A sequence of strings, representing the syntactic relation types. | -| `entities` | iterable | A sequence of named entity annotations, either as BILUO tag strings, or as `(start_char, end_char, label)` tuples, representing the entity positions. | -| **RETURNS** | `GoldParse` | The newly constructed object. | +| Name | Type | Description | +| ----------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `doc` | `Doc` | The document the annotations refer to. | +| `words` | iterable | A sequence of unicode word strings. | +| `tags` | iterable | A sequence of strings, representing tag annotations. | +| `heads` | iterable | A sequence of integers, representing syntactic head offsets. | +| `deps` | iterable | A sequence of strings, representing the syntactic relation types. | +| `entities` | iterable | A sequence of named entity annotations, either as BILUO tag strings, or as `(start_char, end_char, label)` tuples, representing the entity positions. | +| `cats` | dict | Labels for text classification. Each key in the dictionary may be a string or an int, or a `(start_char, end_char, label)` tuple, indicating that the label is applied to only part of the document (usually a sentence). | +| **RETURNS** | `GoldParse` | The newly constructed object. | ## GoldParse.\_\_len\_\_ {#len tag="method"} @@ -52,11 +58,10 @@ Whether the provided syntactic annotations form a projective dependency tree. ### gold.biluo_tags_from_offsets {#biluo_tags_from_offsets tag="function"} Encode labelled spans into per-token tags, using the -[BILUO scheme](/api/annotation#biluo) (Begin/In/Last/Unit/Out). - -Returns a list of unicode strings, describing the tags. Each tag string will be -of the form of either `""`, `"O"` or `"{action}-{label}"`, where action is one -of `"B"`, `"I"`, `"L"`, `"U"`. The string `"-"` is used where the entity offsets +[BILUO scheme](/api/annotation#biluo) (Begin, In, Last, Unit, Out). Returns a +list of unicode strings, describing the tags. Each tag string will be of the +form of either `""`, `"O"` or `"{action}-{label}"`, where action is one of +`"B"`, `"I"`, `"L"`, `"U"`. The string `"-"` is used where the entity offsets don't align with the tokenization in the `Doc` object. The training algorithm will view these as missing values. `O` denotes a non-entity token. `B` denotes the beginning of a multi-token entity, `I` the inside of an entity of three or diff --git a/website/docs/api/tagger.md b/website/docs/api/tagger.md index fccb7cfd0..7b9581f9a 100644 --- a/website/docs/api/tagger.md +++ b/website/docs/api/tagger.md @@ -47,8 +47,8 @@ shortcut for this and instantiate the component using its string name and ## Tagger.\_\_call\_\_ {#call tag="method"} Apply the pipe to one document. The document is modified in place, and returned. -This usually happens under the hood when you call the `nlp` object on a text and -all pipeline components are applied to the `Doc` in order. Both +This usually happens under the hood when the `nlp` object is called on a text +and all pipeline components are applied to the `Doc` in order. Both [`__call__`](/api/tagger#call) and [`pipe`](/api/tagger#pipe) delegate to the [`predict`](/api/tagger#predict) and [`set_annotations`](/api/tagger#set_annotations) methods. @@ -69,16 +69,17 @@ all pipeline components are applied to the `Doc` in order. Both ## Tagger.pipe {#pipe tag="method"} -Apply the pipe to a stream of documents. Both [`__call__`](/api/tagger#call) and +Apply the pipe to a stream of documents. This usually happens under the hood +when the `nlp` object is called on a text and all pipeline components are +applied to the `Doc` in order. Both [`__call__`](/api/tagger#call) and [`pipe`](/api/tagger#pipe) delegate to the [`predict`](/api/tagger#predict) and [`set_annotations`](/api/tagger#set_annotations) methods. > #### Example > > ```python -> texts = [u"One doc", u"...", u"Lots of docs"] > tagger = Tagger(nlp.vocab) -> for doc in tagger.pipe(texts, batch_size=50): +> for doc in tagger.pipe(docs, batch_size=50): > pass > ``` @@ -99,10 +100,10 @@ Apply the pipeline's model to a batch of docs, without modifying them. > scores = tagger.predict([doc1, doc2]) > ``` -| Name | Type | Description | -| ----------- | -------- | ------------------------- | -| `docs` | iterable | The documents to predict. | -| **RETURNS** | - | Scores from the model. | +| Name | Type | Description | +| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `docs` | iterable | The documents to predict. | +| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. | ## Tagger.set_annotations {#set_annotations tag="method"} diff --git a/website/docs/api/textcategorizer.md b/website/docs/api/textcategorizer.md index f26a89098..faeb45bc6 100644 --- a/website/docs/api/textcategorizer.md +++ b/website/docs/api/textcategorizer.md @@ -64,8 +64,8 @@ argument. ## TextCategorizer.\_\_call\_\_ {#call tag="method"} Apply the pipe to one document. The document is modified in place, and returned. -This usually happens under the hood when you call the `nlp` object on a text and -all pipeline components are applied to the `Doc` in order. Both +This usually happens under the hood when the `nlp` object is called on a text +and all pipeline components are applied to the `Doc` in order. Both [`__call__`](/api/textcategorizer#call) and [`pipe`](/api/textcategorizer#pipe) delegate to the [`predict`](/api/textcategorizer#predict) and [`set_annotations`](/api/textcategorizer#set_annotations) methods. @@ -86,17 +86,18 @@ delegate to the [`predict`](/api/textcategorizer#predict) and ## TextCategorizer.pipe {#pipe tag="method"} -Apply the pipe to a stream of documents. Both -[`__call__`](/api/textcategorizer#call) and [`pipe`](/api/textcategorizer#pipe) -delegate to the [`predict`](/api/textcategorizer#predict) and +Apply the pipe to a stream of documents. This usually happens under the hood +when the `nlp` object is called on a text and all pipeline components are +applied to the `Doc` in order. Both [`__call__`](/api/textcategorizer#call) and +[`pipe`](/api/textcategorizer#pipe) delegate to the +[`predict`](/api/textcategorizer#predict) and [`set_annotations`](/api/textcategorizer#set_annotations) methods. > #### Example > > ```python -> texts = [u"One doc", u"...", u"Lots of docs"] > textcat = TextCategorizer(nlp.vocab) -> for doc in textcat.pipe(texts, batch_size=50): +> for doc in textcat.pipe(docs, batch_size=50): > pass > ``` @@ -117,10 +118,10 @@ Apply the pipeline's model to a batch of docs, without modifying them. > scores = textcat.predict([doc1, doc2]) > ``` -| Name | Type | Description | -| ----------- | -------- | ------------------------- | -| `docs` | iterable | The documents to predict. | -| **RETURNS** | - | Scores from the model. | +| Name | Type | Description | +| ----------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `docs` | iterable | The documents to predict. | +| **RETURNS** | tuple | A `(scores, tensors)` tuple where `scores` is the model's prediction for each document and `tensors` is the token representations used to predict the scores. Each tensor is an array with one row for each token in the document. | ## TextCategorizer.set_annotations {#set_annotations tag="method"}