From 46ec5cdccc7fe61ad3c319ddf9ef5b587d97f44c Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Sun, 24 Feb 2019 13:11:57 +0100 Subject: [PATCH] Update TextCategorizer docs --- website/docs/api/textcategorizer.md | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/website/docs/api/textcategorizer.md b/website/docs/api/textcategorizer.md index cdb826c44..f26a89098 100644 --- a/website/docs/api/textcategorizer.md +++ b/website/docs/api/textcategorizer.md @@ -31,6 +31,7 @@ shortcut for this and instantiate the component using its string name and > ```python > # Construction via create_pipe > textcat = nlp.create_pipe("textcat") +> textcat = nlp.create_pipe("textcat", config={"exclusive_classes": True}) > > # Construction from class > from spacy.pipeline import TextCategorizer @@ -38,12 +39,27 @@ shortcut for this and instantiate the component using its string name and > textcat.from_disk("/path/to/model") > ``` -| Name | Type | Description | -| ----------- | ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -| `vocab` | `Vocab` | The shared vocabulary. | -| `model` | `thinc.neural.Model` or `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. | -| `**cfg` | - | Configuration parameters. | -| **RETURNS** | `TextCategorizer` | The newly constructed object. | +| Name | Type | Description | +| ------------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | +| `vocab` | `Vocab` | The shared vocabulary. | +| `model` | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. | +| `exclusive_classes` | bool | Make categories mutually exclusive. Defaults to `False`. | +| `architecture` | unicode | Model architecture to use, see [architectures](#architectures) for details. Defaults to `"ensemble"`. | +| **RETURNS** | `TextCategorizer` | The newly constructed object. | + +### Architectures {#architectures new="2.1"} + +Text classification models can be used to solve a wide variety of problems. +Differences in text length, number of labels, difficulty, and runtime +performance constraints mean that no single algorithm performs well on all types +of problems. To handle a wider variety of problems, the `TextCategorizer` object +allows configuration of its model architecture, using the `architecture` keyword +argument. + +| Name | Description | +| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `"ensemble"` | **Default:** Stacked ensemble of a unigram bag-of-words model and a neural network model. The neural network uses a CNN with mean pooling and attention. | +| `"simple_cnn"` | A neural network model where token vectors are calculated using a CNN. The vectors are mean pooled and used as features in a feed-forward network. | ## TextCategorizer.\_\_call\_\_ {#call tag="method"}