Update training docs [ci skip]

2019-09-12 15:32:39 +02:00 · 2019-09-12 15:32:39 +02:00 · a31e9e1cd5
parent b544dcb3c5
commit a31e9e1cd5
1 changed files with 71 additions and 70 deletions
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -6,6 +6,7 @@ menu:
  - ['NER', 'ner']
  - ['Tagger & Parser', 'tagger-parser']
  - ['Text Classification', 'textcat']
+  - ['Entity Linking', 'entity-linker']
  - ['Tips and Advice', 'tips']
 ---

@ -415,76 +416,6 @@ referred to as the "catastrophic forgetting" problem.
 4. **Save** the trained model using [`nlp.to_disk`](/api/language#to_disk).
 5. **Test** the model to make sure the new entity is recognized correctly.

-## Entity linking {#entity-linker}
-
-To train an entity linking model, you first need to define a knowledge base
-(KB).
-
-### Creating a knowledge base {#kb}
-
-A KB consists of a list of entities with unique identifiers. Each such entity
-has an entity vector that will be used to measure similarity with the context in
-which an entity is used. These vectors are pretrained and stored in the KB
-before the entity linking model will be trained.
-
-The following example shows how to build a knowledge base from scratch, given a
-list of entities and potential aliases. The script further demonstrates how to
-pretrain and store the entity vectors. To run this example, the script needs
-access to a `vocab` instance or an `nlp` model with pretrained word embeddings.
-
-```python
-https://github.com/explosion/spaCy/tree/master/examples/training/pretrain_kb.py
-```
-
-#### Step by step guide {#step-by-step-kb}
-
-1. **Load the model** you want to start with, or create an **empty model** using
-   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and
-   a pre-defined [`vocab`](/api/vocab) object.
-2. **Pretrain the entity embeddings** by running the descriptions of the
-   entities through a simple encoder-decoder network. The current implementation
-   requires the `nlp` model to have access to pre-trained word embeddings, but a
-   custom implementation of this enoding step can also be used.
-3. **Construct the KB** by defining all entities with their pretrained vectors,
-   and all aliases with their prior probabilities.
-4. **Save** the KB using [`kb.dump`](/api/kb#dump).
-5. **Test** the KB to make sure the entities were added correctly.
-
-### Training an entity linking model {#entity-linker-model}
-
-This example shows how to create an entity linker pipe using a previously
-created knowledge base. The entity linker pipe is then trained with your own
-examples. To do so, you'll need to provide **example texts**, and the
-**character offsets** and **knowledge base identifiers** of each entity
-contained in the texts.
-
-```python
-https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
-```
-
-#### Step by step guide {#step-by-step-entity-linker}
-
-1. **Load the KB** you want to start with, and specify the path to the `Vocab`
-   object that was used to create this KB. Then, create an **empty model** using
-   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language.
-   Don't forget to add the KB to the entity linker, and to add the entity linker
-   to the pipeline. In practical applications, you will want a more advanced
-   pipeline including also a component for
-   [named entity recognition](/usage/training#ner). If you're using a model with
-   additional components, make sure to disable all other pipeline components
-   during training using [`nlp.disable_pipes`](/api/language#disable_pipes).
-   This way, you'll only be training the entity linker.
-2. **Shuffle and loop over** the examples. For each example, **update the
-   model** by calling [`nlp.update`](/api/language#update), which steps through
-   the annotated examples of the input. For each combination of a mention in
-   text and a potential KB identifier, the model makes a **prediction** whether
-   or not this is the correct match. It then consults the annotations to see
-   whether it was right. If it was wrong, it adjusts its weights so that the
-   correct combination will score higher next time.
-3. **Save** the trained model using [`nlp.to_disk`](/api/language#to_disk).
-4. **Test** the model to make sure the entities in the training data are
-   recognized correctly.
-
 ## Training the tagger and parser {#tagger-parser}

 ### Updating the Dependency Parser {#example-train-parser}
@ -665,6 +596,76 @@ https://github.com/explosion/spaCy/tree/master/examples/training/train_textcat.p
 7. **Save** the trained model using [`nlp.to_disk`](/api/language#to_disk).
 8. **Test** the model to make sure the text classifier works as expected.

+## Entity linking {#entity-linker}
+
+To train an entity linking model, you first need to define a knowledge base
+(KB).
+
+### Creating a knowledge base {#kb}
+
+A KB consists of a list of entities with unique identifiers. Each such entity
+has an entity vector that will be used to measure similarity with the context in
+which an entity is used. These vectors are pretrained and stored in the KB
+before the entity linking model will be trained.
+
+The following example shows how to build a knowledge base from scratch, given a
+list of entities and potential aliases. The script further demonstrates how to
+pretrain and store the entity vectors. To run this example, the script needs
+access to a `vocab` instance or an `nlp` model with pretrained word embeddings.
+
+```python
+https://github.com/explosion/spaCy/tree/master/examples/training/pretrain_kb.py
+```
+
+#### Step by step guide {#step-by-step-kb}
+
+1. **Load the model** you want to start with, or create an **empty model** using
+   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and
+   a pre-defined [`vocab`](/api/vocab) object.
+2. **Pretrain the entity embeddings** by running the descriptions of the
+   entities through a simple encoder-decoder network. The current implementation
+   requires the `nlp` model to have access to pre-trained word embeddings, but a
+   custom implementation of this enoding step can also be used.
+3. **Construct the KB** by defining all entities with their pretrained vectors,
+   and all aliases with their prior probabilities.
+4. **Save** the KB using [`kb.dump`](/api/kb#dump).
+5. **Test** the KB to make sure the entities were added correctly.
+
+### Training an entity linking model {#entity-linker-model}
+
+This example shows how to create an entity linker pipe using a previously
+created knowledge base. The entity linker pipe is then trained with your own
+examples. To do so, you'll need to provide **example texts**, and the
+**character offsets** and **knowledge base identifiers** of each entity
+contained in the texts.
+
+```python
+https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
+```
+
+#### Step by step guide {#step-by-step-entity-linker}
+
+1. **Load the KB** you want to start with, and specify the path to the `Vocab`
+   object that was used to create this KB. Then, create an **empty model** using
+   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language.
+   Don't forget to add the KB to the entity linker, and to add the entity linker
+   to the pipeline. In practical applications, you will want a more advanced
+   pipeline including also a component for
+   [named entity recognition](/usage/training#ner). If you're using a model with
+   additional components, make sure to disable all other pipeline components
+   during training using [`nlp.disable_pipes`](/api/language#disable_pipes).
+   This way, you'll only be training the entity linker.
+2. **Shuffle and loop over** the examples. For each example, **update the
+   model** by calling [`nlp.update`](/api/language#update), which steps through
+   the annotated examples of the input. For each combination of a mention in
+   text and a potential KB identifier, the model makes a **prediction** whether
+   or not this is the correct match. It then consults the annotations to see
+   whether it was right. If it was wrong, it adjusts its weights so that the
+   correct combination will score higher next time.
+3. **Save** the trained model using [`nlp.to_disk`](/api/language#to_disk).
+4. **Test** the model to make sure the entities in the training data are
+   recognized correctly.
+
 ## Optimization tips and advice {#tips}

 There are lots of conflicting "recipes" for training deep neural networks at the