Update training docs [ci skip]

2019-09-12 15:32:39 +02:00 · 2019-09-12 15:32:39 +02:00 · a31e9e1cd5
parent b544dcb3c5
commit a31e9e1cd5
1 changed files with 71 additions and 70 deletions
--- a/website/docs/usage/training.md
+++ b/website/docs/usage/training.md
@ -6,6 +6,7 @@ menu:
  - ['NER', 'ner']
  - ['Tagger & Parser', 'tagger-parser']
  - ['Text Classification', 'textcat']
  - ['Entity Linking', 'entity-linker']
  - ['Tips and Advice', 'tips']
 ---
@ -415,76 +416,6 @@ referred to as the "catastrophic forgetting" problem.
 4. **Save** the trained model using [`nlp.to_disk`](/api/language#to_disk).
 5. **Test** the model to make sure the new entity is recognized correctly.
 ## Entity linking {#entity-linker}
 To train an entity linking model, you first need to define a knowledge base
 (KB).
 ### Creating a knowledge base {#kb}
 A KB consists of a list of entities with unique identifiers. Each such entity
 has an entity vector that will be used to measure similarity with the context in
 which an entity is used. These vectors are pretrained and stored in the KB
 before the entity linking model will be trained.
 The following example shows how to build a knowledge base from scratch, given a
 list of entities and potential aliases. The script further demonstrates how to
 pretrain and store the entity vectors. To run this example, the script needs
 access to a `vocab` instance or an `nlp` model with pretrained word embeddings.
 ```python
 https://github.com/explosion/spaCy/tree/master/examples/training/pretrain_kb.py
 ```
 #### Step by step guide {#step-by-step-kb}
 1. **Load the model** you want to start with, or create an **empty model** using
   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and
   a pre-defined [`vocab`](/api/vocab) object.
 2. **Pretrain the entity embeddings** by running the descriptions of the
   entities through a simple encoder-decoder network. The current implementation
   requires the `nlp` model to have access to pre-trained word embeddings, but a
   custom implementation of this enoding step can also be used.
 3. **Construct the KB** by defining all entities with their pretrained vectors,
   and all aliases with their prior probabilities.
 4. **Save** the KB using [`kb.dump`](/api/kb#dump).
 5. **Test** the KB to make sure the entities were added correctly.
 ### Training an entity linking model {#entity-linker-model}
 This example shows how to create an entity linker pipe using a previously
 created knowledge base. The entity linker pipe is then trained with your own
 examples. To do so, you'll need to provide **example texts**, and the
 **character offsets** and **knowledge base identifiers** of each entity
 contained in the texts.
 ```python
 https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
 ```
 #### Step by step guide {#step-by-step-entity-linker}
 1. **Load the KB** you want to start with, and specify the path to the `Vocab`
   object that was used to create this KB. Then, create an **empty model** using
   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language.
   Don't forget to add the KB to the entity linker, and to add the entity linker
   to the pipeline. In practical applications, you will want a more advanced
   pipeline including also a component for
   [named entity recognition](/usage/training#ner). If you're using a model with
   additional components, make sure to disable all other pipeline components
   during training using [`nlp.disable_pipes`](/api/language#disable_pipes).
   This way, you'll only be training the entity linker.
 2. **Shuffle and loop over** the examples. For each example, **update the
   model** by calling [`nlp.update`](/api/language#update), which steps through
   the annotated examples of the input. For each combination of a mention in
   text and a potential KB identifier, the model makes a **prediction** whether
   or not this is the correct match. It then consults the annotations to see
   whether it was right. If it was wrong, it adjusts its weights so that the
   correct combination will score higher next time.
 3. **Save** the trained model using [`nlp.to_disk`](/api/language#to_disk).
 4. **Test** the model to make sure the entities in the training data are
   recognized correctly.
 ## Training the tagger and parser {#tagger-parser}
 ### Updating the Dependency Parser {#example-train-parser}
@ -665,6 +596,76 @@ https://github.com/explosion/spaCy/tree/master/examples/training/train_textcat.p
 7. **Save** the trained model using [`nlp.to_disk`](/api/language#to_disk).
 8. **Test** the model to make sure the text classifier works as expected.
 ## Entity linking {#entity-linker}
 To train an entity linking model, you first need to define a knowledge base
 (KB).
 ### Creating a knowledge base {#kb}
 A KB consists of a list of entities with unique identifiers. Each such entity
 has an entity vector that will be used to measure similarity with the context in
 which an entity is used. These vectors are pretrained and stored in the KB
 before the entity linking model will be trained.
 The following example shows how to build a knowledge base from scratch, given a
 list of entities and potential aliases. The script further demonstrates how to
 pretrain and store the entity vectors. To run this example, the script needs
 access to a `vocab` instance or an `nlp` model with pretrained word embeddings.
 ```python
 https://github.com/explosion/spaCy/tree/master/examples/training/pretrain_kb.py
 ```
 #### Step by step guide {#step-by-step-kb}
 1. **Load the model** you want to start with, or create an **empty model** using
   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language and
   a pre-defined [`vocab`](/api/vocab) object.
 2. **Pretrain the entity embeddings** by running the descriptions of the
   entities through a simple encoder-decoder network. The current implementation
   requires the `nlp` model to have access to pre-trained word embeddings, but a
   custom implementation of this enoding step can also be used.
 3. **Construct the KB** by defining all entities with their pretrained vectors,
   and all aliases with their prior probabilities.
 4. **Save** the KB using [`kb.dump`](/api/kb#dump).
 5. **Test** the KB to make sure the entities were added correctly.
 ### Training an entity linking model {#entity-linker-model}
 This example shows how to create an entity linker pipe using a previously
 created knowledge base. The entity linker pipe is then trained with your own
 examples. To do so, you'll need to provide **example texts**, and the
 **character offsets** and **knowledge base identifiers** of each entity
 contained in the texts.
 ```python
 https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
 ```
 #### Step by step guide {#step-by-step-entity-linker}
 1. **Load the KB** you want to start with, and specify the path to the `Vocab`
   object that was used to create this KB. Then, create an **empty model** using
   [`spacy.blank`](/api/top-level#spacy.blank) with the ID of your language.
   Don't forget to add the KB to the entity linker, and to add the entity linker
   to the pipeline. In practical applications, you will want a more advanced
   pipeline including also a component for
   [named entity recognition](/usage/training#ner). If you're using a model with
   additional components, make sure to disable all other pipeline components
   during training using [`nlp.disable_pipes`](/api/language#disable_pipes).
   This way, you'll only be training the entity linker.
 2. **Shuffle and loop over** the examples. For each example, **update the
   model** by calling [`nlp.update`](/api/language#update), which steps through
   the annotated examples of the input. For each combination of a mention in
   text and a potential KB identifier, the model makes a **prediction** whether
   or not this is the correct match. It then consults the annotations to see
   whether it was right. If it was wrong, it adjusts its weights so that the
   correct combination will score higher next time.
 3. **Save** the trained model using [`nlp.to_disk`](/api/language#to_disk).
 4. **Test** the model to make sure the entities in the training data are
   recognized correctly.
 ## Optimization tips and advice {#tips}
 There are lots of conflicting "recipes" for training deep neural networks at the