diff --git a/website/usage/_training/_ner.jade b/website/usage/_training/_ner.jade index ff3101c8f..ed58c4c6f 100644 --- a/website/usage/_training/_ner.jade +++ b/website/usage/_training/_ner.jade @@ -24,28 +24,60 @@ p | #[strong experiment on your own data] to find a solution that works best | for you. -+h(3, "example-new-entity-type") Example: Training an additional entity type ++h(3, "example-new-entity-type") Training an additional entity type p - | This script shows how to add a new entity type to an existing pre-trained - | NER model. To keep the example short and simple, only a few sentences are + | This script shows how to add a new entity type #[code ANIMAL] to an + | existing pre-trained NER model, or an empty #[code Language] class. To + | keep the example short and simple, only a few sentences are | provided as examples. In practice, you'll need many more — a few hundred | would be a good start. You will also likely need to mix in examples of | other entity types, which might be obtained by running the entity | recognizer over unlabelled sentences, and adding their annotations to the | training set. -p - | The actual training is performed by looping over the examples, and - | calling #[+api("language#update") #[code nlp.update()]]. The - | #[code update] method steps through the words of the input. At each word, - | it makes a prediction. It then consults the annotations provided on the - | #[+api("goldparse") #[code GoldParse]] instance, to see whether it was - | right. If it was wrong, it adjusts its weights so that the correct - | action will score higher next time. - +github("spacy", "examples/training/train_new_entity_type.py") +p Training a new entity type requires the following steps: + ++list("numbers") + +item + | Create #[+api("doc") #[code Doc]] and + | #[+api("goldparse") #[code GoldParse]] objects for + | #[strong each example in your training data]. + + +item + | #[strong Load the model] you want to start with, or create an + | #[strong empty model] using + | #[+api("spacy#blank") #[code spacy.blank()]] with the ID of your + | language. If you're using an existing model, make sure to disable + | all other pipeline components during training using + | #[+api("language#disable_pipes") #[code nlp.disable_pipes]]. This way, + | you'll only be training the entity recognizer. + + +item + | #[strong Add the new entity label] to the entity recognizer using the + | #[+api("entityrecognizer#add_label") #[code add_label]] method. You + | can access the entity recognizer in the pipeline via + | #[code nlp.get_pipe('ner')]. + + +item + | #[strong Loop over] the examples and call + | #[+api("language#update") #[code nlp.update]], which steps through + | the words of the input. At each word, it makes a + | #[strong prediction]. It then consults the annotations provided on the + | #[+api("goldparse") #[code GoldParse]] instance, to see whether it was + | right. If it was wrong, it adjusts its weights so that the correct + | action will score higher next time. + + +item + | #[strong Save] the trained model using + | #[+api("language#to_disk") #[code nlp.to_disk()]]. + + +item + | #[strong Test] the model to make sure the new entity is recognized + | correctly. + +h(3, "example-ner-from-scratch") Example: Training an NER system from scratch p