diff --git a/website/docs/api/architectures.md b/website/docs/api/architectures.md index a22ee5be8..a9849cc81 100644 --- a/website/docs/api/architectures.md +++ b/website/docs/api/architectures.md @@ -158,9 +158,21 @@ architectures into your training config. ## Entity linking architectures {#entitylinker source="spacy/ml/models/entity_linker.py"} +An Entity Linker component disambiguates textual mentions (tagged as named +entities) to unique identifiers, grounding the named entities into the "real +world". This requires 3 main components: + +- A [`KnowledgeBase`](/api/kb) (KB) holding the unique identifiers, potential + synonyms and prior probabilities. +- A candidate generation step to produce a set of likely identifiers, given a + certain textual mention. +- A Machine learning [`Model`](https://thinc.ai/docs/api-model) that picks the + most plausible ID from the set of candidates. + ### spacy.EntityLinker.v1 {#EntityLinker} - +The `EntityLinker` model architecture is a `Thinc` `Model` with a Linear output +layer. > #### Example Config > @@ -170,10 +182,46 @@ architectures into your training config. > nO = null > > [model.tok2vec] -> # ... +> @architectures = "spacy.HashEmbedCNN.v1" +> pretrained_vectors = null +> width = 96 +> depth = 2 +> embed_size = 300 +> window_size = 1 +> maxout_pieces = 3 +> subword_features = true +> dropout = null +> +> [kb_loader] +> @assets = "spacy.EmptyKB.v1" +> entity_vector_length = 64 +> +> [get_candidates] +> @assets = "spacy.CandidateGenerator.v1" > ``` -| Name | Type | Description | -| --------- | ------------------------------------------ | ----------- | -| `tok2vec` | [`Model`](https://thinc.ai/docs/api-model) | | -| `nO` | int | | +| Name | Type | Description | +| --------- | ------------------------------------------ | ---------------------------------------------------------------------------------------- | +| `tok2vec` | [`Model`](https://thinc.ai/docs/api-model) | The [`tok2vec`](#tok2vec) layer of the model. | +| `nO` | int | Output dimension, determined by the length of the vectors encoding each entity in the KB | + +If the `nO` dimension is not set, the Entity Linking component will set it when +`begin_training` is called. + +### spacy.EmptyKB.v1 {#EmptyKB} + +A function that creates a default, empty Knowledge Base from a [`Vocab`](/api/vocab) instance. + +| Name | Type | Description | +| ---------------------- | ---- | -------------------------------------------------------- | +| `entity_vector_length` | int | The length of the vectors encoding each entity in the KB - 64 by default. | + +### spacy.CandidateGenerator.v1 {#CandidateGenerator} + +A function that takes as input a [`KnowledgeBase`](/api/kb) and a [`Span`](/api/span) object denoting a +named entity, and returns a list of plausible +[`Candidate` objects](/api/kb/#candidate_init). + +The default `CandidateGenerator` simply uses the text of a mention to find its +potential aliases in the Knowledgebase. Note that this function is +case-dependent. diff --git a/website/docs/api/entitylinker.md b/website/docs/api/entitylinker.md index 652574d15..50ffe5c09 100644 --- a/website/docs/api/entitylinker.md +++ b/website/docs/api/entitylinker.md @@ -9,6 +9,12 @@ api_string_name: entity_linker api_trainable: true --- +An Entity Linker component disambiguates textual mentions (tagged as named +entities) to unique identifiers, grounding the named entities into the "real +world". It requires a Knowledge base, a function to generate plausible +candidates from that Knowledge base given a certain textual mention, and a ML +model to pick the right candidate, given the local context of the mention. + ## Config and implementation {#config} The default config is defined by the pipeline component factory and describes