spaCy/website/docs/api/kb.md

11 KiB

title teaser tag source new
KnowledgeBase A storage class for entities and aliases of a specific knowledge base (ontology) class spacy/kb.pyx 2.2

The KnowledgeBase object provides a method to generate Candidate objects, which are plausible external identifiers given a certain textual mention. Each such Candidate holds information from the relevant KB entities, such as its frequency in text and possible aliases. Each entity in the knowledge base also has a pretrained entity vector of a fixed size.

KnowledgeBase.__init__

Create the knowledge base.

Example

from spacy.kb import KnowledgeBase
vocab = nlp.vocab
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
Name Description
vocab The shared vocabulary. Vocab
entity_vector_length Length of the fixed-size entity vectors. int

KnowledgeBase.entity_vector_length

The length of the fixed-size entity vectors in the knowledge base.

Name Description
RETURNS Length of the fixed-size entity vectors. int

KnowledgeBase.add_entity

Add an entity to the knowledge base, specifying its corpus frequency and entity vector, which should be of length entity_vector_length.

Example

kb.add_entity(entity="Q42", freq=32, entity_vector=vector1)
kb.add_entity(entity="Q463035", freq=111, entity_vector=vector2)
Name Description
entity The unique entity identifier. str
freq The frequency of the entity in a typical corpus. float
entity_vector The pretrained vector of the entity. numpy.ndarray

KnowledgeBase.set_entities

Define the full list of entities in the knowledge base, specifying the corpus frequency and entity vector for each entity.

Example

kb.set_entities(entity_list=["Q42", "Q463035"], freq_list=[32, 111], vector_list=[vector1, vector2])
Name Description
entity_list List of unique entity identifiers. Iterable[Union[str, int]]
freq_list List of entity frequencies. Iterable[int]
vector_list List of entity vectors. Iterable[numpy.ndarray]

KnowledgeBase.add_alias

Add an alias or mention to the knowledge base, specifying its potential KB identifiers and their prior probabilities. The entity identifiers should refer to entities previously added with add_entity or set_entities. The sum of the prior probabilities should not exceed 1.

Example

kb.add_alias(alias="Douglas", entities=["Q42", "Q463035"], probabilities=[0.6, 0.3])
Name Description
alias The textual mention or alias. str
entities The potential entities that the alias may refer to. Iterable[Union[str, int]]
probabilities The prior probabilities of each entity. Iterable[float]

KnowledgeBase.__len__

Get the total number of entities in the knowledge base.

Example

total_entities = len(kb)
Name Description
RETURNS The number of entities in the knowledge base. int

KnowledgeBase.get_entity_strings

Get a list of all entity IDs in the knowledge base.

Example

all_entities = kb.get_entity_strings()
Name Description
RETURNS The list of entities in the knowledge base. List[str]

KnowledgeBase.get_size_aliases

Get the total number of aliases in the knowledge base.

Example

total_aliases = kb.get_size_aliases()
Name Description
RETURNS The number of aliases in the knowledge base. int

KnowledgeBase.get_alias_strings

Get a list of all aliases in the knowledge base.

Example

all_aliases = kb.get_alias_strings()
Name Description
RETURNS The list of aliases in the knowledge base. List[str]

KnowledgeBase.get_candidates

Given a certain textual mention as input, retrieve a list of candidate entities of type Candidate.

Example

candidates = kb.get_candidates("Douglas")
Name Description
alias The textual mention or alias. str
RETURNS iterable

KnowledgeBase.get_vector

Given a certain entity ID, retrieve its pretrained entity vector.

Example

vector = kb.get_vector("Q42")
Name Description
entity The entity ID. str
RETURNS The entity vector. numpy.ndarray

KnowledgeBase.get_prior_prob

Given a certain entity ID and a certain textual mention, retrieve the prior probability of the fact that the mention links to the entity ID.

Example

probability = kb.get_prior_prob("Q42", "Douglas")
Name Description
entity The entity ID. str
alias The textual mention or alias. str
RETURNS The prior probability of the alias referring to the entity. float

KnowledgeBase.dump

Save the current state of the knowledge base to a directory.

Example

kb.dump(loc)
Name Description
loc A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. Union[str, Path]

KnowledgeBase.load_bulk

Restore the state of the knowledge base from a given directory. Note that the Vocab should also be the same as the one used to create the KB.

Example

from spacy.kb import KnowledgeBase
from spacy.vocab import Vocab
vocab = Vocab().from_disk("/path/to/vocab")
kb = KnowledgeBase(vocab=vocab, entity_vector_length=64)
kb.load_bulk("/path/to/kb")
Name Description
loc A path to a directory. Paths may be either strings or Path-like objects. Union[str, Path]
RETURNS The modified KnowledgeBase object. KnowledgeBase

Candidate

A Candidate object refers to a textual mention (alias) that may or may not be resolved to a specific entity from a KnowledgeBase. This will be used as input for the entity linking algorithm which will disambiguate the various candidates to the correct one. Each candidate (alias, entity) pair is assigned to a certain prior probability.

Candidate.__init__

Construct a Candidate object. Usually this constructor is not called directly, but instead these objects are returned by the get_candidates method of a KnowledgeBase.

Example

from spacy.kb import Candidate
candidate = Candidate(kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
Name Description
kb The knowledge base that defined this candidate. KnowledgeBase
entity_hash The hash of the entity's KB ID. int
entity_freq The entity frequency as recorded in the KB. float
alias_hash The hash of the textual mention or alias. int
prior_prob The prior probability of the alias referring to the entity. float

Candidate attributes

Name Description
entity The entity's unique KB identifier. int
entity_ The entity's unique KB identifier. str
alias The alias or textual mention. int
alias_ The alias or textual mention. str
prior_prob The prior probability of the alias referring to the entity. long
entity_freq The frequency of the entity in a typical corpus. long
entity_vector The pretrained vector of the entity. numpy.ndarray