spaCy/website/docs/api/lookups.md

9.9 KiB

title teaser tag source new
Lookups A container for large lookup tables and dictionaries class spacy/lookups.py 2.2

This class allows convenient accesss to large lookup tables and dictionaries, e.g. lemmatization data or tokenizer exception lists using Bloom filters. Lookups are available via the Vocab as vocab.lookups, so they can be accessed before the pipeline components are applied (e.g. in the tokenizer and lemmatizer), as well as within the pipeline components via doc.vocab.lookups.

Lookups.__init__

Create a Lookups object.

Example

from spacy.lookups import Lookups
lookups = Lookups()
Name Type Description
RETURNS Lookups The newly constructed object.

Lookups.__len__

Get the current number of tables in the lookups.

Example

lookups = Lookups()
assert len(lookups) == 0
Name Type Description
RETURNS int The number of tables in the lookups.

Lookups._\contains__

Check if the lookups contain a table of a given name. Delegates to Lookups.has_table.

Example

lookups = Lookups()
lookups.add_table("some_table")
assert "some_table" in lookups
Name Type Description
name unicode Name of the table.
RETURNS bool Whether a table of that name is in the lookups.

Lookups.tables

Get the names of all tables in the lookups.

Example

lookups = Lookups()
lookups.add_table("some_table")
assert lookups.tables == ["some_table"]
Name Type Description
RETURNS list Names of the tables in the lookups.

Lookups.add_table

Add a new table with optional data to the lookups. Raises an error if the table exists.

Example

lookups = Lookups()
lookups.add_table("some_table", {"foo": "bar"})
Name Type Description
name unicode Unique name of the table.
data dict Optional data to add to the table.
RETURNS Table The newly added table.

Lookups.get_table

Get a table from the lookups. Raises an error if the table doesn't exist.

Example

lookups = Lookups()
lookups.add_table("some_table", {"foo": "bar"})
table = lookups.get_table("some_table")
assert table["foo"] == "bar"
Name Type Description
name unicode Name of the table.
RETURNS Table The table.

Lookups.remove_table

Remove a table from the lookups. Raises an error if the table doesn't exist.

Example

lookups = Lookups()
lookups.add_table("some_table")
removed_table = lookups.remove_table("some_table")
assert "some_table" not in lookups
Name Type Description
name unicode Name of the table to remove.
RETURNS Table The removed table.

Lookups.has_table

Check if the lookups contain a table of a given name. Equivalent to Lookups.__contains__.

Example

lookups = Lookups()
lookups.add_table("some_table")
assert lookups.has_table("some_table")
Name Type Description
name unicode Name of the table.
RETURNS bool Whether a table of that name is in the lookups.

Lookups.to_bytes

Serialize the lookups to a bytestring.

Example

lookup_bytes = lookups.to_bytes()
Name Type Description
RETURNS bytes The serialized lookups.

Lookups.from_bytes

Load the lookups from a bytestring.

Example

lookup_bytes = lookups.to_bytes()
lookups = Lookups()
lookups.from_bytes(lookup_bytes)
Name Type Description
bytes_data bytes The data to load from.
RETURNS Lookups The loaded lookups.

Lookups.to_disk

Save the lookups to a directory as lookups.bin. Expects a path to a directory, which will be created if it doesn't exist.

Example

lookups.to_disk("/path/to/lookups")
Name Type Description
path unicode / Path A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.

Lookups.from_disk

Load lookups from a directory containing a lookups.bin. Will skip loading if the file doesn't exist.

Example

from spacy.lookups import Lookups
lookups = Lookups()
lookups.from_disk("/path/to/lookups")
Name Type Description
path unicode / Path A path to a directory. Paths may be either strings or Path-like objects.
RETURNS Lookups The loaded lookups.

Table

A table in the lookups. Subclass of OrderedDict that implements a slightly more consistent and unified API and includes a Bloom filter to speed up missed lookups. Supports all other methods and attributes of OrderedDict / dict, and the customized methods listed here. Methods that get or set keys accept both integers and strings (which will be hashed before being added to the table).

Table.__init__

Initialize a new table.

Example

from spacy.lookups import Table
data = {"foo": "bar", "baz": 100}
table = Table(name="some_table", data=data)
assert "foo" in table
assert table["foo"] == "bar"
Name Type Description
name unicode Optional table name for reference.
RETURNS Table The newly constructed object.

Table.from_dict

Initialize a new table from a dict.

Example

from spacy.lookups import Table
data = {"foo": "bar", "baz": 100}
table = Table.from_dict(data, name="some_table")
Name Type Description
data dict The dictionary.
name unicode Optional table name for reference.
RETURNS Table The newly constructed object.

Table.set

Set a new key / value pair. String keys will be hashed. Same as table[key] = value.

Example

from spacy.lookups import Table
table = Table()
table.set("foo", "bar")
assert table["foo"] == "bar"
Name Type Description
key unicode / int The key.
value - The value.

Table.to_bytes

Serialize the table to a bytestring.

Example

table_bytes = table.to_bytes()
Name Type Description
RETURNS bytes The serialized table.

Table.from_bytes

Load a table from a bytestring.

Example

table_bytes = table.to_bytes()
table = Table()
table.from_bytes(table_bytes)
Name Type Description
bytes_data bytes The data to load.
RETURNS Table The loaded table.

Attributes

Name Type Description
name unicode Table name.
default_size int Default size of bloom filters if no data is provided.
bloom preshed.bloom.BloomFilter The bloom filters.