spaCy/website/docs/api/stringstore.md

6.0 KiB

title tag source
StringStore class spacy/strings.pyx

Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values instead of integer IDs. This ensures that strings always map to the same ID, even from different StringStores.

StringStore.__init__

Create the StringStore.

Example

from spacy.strings import StringStore
stringstore = StringStore(["apple", "orange"])
Name Type Description
strings iterable A sequence of unicode strings to add to the store.
RETURNS StringStore The newly constructed object.

StringStore.__len__

Get the number of strings in the store.

Example

stringstore = StringStore(["apple", "orange"])
assert len(stringstore) == 2
Name Type Description
RETURNS int The number of strings in the store.

StringStore.__getitem__

Retrieve a string from a given hash, or vice versa.

Example

stringstore = StringStore(["apple", "orange"])
apple_hash = stringstore["apple"]
assert apple_hash == 8566208034543834098
assert stringstore[apple_hash] == "apple"
Name Type Description
string_or_id bytes, unicode or uint64 The value to encode.
RETURNS unicode or int The value to be retrieved.

StringStore.__contains__

Check whether a string is in the store.

Example

stringstore = StringStore(["apple", "orange"])
assert "apple" in stringstore
assert not "cherry" in stringstore
Name Type Description
string unicode The string to check.
RETURNS bool Whether the store contains the string.

StringStore.__iter__

Iterate over the strings in the store, in order. Note that a newly initialized store will always include an empty string '' at position 0.

Example

stringstore = StringStore(["apple", "orange"])
all_strings = [s for s in stringstore]
assert all_strings == ["apple", "orange"]
Name Type Description
YIELDS unicode A string in the store.

StringStore.add

Add a string to the StringStore.

Example

stringstore = StringStore(["apple", "orange"])
banana_hash = stringstore.add("banana")
assert len(stringstore) == 3
assert banana_hash == 2525716904149915114
assert stringstore[banana_hash] == "banana"
assert stringstore["banana"] == banana_hash
Name Type Description
string unicode The string to add.
RETURNS uint64 The string's hash value.

StringStore.to_disk

Save the current state to a directory.

Example

stringstore.to_disk("/path/to/strings")
Name Type Description
path unicode / Path A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.

StringStore.from_disk

Loads state from a directory. Modifies the object in place and returns it.

Example

from spacy.strings import StringStore
stringstore = StringStore().from_disk("/path/to/strings")
Name Type Description
path unicode / Path A path to a directory. Paths may be either strings or Path-like objects.
RETURNS StringStore The modified StringStore object.

StringStore.to_bytes

Serialize the current state to a binary string.

Example

store_bytes = stringstore.to_bytes()
Name Type Description
RETURNS bytes The serialized form of the StringStore object.

StringStore.from_bytes

Load state from a binary string.

Example

fron spacy.strings import StringStore
store_bytes = stringstore.to_bytes()
new_store = StringStore().from_bytes(store_bytes)
Name Type Description
bytes_data bytes The data to load from.
RETURNS StringStore The StringStore object.

Utilities

strings.hash_string

Get a 64-bit hash for a given string.

Example

from spacy.strings import hash_string
assert hash_string("apple") == 8566208034543834098
Name Type Description
string unicode The string to hash.
RETURNS uint64 The hash.