86 lines
5.0 KiB
Markdown
86 lines
5.0 KiB
Markdown
# Hashing
|
|
|
|
## Hash Method Generation
|
|
|
|
:::{warning}
|
|
The overarching theme is to never set the `@attrs.define(unsafe_hash=X)` parameter yourself.
|
|
Leave it at `None` which means that *attrs* will do the right thing for you, depending on the other parameters:
|
|
|
|
- If you want to make objects hashable by value: use `@define(frozen=True)`.
|
|
- If you want hashing and equality by object identity: use `@define(eq=False)`
|
|
|
|
Setting `unsafe_hash` yourself can have unexpected consequences so we recommend to tinker with it only if you know exactly what you're doing.
|
|
:::
|
|
|
|
Under certain circumstances, it's necessary for objects to be *hashable*.
|
|
For example if you want to put them into a {class}`set` or if you want to use them as keys in a {class}`dict`.
|
|
|
|
The *hash* of an object is an integer that represents the contents of an object.
|
|
It can be obtained by calling {func}`hash` on an object and is implemented by writing a `__hash__` method for your class.
|
|
|
|
*attrs* will happily write a `__hash__` method for you [^fn1], however it will *not* do so by default.
|
|
Because according to the [definition](https://docs.python.org/3/glossary.html#term-hashable) from the official Python docs, the returned hash has to fulfill certain constraints:
|
|
|
|
[^fn1]: The hash is computed by hashing a tuple that consists of a unique id for the class plus all attribute values.
|
|
|
|
1. Two objects that are equal, **must** have the same hash.
|
|
This means that if `x == y`, it *must* follow that `hash(x) == hash(y)`.
|
|
|
|
By default, Python classes are compared *and* hashed by their `id`.
|
|
That means that every instance of a class has a different hash, no matter what attributes it carries.
|
|
|
|
It follows that the moment you (or *attrs*) change the way equality is handled by implementing `__eq__` which is based on attribute values, this constraint is broken.
|
|
For that reason Python 3 will make a class that has customized equality unhashable.
|
|
Python 2 on the other hand will happily let you shoot your foot off.
|
|
Unfortunately, *attrs* still mimics (otherwise unsupported) Python 2's behavior for backward-compatibility reasons if you set `unsafe_hash=False`.
|
|
|
|
The *correct way* to achieve hashing by id is to set `@define(eq=False)`.
|
|
Setting `@define(unsafe_hash=False)` (which implies `eq=True`) is almost certainly a *bug*.
|
|
|
|
:::{warning}
|
|
Be careful when subclassing!
|
|
Setting `eq=False` on a class whose base class has a non-default `__hash__` method will *not* make *attrs* remove that `__hash__` for you.
|
|
|
|
It is part of *attrs*'s philosophy to only *add* to classes so you have the freedom to customize your classes as you wish.
|
|
So if you want to *get rid* of methods, you'll have to do it by hand.
|
|
|
|
The easiest way to reset `__hash__` on a class is adding `__hash__ = object.__hash__` in the class body.
|
|
:::
|
|
|
|
2. If two objects are not equal, their hash **should** be different.
|
|
|
|
While this isn't a requirement from a standpoint of correctness, sets and dicts become less effective if there are a lot of identical hashes.
|
|
The worst case is when all objects have the same hash which turns a set into a list.
|
|
|
|
3. The hash of an object **must not** change.
|
|
|
|
If you create a class with `@define(frozen=True)` this is fulfilled by definition, therefore *attrs* will write a `__hash__` function for you automatically.
|
|
You can also force it to write one with `unsafe_hash=True` but then it's *your* responsibility to make sure that the object is not mutated.
|
|
|
|
This point is the reason why mutable structures like lists, dictionaries, or sets aren't hashable while immutable ones like tuples or `frozenset`s are:
|
|
point 1 and 2 require that the hash changes with the contents but point 3 forbids it.
|
|
|
|
For a more thorough explanation of this topic, please refer to this blog post: [*Python Hashes and Equality*](https://hynek.me/articles/hashes-and-equality/).
|
|
|
|
:::{note}
|
|
Please note that the `unsafe_hash` argument's original name was `hash` but was changed to conform with {pep}`681` in 22.2.0.
|
|
The old argument name is still around and will **not** be removed -- but setting `unsafe_hash` takes precedence over `hash`.
|
|
The field-level argument is still called `hash` and will remain so.
|
|
:::
|
|
|
|
|
|
## Hashing and Mutability
|
|
|
|
Changing any field involved in hash code computation after the first call to `__hash__` (typically this would be after its insertion into a hash-based collection) can result in silent bugs.
|
|
Therefore, it is strongly recommended that hashable classes be `frozen`.
|
|
Beware, however, that this is not a complete guarantee of safety:
|
|
if a field points to an object and that object is mutated, the hash code may change, but `frozen` will not protect you.
|
|
|
|
|
|
## Hash Code Caching
|
|
|
|
Some objects have hash codes which are expensive to compute.
|
|
If such objects are to be stored in hash-based collections, it can be useful to compute the hash codes only once and then store the result on the object to make future hash code requests fast.
|
|
To enable caching of hash codes, pass `@define(cache_hash=True)`.
|
|
This may only be done if *attrs* is already generating a hash function for the object.
|