86 lines
4.8 KiB
ReStructuredText
86 lines
4.8 KiB
ReStructuredText
Hashing
|
|
=======
|
|
|
|
Hash Method Generation
|
|
----------------------
|
|
|
|
.. warning::
|
|
|
|
The overarching theme is to never set the ``@attr.s(hash=X)`` parameter yourself.
|
|
Leave it at ``None`` which means that ``attrs`` will do the right thing for you, depending on the other parameters:
|
|
|
|
- If you want to make objects hashable by value: use ``@attr.s(frozen=True)``.
|
|
- If you want hashing and comparison by object identity: use ``@attr.s(cmp=False)``
|
|
|
|
Setting ``hash`` yourself can have unexpected consequences so we recommend to tinker with it only if you know exactly what you're doing.
|
|
|
|
Under certain circumstances, it's necessary for objects to be *hashable*.
|
|
For example if you want to put them into a :class:`set` or if you want to use them as keys in a :class:`dict`.
|
|
|
|
The *hash* of an object is an integer that represents the contents of an object.
|
|
It can be obtained by calling :func:`hash` on an object and is implemented by writing a ``__hash__`` method for your class.
|
|
|
|
``attrs`` will happily write a ``__hash__`` method for you [#fn1]_, however it will *not* do so by default.
|
|
Because according to the definition_ from the official Python docs, the returned hash has to fulfill certain constraints:
|
|
|
|
#. Two objects that are equal, **must** have the same hash.
|
|
This means that if ``x == y``, it *must* follow that ``hash(x) == hash(y)``.
|
|
|
|
By default, Python classes are compared *and* hashed by their :func:`id`.
|
|
That means that every instance of a class has a different hash, no matter what attributes it carries.
|
|
|
|
It follows that the moment you (or ``attrs``) change the way equality is handled by implementing ``__eq__`` which is based on attribute values, this constraint is broken.
|
|
For that reason Python 3 will make a class that has customized equality unhashable.
|
|
Python 2 on the other hand will happily let you shoot your foot off.
|
|
Unfortunately ``attrs`` currently mimics Python 2's behavior for backward compatibility reasons if you set ``hash=False``.
|
|
|
|
The *correct way* to achieve hashing by id is to set ``@attr.s(cmp=False)``.
|
|
Setting ``@attr.s(hash=False)`` (that implies ``cmp=True``) is almost certainly a *bug*.
|
|
|
|
.. warning::
|
|
|
|
Be careful when subclassing!
|
|
Setting ``cmp=False`` on class whose base class has a non-default ``__hash__`` method, will *not* make ``attrs`` remove that ``__hash__`` for you.
|
|
|
|
It is part of ``attrs``'s philosophy to only *add* to classes so you have the freedom to customize your classes as you wish.
|
|
So if you want to *get rid* of methods, you'll have to do it by hand.
|
|
|
|
The easiest way to reset ``__hash__`` on a class is adding ``__hash__ = object.__hash__`` in the class body.
|
|
|
|
#. If two object are not equal, their hash **should** be different.
|
|
|
|
While this isn't a requirement from a standpoint of correctness, sets and dicts become less effective if there are a lot of identical hashes.
|
|
The worst case is when all objects have the same hash which turns a set into a list.
|
|
|
|
#. The hash of an object **must not** change.
|
|
|
|
If you create a class with ``@attr.s(frozen=True)`` this is fullfilled by definition, therefore ``attrs`` will write a ``__hash__`` function for you automatically.
|
|
You can also force it to write one with ``hash=True`` but then it's *your* responsibility to make sure that the object is not mutated.
|
|
|
|
This point is the reason why mutable structures like lists, dictionaries, or sets aren't hashable while immutable ones like tuples or frozensets are:
|
|
point 1 and 2 require that the hash changes with the contents but point 3 forbids it.
|
|
|
|
For a more thorough explanation of this topic, please refer to this blog post: `Python Hashes and Equality`_.
|
|
|
|
|
|
Hashing and Mutability
|
|
----------------------
|
|
Changing any field involved in hash code computation after the first call to `__hash__` (typically this would be after its insertion into a hash-based collection) can result in silent bugs.
|
|
Therefore, it is strongly recommended that hashable classes be ``frozen.``
|
|
Beware, however, that this is not a complete guarantee of safety:
|
|
if a field points to an object and that object is mutated, the hash code may change, but ``frozen`` will not protect you.
|
|
|
|
|
|
Hash Code Caching
|
|
-----------------
|
|
|
|
Some objects have hash codes which are expensive to compute.
|
|
If such objects are to be stored in hash-based collections, it can be useful to compute the hash codes only once and then store the result on the object to make future hash code requests fast.
|
|
To enable caching of hash codes, pass ``cache_hash=True`` to ``@attrs``.
|
|
This may only be done if ``attrs`` is already generating a hash function for the object.
|
|
|
|
.. [#fn1] The hash is computed by hashing a tuple that consists of an unique id for the class plus all attribute values.
|
|
|
|
.. _definition: https://docs.python.org/3/glossary.html#term-hashable
|
|
.. _`Python Hashes and Equality`: https://hynek.me/articles/hashes-and-equality/
|