mirror of https://github.com/jab/bidict.git
278 lines
10 KiB
278 lines
10 KiB
Learning from bidict
Below is an outline of
some of the more fascinating Python corners
I got to explore further
thanks to working on bidict.
If you are interested in learning more about any of the following,
:ref:`reviewing the (small) codebase <reviewers-wanted>`
could be a great way to get started.
Python's data model
- Using :meth:`object.__new__` to bypass default object initialization,
e.g. for better :meth:`~bidict.bidict.copy` performance.
See ``_base.py``.
- Overriding :meth:`object.__getattribute__` for custom attribute lookup.
See :ref:`sorted-bidict-recipes`.
- Using
:meth:`object.__setstate__`, and
:meth:`object.__reduce__` to make an object pickleable
that otherwise wouldn't be,
due to e.g. using weakrefs,
as bidicts do (covered further below).
- Using :ref:`slots` to speed up attribute access and reduce memory usage.
Must be careful with pickling and weakrefs.
See ``BidictBase.__getstate__()``.
- What happens when you implement a custom :meth:`~object.__eq__`?
e.g. ``a == b`` vs. ``b == a`` when only ``a`` is an instance of your class?
Great write-up in https://eev.ee/blog/2012/03/24/python-faq-equality/
- Making an immutable type hashable
(so it can be inserted into :class:`dict`\s and :class:`set`\s):
Must implement :meth:`~object.__hash__` such that
``a == b ⇒ hash(a) == hash(b)``.
See the :meth:`object.__hash__` and :meth:`object.__eq__` docs.
See :class:`bidict.frozenbidict`.
- Consider :class:`~bidict.FrozenOrderedBidict`:
its :meth:`~bidict.FrozenOrderedBidict.__eq__`
is :ref:`order-insensitive <eq-order-insensitive>`.
So all contained items must participate in the hash order-insensitively.
- Can use `collections.abc.Set._hash <https://github.com/python/cpython/blob/a0374d/Lib/_collections_abc.py#L521>`_
which provides a pure Python implementation of the same hash algorithm
used to hash :class:`frozenset`\s.
(Since :class:`~collections.abc.ItemsView` extends
just calls ``ItemsView(self)._hash()``.)
- Does this argue for making :meth:`collections.abc.Set._hash` non-private?
- Why isn't the C implementation of this algorithm directly exposed in
CPython? Only way to use it is to call ``hash(frozenset(self.items()))``,
which wastes memory allocating the ephemeral frozenset,
and time copying all the items into it before they're hashed.
- Unlike other attributes, if a class implements ``__hash__()``,
any subclasses of that class will not inherit it.
It's like Python implicitly adds ``__hash__ = None`` to the body
of every class that doesn't explicitly define ``__hash__``.
So if you do want a subclass to inherit a base class's ``__hash__()``
implementation, you have to set that manually,
e.g. by adding ``__hash__ = BaseClass.__hash__`` in the class body.
See :class:`~bidict.FrozenOrderedBidict`.
This is consistent with the fact that
:class:`object` implements ``__hash__()``,
but subclasses of :class:`object`
are not hashable by default.
- Surprising :class:`~collections.abc.Mapping` corner cases:
- :ref:`nan-as-key`
- :ref:`equiv-but-distinct`
- `pywat#38 <https://github.com/cosmologicon/pywat/issues/38>`_
- "Intransitive equality
(of :class:`~collections.OrderedDict`)
was a mistake." –Raymond Hettinger
- Hence :ref:`eq-order-insensitive` for ordered bidicts.
- If an instance of your custom mapping type
contains the same items as a mapping of another type,
should they compare equal?
What if one of the mappings is ordered and the other isn't?
What about returning the :obj:`NotImplemented` object?
- bidict's ``__eq__()`` design
errs on the side of allowing more type polymorphism
on the grounds that this is what the majority of use cases expect,
and that it's more Pythonic.
- Any user who does need exact-type-matching equality can just override
:meth:`bidict’s __eq__() <bidict.BidictBase.__eq__>` method in a subclass.
- If this subclass were also hashable, would it be worth overriding
:meth:`bidict.frozenbidict.__hash__` too to include the type?
- Only point would be to reduce collisions when multiple instances of different
types contained the same items
and were going to be inserted into the same :class:`dict` or :class:`set`
(since they'd now be unequal but would hash to the same value otherwise).
Probably not worth it.
Using :mod:`weakref`
- See :ref:`inv-avoids-reference-cycles`
Other interesting stuff in the standard library
- :mod:`reprlib` and :func:`reprlib.recursive_repr`
(but not needed for bidict because there's no way to insert a bidict into itself)
- :func:`operator.methodcaller`
- :attr:`platform.python_implementation`
- See :ref:`missing-bidicts-in-stdlib`
:func:`~collections.namedtuple`-style dynamic class generation
- See ``_named.py``
How to efficiently implement an ordered mapping
- Use a backing dict and doubly-linked list.
- See ``_orderedbase.py``.
:class:`~collections.OrderedDict` provided a good
`reference <https://github.com/python/cpython/blob/a0374d/Lib/collections/__init__.py#L71>`_.
API Design
- Integrating with :mod:`collections` via :mod:`collections.abc` and :mod:`abc`
- Implementing ABCs like :class:`collections.abc.Hashable`
- Thanks to :class:`~collections.abc.Hashable`
implementing :meth:`abc.ABCMeta.__subclasshook__`,
any class that implements all the required methods of the
:class:`~collections.abc.Hashable` interface
(namely, :meth:`~collections.abc.Hashable.__hash__`)
makes it a virtual subclass already, no need to explicitly extend.
I.e. As long as ``Foo`` implements a ``__hash__()`` method,
``issubclass(Foo, Hashable)`` will always be True,
no need to explicitly subclass via ``class Foo(Hashable): ...``
- :class:`collections.abc.Mapping` and
don't implement :meth:`~abc.ABCMeta.__subclasshook__`,
so must either explicitly subclass
(if you want to inherit any of their implementations)
or use :meth:`abc.ABCMeta.register`
(to register as a virtual subclass without inheriting any implementation)
- Providing a new open ABC like :class:`~bidict.BidirectionalMapping`
- Just override :meth:`~abc.ABCMeta.__subclasshook__`.
See ``_abc.py``.
- Interesting consequence of the ``__subclasshook__()`` design:
the "subclass" relation is now intransitive,
e.g. :class:`object` is a subclass of :class:`~collections.abc.Hashable`,
:class:`list` is a subclass of :class:`object`,
but :class:`list` is not a subclass of :class:`~collections.abc.Hashable`
- Notice we have :class:`collections.abc.Reversible`
but no ``collections.abc.Ordered`` or ``collections.abc.OrderedMapping``.
Proposed in `bpo-28912 <https://bugs.python.org/issue28912>`_ but rejected.
Would have been useful for bidict's ``__repr__()`` implementation (see ``_base.py``),
and potentially for interop with other ordered mapping implementations
such as `SortedDict <http://www.grantjenks.com/docs/sortedcontainers/sorteddict.html>`_
- Beyond :class:`collections.abc.Mapping`, bidicts implement additional APIs
that :class:`dict` and :class:`~collections.OrderedDict` implement.
- When creating a new API, making it familiar, memorable, and intuitive
is hugely important to a good user experience.
- Making APIs Pythonic
- `Zen of Python <https://www.python.org/dev/peps/pep-0020/>`_
- "Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess."
→ bidict's default duplication policies
- "Explicit is better than implicit.
There should be one—and preferably only one—obvious way to do it."
→ dropped the alternate ``.inv`` APIs that used
the ``~`` operator and the old slice syntax
- Python 2 vs. Python 3
- mostly :class:`dict` API changes,
but also functions like :func:`zip`, :func:`map`, :func:`filter`, etc.
- If you define a custom :meth:`~object.__eq__` on a class,
it will *not* be used for ``!=`` comparisons on Python 2 automatically;
you must explicitly add an :meth:`~object.__ne__` implementation
that calls your :meth:`~object.__eq__` implementation.
If you don't, :meth:`object.__ne__` will be used instead,
which behaves like ``is not``.
GOTCHA alert!
Python 3 thankfully fixes this.
- borrowing methods from other classes:
In Python 2, must grab the ``.im_func`` / ``__func__``
attribute off the borrowed method to avoid getting
``TypeError: unbound method ...() must be called with ... instance as first argument``
See ``_frozenordered.py``.
- CPython vs. PyPy
- gc / weakref
- http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies
- hence ``test_no_reference_cycles`` (in ``test_hypothesis.py``)
is skipped on PyPy
- primitives' identities, nan, etc.
- http://doc.pypy.org/en/latest/cpython_differences.html#object-identity-of-primitive-values-is-and-id
Python Syntax hacks
- See `#19 <https://github.com/jab/bidict/issues/19>`_
Correctness, performance, code quality, etc.
bidict provided a need to learn these fantastic tools,
many of which have been indispensable
(especially hypothesis – see ``test_hypothesis.py``):
- `Pytest <https://docs.pytest.org/en/latest/>`_
- `Coverage <http://coverage.readthedocs.io/en/latest/>`_
- `hypothesis <http://hypothesis.readthedocs.io/en/latest/>`_
- `pytest-benchmark <https://github.com/ionelmc/pytest-benchmark>`_
- `Sphinx <http://www.sphinx-doc.org/en/stable/>`_
- `Travis <https://travis-ci.org/>`_
- `Readthedocs <http://bidict.readthedocs.io/en/latest/>`_
- `Codecov <https://codecov.io>`_
- `lgtm <http://lgtm.com/>`_
- `Pylint <https://www.pylint.org/>`_
- `setuptools_scm <https://github.com/pypa/setuptools_scm>`_