mirror of https://github.com/jab/bidict.git
228 lines
6.5 KiB
ReStructuredText
228 lines
6.5 KiB
ReStructuredText
Addendum
|
|
========
|
|
|
|
Performance
|
|
-----------
|
|
|
|
:mod:`bidict` strives to be as performant as possible
|
|
while being faithful to its purpose.
|
|
The need for speed
|
|
is balanced with the responsibility
|
|
to protect users from shooting themselves in the foot.
|
|
|
|
In general,
|
|
accomplishing some task using :mod:`bidict`
|
|
should have about the same performance
|
|
as keeping two inverse dicts in sync manually.
|
|
The test suite includes benchmarks for common workloads
|
|
to catch any performance regressions.
|
|
|
|
If you spot a case where :mod:`bidict`'s performance could be improved,
|
|
please don't hesitate to
|
|
:doc:`file an issue or submit a pull request <contributors-guide>`.
|
|
|
|
|
|
``bidict`` Avoids Reference Cycles
|
|
----------------------------------
|
|
|
|
A careful reader might notice the following...
|
|
|
|
.. testsetup::
|
|
|
|
from bidict import bidict
|
|
|
|
.. doctest::
|
|
|
|
>>> fwd = bidict(one=1)
|
|
>>> inv = fwd.inverse
|
|
>>> inv.inverse is fwd
|
|
True
|
|
|
|
...and worry that a :class:`~bidict.bidict` and its inverse
|
|
create a reference cycle.
|
|
If this were true,
|
|
in CPython this would mean that the memory for a :class:`~bidict.bidict`
|
|
could not be immediately reclaimed when you retained no more references to it,
|
|
but rather would have to wait for the next garbage collection to kick in
|
|
before it could be reclaimed.
|
|
|
|
However, :class:`~bidict.bidict`\s use a :class:`weakref.ref`
|
|
to store the inverse reference in one direction,
|
|
avoiding the strong reference cycle.
|
|
As a result, when you no longer retain
|
|
any references to a :class:`~bidict.bidict` you create,
|
|
you can be sure that its refcount in CPython drops to zero,
|
|
and that its memory will therefore be reclaimed immediately.
|
|
|
|
.. note::
|
|
|
|
In PyPy this is not an issue, as PyPy doesn't use reference counts.
|
|
The memory for unreferenced objects in PyPy is only reclaimed
|
|
when GC kicks in, which is unpredictable.
|
|
|
|
|
|
Terminology
|
|
-----------
|
|
|
|
- It's intentional that the term "inverse" is used rather than "reverse".
|
|
|
|
Consider a collection of *(k, v)* pairs.
|
|
Taking the reverse of the collection can only be done if it is ordered,
|
|
and (as you'd expect) reverses the order of the pairs in the collection.
|
|
But each original *(k, v)* pair remains in the resulting collection.
|
|
|
|
By contrast, taking the inverse of such a collection
|
|
neither requires the collection to be ordered
|
|
nor guarantees any ordering in the result,
|
|
but rather just replaces every *(k, v)* pair
|
|
with the inverse pair *(v, k)*.
|
|
|
|
- "keys" and "values" could perhaps more properly be called
|
|
"primary keys" and "secondary keys" (as in a database),
|
|
or even "forward keys" and "inverse keys", respectively.
|
|
:mod:`bidict` sticks with the terms "keys" and "values"
|
|
for the sake of familiarity and to avoid potential confusion,
|
|
but technically values are also keys themselves.
|
|
|
|
Concretely, this allows :class:`~bidict.bidict`\s
|
|
to return a set-like (*dict_keys*) object
|
|
for :meth:`~bidict.bidict.values`,
|
|
rather than a non-set-like *dict_values* object.
|
|
|
|
|
|
Missing ``bidict``\s in the Standard Library
|
|
--------------------------------------------
|
|
|
|
The Python standard library actually contains some examples
|
|
where :class:`~bidict.bidict`\s could be used for fun and profit
|
|
(depending on your ideas of fun and profit):
|
|
|
|
- The :mod:`logging` module
|
|
contains a private ``_levelToName`` dict
|
|
which maps integer levels like *10* to their string names like *DEBUG*.
|
|
If I had a nickel for every time I wanted that exposed in a bidirectional map
|
|
(and as a public attribute, no less),
|
|
I bet I could afford some better turns of phrase.
|
|
|
|
- The :mod:`dis` module
|
|
maintains a mapping from opnames to opcodes
|
|
``dis.opmap``
|
|
and a separate list of opnames indexed by opcode
|
|
``dis.opnames``.
|
|
These could be combined into a single bidict.
|
|
|
|
- Python 3's
|
|
:mod:`html.entities` module
|
|
maintains separate
|
|
``html.entities.name2codepoint`` and
|
|
``html.entities.codepoint2name`` dicts.
|
|
These could be combined into a single bidict.
|
|
|
|
|
|
Caveats
|
|
-------
|
|
|
|
Non-Atomic Mutation
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
As with built-in dicts,
|
|
mutating operations on a :class:`~bidict.bidict` are not atomic.
|
|
If you need to mutate the same :class:`~bidict.bidict` from different threads,
|
|
use a
|
|
`synchronization primitive <https://docs.python.org/3/library/threading.html#lock-objects>`__
|
|
to coordinate access. [#]_
|
|
|
|
.. [#] *See also:*
|
|
[`2 <https://twitter.com/teozaurus/status/518071391959388160>`__],
|
|
[`3 <https://twitter.com/ph1/status/943240854419922945>`__]
|
|
|
|
|
|
Equivalent but distinct :class:`~collections.abc.Hashable`\s
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Consider the following:
|
|
|
|
.. doctest::
|
|
|
|
>>> d = {1: int, 1.0: float}
|
|
|
|
How many items do you expect *d* to contain?
|
|
The actual result might surprise you:
|
|
|
|
.. doctest::
|
|
|
|
>>> len(d)
|
|
1
|
|
|
|
And similarly,
|
|
|
|
.. doctest::
|
|
|
|
>>> dict([(1, int), (1.0, float), (1+0j, complex), (True, bool)])
|
|
{1: <... 'bool'>}
|
|
>>> 1.0 in {True}
|
|
True
|
|
|
|
(Note that ``1 == 1.0 == 1+0j == True``.)
|
|
|
|
This illustrates that a mapping cannot contain two items
|
|
with equivalent but distinct keys
|
|
(and likewise a set cannot contain two equivalent but distinct elements).
|
|
If an object that is being looked up in a set or mapping
|
|
is equal to a contained object,
|
|
the contained object will be found,
|
|
even if it is distinct.
|
|
|
|
With a :class:`~bidict.bidict`,
|
|
since values function as keys in the inverse mapping,
|
|
this behavior occurs in the inverse direction too,
|
|
and means that a :class:`~bidict.bidict` can end up with a different
|
|
but equivalent key from the corresponding value
|
|
in its own inverse:
|
|
|
|
.. doctest::
|
|
|
|
>>> b = bidict({'false': 0})
|
|
>>> b.forceput('FALSE', False)
|
|
>>> b
|
|
bidict({'FALSE': False})
|
|
>>> b.inverse
|
|
bidict({0: 'FALSE'})
|
|
|
|
|
|
*nan* as a Key
|
|
^^^^^^^^^^^^^^
|
|
|
|
In CPython, *nan* is especially tricky when used as a dictionary key:
|
|
|
|
.. doctest::
|
|
|
|
>>> d = {float('nan'): 'nan'}
|
|
>>> d
|
|
{nan: 'nan'}
|
|
>>> d[float('nan')] # doctest: +SKIP
|
|
Traceback (most recent call last):
|
|
...
|
|
KeyError: nan
|
|
>>> d[float('nan')] = 'not overwritten'
|
|
>>> d # doctest: +SKIP
|
|
{nan: 'nan', nan: 'not overwritten'}
|
|
|
|
In other Python implementations such as PyPy,
|
|
*nan* behaves just like any other dictionary key.
|
|
But in CPython, beware of this unexpected behavior,
|
|
which applies to :class:`~bidict.bidict`\s too.
|
|
:mod:`bidict` contains no special-case logic
|
|
for dealing with *nan* as a key,
|
|
so bidict's behavior will match :class:`dict`'s
|
|
on whatever runtime you're using.
|
|
|
|
See e.g. `these docs
|
|
<https://doc.pypy.org/en/latest/cpython_differences.html>`__
|
|
for more info (search the page for "nan").
|
|
|
|
----
|
|
|
|
For more in this vein,
|
|
check out :doc:`learning-from-bidict`.
|