add some remap docstring links, remove a bunch of notes from the bottom of iterutils

This commit is contained in:
Mahmoud Hashemi 2015-09-23 01:16:19 -07:00
parent 27a4bc1b6d
commit 90ac477732
1 changed files with 10 additions and 61 deletions

View File

@ -659,6 +659,9 @@ def remap(root, visit=default_visit, enter=default_enter, exit=default_exit,
Notice how both Nones have been removed despite the nesting in the
dictionary. Not bad for a one-liner, and that's just the beginning.
See `this remap cookbook`_ for more delicious recipes.
.. _this remap cookbook: http://sedimental.org/remap_nested_data_multitool_for_python.html
remap takes four main arguments: the object to traverse and three
optional callables which determine how the remapped object will be
@ -719,12 +722,14 @@ def remap(root, visit=default_visit, enter=default_enter, exit=default_exit,
passing more than one function.
When passing *enter* and *exit*, it's common and easiest to build
on the default behavior. Simply ``from boltons.iterutils import
on the default behavior. Simply add ``from boltons.iterutils import
default_enter`` (or ``default_exit``), and have your enter/exit
function call the default behavior before or after your custom
logic.
logic. See `this example`_.
.. _this example: http://sedimental.org/remap_nested_data_multitool_for_python.html#sort_all_lists
"""
# TODO: improve argument formatting in sphinx doc
# TODO: enter() return (False, items) to continue traverse but cancel copy?
if not callable(visit):
raise TypeError('visit expected callable, not: %r' % visit)
@ -792,62 +797,6 @@ def remap(root, visit=default_visit, enter=default_enter, exit=default_exit,
raise TypeError('expected remappable root, not: %r' % root)
return value
"""The marker approach to solving self-reference problems in remap
won't work because we can't rely on exit returning a
traversable, mutable object. We may know that the marker is in the
items going into exit but there's no guarantee it's not being
filtered out or being made otherwise inaccessible for other reasons.
On the other hand, having enter return the new parent instance
before it's populated is a pretty workable solution. The division of
labor stays clear and exit still has some override powers. Also
note that only mutable structures can have self references (unless
getting really nasty with the Python C API). The downside is that
enter must do a bit more work and in the case of immutable
collections, the new collection is discarded, as a new one has to be
created from scratch by exit. The code is still pretty clear
overall.
Not that remap is supposed to be a speed demon, but here are some
thoughts on performance. Memorywise, the registry grows linearly with
the number of collections. The stack of course grows in proportion to
the depth of the data. Many intermediate lists are created, but for
most data list comprehensions are much faster than generators (and
generator expressions). The ABC isinstance checks are going to be dog
slow. As soon as a couple large enough use case cross my desk, I'll be
sure to profile and optimize. It's not a question of if isinstance+ABC
is slow, it's which pragmatic alternative passes tests while being
faster.
TODO Examples:
* sort all lists
* normalize all keys
* convert all dicts to OrderedDicts
* drop all Nones
## Remap design principles
Nested structures are common. Virtually all compact Python iterative
interaction is flat (list comprehensions, map/filter, generator
expressions, itertools, even other iterutils). remap is a succinct
solution to both quick and dirty data wrangling, as well as expressive
functional interaction with nested structures.
* visit() should be able to handle 80% of my pragmatic use cases, and
the argument/return signature should be similarly pragmatic.
* enter()/exit() are for more advanced use cases and the signature can
be more complex.
* 95%+ of applications should be covered by passing in only one
callback.
* Roundtripping should be the default. Don't repeat the faux pas of
HTMLParser where, despite the nice SAX-like interface, it is
impossible (or very difficult) to regenerate the input. Roundtripped
results compare as equal, realistically somewhere between copy.copy
and copy.deepcopy.
* Leave streaming for another day. Generators can be handy, but the
vast majority of data is of easily manageable size. Besides, there's
no such thing as a streamable dictionary.
"""
# TODO: get_path/set_path
# TODO: recollect()
# TODO: reiter()