Improve pickle's documentation.

There is still much to be done, but I am committing my changes
incrementally to avoid losing them again (for a third time now).
This commit is contained in:
Alexandre Vassalotti 2008-10-18 19:25:07 +00:00
parent 87eee631fb
commit 758bca6e36
1 changed files with 149 additions and 101 deletions

View File

@ -92,11 +92,9 @@ advantage that there are no restrictions imposed by external standards such as
XDR (which can't represent pointer sharing); however it means that non-Python
programs may not be able to reconstruct pickled Python objects.
By default, the :mod:`pickle` data format uses a printable ASCII representation.
This is slightly more voluminous than a binary representation. The big
advantage of using printable ASCII (and of some other characteristics of
:mod:`pickle`'s representation) is that for debugging or recovery purposes it is
possible for a human to read the pickled file with a standard text editor.
By default, the :mod:`pickle` data format uses a compact binary representation.
The module :mod:`pickletools` contains tools for analyzing data streams
generated by :mod:`pickle`.
There are currently 4 different protocols which can be used for pickling.
@ -110,17 +108,15 @@ There are currently 4 different protocols which can be used for pickling.
efficient pickling of :term:`new-style class`\es.
* Protocol version 3 was added in Python 3.0. It has explicit support for
bytes and cannot be unpickled by Python 2.x pickle modules.
bytes and cannot be unpickled by Python 2.x pickle modules. This is
the current recommended protocol, use it whenever it is possible.
Refer to :pep:`307` for more information.
If a *protocol* is not specified, protocol 3 is used. If *protocol* is
If a *protocol* is not specified, protocol 3 is used. If *protocol* is
specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
protocol version available will be used.
A binary format, which is slightly more efficient, can be chosen by specifying a
*protocol* version >= 1.
Usage
-----
@ -146,152 +142,210 @@ an unpickler, then you call the unpickler's :meth:`load` method. The
as line terminators and therefore will look "funny" when viewed in Notepad or
other editors which do not support this format.
.. data:: DEFAULT_PROTOCOL
The default protocol used for pickling. May be less than HIGHEST_PROTOCOL.
Currently the default protocol is 3; a backward-incompatible protocol
designed for Python 3.0.
The :mod:`pickle` module provides the following functions to make the pickling
process more convenient:
.. function:: dump(obj, file[, protocol])
Write a pickled representation of *obj* to the open file object *file*. This is
equivalent to ``Pickler(file, protocol).dump(obj)``.
Write a pickled representation of *obj* to the open file object *file*. This
is equivalent to ``Pickler(file, protocol).dump(obj)``.
If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is
specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
protocol version will be used.
The optional *protocol* argument tells the pickler to use the given protocol;
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0.
*file* must have a :meth:`write` method that accepts a single string argument.
It can thus be a file object opened for writing, a :mod:`StringIO` object, or
any other custom object that meets this interface.
.. function:: load(file)
Read a string from the open file object *file* and interpret it as a pickle data
stream, reconstructing and returning the original object hierarchy. This is
equivalent to ``Unpickler(file).load()``.
*file* must have two methods, a :meth:`read` method that takes an integer
argument, and a :meth:`readline` method that requires no arguments. Both
methods should return a string. Thus *file* can be a file object opened for
reading, a :mod:`StringIO` object, or any other custom object that meets this
interface.
This function automatically determines whether the data stream was written in
binary mode or not.
Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced.
The *file* argument must have a write() method that accepts a single bytes
argument. It can thus be a file object opened for binary writing, a
io.BytesIO instance, or any other custom object that meets this interface.
.. function:: dumps(obj[, protocol])
Return the pickled representation of the object as a :class:`bytes`
object, instead of writing it to a file.
If the *protocol* parameter is omitted, protocol 3 is used. If *protocol*
is specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
protocol version will be used.
The optional *protocol* argument tells the pickler to use the given protocol;
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0.
Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced.
.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
Read a pickled object representation from the open file object *file* and
return the reconstituted object hierarchy specified therein. This is
equivalent to ``Unpickler(file).load()``.
The protocol version of the pickle is detected automatically, so no protocol
argument is needed. Bytes past the pickled object's representation are
ignored.
The argument *file* must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus *file* can be a binary file object opened
for reading, a BytesIO object, or any other custom object that meets this
interface.
Optional keyword arguments are encoding and errors, which are used to decode
8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
'strict', respectively.
.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
Read a pickled object hierarchy from a :class:`bytes` object and return the
reconstituted object hierarchy specified therein
The protocol version of the pickle is detected automatically, so no protocol
argument is needed. Bytes past the pickled object's representation are
ignored.
Optional keyword arguments are encoding and errors, which are used to decode
8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
'strict', respectively.
.. function:: loads(bytes_object)
Read a pickled object hierarchy from a :class:`bytes` object.
Bytes past the pickled object's representation are ignored.
The :mod:`pickle` module also defines three exceptions:
The :mod:`pickle` module defines three exceptions:
.. exception:: PickleError
A common base class for the other exceptions defined below. This inherits from
Common base class for the other pickling exceptions. It inherits
:exc:`Exception`.
.. exception:: PicklingError
This exception is raised when an unpicklable object is passed to the
:meth:`dump` method.
Error raised when an unpicklable object is encountered by :class:`Pickler`.
It inherits :exc:`PickleError`.
.. exception:: UnpicklingError
This exception is raised when there is a problem unpickling an object. Note that
other exceptions may also be raised during unpickling, including (but not
necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
:exc:`ImportError`, and :exc:`IndexError`.
Error raised when there a problem unpickling an object, such as a data
corruption or a security violation. It inherits :exc:`PickleError`.
The :mod:`pickle` module also exports two callables, :class:`Pickler` and
Note that other exceptions may also be raised during unpickling, including
(but not necessarily limited to) AttributeError, EOFError, ImportError, and
IndexError.
The :mod:`pickle` module exports two classes, :class:`Pickler` and
:class:`Unpickler`:
.. class:: Pickler(file[, protocol])
This takes a file-like object to which it will write a pickle data stream.
This takes a binary file for writing a pickle data stream.
If the *protocol* parameter is omitted, protocol 3 is used. If *protocol* is
specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
protocol version will be used.
The optional *protocol* argument tells the pickler to use the given protocol;
supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0.
*file* must have a :meth:`write` method that accepts a single string argument.
It can thus be an open file object, a :mod:`StringIO` object, or any other
custom object that meets this interface.
:class:`Pickler` objects define one (or two) public methods:
Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced.
The *file* argument must have a write() method that accepts a single bytes
argument. It can thus be a file object opened for binary writing, a
io.BytesIO instance, or any other custom object that meets this interface.
.. method:: dump(obj)
Write a pickled representation of *obj* to the open file object given in the
constructor. Either the binary or ASCII format will be used, depending on the
value of the *protocol* argument passed to the constructor.
Write a pickled representation of *obj* to the open file object given in
the constructor.
.. method:: persistent_id(obj)
Do nothing by default. This exists so a subclass can override it.
If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual. Any
other value causes :class:`Pickler` to emit the returned value as a
persistent ID for *obj*. The meaning of this persistent ID should be
defined by :meth:`Unpickler.persistent_load`. Note that the value
returned by :meth:`persistent_id` cannot itself have a persistent ID.
See :ref:`pickle-persistent` for details and examples of uses.
.. method:: clear_memo()
Clears the pickler's "memo". The memo is the data structure that remembers
which objects the pickler has already seen, so that shared or recursive objects
pickled by reference and not by value. This method is useful when re-using
picklers.
Deprecated. Use the :meth:`clear` method on the :attr:`memo`. Clear the
pickler's memo, useful when reusing picklers.
.. attribute:: fast
Enable fast mode if set to a true value. The fast mode disables the usage
of memo, therefore speeding the pickling process by not generating
superfluous PUT opcodes. It should not be used with self-referential
objects, doing otherwise will cause :class:`Pickler` to recurse
infinitely.
Use :func:`pickletools.optimize` if you need more compact pickles.
.. attribute:: memo
Dictionary holding previously pickled objects to allow shared or
recursive objects to pickled by reference as opposed to by value.
It is possible to make multiple calls to the :meth:`dump` method of the same
:class:`Pickler` instance. These must then be matched to the same number of
calls to the :meth:`load` method of the corresponding :class:`Unpickler`
instance. If the same object is pickled by multiple :meth:`dump` calls, the
:meth:`load` will all yield references to the same object. [#]_
:meth:`load` will all yield references to the same object.
:class:`Unpickler` objects are defined as:
Please note, this is intended for pickling multiple objects without intervening
modifications to the objects or their parts. If you modify an object and then
pickle it again using the same :class:`Pickler` instance, the object is not
pickled again --- a reference to it is pickled and the :class:`Unpickler` will
return the old value, not the modified one.
.. class:: Unpickler(file)
.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
This takes a file-like object from which it will read a pickle data stream.
This class automatically determines whether the data stream was written in
binary mode or not, so it does not need a flag as in the :class:`Pickler`
factory.
This takes a binary file for reading a pickle data stream.
*file* must have two methods, a :meth:`read` method that takes an integer
argument, and a :meth:`readline` method that requires no arguments. Both
methods should return a string. Thus *file* can be a file object opened for
reading, a :mod:`StringIO` object, or any other custom object that meets this
The protocol version of the pickle is detected automatically, so no
protocol argument is needed.
The argument *file* must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus *file* can be a binary file object opened
for reading, a BytesIO object, or any other custom object that meets this
interface.
:class:`Unpickler` objects have one (or two) public methods:
Optional keyword arguments are encoding and errors, which are used to decode
8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
'strict', respectively.
.. method:: load()
Read a pickled object representation from the open file object given in
the constructor, and return the reconstituted object hierarchy specified
therein.
therein. Bytes past the pickled object's representation are ignored.
This method automatically determines whether the data stream was written
in binary mode or not.
.. method:: persistent_load(pid)
Raise an :exc:`UnpickingError` by default.
.. method:: noload()
If defined, :meth:`persistent_load` should return the object specified by
the persistent ID *pid*. On errors, such as if an invalid persistent ID is
encountered, an :exc:`UnpickingError` should be raised.
This is just like :meth:`load` except that it doesn't actually create any
objects. This is useful primarily for finding what's called "persistent
ids" that may be referenced in a pickle data stream. See section
:ref:`pickle-protocol` below for more details.
See :ref:`pickle-persistent` for details and examples of uses.
.. method:: find_class(module, name)
Import *module* if necessary and return the object called *name* from it.
Subclasses may override this to gain control over what type of objects can
be loaded, potentially reducing security risks.
What can be pickled and unpickled?
@ -506,6 +560,8 @@ The registered constructor is deemed a "safe constructor" for purposes of
unpickling as described above.
.. _pickle-persistent:
Pickling and unpickling external objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -747,14 +803,6 @@ the same process or a new process. ::
.. [#] Don't confuse this with the :mod:`marshal` module
.. [#] *Warning*: this is intended for pickling multiple objects without intervening
modifications to the objects or their parts. If you modify an object and then
pickle it again using the same :class:`Pickler` instance, the object is not
pickled again --- a reference to it is pickled and the :class:`Unpickler` will
return the old value, not the modified one. There are two problems here: (1)
detecting changes, and (2) marshalling a minimal set of changes. Garbage
Collection may also become a problem here.
.. [#] The exception raised will likely be an :exc:`ImportError` or an
:exc:`AttributeError` but it could be something else.