cpython/Doc/lib/libpickle.tex

\section{Standard Module \sectcode{pickle}}
\label{module-pickle}
\stmodindex{pickle}
\index{persistency}
\indexii{persistent}{objects}
\indexii{serializing}{objects}
\indexii{marshalling}{objects}
\indexii{flattening}{objects}
\indexii{pickling}{objects}

\renewcommand{\indexsubitem}{(in module pickle)}

The \code{pickle} module implements a basic but powerful algorithm for
``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
arbitrary Python objects.  This is the act of converting objects to a
stream of bytes (and back: ``unpickling'').
This is a more primitive notion than
persistency --- although \code{pickle} reads and writes file objects,
it does not handle the issue of naming persistent objects, nor the
(even more complicated) area of concurrent access to persistent
objects.  The \code{pickle} module can transform a complex object into
a byte stream and it can transform the byte stream into an object with
the same internal structure.  The most obvious thing to do with these
byte streams is to write them onto a file, but it is also conceivable
to send them across a network or store them in a database.  The module
\code{shelve} provides a simple interface to pickle and unpickle
objects on ``dbm''-style database files.
\stmodindex{shelve}

Unlike the built-in module \code{marshal}, \code{pickle} handles the
following correctly:
\stmodindex{marshal}

\begin{itemize}

\item recursive objects (objects containing references to themselves)

\item object sharing (references to the same object in different places)

\item user-defined classes and their instances

\end{itemize}

The data format used by \code{pickle} is Python-specific.  This has
the advantage that there are no restrictions imposed by external
standards such as CORBA (which probably can't represent pointer
sharing or recursive objects); however it means that non-Python
programs may not be able to reconstruct pickled Python objects.

The \code{pickle} data format uses a printable \ASCII{} representation.
This is slightly more voluminous than a binary representation.
However, small integers actually take {\em less} space when
represented as minimal-size decimal strings than when represented as
32-bit binary numbers, and strings are only much longer if they
contain many control characters or 8-bit characters.  The big
advantage of using printable \ASCII{} (and of some other characteristics
of \code{pickle}'s representation) is that for debugging or recovery
purposes it is possible for a human to read the pickled file with a
standard text editor.  (I could have gone a step further and used a
notation like S-expressions, but the parser
(currently written in Python) would have been
considerably more complicated and slower, and the files would probably
have become much larger.)

The \code{pickle} module doesn't handle code objects, which the
\code{marshal} module does.  I suppose \code{pickle} could, and maybe
it should, but there's probably no great need for it right now (as
long as \code{marshal} continues to be used for reading and writing
code objects), and at least this avoids the possibility of smuggling
Trojan horses into a program.
\stmodindex{marshal}

For the benefit of persistency modules written using \code{pickle}, it
supports the notion of a reference to an object outside the pickled
data stream.  Such objects are referenced by a name, which is an
arbitrary string of printable \ASCII{} characters.  The resolution of
such names is not defined by the \code{pickle} module --- the
persistent object module will have to implement a method
\code{persistent_load}.  To write references to persistent objects,
the persistent module must define a method \code{persistent_id} which
returns either \code{None} or the persistent ID of the object.

There are some restrictions on the pickling of class instances.

First of all, the class must be defined at the top level in a module.

\renewcommand{\indexsubitem}{(pickle protocol)}

Next, it must normally be possible to create class instances by
calling the class without arguments.  Usually, this is best
accomplished by providing default values for all arguments to its
\code{__init__} method (if it has one).  If this is undesirable, the
class can define a method \code{__getinitargs__()}, which should
return a {\em tuple} containing the arguments to be passed to the
class constructor (\code{__init__()}).
\ttindex{__getinitargs__}
\ttindex{__init__}

Classes can further influence how their instances are pickled --- if the class
defines the method \code{__getstate__()}, it is called and the return
state is pickled as the contents for the instance, and if the class
defines the method \code{__setstate__()}, it is called with the
unpickled state.  (Note that these methods can also be used to
implement copying class instances.)  If there is no
\code{__getstate__()} method, the instance's \code{__dict__} is
pickled.  If there is no \code{__setstate__()} method, the pickled
object must be a dictionary and its items are assigned to the new
instance's dictionary.  (If a class defines both \code{__getstate__()}
and \code{__setstate__()}, the state object needn't be a dictionary
--- these methods can do what they want.)  This protocol is also used
by the shallow and deep copying operations defined in the \code{copy}
module.
\ttindex{__getstate__}
\ttindex{__setstate__}
\ttindex{__dict__}

Note that when class instances are pickled, their class's code and
data are not pickled along with them.  Only the instance data are
pickled.  This is done on purpose, so you can fix bugs in a class or
add methods and still load objects that were created with an earlier
version of the class.  If you plan to have long-lived objects that
will see many versions of a class, it may be worthwhile to put a version
number in the objects so that suitable conversions can be made by the
class's \code{__setstate__()} method.

When a class itself is pickled, only its name is pickled --- the class
definition is not pickled, but re-imported by the unpickling process.
Therefore, the restriction that the class must be defined at the top
level in a module applies to pickled classes as well.

\renewcommand{\indexsubitem}{(in module pickle)}

The interface can be summarized as follows.

To pickle an object \code{x} onto a file \code{f}, open for writing:

\bcode\begin{verbatim}
p = pickle.Pickler(f)
p.dump(x)
\end{verbatim}\ecode
%
A shorthand for this is:

\bcode\begin{verbatim}
pickle.dump(x, f)
\end{verbatim}\ecode
%
To unpickle an object \code{x} from a file \code{f}, open for reading:

\bcode\begin{verbatim}
u = pickle.Unpickler(f)
x = u.load()
\end{verbatim}\ecode
%
A shorthand is:

\bcode\begin{verbatim}
x = pickle.load(f)
\end{verbatim}\ecode
%
The \code{Pickler} class only calls the method \code{f.write} with a
string argument.  The \code{Unpickler} calls the methods \code{f.read}
(with an integer argument) and \code{f.readline} (without argument),
both returning a string.  It is explicitly allowed to pass non-file
objects here, as long as they have the right methods.
\ttindex{Unpickler}
\ttindex{Pickler}

The following types can be pickled:
\begin{itemize}

\item \code{None}

\item integers, long integers, floating point numbers

\item strings

\item tuples, lists and dictionaries containing only picklable objects

\item classes that are defined at the top level in a module

\item instances of such classes whose \code{__dict__} or
\code{__setstate__()} is picklable

\end{itemize}

Attempts to pickle unpicklable objects will raise the
\code{PicklingError} exception; when this happens, an unspecified
number of bytes may have been written to the file.

It is possible to make multiple calls to the \code{dump()} method of
the same \code{Pickler} instance.  These must then be matched to the
same number of calls to the \code{load()} instance of the
corresponding \code{Unpickler} instance.  If the same object is
pickled by multiple \code{dump()} calls, the \code{load()} will all
yield references to the same object.  {\em Warning}: this is intended
for pickling multiple objects without intervening modifications to the
objects or their parts.  If you modify an object and then pickle it
again using the same \code{Pickler} instance, the object is not
pickled again --- a reference to it is pickled and the
\code{Unpickler} will return the old value, not the modified one.
(There are two problems here: (a) detecting changes, and (b)
marshalling a minimal set of changes.  I have no answers.  Garbage
Collection may also become a problem here.)

Apart from the \code{Pickler} and \code{Unpickler} classes, the
module defines the following functions, and an exception:

\begin{funcdesc}{dump}{object\, file}
Write a pickled representation of \var{obect} to the open file object
\var{file}.  This is equivalent to \code{Pickler(file).dump(object)}.
\end{funcdesc}

\begin{funcdesc}{load}{file}
Read a pickled object from the open file object \var{file}.  This is
equivalent to \code{Unpickler(file).load()}.
\end{funcdesc}

\begin{funcdesc}{dumps}{object}
Return the pickled representation of the object as a string, instead
of writing it to a file.
\end{funcdesc}

\begin{funcdesc}{loads}{string}
Read a pickled object from a string instead of a file.  Characters in
the string past the pickled object's representation are ignored.
\end{funcdesc}

\begin{excdesc}{PicklingError}
This exception is raised when an unpicklable object is passed to
\code{Pickler.dump()}.
\end{excdesc}
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`\section{Standard Module \sectcode{pickle}}`
AMK's megapatch: * \bcode, \ecode added everywhere * \label{module-foo} added everywhere * A few \seealso sections added. * Indentation fixed inside verbatim in lib*tex files 1997-07-17 16:34:52 +00:00			`\label{module-pickle}`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`\stmodindex{pickle}`
			`\index{persistency}`
			`\indexii{persistent}{objects}`
			`\indexii{serializing}{objects}`
			`\indexii{marshalling}{objects}`
			`\indexii{flattening}{objects}`
			`\indexii{pickling}{objects}`

mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`\renewcommand{\indexsubitem}{(in module pickle)}`

added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`The \code{pickle} module implements a basic but powerful algorithm for`
small changes by Soren Larsen 1995-03-13 10:03:32 +00:00			``pickling'' (a.k.a.\ serializing, marshalling or flattening) nearly
restructured library manual accordiung to functional group 1995-03-28 13:35:14 +00:00			`arbitrary Python objects. This is the act of converting objects to a`
			stream of bytes (and back: ``unpickling'').
			`This is a more primitive notion than`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`persistency --- although \code{pickle} reads and writes file objects,`
			`it does not handle the issue of naming persistent objects, nor the`
			`(even more complicated) area of concurrent access to persistent`
			`objects. The \code{pickle} module can transform a complex object into`
			`a byte stream and it can transform the byte stream into an object with`
			`the same internal structure. The most obvious thing to do with these`
			`byte streams is to write them onto a file, but it is also conceivable`
			`to send them across a network or store them in a database. The module`
			`\code{shelve} provides a simple interface to pickle and unpickle`
			objects on ``dbm''-style database files.
			`\stmodindex{shelve}`

			`Unlike the built-in module \code{marshal}, \code{pickle} handles the`
			`following correctly:`
			`\stmodindex{marshal}`

			`\begin{itemize}`

mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`\item recursive objects (objects containing references to themselves)`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`\item object sharing (references to the same object in different places)`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`\item user-defined classes and their instances`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00
			`\end{itemize}`

			`The data format used by \code{pickle} is Python-specific. This has`
			`the advantage that there are no restrictions imposed by external`
			`standards such as CORBA (which probably can't represent pointer`
			`sharing or recursive objects); however it means that non-Python`
			`programs may not be able to reconstruct pickled Python objects.`

mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`The \code{pickle} data format uses a printable \ASCII{} representation.`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`This is slightly more voluminous than a binary representation.`
			`However, small integers actually take {\em less} space when`
			`represented as minimal-size decimal strings than when represented as`
			`32-bit binary numbers, and strings are only much longer if they`
			`contain many control characters or 8-bit characters. The big`
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`advantage of using printable \ASCII{} (and of some other characteristics`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`of \code{pickle}'s representation) is that for debugging or recovery`
			`purposes it is possible for a human to read the pickled file with a`
			`standard text editor. (I could have gone a step further and used a`
restructured library manual accordiung to functional group 1995-03-28 13:35:14 +00:00			`notation like S-expressions, but the parser`
			`(currently written in Python) would have been`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`considerably more complicated and slower, and the files would probably`
			`have become much larger.)`

			`The \code{pickle} module doesn't handle code objects, which the`
			`\code{marshal} module does. I suppose \code{pickle} could, and maybe`
			`it should, but there's probably no great need for it right now (as`
			`long as \code{marshal} continues to be used for reading and writing`
			`code objects), and at least this avoids the possibility of smuggling`
			`Trojan horses into a program.`
			`\stmodindex{marshal}`

			`For the benefit of persistency modules written using \code{pickle}, it`
			`supports the notion of a reference to an object outside the pickled`
			`data stream. Such objects are referenced by a name, which is an`
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`arbitrary string of printable \ASCII{} characters. The resolution of`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`such names is not defined by the \code{pickle} module --- the`
			`persistent object module will have to implement a method`
			`\code{persistent_load}. To write references to persistent objects,`
			`the persistent module must define a method \code{persistent_id} which`
			`returns either \code{None} or the persistent ID of the object.`

			`There are some restrictions on the pickling of class instances.`

			`First of all, the class must be defined at the top level in a module.`

mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`\renewcommand{\indexsubitem}{(pickle protocol)}`

added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`Next, it must normally be possible to create class instances by`
Suggest using default values for __init__ arguments to make classes unpicklable. 1996-08-09 21:23:47 +00:00			`calling the class without arguments. Usually, this is best`
			`accomplished by providing default values for all arguments to its`
			`\code{__init__} method (if it has one). If this is undesirable, the`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`class can define a method \code{__getinitargs__()}, which should`
			`return a {\em tuple} containing the arguments to be passed to the`
			`class constructor (\code{__init__()}).`
			`\ttindex{__getinitargs__}`
			`\ttindex{__init__}`

mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`Classes can further influence how their instances are pickled --- if the class`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`defines the method \code{__getstate__()}, it is called and the return`
			`state is pickled as the contents for the instance, and if the class`
			`defines the method \code{__setstate__()}, it is called with the`
			`unpickled state. (Note that these methods can also be used to`
			`implement copying class instances.) If there is no`
			`\code{__getstate__()} method, the instance's \code{__dict__} is`
			`pickled. If there is no \code{__setstate__()} method, the pickled`
			`object must be a dictionary and its items are assigned to the new`
			`instance's dictionary. (If a class defines both \code{__getstate__()}`
			`and \code{__setstate__()}, the state object needn't be a dictionary`
			`--- these methods can do what they want.) This protocol is also used`
			`by the shallow and deep copying operations defined in the \code{copy}`
			`module.`
			`\ttindex{__getstate__}`
			`\ttindex{__setstate__}`
			`\ttindex{__dict__}`

			`Note that when class instances are pickled, their class's code and`
small changes by Soren Larsen 1995-03-13 10:03:32 +00:00			`data are not pickled along with them. Only the instance data are`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`pickled. This is done on purpose, so you can fix bugs in a class or`
			`add methods and still load objects that were created with an earlier`
			`version of the class. If you plan to have long-lived objects that`
small changes by Soren Larsen 1995-03-13 10:03:32 +00:00			`will see many versions of a class, it may be worthwhile to put a version`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`number in the objects so that suitable conversions can be made by the`
			`class's \code{__setstate__()} method.`

mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`When a class itself is pickled, only its name is pickled --- the class`
			`definition is not pickled, but re-imported by the unpickling process.`
			`Therefore, the restriction that the class must be defined at the top`
			`level in a module applies to pickled classes as well.`

			`\renewcommand{\indexsubitem}{(in module pickle)}`

added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`The interface can be summarized as follows.`

			`To pickle an object \code{x} onto a file \code{f}, open for writing:`

AMK's megapatch: * \bcode, \ecode added everywhere * \label{module-foo} added everywhere * A few \seealso sections added. * Indentation fixed inside verbatim in lib*tex files 1997-07-17 16:34:52 +00:00			`\bcode\begin{verbatim}`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`p = pickle.Pickler(f)`
			`p.dump(x)`
AMK's megapatch: * \bcode, \ecode added everywhere * \label{module-foo} added everywhere * A few \seealso sections added. * Indentation fixed inside verbatim in lib*tex files 1997-07-17 16:34:52 +00:00			`\end{verbatim}\ecode`
			`%`
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`A shorthand for this is:`

AMK's megapatch: * \bcode, \ecode added everywhere * \label{module-foo} added everywhere * A few \seealso sections added. * Indentation fixed inside verbatim in lib*tex files 1997-07-17 16:34:52 +00:00			`\bcode\begin{verbatim}`
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`pickle.dump(x, f)`
AMK's megapatch: * \bcode, \ecode added everywhere * \label{module-foo} added everywhere * A few \seealso sections added. * Indentation fixed inside verbatim in lib*tex files 1997-07-17 16:34:52 +00:00			`\end{verbatim}\ecode`
			`%`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`To unpickle an object \code{x} from a file \code{f}, open for reading:`

AMK's megapatch: * \bcode, \ecode added everywhere * \label{module-foo} added everywhere * A few \seealso sections added. * Indentation fixed inside verbatim in lib*tex files 1997-07-17 16:34:52 +00:00			`\bcode\begin{verbatim}`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`u = pickle.Unpickler(f)`
typos, layout and other small things 1995-04-10 11:34:00 +00:00			`x = u.load()`
AMK's megapatch: * \bcode, \ecode added everywhere * \label{module-foo} added everywhere * A few \seealso sections added. * Indentation fixed inside verbatim in lib*tex files 1997-07-17 16:34:52 +00:00			`\end{verbatim}\ecode`
			`%`
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`A shorthand is:`

AMK's megapatch: * \bcode, \ecode added everywhere * \label{module-foo} added everywhere * A few \seealso sections added. * Indentation fixed inside verbatim in lib*tex files 1997-07-17 16:34:52 +00:00			`\bcode\begin{verbatim}`
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`x = pickle.load(f)`
AMK's megapatch: * \bcode, \ecode added everywhere * \label{module-foo} added everywhere * A few \seealso sections added. * Indentation fixed inside verbatim in lib*tex files 1997-07-17 16:34:52 +00:00			`\end{verbatim}\ecode`
			`%`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00			`The \code{Pickler} class only calls the method \code{f.write} with a`
			`string argument. The \code{Unpickler} calls the methods \code{f.read}`
			`(with an integer argument) and \code{f.readline} (without argument),`
			`both returning a string. It is explicitly allowed to pass non-file`
			`objects here, as long as they have the right methods.`
mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`\ttindex{Unpickler}`
			`\ttindex{Pickler}`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00
			`The following types can be pickled:`
			`\begin{itemize}`

			`\item \code{None}`

			`\item integers, long integers, floating point numbers`

			`\item strings`

			`\item tuples, lists and dictionaries containing only picklable objects`

mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`\item classes that are defined at the top level in a module`

			`\item instances of such classes whose \code{__dict__} or`
			`\code{__setstate__()} is picklable`
added docs for pickle, shelve and copy 1995-02-15 15:53:08 +00:00
			`\end{itemize}`

mass changes; fix titles; add examples; correct typos; clarifications; unified style; etc. 1995-03-17 16:07:09 +00:00			`Attempts to pickle unpicklable objects will raise the`
			`\code{PicklingError} exception; when this happens, an unspecified`
			`number of bytes may have been written to the file.`

			`It is possible to make multiple calls to the \code{dump()} method of`
			`the same \code{Pickler} instance. These must then be matched to the`
			`same number of calls to the \code{load()} instance of the`
			`corresponding \code{Unpickler} instance. If the same object is`
			`pickled by multiple \code{dump()} calls, the \code{load()} will all`
			`yield references to the same object. {\em Warning}: this is intended`
			`for pickling multiple objects without intervening modifications to the`
			`objects or their parts. If you modify an object and then pickle it`
			`again using the same \code{Pickler} instance, the object is not`
			`pickled again --- a reference to it is pickled and the`
			`\code{Unpickler} will return the old value, not the modified one.`
			`(There are two problems here: (a) detecting changes, and (b)`
			`marshalling a minimal set of changes. I have no answers. Garbage`
			`Collection may also become a problem here.)`

			`Apart from the \code{Pickler} and \code{Unpickler} classes, the`
			`module defines the following functions, and an exception:`

			`\begin{funcdesc}{dump}{object\, file}`
			`Write a pickled representation of \var{obect} to the open file object`
			`\var{file}. This is equivalent to \code{Pickler(file).dump(object)}.`
			`\end{funcdesc}`

			`\begin{funcdesc}{load}{file}`
			`Read a pickled object from the open file object \var{file}. This is`
			`equivalent to \code{Unpickler(file).load()}.`
			`\end{funcdesc}`

			`\begin{funcdesc}{dumps}{object}`
			`Return the pickled representation of the object as a string, instead`
			`of writing it to a file.`
			`\end{funcdesc}`

			`\begin{funcdesc}{loads}{string}`
			`Read a pickled object from a string instead of a file. Characters in`
			`the string past the pickled object's representation are ignored.`
			`\end{funcdesc}`

			`\begin{excdesc}{PicklingError}`
			`This exception is raised when an unpicklable object is passed to`
			`\code{Pickler.dump()}.`
			`\end{excdesc}`