added docs for pickle, shelve and copy

1995-02-15 15:53:08 +00:00 · 1995-02-15 15:53:08 +00:00 · d1883588ae
parent e1ff7adbf6
commit d1883588ae
8 changed files with 580 additions and 0 deletions
--- a/Doc/lib.tex
+++ b/Doc/lib.tex
@ -67,6 +67,9 @@ language.
 \input{libstring}
 \input{libwhrandom}
 \input{libaifc}
+\input{libpickle}
+\input{libshelve}
+\input{libcopy}

 \input{libunix}			% UNIX ONLY
 \input{libdbm}
--- a/Doc/lib/lib.tex
+++ b/Doc/lib/lib.tex
@ -67,6 +67,9 @@ language.
 \input{libstring}
 \input{libwhrandom}
 \input{libaifc}
+\input{libpickle}
+\input{libshelve}
+\input{libcopy}

 \input{libunix}			% UNIX ONLY
 \input{libdbm}
--- a/Doc/lib/libcopy.tex
+++ b/Doc/lib/libcopy.tex
@ -0,0 +1,79 @@
+\section{Built-in module \sectcode{copy}}
+\stmodindex{copy}
+\ttindex{copy}
+\ttindex{deepcopy}
+
+This module provides generic (shallow and deep) copying operations.
+
+Interface summary:
+
+\begin{verbatim}
+import copy
+
+x = copy.copy(y)	# make a shallow copy of y
+x = copy.deepcopy(y)	# make a deep copy of y
+\end{verbatim}
+
+For module specific errors, \code{copy.Error} is raised.
+
+The difference between shallow and deep copying is only relevant for
+compound objects (objects that contain other objects, like lists or
+class instances):
+
+\begin{itemize}
+
+\item
+A {\em shallow copy} constructs a new compound object and then (to the
+extent possible) inserts {\em references} into it to the objects found
+in the original.
+
+\item
+A {\em deep copy} constructs a new compound object and then,
+recursively, inserts {\em copies} into it of the objects found in the
+original.
+
+\end{itemize}
+
+Two problems often exist with deep copy operations that don't exist
+with shallow copy operations:
+
+\begin{itemize}
+
+\item
+Recursive objects (compound objects that, directly or indirectly,
+contain a reference to themselves) may cause a recursive loop.
+
+\item
+Because deep copy copies {\em everything} it may copy too much, e.g.
+administrative data structures that should be shared even between
+copies.
+
+\end{itemize}
+
+Python's \code{deepcopy()} operation avoids these problems by:
+
+\begin{itemize}
+
+\item
+keeping a table of objects already copied during the current
+copying pass; and
+
+\item
+letting user-defined classes override the copying operation or the
+set of components copied.
+
+\end{itemize}
+
+This version does not copy types like module, class, function, method,
+nor stack trace, stack frame, nor file, socket, window, nor array, nor
+any similar types.
+
+Classes can use the same interfaces to control copying that they use
+to control pickling: they can define methods called
+\code{__getinitargs__()}, \code{__getstate__()} and
+\code{__setstate__()}.  See the description of module \code{pickle}
+for information on these methods.
+\stmodindex{pickle}
+\ttindex{__getinitargs__}
+\ttindex{__getstate__}
+\ttindex{__setstate__}
--- a/Doc/lib/libpickle.tex
+++ b/Doc/lib/libpickle.tex
@ -0,0 +1,170 @@
+\section{Built-in module \sectcode{pickle}}
+\stmodindex{pickle}
+\index{persistency}
+\indexii{persistent}{objects}
+\indexii{serializing}{objects}
+\indexii{marshalling}{objects}
+\indexii{flattening}{objects}
+\indexii{pickling}{objects}
+
+The \code{pickle} module implements a basic but powerful algorithm for
+``pickling'' (a.k.a. serializing, marshalling or flattening) nearly
+arbitrary Python objects.  This is a more primitive notion than
+persistency --- although \code{pickle} reads and writes file objects,
+it does not handle the issue of naming persistent objects, nor the
+(even more complicated) area of concurrent access to persistent
+objects.  The \code{pickle} module can transform a complex object into
+a byte stream and it can transform the byte stream into an object with
+the same internal structure.  The most obvious thing to do with these
+byte streams is to write them onto a file, but it is also conceivable
+to send them across a network or store them in a database.  The module
+\code{shelve} provides a simple interface to pickle and unpickle
+objects on ``dbm''-style database files.
+\stmodindex{shelve}
+
+Unlike the built-in module \code{marshal}, \code{pickle} handles the
+following correctly:
+\stmodindex{marshal}
+
+\begin{itemize}
+
+\item recursive objects
+
+\item pointer sharing
+
+\item instances uf user-defined classes
+
+\end{itemize}
+
+The data format used by \code{pickle} is Python-specific.  This has
+the advantage that there are no restrictions imposed by external
+standards such as CORBA (which probably can't represent pointer
+sharing or recursive objects); however it means that non-Python
+programs may not be able to reconstruct pickled Python objects.
+
+The \code{pickle} data format uses a printable ASCII representation.
+This is slightly more voluminous than a binary representation.
+However, small integers actually take {\em less} space when
+represented as minimal-size decimal strings than when represented as
+32-bit binary numbers, and strings are only much longer if they
+contain many control characters or 8-bit characters.  The big
+advantage of using printable ASCII (and of some other characteristics
+of \code{pickle}'s representation) is that for debugging or recovery
+purposes it is possible for a human to read the pickled file with a
+standard text editor.  (I could have gone a step further and used a
+notation like S-expressions, but the parser would have been
+considerably more complicated and slower, and the files would probably
+have become much larger.)
+
+The \code{pickle} module doesn't handle code objects, which the
+\code{marshal} module does.  I suppose \code{pickle} could, and maybe
+it should, but there's probably no great need for it right now (as
+long as \code{marshal} continues to be used for reading and writing
+code objects), and at least this avoids the possibility of smuggling
+Trojan horses into a program.
+\stmodindex{marshal}
+
+For the benefit of persistency modules written using \code{pickle}, it
+supports the notion of a reference to an object outside the pickled
+data stream.  Such objects are referenced by a name, which is an
+arbitrary string of printable ASCII characters.  The resolution of
+such names is not defined by the \code{pickle} module --- the
+persistent object module will have to implement a method
+\code{persistent_load}.  To write references to persistent objects,
+the persistent module must define a method \code{persistent_id} which
+returns either \code{None} or the persistent ID of the object.
+
+There are some restrictions on the pickling of class instances.
+
+First of all, the class must be defined at the top level in a module.
+
+Next, it must normally be possible to create class instances by
+calling the class without arguments.  If this is undesirable, the
+class can define a method \code{__getinitargs__()}, which should
+return a {\em tuple} containing the arguments to be passed to the
+class constructor (\code{__init__()}).
+\ttindex{__getinitargs__}
+\ttindex{__init__}
+
+Classes can further influence how they are pickled --- if the class
+defines the method \code{__getstate__()}, it is called and the return
+state is pickled as the contents for the instance, and if the class
+defines the method \code{__setstate__()}, it is called with the
+unpickled state.  (Note that these methods can also be used to
+implement copying class instances.)  If there is no
+\code{__getstate__()} method, the instance's \code{__dict__} is
+pickled.  If there is no \code{__setstate__()} method, the pickled
+object must be a dictionary and its items are assigned to the new
+instance's dictionary.  (If a class defines both \code{__getstate__()}
+and \code{__setstate__()}, the state object needn't be a dictionary
+--- these methods can do what they want.)  This protocol is also used
+by the shallow and deep copying operations defined in the \code{copy}
+module.
+\ttindex{__getstate__}
+\ttindex{__setstate__}
+\ttindex{__dict__}
+
+Note that when class instances are pickled, their class's code and
+data is not pickled along with them.  Only the instance data is
+pickled.  This is done on purpose, so you can fix bugs in a class or
+add methods and still load objects that were created with an earlier
+version of the class.  If you plan to have long-lived objects that
+will see many versions of a class, it may be worth to put a version
+number in the objects so that suitable conversions can be made by the
+class's \code{__setstate__()} method.
+
+The interface can be summarized as follows.
+
+To pickle an object \code{x} onto a file \code{f}, open for writing:
+
+\begin{verbatim}
+p = pickle.Pickler(f)
+p.dump(x)
+\end{verbatim}
+
+To unpickle an object \code{x} from a file \code{f}, open for reading:
+
+\begin{verbatim}
+u = pickle.Unpickler(f)
+x = u.load(x)
+\end{verbatim}
+
+The \code{Pickler} class only calls the method \code{f.write} with a
+string argument.  The \code{Unpickler} calls the methods \code{f.read}
+(with an integer argument) and \code{f.readline} (without argument),
+both returning a string.  It is explicitly allowed to pass non-file
+objects here, as long as they have the right methods.
+
+The following types can be pickled:
+\begin{itemize}
+
+\item \code{None}
+
+\item integers, long integers, floating point numbers
+
+\item strings
+
+\item tuples, lists and dictionaries containing only picklable objects
+
+\item class instances whose \code{__dict__} or \code{__setstate__()}
+is picklable
+
+\end{itemize}
+
+Attempts to pickle unpicklable objects will raise an exception; when
+this happens, an unspecified number of bytes may have been written to
+the file argument.
+
+It is possible to make multiple calls to \code{Pickler.dump()} or to
+\code{Unpickler.load()}, as long as there is a one-to-one
+correspondence between pickler and \code{Unpickler} objects and
+between \code{dump} and \code{load} calls for any pair of
+corresponding \code{Pickler} and \code{Unpicklers}.  {\em Warning}:
+this is intended for pickling multiple objects without intervening
+modifications to the objects or their parts.  If you modify an object
+and then pickle it again using the same \code{Pickler} instance, the
+object is not pickled again --- a reference to it is pickled and the
+\code{Unpickler} will return the old value, not the modified one.  (There
+are two problems here: (a) detecting changes, and (b) marshalling a
+minimal set of changes.  I have no answers.  Garbage Collection may
+also become a problem here.)
--- a/Doc/lib/libshelve.tex
+++ b/Doc/lib/libshelve.tex
@ -0,0 +1,38 @@
+\section{Built-in module \sectcode{shelve}}
+\stmodindex{shelve}
+\stmodindex{pickle}
+\bimodindex{dbm}
+
+A ``shelf'' is a persistent, dictionary-like object.  The difference
+with ``dbm'' databases is that the values (not the keys!) in a shelf
+can be essentially arbitrary Python objects --- anything that the
+\code{pickle} module can handle.  This includes most class instances,
+recursive data types, and objects containing lots of shared
+sub-objects.  The keys are ordinary strings.
+
+To summarize the interface (\code{key} is a string, \code{data} is an
+arbitrary object):
+
+\begin{verbatim}
+import shelve
+
+d = shelve.open(filename) # open, with (g)dbm filename -- no suffix
+
+d[key] = data   # store data at key (overwrites old data if
+                # using an existing key)
+data = d[key]   # retrieve data at key (raise KeyError if no
+                # such key)
+del d[key]      # delete data stored at key (raises KeyError
+                # if no such key)
+flag = d.has_key(key)   # true if the key exists
+list = d.keys() # a list of all existing keys (slow!)
+
+d.close()       # close it
+\end{verbatim}
+
+Dependent on the implementation, closing a persistent dictionary may
+or may not be necessary to flush changes to disk.
+
+Note: \code{shelve} does not support {\em concurrent} access to
+shelved objects.  Two programs should not try to simultaneously access
+the same shelf.
--- a/Doc/libcopy.tex
+++ b/Doc/libcopy.tex
@ -0,0 +1,79 @@
+\section{Built-in module \sectcode{copy}}
+\stmodindex{copy}
+\ttindex{copy}
+\ttindex{deepcopy}
+
+This module provides generic (shallow and deep) copying operations.
+
+Interface summary:
+
+\begin{verbatim}
+import copy
+
+x = copy.copy(y)	# make a shallow copy of y
+x = copy.deepcopy(y)	# make a deep copy of y
+\end{verbatim}
+
+For module specific errors, \code{copy.Error} is raised.
+
+The difference between shallow and deep copying is only relevant for
+compound objects (objects that contain other objects, like lists or
+class instances):
+
+\begin{itemize}
+
+\item
+A {\em shallow copy} constructs a new compound object and then (to the
+extent possible) inserts {\em references} into it to the objects found
+in the original.
+
+\item
+A {\em deep copy} constructs a new compound object and then,
+recursively, inserts {\em copies} into it of the objects found in the
+original.
+
+\end{itemize}
+
+Two problems often exist with deep copy operations that don't exist
+with shallow copy operations:
+
+\begin{itemize}
+
+\item
+Recursive objects (compound objects that, directly or indirectly,
+contain a reference to themselves) may cause a recursive loop.
+
+\item
+Because deep copy copies {\em everything} it may copy too much, e.g.
+administrative data structures that should be shared even between
+copies.
+
+\end{itemize}
+
+Python's \code{deepcopy()} operation avoids these problems by:
+
+\begin{itemize}
+
+\item
+keeping a table of objects already copied during the current
+copying pass; and
+
+\item
+letting user-defined classes override the copying operation or the
+set of components copied.
+
+\end{itemize}
+
+This version does not copy types like module, class, function, method,
+nor stack trace, stack frame, nor file, socket, window, nor array, nor
+any similar types.
+
+Classes can use the same interfaces to control copying that they use
+to control pickling: they can define methods called
+\code{__getinitargs__()}, \code{__getstate__()} and
+\code{__setstate__()}.  See the description of module \code{pickle}
+for information on these methods.
+\stmodindex{pickle}
+\ttindex{__getinitargs__}
+\ttindex{__getstate__}
+\ttindex{__setstate__}
--- a/Doc/libpickle.tex
+++ b/Doc/libpickle.tex
@ -0,0 +1,170 @@
+\section{Built-in module \sectcode{pickle}}
+\stmodindex{pickle}
+\index{persistency}
+\indexii{persistent}{objects}
+\indexii{serializing}{objects}
+\indexii{marshalling}{objects}
+\indexii{flattening}{objects}
+\indexii{pickling}{objects}
+
+The \code{pickle} module implements a basic but powerful algorithm for
+``pickling'' (a.k.a. serializing, marshalling or flattening) nearly
+arbitrary Python objects.  This is a more primitive notion than
+persistency --- although \code{pickle} reads and writes file objects,
+it does not handle the issue of naming persistent objects, nor the
+(even more complicated) area of concurrent access to persistent
+objects.  The \code{pickle} module can transform a complex object into
+a byte stream and it can transform the byte stream into an object with
+the same internal structure.  The most obvious thing to do with these
+byte streams is to write them onto a file, but it is also conceivable
+to send them across a network or store them in a database.  The module
+\code{shelve} provides a simple interface to pickle and unpickle
+objects on ``dbm''-style database files.
+\stmodindex{shelve}
+
+Unlike the built-in module \code{marshal}, \code{pickle} handles the
+following correctly:
+\stmodindex{marshal}
+
+\begin{itemize}
+
+\item recursive objects
+
+\item pointer sharing
+
+\item instances uf user-defined classes
+
+\end{itemize}
+
+The data format used by \code{pickle} is Python-specific.  This has
+the advantage that there are no restrictions imposed by external
+standards such as CORBA (which probably can't represent pointer
+sharing or recursive objects); however it means that non-Python
+programs may not be able to reconstruct pickled Python objects.
+
+The \code{pickle} data format uses a printable ASCII representation.
+This is slightly more voluminous than a binary representation.
+However, small integers actually take {\em less} space when
+represented as minimal-size decimal strings than when represented as
+32-bit binary numbers, and strings are only much longer if they
+contain many control characters or 8-bit characters.  The big
+advantage of using printable ASCII (and of some other characteristics
+of \code{pickle}'s representation) is that for debugging or recovery
+purposes it is possible for a human to read the pickled file with a
+standard text editor.  (I could have gone a step further and used a
+notation like S-expressions, but the parser would have been
+considerably more complicated and slower, and the files would probably
+have become much larger.)
+
+The \code{pickle} module doesn't handle code objects, which the
+\code{marshal} module does.  I suppose \code{pickle} could, and maybe
+it should, but there's probably no great need for it right now (as
+long as \code{marshal} continues to be used for reading and writing
+code objects), and at least this avoids the possibility of smuggling
+Trojan horses into a program.
+\stmodindex{marshal}
+
+For the benefit of persistency modules written using \code{pickle}, it
+supports the notion of a reference to an object outside the pickled
+data stream.  Such objects are referenced by a name, which is an
+arbitrary string of printable ASCII characters.  The resolution of
+such names is not defined by the \code{pickle} module --- the
+persistent object module will have to implement a method
+\code{persistent_load}.  To write references to persistent objects,
+the persistent module must define a method \code{persistent_id} which
+returns either \code{None} or the persistent ID of the object.
+
+There are some restrictions on the pickling of class instances.
+
+First of all, the class must be defined at the top level in a module.
+
+Next, it must normally be possible to create class instances by
+calling the class without arguments.  If this is undesirable, the
+class can define a method \code{__getinitargs__()}, which should
+return a {\em tuple} containing the arguments to be passed to the
+class constructor (\code{__init__()}).
+\ttindex{__getinitargs__}
+\ttindex{__init__}
+
+Classes can further influence how they are pickled --- if the class
+defines the method \code{__getstate__()}, it is called and the return
+state is pickled as the contents for the instance, and if the class
+defines the method \code{__setstate__()}, it is called with the
+unpickled state.  (Note that these methods can also be used to
+implement copying class instances.)  If there is no
+\code{__getstate__()} method, the instance's \code{__dict__} is
+pickled.  If there is no \code{__setstate__()} method, the pickled
+object must be a dictionary and its items are assigned to the new
+instance's dictionary.  (If a class defines both \code{__getstate__()}
+and \code{__setstate__()}, the state object needn't be a dictionary
+--- these methods can do what they want.)  This protocol is also used
+by the shallow and deep copying operations defined in the \code{copy}
+module.
+\ttindex{__getstate__}
+\ttindex{__setstate__}
+\ttindex{__dict__}
+
+Note that when class instances are pickled, their class's code and
+data is not pickled along with them.  Only the instance data is
+pickled.  This is done on purpose, so you can fix bugs in a class or
+add methods and still load objects that were created with an earlier
+version of the class.  If you plan to have long-lived objects that
+will see many versions of a class, it may be worth to put a version
+number in the objects so that suitable conversions can be made by the
+class's \code{__setstate__()} method.
+
+The interface can be summarized as follows.
+
+To pickle an object \code{x} onto a file \code{f}, open for writing:
+
+\begin{verbatim}
+p = pickle.Pickler(f)
+p.dump(x)
+\end{verbatim}
+
+To unpickle an object \code{x} from a file \code{f}, open for reading:
+
+\begin{verbatim}
+u = pickle.Unpickler(f)
+x = u.load(x)
+\end{verbatim}
+
+The \code{Pickler} class only calls the method \code{f.write} with a
+string argument.  The \code{Unpickler} calls the methods \code{f.read}
+(with an integer argument) and \code{f.readline} (without argument),
+both returning a string.  It is explicitly allowed to pass non-file
+objects here, as long as they have the right methods.
+
+The following types can be pickled:
+\begin{itemize}
+
+\item \code{None}
+
+\item integers, long integers, floating point numbers
+
+\item strings
+
+\item tuples, lists and dictionaries containing only picklable objects
+
+\item class instances whose \code{__dict__} or \code{__setstate__()}
+is picklable
+
+\end{itemize}
+
+Attempts to pickle unpicklable objects will raise an exception; when
+this happens, an unspecified number of bytes may have been written to
+the file argument.
+
+It is possible to make multiple calls to \code{Pickler.dump()} or to
+\code{Unpickler.load()}, as long as there is a one-to-one
+correspondence between pickler and \code{Unpickler} objects and
+between \code{dump} and \code{load} calls for any pair of
+corresponding \code{Pickler} and \code{Unpicklers}.  {\em Warning}:
+this is intended for pickling multiple objects without intervening
+modifications to the objects or their parts.  If you modify an object
+and then pickle it again using the same \code{Pickler} instance, the
+object is not pickled again --- a reference to it is pickled and the
+\code{Unpickler} will return the old value, not the modified one.  (There
+are two problems here: (a) detecting changes, and (b) marshalling a
+minimal set of changes.  I have no answers.  Garbage Collection may
+also become a problem here.)
--- a/Doc/libshelve.tex
+++ b/Doc/libshelve.tex
@ -0,0 +1,38 @@
+\section{Built-in module \sectcode{shelve}}
+\stmodindex{shelve}
+\stmodindex{pickle}
+\bimodindex{dbm}
+
+A ``shelf'' is a persistent, dictionary-like object.  The difference
+with ``dbm'' databases is that the values (not the keys!) in a shelf
+can be essentially arbitrary Python objects --- anything that the
+\code{pickle} module can handle.  This includes most class instances,
+recursive data types, and objects containing lots of shared
+sub-objects.  The keys are ordinary strings.
+
+To summarize the interface (\code{key} is a string, \code{data} is an
+arbitrary object):
+
+\begin{verbatim}
+import shelve
+
+d = shelve.open(filename) # open, with (g)dbm filename -- no suffix
+
+d[key] = data   # store data at key (overwrites old data if
+                # using an existing key)
+data = d[key]   # retrieve data at key (raise KeyError if no
+                # such key)
+del d[key]      # delete data stored at key (raises KeyError
+                # if no such key)
+flag = d.has_key(key)   # true if the key exists
+list = d.keys() # a list of all existing keys (slow!)
+
+d.close()       # close it
+\end{verbatim}
+
+Dependent on the implementation, closing a persistent dictionary may
+or may not be necessary to flush changes to disk.
+
+Note: \code{shelve} does not support {\em concurrent} access to
+shelved objects.  Two programs should not try to simultaneously access
+the same shelf.