mirror of https://github.com/python/cpython.git
1435 lines
59 KiB
TeX
1435 lines
59 KiB
TeX
\documentclass{howto}
|
|
|
|
% $Id$
|
|
|
|
\title{What's New in Python 2.2}
|
|
\release{1.00}
|
|
\author{A.M. Kuchling}
|
|
\authoraddress{\email{akuchlin@mems-exchange.org}}
|
|
\begin{document}
|
|
\maketitle\tableofcontents
|
|
|
|
\section{Introduction}
|
|
|
|
This article explains the new features in Python 2.2, released on
|
|
December 21, 2001.
|
|
|
|
Python 2.2 can be thought of as the "cleanup release". There are some
|
|
features such as generators and iterators that are completely new, but
|
|
most of the changes, significant and far-reaching though they may be,
|
|
are aimed at cleaning up irregularities and dark corners of the
|
|
language design.
|
|
|
|
This article doesn't attempt to provide a complete specification of
|
|
the new features, but instead provides a convenient overview. For
|
|
full details, you should refer to the documentation for Python 2.2,
|
|
such as the
|
|
\citetitle[http://www.python.org/doc/2.2/lib/lib.html]{Python
|
|
Library Reference} and the
|
|
\citetitle[http://www.python.org/doc/2.2/ref/ref.html]{Python
|
|
Reference Manual}. If you want to understand the complete
|
|
implementation and design rationale for a change, refer to the PEP for
|
|
a particular new feature.
|
|
|
|
\begin{seealso}
|
|
|
|
\seeurl{http://www.unixreview.com/documents/s=1356/urm0109h/0109h.htm}
|
|
{``What's So Special About Python 2.2?'' is also about the new 2.2
|
|
features, and was written by Cameron Laird and Kathryn Soraiz.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEPs 252 and 253: Type and Class Changes}
|
|
|
|
The largest and most far-reaching changes in Python 2.2 are to
|
|
Python's model of objects and classes. The changes should be backward
|
|
compatible, so it's likely that your code will continue to run
|
|
unchanged, but the changes provide some amazing new capabilities.
|
|
Before beginning this, the longest and most complicated section of
|
|
this article, I'll provide an overview of the changes and offer some
|
|
comments.
|
|
|
|
A long time ago I wrote a Web page
|
|
(\url{http://www.amk.ca/python/writing/warts.html}) listing flaws in
|
|
Python's design. One of the most significant flaws was that it's
|
|
impossible to subclass Python types implemented in C. In particular,
|
|
it's not possible to subclass built-in types, so you can't just
|
|
subclass, say, lists in order to add a single useful method to them.
|
|
The \module{UserList} module provides a class that supports all of the
|
|
methods of lists and that can be subclassed further, but there's lots
|
|
of C code that expects a regular Python list and won't accept a
|
|
\class{UserList} instance.
|
|
|
|
Python 2.2 fixes this, and in the process adds some exciting new
|
|
capabilities. A brief summary:
|
|
|
|
\begin{itemize}
|
|
|
|
\item You can subclass built-in types such as lists and even integers,
|
|
and your subclasses should work in every place that requires the
|
|
original type.
|
|
|
|
\item It's now possible to define static and class methods, in addition
|
|
to the instance methods available in previous versions of Python.
|
|
|
|
\item It's also possible to automatically call methods on accessing or
|
|
setting an instance attribute by using a new mechanism called
|
|
\dfn{properties}. Many uses of \method{__getattr__} can be rewritten
|
|
to use properties instead, making the resulting code simpler and
|
|
faster. As a small side benefit, attributes can now have docstrings,
|
|
too.
|
|
|
|
\item The list of legal attributes for an instance can be limited to a
|
|
particular set using \dfn{slots}, making it possible to safeguard
|
|
against typos and perhaps make more optimizations possible in future
|
|
versions of Python.
|
|
|
|
\end{itemize}
|
|
|
|
Some users have voiced concern about all these changes. Sure, they
|
|
say, the new features are neat and lend themselves to all sorts of
|
|
tricks that weren't possible in previous versions of Python, but
|
|
they also make the language more complicated. Some people have said
|
|
that they've always recommended Python for its simplicity, and feel
|
|
that its simplicity is being lost.
|
|
|
|
Personally, I think there's no need to worry. Many of the new
|
|
features are quite esoteric, and you can write a lot of Python code
|
|
without ever needed to be aware of them. Writing a simple class is no
|
|
more difficult than it ever was, so you don't need to bother learning
|
|
or teaching them unless they're actually needed. Some very
|
|
complicated tasks that were previously only possible from C will now
|
|
be possible in pure Python, and to my mind that's all for the better.
|
|
|
|
I'm not going to attempt to cover every single corner case and small
|
|
change that were required to make the new features work. Instead this
|
|
section will paint only the broad strokes. See section~\ref{sect-rellinks},
|
|
``Related Links'', for further sources of information about Python 2.2's new
|
|
object model.
|
|
|
|
|
|
\subsection{Old and New Classes}
|
|
|
|
First, you should know that Python 2.2 really has two kinds of
|
|
classes: classic or old-style classes, and new-style classes. The
|
|
old-style class model is exactly the same as the class model in
|
|
earlier versions of Python. All the new features described in this
|
|
section apply only to new-style classes. This divergence isn't
|
|
intended to last forever; eventually old-style classes will be
|
|
dropped, possibly in Python 3.0.
|
|
|
|
So how do you define a new-style class? You do it by subclassing an
|
|
existing new-style class. Most of Python's built-in types, such as
|
|
integers, lists, dictionaries, and even files, are new-style classes
|
|
now. A new-style class named \class{object}, the base class for all
|
|
built-in types, has been also been added so if no built-in type is
|
|
suitable, you can just subclass \class{object}:
|
|
|
|
\begin{verbatim}
|
|
class C(object):
|
|
def __init__ (self):
|
|
...
|
|
...
|
|
\end{verbatim}
|
|
|
|
This means that \keyword{class} statements that don't have any base
|
|
classes are always classic classes in Python 2.2. (Actually you can
|
|
also change this by setting a module-level variable named
|
|
\member{__metaclass__} --- see \pep{253} for the details --- but it's
|
|
easier to just subclass \keyword{object}.)
|
|
|
|
The type objects for the built-in types are available as built-ins,
|
|
named using a clever trick. Python has always had built-in functions
|
|
named \function{int()}, \function{float()}, and \function{str()}. In
|
|
2.2, they aren't functions any more, but type objects that behave as
|
|
factories when called.
|
|
|
|
\begin{verbatim}
|
|
>>> int
|
|
<type 'int'>
|
|
>>> int('123')
|
|
123
|
|
\end{verbatim}
|
|
|
|
To make the set of types complete, new type objects such as
|
|
\function{dict} and \function{file} have been added. Here's a
|
|
more interesting example, adding a \method{lock()} method to file
|
|
objects:
|
|
|
|
\begin{verbatim}
|
|
class LockableFile(file):
|
|
def lock (self, operation, length=0, start=0, whence=0):
|
|
import fcntl
|
|
return fcntl.lockf(self.fileno(), operation,
|
|
length, start, whence)
|
|
\end{verbatim}
|
|
|
|
The now-obsolete \module{posixfile} module contained a class that
|
|
emulated all of a file object's methods and also added a
|
|
\method{lock()} method, but this class couldn't be passed to internal
|
|
functions that expected a built-in file, something which is possible
|
|
with our new \class{LockableFile}.
|
|
|
|
|
|
\subsection{Descriptors}
|
|
|
|
In previous versions of Python, there was no consistent way to
|
|
discover what attributes and methods were supported by an object.
|
|
There were some informal conventions, such as defining
|
|
\member{__members__} and \member{__methods__} attributes that were
|
|
lists of names, but often the author of an extension type or a class
|
|
wouldn't bother to define them. You could fall back on inspecting the
|
|
\member{__dict__} of an object, but when class inheritance or an
|
|
arbitrary \method{__getattr__} hook were in use this could still be
|
|
inaccurate.
|
|
|
|
The one big idea underlying the new class model is that an API for
|
|
describing the attributes of an object using \dfn{descriptors} has
|
|
been formalized. Descriptors specify the value of an attribute,
|
|
stating whether it's a method or a field. With the descriptor API,
|
|
static methods and class methods become possible, as well as more
|
|
exotic constructs.
|
|
|
|
Attribute descriptors are objects that live inside class objects, and
|
|
have a few attributes of their own:
|
|
|
|
\begin{itemize}
|
|
|
|
\item \member{__name__} is the attribute's name.
|
|
|
|
\item \member{__doc__} is the attribute's docstring.
|
|
|
|
\item \method{__get__(\var{object})} is a method that retrieves the
|
|
attribute value from \var{object}.
|
|
|
|
\item \method{__set__(\var{object}, \var{value})} sets the attribute
|
|
on \var{object} to \var{value}.
|
|
|
|
\item \method{__delete__(\var{object}, \var{value})} deletes the \var{value}
|
|
attribute of \var{object}.
|
|
\end{itemize}
|
|
|
|
For example, when you write \code{obj.x}, the steps that Python
|
|
actually performs are:
|
|
|
|
\begin{verbatim}
|
|
descriptor = obj.__class__.x
|
|
descriptor.__get__(obj)
|
|
\end{verbatim}
|
|
|
|
For methods, \method{descriptor.__get__} returns a temporary object that's
|
|
callable, and wraps up the instance and the method to be called on it.
|
|
This is also why static methods and class methods are now possible;
|
|
they have descriptors that wrap up just the method, or the method and
|
|
the class. As a brief explanation of these new kinds of methods,
|
|
static methods aren't passed the instance, and therefore resemble
|
|
regular functions. Class methods are passed the class of the object,
|
|
but not the object itself. Static and class methods are defined like
|
|
this:
|
|
|
|
\begin{verbatim}
|
|
class C(object):
|
|
def f(arg1, arg2):
|
|
...
|
|
f = staticmethod(f)
|
|
|
|
def g(cls, arg1, arg2):
|
|
...
|
|
g = classmethod(g)
|
|
\end{verbatim}
|
|
|
|
The \function{staticmethod()} function takes the function
|
|
\function{f}, and returns it wrapped up in a descriptor so it can be
|
|
stored in the class object. You might expect there to be special
|
|
syntax for creating such methods (\code{def static f()},
|
|
\code{defstatic f()}, or something like that) but no such syntax has
|
|
been defined yet; that's been left for future versions of Python.
|
|
|
|
More new features, such as slots and properties, are also implemented
|
|
as new kinds of descriptors, and it's not difficult to write a
|
|
descriptor class that does something novel. For example, it would be
|
|
possible to write a descriptor class that made it possible to write
|
|
Eiffel-style preconditions and postconditions for a method. A class
|
|
that used this feature might be defined like this:
|
|
|
|
\begin{verbatim}
|
|
from eiffel import eiffelmethod
|
|
|
|
class C(object):
|
|
def f(self, arg1, arg2):
|
|
# The actual function
|
|
...
|
|
def pre_f(self):
|
|
# Check preconditions
|
|
...
|
|
def post_f(self):
|
|
# Check postconditions
|
|
...
|
|
|
|
f = eiffelmethod(f, pre_f, post_f)
|
|
\end{verbatim}
|
|
|
|
Note that a person using the new \function{eiffelmethod()} doesn't
|
|
have to understand anything about descriptors. This is why I think
|
|
the new features don't increase the basic complexity of the language.
|
|
There will be a few wizards who need to know about it in order to
|
|
write \function{eiffelmethod()} or the ZODB or whatever, but most
|
|
users will just write code on top of the resulting libraries and
|
|
ignore the implementation details.
|
|
|
|
|
|
\subsection{Multiple Inheritance: The Diamond Rule}
|
|
|
|
Multiple inheritance has also been made more useful through changing
|
|
the rules under which names are resolved. Consider this set of classes
|
|
(diagram taken from \pep{253} by Guido van Rossum):
|
|
|
|
\begin{verbatim}
|
|
class A:
|
|
^ ^ def save(self): ...
|
|
/ \
|
|
/ \
|
|
/ \
|
|
/ \
|
|
class B class C:
|
|
^ ^ def save(self): ...
|
|
\ /
|
|
\ /
|
|
\ /
|
|
\ /
|
|
class D
|
|
\end{verbatim}
|
|
|
|
The lookup rule for classic classes is simple but not very smart; the
|
|
base classes are searched depth-first, going from left to right. A
|
|
reference to \method{D.save} will search the classes \class{D},
|
|
\class{B}, and then \class{A}, where \method{save()} would be found
|
|
and returned. \method{C.save()} would never be found at all. This is
|
|
bad, because if \class{C}'s \method{save()} method is saving some
|
|
internal state specific to \class{C}, not calling it will result in
|
|
that state never getting saved.
|
|
|
|
New-style classes follow a different algorithm that's a bit more
|
|
complicated to explain, but does the right thing in this situation.
|
|
|
|
\begin{enumerate}
|
|
|
|
\item List all the base classes, following the classic lookup rule and
|
|
include a class multiple times if it's visited repeatedly. In the
|
|
above example, the list of visited classes is [\class{D}, \class{B},
|
|
\class{A}, \class{C}, \class{A}].
|
|
|
|
\item Scan the list for duplicated classes. If any are found, remove
|
|
all but one occurrence, leaving the \emph{last} one in the list. In
|
|
the above example, the list becomes [\class{D}, \class{B}, \class{C},
|
|
\class{A}] after dropping duplicates.
|
|
|
|
\end{enumerate}
|
|
|
|
Following this rule, referring to \method{D.save()} will return
|
|
\method{C.save()}, which is the behaviour we're after. This lookup
|
|
rule is the same as the one followed by Common Lisp. A new built-in
|
|
function, \function{super()}, provides a way to get at a class's
|
|
superclasses without having to reimplement Python's algorithm.
|
|
The most commonly used form will be
|
|
\function{super(\var{class}, \var{obj})}, which returns
|
|
a bound superclass object (not the actual class object). This form
|
|
will be used in methods to call a method in the superclass; for
|
|
example, \class{D}'s \method{save()} method would look like this:
|
|
|
|
\begin{verbatim}
|
|
class D:
|
|
def save (self):
|
|
# Call superclass .save()
|
|
super(D, self).save()
|
|
# Save D's private information here
|
|
...
|
|
\end{verbatim}
|
|
|
|
\function{super()} can also return unbound superclass objects
|
|
when called as \function{super(\var{class})} or
|
|
\function{super(\var{class1}, \var{class2})}, but this probably won't
|
|
often be useful.
|
|
|
|
|
|
\subsection{Attribute Access}
|
|
|
|
A fair number of sophisticated Python classes define hooks for
|
|
attribute access using \method{__getattr__}; most commonly this is
|
|
done for convenience, to make code more readable by automatically
|
|
mapping an attribute access such as \code{obj.parent} into a method
|
|
call such as \code{obj.get_parent()}. Python 2.2 adds some new ways
|
|
of controlling attribute access.
|
|
|
|
First, \method{__getattr__(\var{attr_name})} is still supported by
|
|
new-style classes, and nothing about it has changed. As before, it
|
|
will be called when an attempt is made to access \code{obj.foo} and no
|
|
attribute named \samp{foo} is found in the instance's dictionary.
|
|
|
|
New-style classes also support a new method,
|
|
\method{__getattribute__(\var{attr_name})}. The difference between
|
|
the two methods is that \method{__getattribute__} is \emph{always}
|
|
called whenever any attribute is accessed, while the old
|
|
\method{__getattr__} is only called if \samp{foo} isn't found in the
|
|
instance's dictionary.
|
|
|
|
However, Python 2.2's support for \dfn{properties} will often be a
|
|
simpler way to trap attribute references. Writing a
|
|
\method{__getattr__} method is complicated because to avoid recursion
|
|
you can't use regular attribute accesses inside them, and instead have
|
|
to mess around with the contents of \member{__dict__}.
|
|
\method{__getattr__} methods also end up being called by Python when
|
|
it checks for other methods such as \method{__repr__} or
|
|
\method{__coerce__}, and so have to be written with this in mind.
|
|
Finally, calling a function on every attribute access results in a
|
|
sizable performance loss.
|
|
|
|
\class{property} is a new built-in type that packages up three
|
|
functions that get, set, or delete an attribute, and a docstring. For
|
|
example, if you want to define a \member{size} attribute that's
|
|
computed, but also settable, you could write:
|
|
|
|
\begin{verbatim}
|
|
class C(object):
|
|
def get_size (self):
|
|
result = ... computation ...
|
|
return result
|
|
def set_size (self, size):
|
|
... compute something based on the size
|
|
and set internal state appropriately ...
|
|
|
|
# Define a property. The 'delete this attribute'
|
|
# method is defined as None, so the attribute
|
|
# can't be deleted.
|
|
size = property(get_size, set_size,
|
|
None,
|
|
"Storage size of this instance")
|
|
\end{verbatim}
|
|
|
|
That is certainly clearer and easier to write than a pair of
|
|
\method{__getattr__}/\method{__setattr__} methods that check for the
|
|
\member{size} attribute and handle it specially while retrieving all
|
|
other attributes from the instance's \member{__dict__}. Accesses to
|
|
\member{size} are also the only ones which have to perform the work of
|
|
calling a function, so references to other attributes run at
|
|
their usual speed.
|
|
|
|
Finally, it's possible to constrain the list of attributes that can be
|
|
referenced on an object using the new \member{__slots__} class attribute.
|
|
Python objects are usually very dynamic; at any time it's possible to
|
|
define a new attribute on an instance by just doing
|
|
\code{obj.new_attr=1}. This is flexible and convenient, but this
|
|
flexibility can also lead to bugs, as when you meant to write
|
|
\code{obj.template = 'a'} but made a typo and wrote
|
|
\code{obj.templtae} by accident.
|
|
|
|
A new-style class can define a class attribute named \member{__slots__}
|
|
to constrain the list of legal attribute names. An example will make
|
|
this clear:
|
|
|
|
\begin{verbatim}
|
|
>>> class C(object):
|
|
... __slots__ = ('template', 'name')
|
|
...
|
|
>>> obj = C()
|
|
>>> print obj.template
|
|
None
|
|
>>> obj.template = 'Test'
|
|
>>> print obj.template
|
|
Test
|
|
>>> obj.templtae = None
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
AttributeError: 'C' object has no attribute 'templtae'
|
|
\end{verbatim}
|
|
|
|
Note how you get an \exception{AttributeError} on the attempt to
|
|
assign to an attribute not listed in \member{__slots__}.
|
|
|
|
|
|
\subsection{Related Links}
|
|
\label{sect-rellinks}
|
|
|
|
This section has just been a quick overview of the new features,
|
|
giving enough of an explanation to start you programming, but many
|
|
details have been simplified or ignored. Where should you go to get a
|
|
more complete picture?
|
|
|
|
\url{http://www.python.org/2.2/descrintro.html} is a lengthy tutorial
|
|
introduction to the descriptor features, written by Guido van Rossum.
|
|
If my description has whetted your appetite, go read this tutorial
|
|
next, because it goes into much more detail about the new features
|
|
while still remaining quite easy to read.
|
|
|
|
Next, there are two relevant PEPs, \pep{252} and \pep{253}. \pep{252}
|
|
is titled "Making Types Look More Like Classes", and covers the
|
|
descriptor API. \pep{253} is titled "Subtyping Built-in Types", and
|
|
describes the changes to type objects that make it possible to subtype
|
|
built-in objects. \pep{253} is the more complicated PEP of the two,
|
|
and at a few points the necessary explanations of types and meta-types
|
|
may cause your head to explode. Both PEPs were written and
|
|
implemented by Guido van Rossum, with substantial assistance from the
|
|
rest of the Zope Corp. team.
|
|
|
|
Finally, there's the ultimate authority: the source code. Most of the
|
|
machinery for the type handling is in \file{Objects/typeobject.c}, but
|
|
you should only resort to it after all other avenues have been
|
|
exhausted, including posting a question to python-list or python-dev.
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 234: Iterators}
|
|
|
|
Another significant addition to 2.2 is an iteration interface at both
|
|
the C and Python levels. Objects can define how they can be looped
|
|
over by callers.
|
|
|
|
In Python versions up to 2.1, the usual way to make \code{for item in
|
|
obj} work is to define a \method{__getitem__()} method that looks
|
|
something like this:
|
|
|
|
\begin{verbatim}
|
|
def __getitem__(self, index):
|
|
return <next item>
|
|
\end{verbatim}
|
|
|
|
\method{__getitem__()} is more properly used to define an indexing
|
|
operation on an object so that you can write \code{obj[5]} to retrieve
|
|
the sixth element. It's a bit misleading when you're using this only
|
|
to support \keyword{for} loops. Consider some file-like object that
|
|
wants to be looped over; the \var{index} parameter is essentially
|
|
meaningless, as the class probably assumes that a series of
|
|
\method{__getitem__()} calls will be made with \var{index}
|
|
incrementing by one each time. In other words, the presence of the
|
|
\method{__getitem__()} method doesn't mean that using \code{file[5]}
|
|
to randomly access the sixth element will work, though it really should.
|
|
|
|
In Python 2.2, iteration can be implemented separately, and
|
|
\method{__getitem__()} methods can be limited to classes that really
|
|
do support random access. The basic idea of iterators is
|
|
simple. A new built-in function, \function{iter(obj)} or
|
|
\code{iter(\var{C}, \var{sentinel})}, is used to get an iterator.
|
|
\function{iter(obj)} returns an iterator for the object \var{obj},
|
|
while \code{iter(\var{C}, \var{sentinel})} returns an iterator that
|
|
will invoke the callable object \var{C} until it returns
|
|
\var{sentinel} to signal that the iterator is done.
|
|
|
|
Python classes can define an \method{__iter__()} method, which should
|
|
create and return a new iterator for the object; if the object is its
|
|
own iterator, this method can just return \code{self}. In particular,
|
|
iterators will usually be their own iterators. Extension types
|
|
implemented in C can implement a \code{tp_iter} function in order to
|
|
return an iterator, and extension types that want to behave as
|
|
iterators can define a \code{tp_iternext} function.
|
|
|
|
So, after all this, what do iterators actually do? They have one
|
|
required method, \method{next()}, which takes no arguments and returns
|
|
the next value. When there are no more values to be returned, calling
|
|
\method{next()} should raise the \exception{StopIteration} exception.
|
|
|
|
\begin{verbatim}
|
|
>>> L = [1,2,3]
|
|
>>> i = iter(L)
|
|
>>> print i
|
|
<iterator object at 0x8116870>
|
|
>>> i.next()
|
|
1
|
|
>>> i.next()
|
|
2
|
|
>>> i.next()
|
|
3
|
|
>>> i.next()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
StopIteration
|
|
>>>
|
|
\end{verbatim}
|
|
|
|
In 2.2, Python's \keyword{for} statement no longer expects a sequence;
|
|
it expects something for which \function{iter()} will return an iterator.
|
|
For backward compatibility and convenience, an iterator is
|
|
automatically constructed for sequences that don't implement
|
|
\method{__iter__()} or a \code{tp_iter} slot, so \code{for i in
|
|
[1,2,3]} will still work. Wherever the Python interpreter loops over
|
|
a sequence, it's been changed to use the iterator protocol. This
|
|
means you can do things like this:
|
|
|
|
\begin{verbatim}
|
|
>>> L = [1,2,3]
|
|
>>> i = iter(L)
|
|
>>> a,b,c = i
|
|
>>> a,b,c
|
|
(1, 2, 3)
|
|
\end{verbatim}
|
|
|
|
Iterator support has been added to some of Python's basic types.
|
|
Calling \function{iter()} on a dictionary will return an iterator
|
|
which loops over its keys:
|
|
|
|
\begin{verbatim}
|
|
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
|
|
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
|
|
>>> for key in m: print key, m[key]
|
|
...
|
|
Mar 3
|
|
Feb 2
|
|
Aug 8
|
|
Sep 9
|
|
May 5
|
|
Jun 6
|
|
Jul 7
|
|
Jan 1
|
|
Apr 4
|
|
Nov 11
|
|
Dec 12
|
|
Oct 10
|
|
\end{verbatim}
|
|
|
|
That's just the default behaviour. If you want to iterate over keys,
|
|
values, or key/value pairs, you can explicitly call the
|
|
\method{iterkeys()}, \method{itervalues()}, or \method{iteritems()}
|
|
methods to get an appropriate iterator. In a minor related change,
|
|
the \keyword{in} operator now works on dictionaries, so
|
|
\code{\var{key} in dict} is now equivalent to
|
|
\code{dict.has_key(\var{key})}.
|
|
|
|
Files also provide an iterator, which calls the \method{readline()}
|
|
method until there are no more lines in the file. This means you can
|
|
now read each line of a file using code like this:
|
|
|
|
\begin{verbatim}
|
|
for line in file:
|
|
# do something for each line
|
|
...
|
|
\end{verbatim}
|
|
|
|
Note that you can only go forward in an iterator; there's no way to
|
|
get the previous element, reset the iterator, or make a copy of it.
|
|
An iterator object could provide such additional capabilities, but the
|
|
iterator protocol only requires a \method{next()} method.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{234}{Iterators}{Written by Ka-Ping Yee and GvR; implemented
|
|
by the Python Labs crew, mostly by GvR and Tim Peters.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 255: Simple Generators}
|
|
|
|
Generators are another new feature, one that interacts with the
|
|
introduction of iterators.
|
|
|
|
You're doubtless familiar with how function calls work in Python or
|
|
C. When you call a function, it gets a private namespace where its local
|
|
variables are created. When the function reaches a \keyword{return}
|
|
statement, the local variables are destroyed and the resulting value
|
|
is returned to the caller. A later call to the same function will get
|
|
a fresh new set of local variables. But, what if the local variables
|
|
weren't thrown away on exiting a function? What if you could later
|
|
resume the function where it left off? This is what generators
|
|
provide; they can be thought of as resumable functions.
|
|
|
|
Here's the simplest example of a generator function:
|
|
|
|
\begin{verbatim}
|
|
def generate_ints(N):
|
|
for i in range(N):
|
|
yield i
|
|
\end{verbatim}
|
|
|
|
A new keyword, \keyword{yield}, was introduced for generators. Any
|
|
function containing a \keyword{yield} statement is a generator
|
|
function; this is detected by Python's bytecode compiler which
|
|
compiles the function specially as a result. Because a new keyword was
|
|
introduced, generators must be explicitly enabled in a module by
|
|
including a \code{from __future__ import generators} statement near
|
|
the top of the module's source code. In Python 2.3 this statement
|
|
will become unnecessary.
|
|
|
|
When you call a generator function, it doesn't return a single value;
|
|
instead it returns a generator object that supports the iterator
|
|
protocol. On executing the \keyword{yield} statement, the generator
|
|
outputs the value of \code{i}, similar to a \keyword{return}
|
|
statement. The big difference between \keyword{yield} and a
|
|
\keyword{return} statement is that on reaching a \keyword{yield} the
|
|
generator's state of execution is suspended and local variables are
|
|
preserved. On the next call to the generator's \code{.next()} method,
|
|
the function will resume executing immediately after the
|
|
\keyword{yield} statement. (For complicated reasons, the
|
|
\keyword{yield} statement isn't allowed inside the \keyword{try} block
|
|
of a \code{try...finally} statement; read \pep{255} for a full
|
|
explanation of the interaction between \keyword{yield} and
|
|
exceptions.)
|
|
|
|
Here's a sample usage of the \function{generate_ints} generator:
|
|
|
|
\begin{verbatim}
|
|
>>> gen = generate_ints(3)
|
|
>>> gen
|
|
<generator object at 0x8117f90>
|
|
>>> gen.next()
|
|
0
|
|
>>> gen.next()
|
|
1
|
|
>>> gen.next()
|
|
2
|
|
>>> gen.next()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
File "<stdin>", line 2, in generate_ints
|
|
StopIteration
|
|
\end{verbatim}
|
|
|
|
You could equally write \code{for i in generate_ints(5)}, or
|
|
\code{a,b,c = generate_ints(3)}.
|
|
|
|
Inside a generator function, the \keyword{return} statement can only
|
|
be used without a value, and signals the end of the procession of
|
|
values; afterwards the generator cannot return any further values.
|
|
\keyword{return} with a value, such as \code{return 5}, is a syntax
|
|
error inside a generator function. The end of the generator's results
|
|
can also be indicated by raising \exception{StopIteration} manually,
|
|
or by just letting the flow of execution fall off the bottom of the
|
|
function.
|
|
|
|
You could achieve the effect of generators manually by writing your
|
|
own class and storing all the local variables of the generator as
|
|
instance variables. For example, returning a list of integers could
|
|
be done by setting \code{self.count} to 0, and having the
|
|
\method{next()} method increment \code{self.count} and return it.
|
|
However, for a moderately complicated generator, writing a
|
|
corresponding class would be much messier.
|
|
\file{Lib/test/test_generators.py} contains a number of more
|
|
interesting examples. The simplest one implements an in-order
|
|
traversal of a tree using generators recursively.
|
|
|
|
\begin{verbatim}
|
|
# A recursive generator that generates Tree leaves in in-order.
|
|
def inorder(t):
|
|
if t:
|
|
for x in inorder(t.left):
|
|
yield x
|
|
yield t.label
|
|
for x in inorder(t.right):
|
|
yield x
|
|
\end{verbatim}
|
|
|
|
Two other examples in \file{Lib/test/test_generators.py} produce
|
|
solutions for the N-Queens problem (placing $N$ queens on an $NxN$
|
|
chess board so that no queen threatens another) and the Knight's Tour
|
|
(a route that takes a knight to every square of an $NxN$ chessboard
|
|
without visiting any square twice).
|
|
|
|
The idea of generators comes from other programming languages,
|
|
especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
|
|
idea of generators is central. In Icon, every
|
|
expression and function call behaves like a generator. One example
|
|
from ``An Overview of the Icon Programming Language'' at
|
|
\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
|
|
what this looks like:
|
|
|
|
\begin{verbatim}
|
|
sentence := "Store it in the neighboring harbor"
|
|
if (i := find("or", sentence)) > 5 then write(i)
|
|
\end{verbatim}
|
|
|
|
In Icon the \function{find()} function returns the indexes at which the
|
|
substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
|
|
\code{i} is first assigned a value of 3, but 3 is less than 5, so the
|
|
comparison fails, and Icon retries it with the second value of 23. 23
|
|
is greater than 5, so the comparison now succeeds, and the code prints
|
|
the value 23 to the screen.
|
|
|
|
Python doesn't go nearly as far as Icon in adopting generators as a
|
|
central concept. Generators are considered a new part of the core
|
|
Python language, but learning or using them isn't compulsory; if they
|
|
don't solve any problems that you have, feel free to ignore them.
|
|
One novel feature of Python's interface as compared to
|
|
Icon's is that a generator's state is represented as a concrete object
|
|
(the iterator) that can be passed around to other functions or stored
|
|
in a data structure.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
|
|
Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
|
|
and Tim Peters, with other fixes from the Python Labs crew.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 237: Unifying Long Integers and Integers}
|
|
|
|
In recent versions, the distinction between regular integers, which
|
|
are 32-bit values on most machines, and long integers, which can be of
|
|
arbitrary size, was becoming an annoyance. For example, on platforms
|
|
that support files larger than \code{2**32} bytes, the
|
|
\method{tell()} method of file objects has to return a long integer.
|
|
However, there were various bits of Python that expected plain
|
|
integers and would raise an error if a long integer was provided
|
|
instead. For example, in Python 1.5, only regular integers
|
|
could be used as a slice index, and \code{'abc'[1L:]} would raise a
|
|
\exception{TypeError} exception with the message 'slice index must be
|
|
int'.
|
|
|
|
Python 2.2 will shift values from short to long integers as required.
|
|
The 'L' suffix is no longer needed to indicate a long integer literal,
|
|
as now the compiler will choose the appropriate type. (Using the 'L'
|
|
suffix will be discouraged in future 2.x versions of Python,
|
|
triggering a warning in Python 2.4, and probably dropped in Python
|
|
3.0.) Many operations that used to raise an \exception{OverflowError}
|
|
will now return a long integer as their result. For example:
|
|
|
|
\begin{verbatim}
|
|
>>> 1234567890123
|
|
1234567890123L
|
|
>>> 2 ** 64
|
|
18446744073709551616L
|
|
\end{verbatim}
|
|
|
|
In most cases, integers and long integers will now be treated
|
|
identically. You can still distinguish them with the
|
|
\function{type()} built-in function, but that's rarely needed.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{237}{Unifying Long Integers and Integers}{Written by
|
|
Moshe Zadka and Guido van Rossum. Implemented mostly by Guido van
|
|
Rossum.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 238: Changing the Division Operator}
|
|
|
|
The most controversial change in Python 2.2 heralds the start of an effort
|
|
to fix an old design flaw that's been in Python from the beginning.
|
|
Currently Python's division operator, \code{/}, behaves like C's
|
|
division operator when presented with two integer arguments: it
|
|
returns an integer result that's truncated down when there would be
|
|
a fractional part. For example, \code{3/2} is 1, not 1.5, and
|
|
\code{(-1)/2} is -1, not -0.5. This means that the results of divison
|
|
can vary unexpectedly depending on the type of the two operands and
|
|
because Python is dynamically typed, it can be difficult to determine
|
|
the possible types of the operands.
|
|
|
|
(The controversy is over whether this is \emph{really} a design flaw,
|
|
and whether it's worth breaking existing code to fix this. It's
|
|
caused endless discussions on python-dev, and in July 2001 erupted into an
|
|
storm of acidly sarcastic postings on \newsgroup{comp.lang.python}. I
|
|
won't argue for either side here and will stick to describing what's
|
|
implemented in 2.2. Read \pep{238} for a summary of arguments and
|
|
counter-arguments.)
|
|
|
|
Because this change might break code, it's being introduced very
|
|
gradually. Python 2.2 begins the transition, but the switch won't be
|
|
complete until Python 3.0.
|
|
|
|
First, I'll borrow some terminology from \pep{238}. ``True division'' is the
|
|
division that most non-programmers are familiar with: 3/2 is 1.5, 1/4
|
|
is 0.25, and so forth. ``Floor division'' is what Python's \code{/}
|
|
operator currently does when given integer operands; the result is the
|
|
floor of the value returned by true division. ``Classic division'' is
|
|
the current mixed behaviour of \code{/}; it returns the result of
|
|
floor division when the operands are integers, and returns the result
|
|
of true division when one of the operands is a floating-point number.
|
|
|
|
Here are the changes 2.2 introduces:
|
|
|
|
\begin{itemize}
|
|
|
|
\item A new operator, \code{//}, is the floor division operator.
|
|
(Yes, we know it looks like \Cpp's comment symbol.) \code{//}
|
|
\emph{always} performs floor division no matter what the types of
|
|
its operands are, so \code{1 // 2} is 0 and \code{1.0 // 2.0} is also
|
|
0.0.
|
|
|
|
\code{//} is always available in Python 2.2; you don't need to enable
|
|
it using a \code{__future__} statement.
|
|
|
|
\item By including a \code{from __future__ import division} in a
|
|
module, the \code{/} operator will be changed to return the result of
|
|
true division, so \code{1/2} is 0.5. Without the \code{__future__}
|
|
statement, \code{/} still means classic division. The default meaning
|
|
of \code{/} will not change until Python 3.0.
|
|
|
|
\item Classes can define methods called \method{__truediv__} and
|
|
\method{__floordiv__} to overload the two division operators. At the
|
|
C level, there are also slots in the \code{PyNumberMethods} structure
|
|
so extension types can define the two operators.
|
|
|
|
\item Python 2.2 supports some command-line arguments for testing
|
|
whether code will works with the changed division semantics. Running
|
|
python with \programopt{-Q warn} will cause a warning to be issued
|
|
whenever division is applied to two integers. You can use this to
|
|
find code that's affected by the change and fix it. By default,
|
|
Python 2.2 will simply perform classic division without a warning; the
|
|
warning will be turned on by default in Python 2.3.
|
|
|
|
\end{itemize}
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{238}{Changing the Division Operator}{Written by Moshe Zadka and
|
|
Guido van Rossum. Implemented by Guido van Rossum..}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{Unicode Changes}
|
|
|
|
Python's Unicode support has been enhanced a bit in 2.2. Unicode
|
|
strings are usually stored as UCS-2, as 16-bit unsigned integers.
|
|
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
|
|
integers, as its internal encoding by supplying
|
|
\longprogramopt{enable-unicode=ucs4} to the configure script.
|
|
(It's also possible to specify
|
|
\longprogramopt{disable-unicode} to completely disable Unicode
|
|
support.)
|
|
|
|
When built to use UCS-4 (a ``wide Python''), the interpreter can
|
|
natively handle Unicode characters from U+000000 to U+110000, so the
|
|
range of legal values for the \function{unichr()} function is expanded
|
|
accordingly. Using an interpreter compiled to use UCS-2 (a ``narrow
|
|
Python''), values greater than 65535 will still cause
|
|
\function{unichr()} to raise a \exception{ValueError} exception.
|
|
This is all described in \pep{261}, ``Support for `wide' Unicode
|
|
characters''; consult it for further details.
|
|
|
|
Another change is simpler to explain. Since their introduction,
|
|
Unicode strings have supported an \method{encode()} method to convert
|
|
the string to a selected encoding such as UTF-8 or Latin-1. A
|
|
symmetric \method{decode(\optional{\var{encoding}})} method has been
|
|
added to 8-bit strings (though not to Unicode strings) in 2.2.
|
|
\method{decode()} assumes that the string is in the specified encoding
|
|
and decodes it, returning whatever is returned by the codec.
|
|
|
|
Using this new feature, codecs have been added for tasks not directly
|
|
related to Unicode. For example, codecs have been added for
|
|
uu-encoding, MIME's base64 encoding, and compression with the
|
|
\module{zlib} module:
|
|
|
|
\begin{verbatim}
|
|
>>> s = """Here is a lengthy piece of redundant, overly verbose,
|
|
... and repetitive text.
|
|
... """
|
|
>>> data = s.encode('zlib')
|
|
>>> data
|
|
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
|
|
>>> data.decode('zlib')
|
|
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
|
|
>>> print s.encode('uu')
|
|
begin 666 <data>
|
|
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
|
|
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*
|
|
|
|
end
|
|
>>> "sheesh".encode('rot-13')
|
|
'furrfu'
|
|
\end{verbatim}
|
|
|
|
To convert a class instance to Unicode, a \method{__unicode__} method
|
|
can be defined by a class, analogous to \method{__str__}.
|
|
|
|
\method{encode()}, \method{decode()}, and \method{__unicode__} were
|
|
implemented by Marc-Andr\'e Lemburg. The changes to support using
|
|
UCS-4 internally were implemented by Fredrik Lundh and Martin von
|
|
L\"owis.
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{261}{Support for `wide' Unicode characters}{Written by
|
|
Paul Prescod.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{PEP 227: Nested Scopes}
|
|
|
|
In Python 2.1, statically nested scopes were added as an optional
|
|
feature, to be enabled by a \code{from __future__ import
|
|
nested_scopes} directive. In 2.2 nested scopes no longer need to be
|
|
specially enabled, and are now always present. The rest of this section
|
|
is a copy of the description of nested scopes from my ``What's New in
|
|
Python 2.1'' document; if you read it when 2.1 came out, you can skip
|
|
the rest of this section.
|
|
|
|
The largest change introduced in Python 2.1, and made complete in 2.2,
|
|
is to Python's scoping rules. In Python 2.0, at any given time there
|
|
are at most three namespaces used to look up variable names: local,
|
|
module-level, and the built-in namespace. This often surprised people
|
|
because it didn't match their intuitive expectations. For example, a
|
|
nested recursive function definition doesn't work:
|
|
|
|
\begin{verbatim}
|
|
def f():
|
|
...
|
|
def g(value):
|
|
...
|
|
return g(value-1) + 1
|
|
...
|
|
\end{verbatim}
|
|
|
|
The function \function{g()} will always raise a \exception{NameError}
|
|
exception, because the binding of the name \samp{g} isn't in either
|
|
its local namespace or in the module-level namespace. This isn't much
|
|
of a problem in practice (how often do you recursively define interior
|
|
functions like this?), but this also made using the \keyword{lambda}
|
|
statement clumsier, and this was a problem in practice. In code which
|
|
uses \keyword{lambda} you can often find local variables being copied
|
|
by passing them as the default values of arguments.
|
|
|
|
\begin{verbatim}
|
|
def find(self, name):
|
|
"Return list of any entries equal to 'name'"
|
|
L = filter(lambda x, name=name: x == name,
|
|
self.list_attribute)
|
|
return L
|
|
\end{verbatim}
|
|
|
|
The readability of Python code written in a strongly functional style
|
|
suffers greatly as a result.
|
|
|
|
The most significant change to Python 2.2 is that static scoping has
|
|
been added to the language to fix this problem. As a first effect,
|
|
the \code{name=name} default argument is now unnecessary in the above
|
|
example. Put simply, when a given variable name is not assigned a
|
|
value within a function (by an assignment, or the \keyword{def},
|
|
\keyword{class}, or \keyword{import} statements), references to the
|
|
variable will be looked up in the local namespace of the enclosing
|
|
scope. A more detailed explanation of the rules, and a dissection of
|
|
the implementation, can be found in the PEP.
|
|
|
|
This change may cause some compatibility problems for code where the
|
|
same variable name is used both at the module level and as a local
|
|
variable within a function that contains further function definitions.
|
|
This seems rather unlikely though, since such code would have been
|
|
pretty confusing to read in the first place.
|
|
|
|
One side effect of the change is that the \code{from \var{module}
|
|
import *} and \keyword{exec} statements have been made illegal inside
|
|
a function scope under certain conditions. The Python reference
|
|
manual has said all along that \code{from \var{module} import *} is
|
|
only legal at the top level of a module, but the CPython interpreter
|
|
has never enforced this before. As part of the implementation of
|
|
nested scopes, the compiler which turns Python source into bytecodes
|
|
has to generate different code to access variables in a containing
|
|
scope. \code{from \var{module} import *} and \keyword{exec} make it
|
|
impossible for the compiler to figure this out, because they add names
|
|
to the local namespace that are unknowable at compile time.
|
|
Therefore, if a function contains function definitions or
|
|
\keyword{lambda} expressions with free variables, the compiler will
|
|
flag this by raising a \exception{SyntaxError} exception.
|
|
|
|
To make the preceding explanation a bit clearer, here's an example:
|
|
|
|
\begin{verbatim}
|
|
x = 1
|
|
def f():
|
|
# The next line is a syntax error
|
|
exec 'x=2'
|
|
def g():
|
|
return x
|
|
\end{verbatim}
|
|
|
|
Line 4 containing the \keyword{exec} statement is a syntax error,
|
|
since \keyword{exec} would define a new local variable named \samp{x}
|
|
whose value should be accessed by \function{g()}.
|
|
|
|
This shouldn't be much of a limitation, since \keyword{exec} is rarely
|
|
used in most Python code (and when it is used, it's often a sign of a
|
|
poor design anyway).
|
|
|
|
\begin{seealso}
|
|
|
|
\seepep{227}{Statically Nested Scopes}{Written and implemented by
|
|
Jeremy Hylton.}
|
|
|
|
\end{seealso}
|
|
|
|
|
|
%======================================================================
|
|
\section{New and Improved Modules}
|
|
|
|
\begin{itemize}
|
|
|
|
\item The \module{xmlrpclib} module was contributed to the standard
|
|
library by Fredrik Lundh, provding support for writing XML-RPC
|
|
clients. XML-RPC is a simple remote procedure call protocol built on
|
|
top of HTTP and XML. For example, the following snippet retrieves a
|
|
list of RSS channels from the O'Reilly Network, and then
|
|
lists the recent headlines for one channel:
|
|
|
|
\begin{verbatim}
|
|
import xmlrpclib
|
|
s = xmlrpclib.Server(
|
|
'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
|
|
channels = s.meerkat.getChannels()
|
|
# channels is a list of dictionaries, like this:
|
|
# [{'id': 4, 'title': 'Freshmeat Daily News'}
|
|
# {'id': 190, 'title': '32Bits Online'},
|
|
# {'id': 4549, 'title': '3DGamers'}, ... ]
|
|
|
|
# Get the items for one channel
|
|
items = s.meerkat.getItems( {'channel': 4} )
|
|
|
|
# 'items' is another list of dictionaries, like this:
|
|
# [{'link': 'http://freshmeat.net/releases/52719/',
|
|
# 'description': 'A utility which converts HTML to XSL FO.',
|
|
# 'title': 'html2fo 0.3 (Default)'}, ... ]
|
|
\end{verbatim}
|
|
|
|
The \module{SimpleXMLRPCServer} module makes it easy to create
|
|
straightforward XML-RPC servers. See \url{http://www.xmlrpc.com/} for
|
|
more information about XML-RPC.
|
|
|
|
\item The new \module{hmac} module implements the HMAC
|
|
algorithm described by \rfc{2104}.
|
|
(Contributed by Gerhard H\"aring.)
|
|
|
|
\item Several functions that originally returned lengthy tuples now
|
|
return pseudo-sequences that still behave like tuples but also have
|
|
mnemonic attributes such as member{st_mtime} or \member{tm_year}.
|
|
The enhanced functions include \function{stat()},
|
|
\function{fstat()}, \function{statvfs()}, and \function{fstatvfs()}
|
|
in the \module{os} module, and \function{localtime()},
|
|
\function{gmtime()}, and \function{strptime()} in the \module{time}
|
|
module.
|
|
|
|
For example, to obtain a file's size using the old tuples, you'd end
|
|
up writing something like \code{file_size =
|
|
os.stat(filename)[stat.ST_SIZE]}, but now this can be written more
|
|
clearly as \code{file_size = os.stat(filename).st_size}.
|
|
|
|
The original patch for this feature was contributed by Nick Mathewson.
|
|
|
|
\item The Python profiler has been extensively reworked and various
|
|
errors in its output have been corrected. (Contributed by Fred
|
|
Fred~L. Drake, Jr. and Tim Peters.)
|
|
|
|
\item The \module{socket} module can be compiled to support IPv6;
|
|
specify the \longprogramopt{enable-ipv6} option to Python's configure
|
|
script. (Contributed by Jun-ichiro ``itojun'' Hagino.)
|
|
|
|
\item Two new format characters were added to the \module{struct}
|
|
module for 64-bit integers on platforms that support the C
|
|
\ctype{long long} type. \samp{q} is for a signed 64-bit integer,
|
|
and \samp{Q} is for an unsigned one. The value is returned in
|
|
Python's long integer type. (Contributed by Tim Peters.)
|
|
|
|
\item In the interpreter's interactive mode, there's a new built-in
|
|
function \function{help()} that uses the \module{pydoc} module
|
|
introduced in Python 2.1 to provide interactive help.
|
|
\code{help(\var{object})} displays any available help text about
|
|
\var{object}. \code{help()} with no argument puts you in an online
|
|
help utility, where you can enter the names of functions, classes,
|
|
or modules to read their help text.
|
|
(Contributed by Guido van Rossum, using Ka-Ping Yee's \module{pydoc} module.)
|
|
|
|
\item Various bugfixes and performance improvements have been made
|
|
to the SRE engine underlying the \module{re} module. For example,
|
|
the \function{re.sub()} and \function{re.split()} functions have
|
|
been rewritten in C. Another contributed patch speeds up certain
|
|
Unicode character ranges by a factor of two, and a new \method{finditer()}
|
|
method that returns an iterator over all the non-overlapping matches in
|
|
a given string.
|
|
(SRE is maintained by
|
|
Fredrik Lundh. The BIGCHARSET patch was contributed by Martin von
|
|
L\"owis.)
|
|
|
|
\item The \module{smtplib} module now supports \rfc{2487}, ``Secure
|
|
SMTP over TLS'', so it's now possible to encrypt the SMTP traffic
|
|
between a Python program and the mail transport agent being handed a
|
|
message. \module{smtplib} also supports SMTP authentication.
|
|
(Contributed by Gerhard H\"aring.)
|
|
|
|
\item The \module{imaplib} module, maintained by Piers Lauder, has
|
|
support for several new extensions: the NAMESPACE extension defined
|
|
in \rfc{2342}, SORT, GETACL and SETACL. (Contributed by Anthony
|
|
Baxter and Michel Pelletier.)
|
|
|
|
% XXX should the 'email' module get a section of its own?
|
|
\item The \module{rfc822} module's parsing of email addresses is now
|
|
compliant with \rfc{2822}, an update to \rfc{822}. (The module's
|
|
name is \emph{not} going to be changed to \samp{rfc2822}.) A new
|
|
package, \module{email}, has also been added for parsing and
|
|
generating e-mail messages. (Contributed by Barry Warsaw, and
|
|
arising out of his work on Mailman.)
|
|
|
|
\item The \module{difflib} module now contains a new \class{Differ}
|
|
class for producing human-readable lists of changes (a ``delta'')
|
|
between two sequences of lines of text. There are also two
|
|
generator functions, \function{ndiff()} and \function{restore()},
|
|
which respectively return a delta from two sequences, or one of the
|
|
original sequences from a delta. (Grunt work contributed by David
|
|
Goodger, from ndiff.py code by Tim Peters who then did the
|
|
generatorization.)
|
|
|
|
\item New constants \constant{ascii_letters},
|
|
\constant{ascii_lowercase}, and \constant{ascii_uppercase} were
|
|
added to the \module{string} module. There were several modules in
|
|
the standard library that used \constant{string.letters} to mean the
|
|
ranges A-Za-z, but that assumption is incorrect when locales are in
|
|
use, because \constant{string.letters} varies depending on the set
|
|
of legal characters defined by the current locale. The buggy
|
|
modules have all been fixed to use \constant{ascii_letters} instead.
|
|
(Reported by an unknown person; fixed by Fred~L. Drake, Jr.)
|
|
|
|
\item The \module{mimetypes} module now makes it easier to use
|
|
alternative MIME-type databases by the addition of a
|
|
\class{MimeTypes} class, which takes a list of filenames to be
|
|
parsed. (Contributed by Fred~L. Drake, Jr.)
|
|
|
|
\item A \class{Timer} class was added to the \module{threading}
|
|
module that allows scheduling an activity to happen at some future
|
|
time. (Contributed by Itamar Shtull-Trauring.)
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\section{Interpreter Changes and Fixes}
|
|
|
|
Some of the changes only affect people who deal with the Python
|
|
interpreter at the C level because they're writing Python extension modules,
|
|
embedding the interpreter, or just hacking on the interpreter itself.
|
|
If you only write Python code, none of the changes described here will
|
|
affect you very much.
|
|
|
|
\begin{itemize}
|
|
|
|
\item Profiling and tracing functions can now be implemented in C,
|
|
which can operate at much higher speeds than Python-based functions
|
|
and should reduce the overhead of profiling and tracing. This
|
|
will be of interest to authors of development environments for
|
|
Python. Two new C functions were added to Python's API,
|
|
\cfunction{PyEval_SetProfile()} and \cfunction{PyEval_SetTrace()}.
|
|
The existing \function{sys.setprofile()} and
|
|
\function{sys.settrace()} functions still exist, and have simply
|
|
been changed to use the new C-level interface. (Contributed by Fred
|
|
L. Drake, Jr.)
|
|
|
|
\item Another low-level API, primarily of interest to implementors
|
|
of Python debuggers and development tools, was added.
|
|
\cfunction{PyInterpreterState_Head()} and
|
|
\cfunction{PyInterpreterState_Next()} let a caller walk through all
|
|
the existing interpreter objects;
|
|
\cfunction{PyInterpreterState_ThreadHead()} and
|
|
\cfunction{PyThreadState_Next()} allow looping over all the thread
|
|
states for a given interpreter. (Contributed by David Beazley.)
|
|
|
|
\item A new \samp{et} format sequence was added to
|
|
\cfunction{PyArg_ParseTuple}; \samp{et} takes both a parameter and
|
|
an encoding name, and converts the parameter to the given encoding
|
|
if the parameter turns out to be a Unicode string, or leaves it
|
|
alone if it's an 8-bit string, assuming it to already be in the
|
|
desired encoding. This differs from the \samp{es} format character,
|
|
which assumes that 8-bit strings are in Python's default ASCII
|
|
encoding and converts them to the specified new encoding.
|
|
(Contributed by M.-A. Lemburg, and used for the MBCS support on
|
|
Windows described in the following section.)
|
|
|
|
\item A different argument parsing function,
|
|
\cfunction{PyArg_UnpackTuple()}, has been added that's simpler and
|
|
presumably faster. Instead of specifying a format string, the
|
|
caller simply gives the minimum and maximum number of arguments
|
|
expected, and a set of pointers to \code{PyObject*} variables that
|
|
will be filled in with argument values.
|
|
|
|
\item Two new flags \constant{METH_NOARGS} and \constant{METH_O} are
|
|
available in method definition tables to simplify implementation of
|
|
methods with no arguments or a single untyped argument. Calling
|
|
such methods is more efficient than calling a corresponding method
|
|
that uses \constant{METH_VARARGS}.
|
|
Also, the old \constant{METH_OLDARGS} style of writing C methods is
|
|
now officially deprecated.
|
|
|
|
\item
|
|
Two new wrapper functions, \cfunction{PyOS_snprintf()} and
|
|
\cfunction{PyOS_vsnprintf()} were added to provide
|
|
cross-platform implementations for the relatively new
|
|
\cfunction{snprintf()} and \cfunction{vsnprintf()} C lib APIs. In
|
|
contrast to the standard \cfunction{sprintf()} and
|
|
\cfunction{vsprintf()} functions, the Python versions check the
|
|
bounds of the buffer used to protect against buffer overruns.
|
|
(Contributed by M.-A. Lemburg.)
|
|
|
|
\item The \cfunction{_PyTuple_Resize()} function has lost an unused
|
|
parameter, so now it takes 2 parameters instead of 3. The third
|
|
argument was never used, and can simply be discarded when porting
|
|
code from earlier versions to Python 2.2.
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\section{Other Changes and Fixes}
|
|
|
|
As usual there were a bunch of other improvements and bugfixes
|
|
scattered throughout the source tree. A search through the CVS change
|
|
logs finds there were 527 patches applied, and 683 bugs fixed; both
|
|
figures are likely to be underestimates. Some of the more notable
|
|
changes are:
|
|
|
|
\begin{itemize}
|
|
|
|
\item The code for the MacOS port for Python, maintained by Jack
|
|
Jansen, is now kept in the main Python CVS tree, and many changes
|
|
have been made to support MacOS~X.
|
|
|
|
The most significant change is the ability to build Python as a
|
|
framework, enabled by supplying the \longprogramopt{enable-framework}
|
|
option to the configure script when compiling Python. According to
|
|
Jack Jansen, ``This installs a self-contained Python installation plus
|
|
the OS~X framework "glue" into
|
|
\file{/Library/Frameworks/Python.framework} (or another location of
|
|
choice). For now there is little immediate added benefit to this
|
|
(actually, there is the disadvantage that you have to change your PATH
|
|
to be able to find Python), but it is the basis for creating a
|
|
full-blown Python application, porting the MacPython IDE, possibly
|
|
using Python as a standard OSA scripting language and much more.''
|
|
|
|
Most of the MacPython toolbox modules, which interface to MacOS APIs
|
|
such as windowing, QuickTime, scripting, etc. have been ported to OS~X,
|
|
but they've been left commented out in \file{setup.py}. People who want
|
|
to experiment with these modules can uncomment them manually.
|
|
|
|
% Jack's original comments:
|
|
%The main change is the possibility to build Python as a
|
|
%framework. This installs a self-contained Python installation plus the
|
|
%OSX framework "glue" into /Library/Frameworks/Python.framework (or
|
|
%another location of choice). For now there is little immedeate added
|
|
%benefit to this (actually, there is the disadvantage that you have to
|
|
%change your PATH to be able to find Python), but it is the basis for
|
|
%creating a fullblown Python application, porting the MacPython IDE,
|
|
%possibly using Python as a standard OSA scripting language and much
|
|
%more. You enable this with "configure --enable-framework".
|
|
|
|
%The other change is that most MacPython toolbox modules, which
|
|
%interface to all the MacOS APIs such as windowing, quicktime,
|
|
%scripting, etc. have been ported. Again, most of these are not of
|
|
%immedeate use, as they need a full application to be really useful, so
|
|
%they have been commented out in setup.py. People wanting to experiment
|
|
%can uncomment them. Gestalt and Internet Config modules are enabled by
|
|
%default.
|
|
|
|
\item Keyword arguments passed to builtin functions that don't take them
|
|
now cause a \exception{TypeError} exception to be raised, with the
|
|
message "\var{function} takes no keyword arguments".
|
|
|
|
\item Weak references, added in Python 2.1 as an extension module,
|
|
are now part of the core because they're used in the implementation
|
|
of new-style classes. The \exception{ReferenceError} exception has
|
|
therefore moved from the \module{weakref} module to become a
|
|
built-in exception.
|
|
|
|
\item A new script, \file{Tools/scripts/cleanfuture.py} by Tim
|
|
Peters, automatically removes obsolete \code{__future__} statements
|
|
from Python source code.
|
|
|
|
\item An additional \var{flags} argument has been added to the
|
|
built-in function \function{compile()}, so the behaviour of
|
|
\code{__future__} statements can now be correctly observed in
|
|
simulated shells, such as those presented by IDLE and other
|
|
development environments. This is described in \pep{264}.
|
|
(Contributed by Michael Hudson.)
|
|
|
|
\item The new license introduced with Python 1.6 wasn't
|
|
GPL-compatible. This is fixed by some minor textual changes to the
|
|
2.2 license, so it's now legal to embed Python inside a GPLed
|
|
program again. Note that Python itself is not GPLed, but instead is
|
|
under a license that's essentially equivalent to the BSD license,
|
|
same as it always was. The license changes were also applied to the
|
|
Python 2.0.1 and 2.1.1 releases.
|
|
|
|
\item When presented with a Unicode filename on Windows, Python will
|
|
now convert it to an MBCS encoded string, as used by the Microsoft
|
|
file APIs. As MBCS is explicitly used by the file APIs, Python's
|
|
choice of ASCII as the default encoding turns out to be an
|
|
annoyance. On Unix, the locale's character set is used if
|
|
\function{locale.nl_langinfo(CODESET)} is available. (Windows
|
|
support was contributed by Mark Hammond with assistance from
|
|
Marc-Andr\'e Lemburg. Unix support was added by Martin von L\"owis.)
|
|
|
|
\item Large file support is now enabled on Windows. (Contributed by
|
|
Tim Peters.)
|
|
|
|
\item The \file{Tools/scripts/ftpmirror.py} script
|
|
now parses a \file{.netrc} file, if you have one.
|
|
(Contributed by Mike Romberg.)
|
|
|
|
\item Some features of the object returned by the
|
|
\function{xrange()} function are now deprecated, and trigger
|
|
warnings when they're accessed; they'll disappear in Python 2.3.
|
|
\class{xrange} objects tried to pretend they were full sequence
|
|
types by supporting slicing, sequence multiplication, and the
|
|
\keyword{in} operator, but these features were rarely used and
|
|
therefore buggy. The \method{tolist()} method and the
|
|
\member{start}, \member{stop}, and \member{step} attributes are also
|
|
being deprecated. At the C level, the fourth argument to the
|
|
\cfunction{PyRange_New()} function, \samp{repeat}, has also been
|
|
deprecated.
|
|
|
|
\item There were a bunch of patches to the dictionary
|
|
implementation, mostly to fix potential core dumps if a dictionary
|
|
contains objects that sneakily changed their hash value, or mutated
|
|
the dictionary they were contained in. For a while python-dev fell
|
|
into a gentle rhythm of Michael Hudson finding a case that dumped
|
|
core, Tim Peters fixing the bug, Michael finding another case, and round
|
|
and round it went.
|
|
|
|
\item On Windows, Python can now be compiled with Borland C thanks
|
|
to a number of patches contributed by Stephen Hansen, though the
|
|
result isn't fully functional yet. (But this \emph{is} progress...)
|
|
|
|
\item Another Windows enhancement: Wise Solutions generously offered
|
|
PythonLabs use of their InstallerMaster 8.1 system. Earlier
|
|
PythonLabs Windows installers used Wise 5.0a, which was beginning to
|
|
show its age. (Packaged up by Tim Peters.)
|
|
|
|
\item Files ending in \samp{.pyw} can now be imported on Windows.
|
|
\samp{.pyw} is a Windows-only thing, used to indicate that a script
|
|
needs to be run using PYTHONW.EXE instead of PYTHON.EXE in order to
|
|
prevent a DOS console from popping up to display the output. This
|
|
patch makes it possible to import such scripts, in case they're also
|
|
usable as modules. (Implemented by David Bolen.)
|
|
|
|
\item On platforms where Python uses the C \cfunction{dlopen()} function
|
|
to load extension modules, it's now possible to set the flags used
|
|
by \cfunction{dlopen()} using the \function{sys.getdlopenflags()} and
|
|
\function{sys.setdlopenflags()} functions. (Contributed by Bram Stolk.)
|
|
|
|
\item The \function{pow()} built-in function no longer supports 3
|
|
arguments when floating-point numbers are supplied.
|
|
\code{pow(\var{x}, \var{y}, \var{z})} returns \code{(x**y) \% z}, but
|
|
this is never useful for floating point numbers, and the final
|
|
result varies unpredictably depending on the platform. A call such
|
|
as \code{pow(2.0, 8.0, 7.0)} will now raise a \exception{TypeError}
|
|
exception.
|
|
|
|
\end{itemize}
|
|
|
|
|
|
%======================================================================
|
|
\section{Acknowledgements}
|
|
|
|
The author would like to thank the following people for offering
|
|
suggestions, corrections and assistance with various drafts of this
|
|
article: Fred Bremmer, Keith Briggs, Andrew Dalke, Fred~L. Drake, Jr.,
|
|
Carel Fellinger, David Goodger, Mark Hammond, Stephen Hansen, Michael
|
|
Hudson, Jack Jansen, Marc-Andr\'e Lemburg, Martin von L\"owis, Fredrik
|
|
Lundh, Michael McLay, Nick Mathewson, Paul Moore, Gustavo Niemeyer,
|
|
Don O'Donnell, Tim Peters, Jens Quade, Tom Reinhardt, Neil
|
|
Schemenauer, Guido van Rossum, Greg Ward.
|
|
|
|
\end{document}
|