mirror of https://github.com/python/cpython.git
1270 lines
59 KiB
ReStructuredText
1270 lines
59 KiB
ReStructuredText
****************************
|
|
What's New in Python 2.2
|
|
****************************
|
|
|
|
:Author: A.M. Kuchling
|
|
|
|
.. |release| replace:: 1.02
|
|
|
|
.. $Id: whatsnew22.tex 37315 2004-09-10 19:33:00Z akuchling $
|
|
|
|
|
|
Introduction
|
|
============
|
|
|
|
This article explains the new features in Python 2.2.2, released on October 14,
|
|
2002. Python 2.2.2 is a bugfix release of Python 2.2, originally released on
|
|
December 21, 2001.
|
|
|
|
Python 2.2 can be thought of as the "cleanup release". There are some features
|
|
such as generators and iterators that are completely new, but most of the
|
|
changes, significant and far-reaching though they may be, are aimed at cleaning
|
|
up irregularities and dark corners of the language design.
|
|
|
|
This article doesn't attempt to provide a complete specification of the new
|
|
features, but instead provides a convenient overview. For full details, you
|
|
should refer to the documentation for Python 2.2, such as the `Python Library
|
|
Reference <http://www.python.org/doc/2.2/lib/lib.html>`_ and the `Python
|
|
Reference Manual <http://www.python.org/doc/2.2/ref/ref.html>`_. If you want to
|
|
understand the complete implementation and design rationale for a change, refer
|
|
to the PEP for a particular new feature.
|
|
|
|
|
|
.. see also, now defunct
|
|
|
|
http://www.unixreview.com/documents/s=1356/urm0109h/0109h.htm
|
|
"What's So Special About Python 2.2?" is also about the new 2.2 features, and
|
|
was written by Cameron Laird and Kathryn Soraiz.
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
PEPs 252 and 253: Type and Class Changes
|
|
========================================
|
|
|
|
The largest and most far-reaching changes in Python 2.2 are to Python's model of
|
|
objects and classes. The changes should be backward compatible, so it's likely
|
|
that your code will continue to run unchanged, but the changes provide some
|
|
amazing new capabilities. Before beginning this, the longest and most
|
|
complicated section of this article, I'll provide an overview of the changes and
|
|
offer some comments.
|
|
|
|
A long time ago I wrote a Web page listing flaws in Python's design. One of the
|
|
most significant flaws was that it's impossible to subclass Python types
|
|
implemented in C. In particular, it's not possible to subclass built-in types,
|
|
so you can't just subclass, say, lists in order to add a single useful method to
|
|
them. The :mod:`UserList` module provides a class that supports all of the
|
|
methods of lists and that can be subclassed further, but there's lots of C code
|
|
that expects a regular Python list and won't accept a :class:`UserList`
|
|
instance.
|
|
|
|
Python 2.2 fixes this, and in the process adds some exciting new capabilities.
|
|
A brief summary:
|
|
|
|
* You can subclass built-in types such as lists and even integers, and your
|
|
subclasses should work in every place that requires the original type.
|
|
|
|
* It's now possible to define static and class methods, in addition to the
|
|
instance methods available in previous versions of Python.
|
|
|
|
* It's also possible to automatically call methods on accessing or setting an
|
|
instance attribute by using a new mechanism called :dfn:`properties`. Many uses
|
|
of :meth:`__getattr__` can be rewritten to use properties instead, making the
|
|
resulting code simpler and faster. As a small side benefit, attributes can now
|
|
have docstrings, too.
|
|
|
|
* The list of legal attributes for an instance can be limited to a particular
|
|
set using :dfn:`slots`, making it possible to safeguard against typos and
|
|
perhaps make more optimizations possible in future versions of Python.
|
|
|
|
Some users have voiced concern about all these changes. Sure, they say, the new
|
|
features are neat and lend themselves to all sorts of tricks that weren't
|
|
possible in previous versions of Python, but they also make the language more
|
|
complicated. Some people have said that they've always recommended Python for
|
|
its simplicity, and feel that its simplicity is being lost.
|
|
|
|
Personally, I think there's no need to worry. Many of the new features are
|
|
quite esoteric, and you can write a lot of Python code without ever needed to be
|
|
aware of them. Writing a simple class is no more difficult than it ever was, so
|
|
you don't need to bother learning or teaching them unless they're actually
|
|
needed. Some very complicated tasks that were previously only possible from C
|
|
will now be possible in pure Python, and to my mind that's all for the better.
|
|
|
|
I'm not going to attempt to cover every single corner case and small change that
|
|
were required to make the new features work. Instead this section will paint
|
|
only the broad strokes. See section :ref:`sect-rellinks`, "Related Links", for
|
|
further sources of information about Python 2.2's new object model.
|
|
|
|
|
|
Old and New Classes
|
|
-------------------
|
|
|
|
First, you should know that Python 2.2 really has two kinds of classes: classic
|
|
or old-style classes, and new-style classes. The old-style class model is
|
|
exactly the same as the class model in earlier versions of Python. All the new
|
|
features described in this section apply only to new-style classes. This
|
|
divergence isn't intended to last forever; eventually old-style classes will be
|
|
dropped, possibly in Python 3.0.
|
|
|
|
So how do you define a new-style class? You do it by subclassing an existing
|
|
new-style class. Most of Python's built-in types, such as integers, lists,
|
|
dictionaries, and even files, are new-style classes now. A new-style class
|
|
named :class:`object`, the base class for all built-in types, has also been
|
|
added so if no built-in type is suitable, you can just subclass
|
|
:class:`object`::
|
|
|
|
class C(object):
|
|
def __init__ (self):
|
|
...
|
|
...
|
|
|
|
This means that :keyword:`class` statements that don't have any base classes are
|
|
always classic classes in Python 2.2. (Actually you can also change this by
|
|
setting a module-level variable named :attr:`__metaclass__` --- see :pep:`253`
|
|
for the details --- but it's easier to just subclass :keyword:`object`.)
|
|
|
|
The type objects for the built-in types are available as built-ins, named using
|
|
a clever trick. Python has always had built-in functions named :func:`int`,
|
|
:func:`float`, and :func:`str`. In 2.2, they aren't functions any more, but
|
|
type objects that behave as factories when called. ::
|
|
|
|
>>> int
|
|
<type 'int'>
|
|
>>> int('123')
|
|
123
|
|
|
|
To make the set of types complete, new type objects such as :func:`dict` and
|
|
:func:`file` have been added. Here's a more interesting example, adding a
|
|
:meth:`lock` method to file objects::
|
|
|
|
class LockableFile(file):
|
|
def lock (self, operation, length=0, start=0, whence=0):
|
|
import fcntl
|
|
return fcntl.lockf(self.fileno(), operation,
|
|
length, start, whence)
|
|
|
|
The now-obsolete :mod:`posixfile` module contained a class that emulated all of
|
|
a file object's methods and also added a :meth:`lock` method, but this class
|
|
couldn't be passed to internal functions that expected a built-in file,
|
|
something which is possible with our new :class:`LockableFile`.
|
|
|
|
|
|
Descriptors
|
|
-----------
|
|
|
|
In previous versions of Python, there was no consistent way to discover what
|
|
attributes and methods were supported by an object. There were some informal
|
|
conventions, such as defining :attr:`__members__` and :attr:`__methods__`
|
|
attributes that were lists of names, but often the author of an extension type
|
|
or a class wouldn't bother to define them. You could fall back on inspecting
|
|
the :attr:`__dict__` of an object, but when class inheritance or an arbitrary
|
|
:meth:`__getattr__` hook were in use this could still be inaccurate.
|
|
|
|
The one big idea underlying the new class model is that an API for describing
|
|
the attributes of an object using :dfn:`descriptors` has been formalized.
|
|
Descriptors specify the value of an attribute, stating whether it's a method or
|
|
a field. With the descriptor API, static methods and class methods become
|
|
possible, as well as more exotic constructs.
|
|
|
|
Attribute descriptors are objects that live inside class objects, and have a few
|
|
attributes of their own:
|
|
|
|
* :attr:`__name__` is the attribute's name.
|
|
|
|
* :attr:`__doc__` is the attribute's docstring.
|
|
|
|
* :meth:`__get__(object)` is a method that retrieves the attribute value from
|
|
*object*.
|
|
|
|
* :meth:`__set__(object, value)` sets the attribute on *object* to *value*.
|
|
|
|
* :meth:`__delete__(object, value)` deletes the *value* attribute of *object*.
|
|
|
|
For example, when you write ``obj.x``, the steps that Python actually performs
|
|
are::
|
|
|
|
descriptor = obj.__class__.x
|
|
descriptor.__get__(obj)
|
|
|
|
For methods, :meth:`descriptor.__get__` returns a temporary object that's
|
|
callable, and wraps up the instance and the method to be called on it. This is
|
|
also why static methods and class methods are now possible; they have
|
|
descriptors that wrap up just the method, or the method and the class. As a
|
|
brief explanation of these new kinds of methods, static methods aren't passed
|
|
the instance, and therefore resemble regular functions. Class methods are
|
|
passed the class of the object, but not the object itself. Static and class
|
|
methods are defined like this::
|
|
|
|
class C(object):
|
|
def f(arg1, arg2):
|
|
...
|
|
f = staticmethod(f)
|
|
|
|
def g(cls, arg1, arg2):
|
|
...
|
|
g = classmethod(g)
|
|
|
|
The :func:`staticmethod` function takes the function :func:`f`, and returns it
|
|
wrapped up in a descriptor so it can be stored in the class object. You might
|
|
expect there to be special syntax for creating such methods (``def static f``,
|
|
``defstatic f()``, or something like that) but no such syntax has been defined
|
|
yet; that's been left for future versions of Python.
|
|
|
|
More new features, such as slots and properties, are also implemented as new
|
|
kinds of descriptors, and it's not difficult to write a descriptor class that
|
|
does something novel. For example, it would be possible to write a descriptor
|
|
class that made it possible to write Eiffel-style preconditions and
|
|
postconditions for a method. A class that used this feature might be defined
|
|
like this::
|
|
|
|
from eiffel import eiffelmethod
|
|
|
|
class C(object):
|
|
def f(self, arg1, arg2):
|
|
# The actual function
|
|
...
|
|
def pre_f(self):
|
|
# Check preconditions
|
|
...
|
|
def post_f(self):
|
|
# Check postconditions
|
|
...
|
|
|
|
f = eiffelmethod(f, pre_f, post_f)
|
|
|
|
Note that a person using the new :func:`eiffelmethod` doesn't have to understand
|
|
anything about descriptors. This is why I think the new features don't increase
|
|
the basic complexity of the language. There will be a few wizards who need to
|
|
know about it in order to write :func:`eiffelmethod` or the ZODB or whatever,
|
|
but most users will just write code on top of the resulting libraries and ignore
|
|
the implementation details.
|
|
|
|
|
|
Multiple Inheritance: The Diamond Rule
|
|
--------------------------------------
|
|
|
|
Multiple inheritance has also been made more useful through changing the rules
|
|
under which names are resolved. Consider this set of classes (diagram taken
|
|
from :pep:`253` by Guido van Rossum)::
|
|
|
|
class A:
|
|
^ ^ def save(self): ...
|
|
/ \
|
|
/ \
|
|
/ \
|
|
/ \
|
|
class B class C:
|
|
^ ^ def save(self): ...
|
|
\ /
|
|
\ /
|
|
\ /
|
|
\ /
|
|
class D
|
|
|
|
The lookup rule for classic classes is simple but not very smart; the base
|
|
classes are searched depth-first, going from left to right. A reference to
|
|
:meth:`D.save` will search the classes :class:`D`, :class:`B`, and then
|
|
:class:`A`, where :meth:`save` would be found and returned. :meth:`C.save`
|
|
would never be found at all. This is bad, because if :class:`C`'s :meth:`save`
|
|
method is saving some internal state specific to :class:`C`, not calling it will
|
|
result in that state never getting saved.
|
|
|
|
New-style classes follow a different algorithm that's a bit more complicated to
|
|
explain, but does the right thing in this situation. (Note that Python 2.3
|
|
changes this algorithm to one that produces the same results in most cases, but
|
|
produces more useful results for really complicated inheritance graphs.)
|
|
|
|
#. List all the base classes, following the classic lookup rule and include a
|
|
class multiple times if it's visited repeatedly. In the above example, the list
|
|
of visited classes is [:class:`D`, :class:`B`, :class:`A`, :class:`C`,
|
|
:class:`A`].
|
|
|
|
#. Scan the list for duplicated classes. If any are found, remove all but one
|
|
occurrence, leaving the *last* one in the list. In the above example, the list
|
|
becomes [:class:`D`, :class:`B`, :class:`C`, :class:`A`] after dropping
|
|
duplicates.
|
|
|
|
Following this rule, referring to :meth:`D.save` will return :meth:`C.save`,
|
|
which is the behaviour we're after. This lookup rule is the same as the one
|
|
followed by Common Lisp. A new built-in function, :func:`super`, provides a way
|
|
to get at a class's superclasses without having to reimplement Python's
|
|
algorithm. The most commonly used form will be :func:`super(class, obj)`, which
|
|
returns a bound superclass object (not the actual class object). This form
|
|
will be used in methods to call a method in the superclass; for example,
|
|
:class:`D`'s :meth:`save` method would look like this::
|
|
|
|
class D (B,C):
|
|
def save (self):
|
|
# Call superclass .save()
|
|
super(D, self).save()
|
|
# Save D's private information here
|
|
...
|
|
|
|
:func:`super` can also return unbound superclass objects when called as
|
|
:func:`super(class)` or :func:`super(class1, class2)`, but this probably won't
|
|
often be useful.
|
|
|
|
|
|
Attribute Access
|
|
----------------
|
|
|
|
A fair number of sophisticated Python classes define hooks for attribute access
|
|
using :meth:`__getattr__`; most commonly this is done for convenience, to make
|
|
code more readable by automatically mapping an attribute access such as
|
|
``obj.parent`` into a method call such as ``obj.get_parent``. Python 2.2 adds
|
|
some new ways of controlling attribute access.
|
|
|
|
First, :meth:`__getattr__(attr_name)` is still supported by new-style classes,
|
|
and nothing about it has changed. As before, it will be called when an attempt
|
|
is made to access ``obj.foo`` and no attribute named ``foo`` is found in the
|
|
instance's dictionary.
|
|
|
|
New-style classes also support a new method,
|
|
:meth:`__getattribute__(attr_name)`. The difference between the two methods is
|
|
that :meth:`__getattribute__` is *always* called whenever any attribute is
|
|
accessed, while the old :meth:`__getattr__` is only called if ``foo`` isn't
|
|
found in the instance's dictionary.
|
|
|
|
However, Python 2.2's support for :dfn:`properties` will often be a simpler way
|
|
to trap attribute references. Writing a :meth:`__getattr__` method is
|
|
complicated because to avoid recursion you can't use regular attribute accesses
|
|
inside them, and instead have to mess around with the contents of
|
|
:attr:`__dict__`. :meth:`__getattr__` methods also end up being called by Python
|
|
when it checks for other methods such as :meth:`__repr__` or :meth:`__coerce__`,
|
|
and so have to be written with this in mind. Finally, calling a function on
|
|
every attribute access results in a sizable performance loss.
|
|
|
|
:class:`property` is a new built-in type that packages up three functions that
|
|
get, set, or delete an attribute, and a docstring. For example, if you want to
|
|
define a :attr:`size` attribute that's computed, but also settable, you could
|
|
write::
|
|
|
|
class C(object):
|
|
def get_size (self):
|
|
result = ... computation ...
|
|
return result
|
|
def set_size (self, size):
|
|
... compute something based on the size
|
|
and set internal state appropriately ...
|
|
|
|
# Define a property. The 'delete this attribute'
|
|
# method is defined as None, so the attribute
|
|
# can't be deleted.
|
|
size = property(get_size, set_size,
|
|
None,
|
|
"Storage size of this instance")
|
|
|
|
That is certainly clearer and easier to write than a pair of
|
|
:meth:`__getattr__`/:meth:`__setattr__` methods that check for the :attr:`size`
|
|
attribute and handle it specially while retrieving all other attributes from the
|
|
instance's :attr:`__dict__`. Accesses to :attr:`size` are also the only ones
|
|
which have to perform the work of calling a function, so references to other
|
|
attributes run at their usual speed.
|
|
|
|
Finally, it's possible to constrain the list of attributes that can be
|
|
referenced on an object using the new :attr:`__slots__` class attribute. Python
|
|
objects are usually very dynamic; at any time it's possible to define a new
|
|
attribute on an instance by just doing ``obj.new_attr=1``. A new-style class
|
|
can define a class attribute named :attr:`__slots__` to limit the legal
|
|
attributes to a particular set of names. An example will make this clear::
|
|
|
|
>>> class C(object):
|
|
... __slots__ = ('template', 'name')
|
|
...
|
|
>>> obj = C()
|
|
>>> print obj.template
|
|
None
|
|
>>> obj.template = 'Test'
|
|
>>> print obj.template
|
|
Test
|
|
>>> obj.newattr = None
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
AttributeError: 'C' object has no attribute 'newattr'
|
|
|
|
Note how you get an :exc:`AttributeError` on the attempt to assign to an
|
|
attribute not listed in :attr:`__slots__`.
|
|
|
|
|
|
.. _sect-rellinks:
|
|
|
|
Related Links
|
|
-------------
|
|
|
|
This section has just been a quick overview of the new features, giving enough
|
|
of an explanation to start you programming, but many details have been
|
|
simplified or ignored. Where should you go to get a more complete picture?
|
|
|
|
http://www.python.org/2.2/descrintro.html is a lengthy tutorial introduction to
|
|
the descriptor features, written by Guido van Rossum. If my description has
|
|
whetted your appetite, go read this tutorial next, because it goes into much
|
|
more detail about the new features while still remaining quite easy to read.
|
|
|
|
Next, there are two relevant PEPs, :pep:`252` and :pep:`253`. :pep:`252` is
|
|
titled "Making Types Look More Like Classes", and covers the descriptor API.
|
|
:pep:`253` is titled "Subtyping Built-in Types", and describes the changes to
|
|
type objects that make it possible to subtype built-in objects. :pep:`253` is
|
|
the more complicated PEP of the two, and at a few points the necessary
|
|
explanations of types and meta-types may cause your head to explode. Both PEPs
|
|
were written and implemented by Guido van Rossum, with substantial assistance
|
|
from the rest of the Zope Corp. team.
|
|
|
|
Finally, there's the ultimate authority: the source code. Most of the machinery
|
|
for the type handling is in :file:`Objects/typeobject.c`, but you should only
|
|
resort to it after all other avenues have been exhausted, including posting a
|
|
question to python-list or python-dev.
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
PEP 234: Iterators
|
|
==================
|
|
|
|
Another significant addition to 2.2 is an iteration interface at both the C and
|
|
Python levels. Objects can define how they can be looped over by callers.
|
|
|
|
In Python versions up to 2.1, the usual way to make ``for item in obj`` work is
|
|
to define a :meth:`__getitem__` method that looks something like this::
|
|
|
|
def __getitem__(self, index):
|
|
return <next item>
|
|
|
|
:meth:`__getitem__` is more properly used to define an indexing operation on an
|
|
object so that you can write ``obj[5]`` to retrieve the sixth element. It's a
|
|
bit misleading when you're using this only to support :keyword:`for` loops.
|
|
Consider some file-like object that wants to be looped over; the *index*
|
|
parameter is essentially meaningless, as the class probably assumes that a
|
|
series of :meth:`__getitem__` calls will be made with *index* incrementing by
|
|
one each time. In other words, the presence of the :meth:`__getitem__` method
|
|
doesn't mean that using ``file[5]`` to randomly access the sixth element will
|
|
work, though it really should.
|
|
|
|
In Python 2.2, iteration can be implemented separately, and :meth:`__getitem__`
|
|
methods can be limited to classes that really do support random access. The
|
|
basic idea of iterators is simple. A new built-in function, :func:`iter(obj)`
|
|
or ``iter(C, sentinel)``, is used to get an iterator. :func:`iter(obj)` returns
|
|
an iterator for the object *obj*, while ``iter(C, sentinel)`` returns an
|
|
iterator that will invoke the callable object *C* until it returns *sentinel* to
|
|
signal that the iterator is done.
|
|
|
|
Python classes can define an :meth:`__iter__` method, which should create and
|
|
return a new iterator for the object; if the object is its own iterator, this
|
|
method can just return ``self``. In particular, iterators will usually be their
|
|
own iterators. Extension types implemented in C can implement a :attr:`tp_iter`
|
|
function in order to return an iterator, and extension types that want to behave
|
|
as iterators can define a :attr:`tp_iternext` function.
|
|
|
|
So, after all this, what do iterators actually do? They have one required
|
|
method, :meth:`next`, which takes no arguments and returns the next value. When
|
|
there are no more values to be returned, calling :meth:`next` should raise the
|
|
:exc:`StopIteration` exception. ::
|
|
|
|
>>> L = [1,2,3]
|
|
>>> i = iter(L)
|
|
>>> print i
|
|
<iterator object at 0x8116870>
|
|
>>> i.next()
|
|
1
|
|
>>> i.next()
|
|
2
|
|
>>> i.next()
|
|
3
|
|
>>> i.next()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
StopIteration
|
|
>>>
|
|
|
|
In 2.2, Python's :keyword:`for` statement no longer expects a sequence; it
|
|
expects something for which :func:`iter` will return an iterator. For backward
|
|
compatibility and convenience, an iterator is automatically constructed for
|
|
sequences that don't implement :meth:`__iter__` or a :attr:`tp_iter` slot, so
|
|
``for i in [1,2,3]`` will still work. Wherever the Python interpreter loops
|
|
over a sequence, it's been changed to use the iterator protocol. This means you
|
|
can do things like this::
|
|
|
|
>>> L = [1,2,3]
|
|
>>> i = iter(L)
|
|
>>> a,b,c = i
|
|
>>> a,b,c
|
|
(1, 2, 3)
|
|
|
|
Iterator support has been added to some of Python's basic types. Calling
|
|
:func:`iter` on a dictionary will return an iterator which loops over its keys::
|
|
|
|
>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
|
|
... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
|
|
>>> for key in m: print key, m[key]
|
|
...
|
|
Mar 3
|
|
Feb 2
|
|
Aug 8
|
|
Sep 9
|
|
May 5
|
|
Jun 6
|
|
Jul 7
|
|
Jan 1
|
|
Apr 4
|
|
Nov 11
|
|
Dec 12
|
|
Oct 10
|
|
|
|
That's just the default behaviour. If you want to iterate over keys, values, or
|
|
key/value pairs, you can explicitly call the :meth:`iterkeys`,
|
|
:meth:`itervalues`, or :meth:`iteritems` methods to get an appropriate iterator.
|
|
In a minor related change, the :keyword:`in` operator now works on dictionaries,
|
|
so ``key in dict`` is now equivalent to ``dict.has_key(key)``.
|
|
|
|
Files also provide an iterator, which calls the :meth:`readline` method until
|
|
there are no more lines in the file. This means you can now read each line of a
|
|
file using code like this::
|
|
|
|
for line in file:
|
|
# do something for each line
|
|
...
|
|
|
|
Note that you can only go forward in an iterator; there's no way to get the
|
|
previous element, reset the iterator, or make a copy of it. An iterator object
|
|
could provide such additional capabilities, but the iterator protocol only
|
|
requires a :meth:`next` method.
|
|
|
|
|
|
.. seealso::
|
|
|
|
:pep:`234` - Iterators
|
|
Written by Ka-Ping Yee and GvR; implemented by the Python Labs crew, mostly by
|
|
GvR and Tim Peters.
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
PEP 255: Simple Generators
|
|
==========================
|
|
|
|
Generators are another new feature, one that interacts with the introduction of
|
|
iterators.
|
|
|
|
You're doubtless familiar with how function calls work in Python or C. When you
|
|
call a function, it gets a private namespace where its local variables are
|
|
created. When the function reaches a :keyword:`return` statement, the local
|
|
variables are destroyed and the resulting value is returned to the caller. A
|
|
later call to the same function will get a fresh new set of local variables.
|
|
But, what if the local variables weren't thrown away on exiting a function?
|
|
What if you could later resume the function where it left off? This is what
|
|
generators provide; they can be thought of as resumable functions.
|
|
|
|
Here's the simplest example of a generator function::
|
|
|
|
def generate_ints(N):
|
|
for i in range(N):
|
|
yield i
|
|
|
|
A new keyword, :keyword:`yield`, was introduced for generators. Any function
|
|
containing a :keyword:`yield` statement is a generator function; this is
|
|
detected by Python's bytecode compiler which compiles the function specially as
|
|
a result. Because a new keyword was introduced, generators must be explicitly
|
|
enabled in a module by including a ``from __future__ import generators``
|
|
statement near the top of the module's source code. In Python 2.3 this
|
|
statement will become unnecessary.
|
|
|
|
When you call a generator function, it doesn't return a single value; instead it
|
|
returns a generator object that supports the iterator protocol. On executing
|
|
the :keyword:`yield` statement, the generator outputs the value of ``i``,
|
|
similar to a :keyword:`return` statement. The big difference between
|
|
:keyword:`yield` and a :keyword:`return` statement is that on reaching a
|
|
:keyword:`yield` the generator's state of execution is suspended and local
|
|
variables are preserved. On the next call to the generator's ``next()`` method,
|
|
the function will resume executing immediately after the :keyword:`yield`
|
|
statement. (For complicated reasons, the :keyword:`yield` statement isn't
|
|
allowed inside the :keyword:`try` block of a :keyword:`try`...\
|
|
:keyword:`finally` statement; read :pep:`255` for a full explanation of the
|
|
interaction between :keyword:`yield` and exceptions.)
|
|
|
|
Here's a sample usage of the :func:`generate_ints` generator::
|
|
|
|
>>> gen = generate_ints(3)
|
|
>>> gen
|
|
<generator object at 0x8117f90>
|
|
>>> gen.next()
|
|
0
|
|
>>> gen.next()
|
|
1
|
|
>>> gen.next()
|
|
2
|
|
>>> gen.next()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
File "<stdin>", line 2, in generate_ints
|
|
StopIteration
|
|
|
|
You could equally write ``for i in generate_ints(5)``, or ``a,b,c =
|
|
generate_ints(3)``.
|
|
|
|
Inside a generator function, the :keyword:`return` statement can only be used
|
|
without a value, and signals the end of the procession of values; afterwards the
|
|
generator cannot return any further values. :keyword:`return` with a value, such
|
|
as ``return 5``, is a syntax error inside a generator function. The end of the
|
|
generator's results can also be indicated by raising :exc:`StopIteration`
|
|
manually, or by just letting the flow of execution fall off the bottom of the
|
|
function.
|
|
|
|
You could achieve the effect of generators manually by writing your own class
|
|
and storing all the local variables of the generator as instance variables. For
|
|
example, returning a list of integers could be done by setting ``self.count`` to
|
|
0, and having the :meth:`next` method increment ``self.count`` and return it.
|
|
However, for a moderately complicated generator, writing a corresponding class
|
|
would be much messier. :file:`Lib/test/test_generators.py` contains a number of
|
|
more interesting examples. The simplest one implements an in-order traversal of
|
|
a tree using generators recursively. ::
|
|
|
|
# A recursive generator that generates Tree leaves in in-order.
|
|
def inorder(t):
|
|
if t:
|
|
for x in inorder(t.left):
|
|
yield x
|
|
yield t.label
|
|
for x in inorder(t.right):
|
|
yield x
|
|
|
|
Two other examples in :file:`Lib/test/test_generators.py` produce solutions for
|
|
the N-Queens problem (placing $N$ queens on an $NxN$ chess board so that no
|
|
queen threatens another) and the Knight's Tour (a route that takes a knight to
|
|
every square of an $NxN$ chessboard without visiting any square twice).
|
|
|
|
The idea of generators comes from other programming languages, especially Icon
|
|
(http://www.cs.arizona.edu/icon/), where the idea of generators is central. In
|
|
Icon, every expression and function call behaves like a generator. One example
|
|
from "An Overview of the Icon Programming Language" at
|
|
http://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks
|
|
like::
|
|
|
|
sentence := "Store it in the neighboring harbor"
|
|
if (i := find("or", sentence)) > 5 then write(i)
|
|
|
|
In Icon the :func:`find` function returns the indexes at which the substring
|
|
"or" is found: 3, 23, 33. In the :keyword:`if` statement, ``i`` is first
|
|
assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon
|
|
retries it with the second value of 23. 23 is greater than 5, so the comparison
|
|
now succeeds, and the code prints the value 23 to the screen.
|
|
|
|
Python doesn't go nearly as far as Icon in adopting generators as a central
|
|
concept. Generators are considered a new part of the core Python language, but
|
|
learning or using them isn't compulsory; if they don't solve any problems that
|
|
you have, feel free to ignore them. One novel feature of Python's interface as
|
|
compared to Icon's is that a generator's state is represented as a concrete
|
|
object (the iterator) that can be passed around to other functions or stored in
|
|
a data structure.
|
|
|
|
|
|
.. seealso::
|
|
|
|
:pep:`255` - Simple Generators
|
|
Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostly
|
|
by Neil Schemenauer and Tim Peters, with other fixes from the Python Labs crew.
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
PEP 237: Unifying Long Integers and Integers
|
|
============================================
|
|
|
|
In recent versions, the distinction between regular integers, which are 32-bit
|
|
values on most machines, and long integers, which can be of arbitrary size, was
|
|
becoming an annoyance. For example, on platforms that support files larger than
|
|
``2**32`` bytes, the :meth:`tell` method of file objects has to return a long
|
|
integer. However, there were various bits of Python that expected plain integers
|
|
and would raise an error if a long integer was provided instead. For example,
|
|
in Python 1.5, only regular integers could be used as a slice index, and
|
|
``'abc'[1L:]`` would raise a :exc:`TypeError` exception with the message 'slice
|
|
index must be int'.
|
|
|
|
Python 2.2 will shift values from short to long integers as required. The 'L'
|
|
suffix is no longer needed to indicate a long integer literal, as now the
|
|
compiler will choose the appropriate type. (Using the 'L' suffix will be
|
|
discouraged in future 2.x versions of Python, triggering a warning in Python
|
|
2.4, and probably dropped in Python 3.0.) Many operations that used to raise an
|
|
:exc:`OverflowError` will now return a long integer as their result. For
|
|
example::
|
|
|
|
>>> 1234567890123
|
|
1234567890123L
|
|
>>> 2 ** 64
|
|
18446744073709551616L
|
|
|
|
In most cases, integers and long integers will now be treated identically. You
|
|
can still distinguish them with the :func:`type` built-in function, but that's
|
|
rarely needed.
|
|
|
|
|
|
.. seealso::
|
|
|
|
:pep:`237` - Unifying Long Integers and Integers
|
|
Written by Moshe Zadka and Guido van Rossum. Implemented mostly by Guido van
|
|
Rossum.
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
PEP 238: Changing the Division Operator
|
|
=======================================
|
|
|
|
The most controversial change in Python 2.2 heralds the start of an effort to
|
|
fix an old design flaw that's been in Python from the beginning. Currently
|
|
Python's division operator, ``/``, behaves like C's division operator when
|
|
presented with two integer arguments: it returns an integer result that's
|
|
truncated down when there would be a fractional part. For example, ``3/2`` is
|
|
1, not 1.5, and ``(-1)/2`` is -1, not -0.5. This means that the results of
|
|
division can vary unexpectedly depending on the type of the two operands and
|
|
because Python is dynamically typed, it can be difficult to determine the
|
|
possible types of the operands.
|
|
|
|
(The controversy is over whether this is *really* a design flaw, and whether
|
|
it's worth breaking existing code to fix this. It's caused endless discussions
|
|
on python-dev, and in July 2001 erupted into an storm of acidly sarcastic
|
|
postings on :newsgroup:`comp.lang.python`. I won't argue for either side here
|
|
and will stick to describing what's implemented in 2.2. Read :pep:`238` for a
|
|
summary of arguments and counter-arguments.)
|
|
|
|
Because this change might break code, it's being introduced very gradually.
|
|
Python 2.2 begins the transition, but the switch won't be complete until Python
|
|
3.0.
|
|
|
|
First, I'll borrow some terminology from :pep:`238`. "True division" is the
|
|
division that most non-programmers are familiar with: 3/2 is 1.5, 1/4 is 0.25,
|
|
and so forth. "Floor division" is what Python's ``/`` operator currently does
|
|
when given integer operands; the result is the floor of the value returned by
|
|
true division. "Classic division" is the current mixed behaviour of ``/``; it
|
|
returns the result of floor division when the operands are integers, and returns
|
|
the result of true division when one of the operands is a floating-point number.
|
|
|
|
Here are the changes 2.2 introduces:
|
|
|
|
* A new operator, ``//``, is the floor division operator. (Yes, we know it looks
|
|
like C++'s comment symbol.) ``//`` *always* performs floor division no matter
|
|
what the types of its operands are, so ``1 // 2`` is 0 and ``1.0 // 2.0`` is
|
|
also 0.0.
|
|
|
|
``//`` is always available in Python 2.2; you don't need to enable it using a
|
|
``__future__`` statement.
|
|
|
|
* By including a ``from __future__ import division`` in a module, the ``/``
|
|
operator will be changed to return the result of true division, so ``1/2`` is
|
|
0.5. Without the ``__future__`` statement, ``/`` still means classic division.
|
|
The default meaning of ``/`` will not change until Python 3.0.
|
|
|
|
* Classes can define methods called :meth:`__truediv__` and :meth:`__floordiv__`
|
|
to overload the two division operators. At the C level, there are also slots in
|
|
the :ctype:`PyNumberMethods` structure so extension types can define the two
|
|
operators.
|
|
|
|
* Python 2.2 supports some command-line arguments for testing whether code will
|
|
works with the changed division semantics. Running python with :option:`-Q
|
|
warn` will cause a warning to be issued whenever division is applied to two
|
|
integers. You can use this to find code that's affected by the change and fix
|
|
it. By default, Python 2.2 will simply perform classic division without a
|
|
warning; the warning will be turned on by default in Python 2.3.
|
|
|
|
|
|
.. seealso::
|
|
|
|
:pep:`238` - Changing the Division Operator
|
|
Written by Moshe Zadka and Guido van Rossum. Implemented by Guido van Rossum..
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
Unicode Changes
|
|
===============
|
|
|
|
Python's Unicode support has been enhanced a bit in 2.2. Unicode strings are
|
|
usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be
|
|
compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by
|
|
supplying :option:`--enable-unicode=ucs4` to the configure script. (It's also
|
|
possible to specify :option:`--disable-unicode` to completely disable Unicode
|
|
support.)
|
|
|
|
When built to use UCS-4 (a "wide Python"), the interpreter can natively handle
|
|
Unicode characters from U+000000 to U+110000, so the range of legal values for
|
|
the :func:`unichr` function is expanded accordingly. Using an interpreter
|
|
compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still
|
|
cause :func:`unichr` to raise a :exc:`ValueError` exception. This is all
|
|
described in :pep:`261`, "Support for 'wide' Unicode characters"; consult it for
|
|
further details.
|
|
|
|
Another change is simpler to explain. Since their introduction, Unicode strings
|
|
have supported an :meth:`encode` method to convert the string to a selected
|
|
encoding such as UTF-8 or Latin-1. A symmetric :meth:`decode([*encoding*])`
|
|
method has been added to 8-bit strings (though not to Unicode strings) in 2.2.
|
|
:meth:`decode` assumes that the string is in the specified encoding and decodes
|
|
it, returning whatever is returned by the codec.
|
|
|
|
Using this new feature, codecs have been added for tasks not directly related to
|
|
Unicode. For example, codecs have been added for uu-encoding, MIME's base64
|
|
encoding, and compression with the :mod:`zlib` module::
|
|
|
|
>>> s = """Here is a lengthy piece of redundant, overly verbose,
|
|
... and repetitive text.
|
|
... """
|
|
>>> data = s.encode('zlib')
|
|
>>> data
|
|
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
|
|
>>> data.decode('zlib')
|
|
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
|
|
>>> print s.encode('uu')
|
|
begin 666 <data>
|
|
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
|
|
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*
|
|
|
|
end
|
|
>>> "sheesh".encode('rot-13')
|
|
'furrfu'
|
|
|
|
To convert a class instance to Unicode, a :meth:`__unicode__` method can be
|
|
defined by a class, analogous to :meth:`__str__`.
|
|
|
|
:meth:`encode`, :meth:`decode`, and :meth:`__unicode__` were implemented by
|
|
Marc-André Lemburg. The changes to support using UCS-4 internally were
|
|
implemented by Fredrik Lundh and Martin von Löwis.
|
|
|
|
|
|
.. seealso::
|
|
|
|
:pep:`261` - Support for 'wide' Unicode characters
|
|
Written by Paul Prescod.
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
PEP 227: Nested Scopes
|
|
======================
|
|
|
|
In Python 2.1, statically nested scopes were added as an optional feature, to be
|
|
enabled by a ``from __future__ import nested_scopes`` directive. In 2.2 nested
|
|
scopes no longer need to be specially enabled, and are now always present. The
|
|
rest of this section is a copy of the description of nested scopes from my
|
|
"What's New in Python 2.1" document; if you read it when 2.1 came out, you can
|
|
skip the rest of this section.
|
|
|
|
The largest change introduced in Python 2.1, and made complete in 2.2, is to
|
|
Python's scoping rules. In Python 2.0, at any given time there are at most
|
|
three namespaces used to look up variable names: local, module-level, and the
|
|
built-in namespace. This often surprised people because it didn't match their
|
|
intuitive expectations. For example, a nested recursive function definition
|
|
doesn't work::
|
|
|
|
def f():
|
|
...
|
|
def g(value):
|
|
...
|
|
return g(value-1) + 1
|
|
...
|
|
|
|
The function :func:`g` will always raise a :exc:`NameError` exception, because
|
|
the binding of the name ``g`` isn't in either its local namespace or in the
|
|
module-level namespace. This isn't much of a problem in practice (how often do
|
|
you recursively define interior functions like this?), but this also made using
|
|
the :keyword:`lambda` statement clumsier, and this was a problem in practice.
|
|
In code which uses :keyword:`lambda` you can often find local variables being
|
|
copied by passing them as the default values of arguments. ::
|
|
|
|
def find(self, name):
|
|
"Return list of any entries equal to 'name'"
|
|
L = filter(lambda x, name=name: x == name,
|
|
self.list_attribute)
|
|
return L
|
|
|
|
The readability of Python code written in a strongly functional style suffers
|
|
greatly as a result.
|
|
|
|
The most significant change to Python 2.2 is that static scoping has been added
|
|
to the language to fix this problem. As a first effect, the ``name=name``
|
|
default argument is now unnecessary in the above example. Put simply, when a
|
|
given variable name is not assigned a value within a function (by an assignment,
|
|
or the :keyword:`def`, :keyword:`class`, or :keyword:`import` statements),
|
|
references to the variable will be looked up in the local namespace of the
|
|
enclosing scope. A more detailed explanation of the rules, and a dissection of
|
|
the implementation, can be found in the PEP.
|
|
|
|
This change may cause some compatibility problems for code where the same
|
|
variable name is used both at the module level and as a local variable within a
|
|
function that contains further function definitions. This seems rather unlikely
|
|
though, since such code would have been pretty confusing to read in the first
|
|
place.
|
|
|
|
One side effect of the change is that the ``from module import *`` and
|
|
:keyword:`exec` statements have been made illegal inside a function scope under
|
|
certain conditions. The Python reference manual has said all along that ``from
|
|
module import *`` is only legal at the top level of a module, but the CPython
|
|
interpreter has never enforced this before. As part of the implementation of
|
|
nested scopes, the compiler which turns Python source into bytecodes has to
|
|
generate different code to access variables in a containing scope. ``from
|
|
module import *`` and :keyword:`exec` make it impossible for the compiler to
|
|
figure this out, because they add names to the local namespace that are
|
|
unknowable at compile time. Therefore, if a function contains function
|
|
definitions or :keyword:`lambda` expressions with free variables, the compiler
|
|
will flag this by raising a :exc:`SyntaxError` exception.
|
|
|
|
To make the preceding explanation a bit clearer, here's an example::
|
|
|
|
x = 1
|
|
def f():
|
|
# The next line is a syntax error
|
|
exec 'x=2'
|
|
def g():
|
|
return x
|
|
|
|
Line 4 containing the :keyword:`exec` statement is a syntax error, since
|
|
:keyword:`exec` would define a new local variable named ``x`` whose value should
|
|
be accessed by :func:`g`.
|
|
|
|
This shouldn't be much of a limitation, since :keyword:`exec` is rarely used in
|
|
most Python code (and when it is used, it's often a sign of a poor design
|
|
anyway).
|
|
|
|
|
|
.. seealso::
|
|
|
|
:pep:`227` - Statically Nested Scopes
|
|
Written and implemented by Jeremy Hylton.
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
New and Improved Modules
|
|
========================
|
|
|
|
* The :mod:`xmlrpclib` module was contributed to the standard library by Fredrik
|
|
Lundh, providing support for writing XML-RPC clients. XML-RPC is a simple
|
|
remote procedure call protocol built on top of HTTP and XML. For example, the
|
|
following snippet retrieves a list of RSS channels from the O'Reilly Network,
|
|
and then lists the recent headlines for one channel::
|
|
|
|
import xmlrpclib
|
|
s = xmlrpclib.Server(
|
|
'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
|
|
channels = s.meerkat.getChannels()
|
|
# channels is a list of dictionaries, like this:
|
|
# [{'id': 4, 'title': 'Freshmeat Daily News'}
|
|
# {'id': 190, 'title': '32Bits Online'},
|
|
# {'id': 4549, 'title': '3DGamers'}, ... ]
|
|
|
|
# Get the items for one channel
|
|
items = s.meerkat.getItems( {'channel': 4} )
|
|
|
|
# 'items' is another list of dictionaries, like this:
|
|
# [{'link': 'http://freshmeat.net/releases/52719/',
|
|
# 'description': 'A utility which converts HTML to XSL FO.',
|
|
# 'title': 'html2fo 0.3 (Default)'}, ... ]
|
|
|
|
The :mod:`SimpleXMLRPCServer` module makes it easy to create straightforward
|
|
XML-RPC servers. See http://www.xmlrpc.com/ for more information about XML-RPC.
|
|
|
|
* The new :mod:`hmac` module implements the HMAC algorithm described by
|
|
:rfc:`2104`. (Contributed by Gerhard Häring.)
|
|
|
|
* Several functions that originally returned lengthy tuples now return pseudo-
|
|
sequences that still behave like tuples but also have mnemonic attributes such
|
|
as memberst_mtime or :attr:`tm_year`. The enhanced functions include
|
|
:func:`stat`, :func:`fstat`, :func:`statvfs`, and :func:`fstatvfs` in the
|
|
:mod:`os` module, and :func:`localtime`, :func:`gmtime`, and :func:`strptime` in
|
|
the :mod:`time` module.
|
|
|
|
For example, to obtain a file's size using the old tuples, you'd end up writing
|
|
something like ``file_size = os.stat(filename)[stat.ST_SIZE]``, but now this can
|
|
be written more clearly as ``file_size = os.stat(filename).st_size``.
|
|
|
|
The original patch for this feature was contributed by Nick Mathewson.
|
|
|
|
* The Python profiler has been extensively reworked and various errors in its
|
|
output have been corrected. (Contributed by Fred L. Drake, Jr. and Tim Peters.)
|
|
|
|
* The :mod:`socket` module can be compiled to support IPv6; specify the
|
|
:option:`--enable-ipv6` option to Python's configure script. (Contributed by
|
|
Jun-ichiro "itojun" Hagino.)
|
|
|
|
* Two new format characters were added to the :mod:`struct` module for 64-bit
|
|
integers on platforms that support the C :ctype:`long long` type. ``q`` is for
|
|
a signed 64-bit integer, and ``Q`` is for an unsigned one. The value is
|
|
returned in Python's long integer type. (Contributed by Tim Peters.)
|
|
|
|
* In the interpreter's interactive mode, there's a new built-in function
|
|
:func:`help` that uses the :mod:`pydoc` module introduced in Python 2.1 to
|
|
provide interactive help. ``help(object)`` displays any available help text
|
|
about *object*. :func:`help` with no argument puts you in an online help
|
|
utility, where you can enter the names of functions, classes, or modules to read
|
|
their help text. (Contributed by Guido van Rossum, using Ka-Ping Yee's
|
|
:mod:`pydoc` module.)
|
|
|
|
* Various bugfixes and performance improvements have been made to the SRE engine
|
|
underlying the :mod:`re` module. For example, the :func:`re.sub` and
|
|
:func:`re.split` functions have been rewritten in C. Another contributed patch
|
|
speeds up certain Unicode character ranges by a factor of two, and a new
|
|
:meth:`finditer` method that returns an iterator over all the non-overlapping
|
|
matches in a given string. (SRE is maintained by Fredrik Lundh. The
|
|
BIGCHARSET patch was contributed by Martin von Löwis.)
|
|
|
|
* The :mod:`smtplib` module now supports :rfc:`2487`, "Secure SMTP over TLS", so
|
|
it's now possible to encrypt the SMTP traffic between a Python program and the
|
|
mail transport agent being handed a message. :mod:`smtplib` also supports SMTP
|
|
authentication. (Contributed by Gerhard Häring.)
|
|
|
|
* The :mod:`imaplib` module, maintained by Piers Lauder, has support for several
|
|
new extensions: the NAMESPACE extension defined in :rfc:`2342`, SORT, GETACL and
|
|
SETACL. (Contributed by Anthony Baxter and Michel Pelletier.)
|
|
|
|
* The :mod:`rfc822` module's parsing of email addresses is now compliant with
|
|
:rfc:`2822`, an update to :rfc:`822`. (The module's name is *not* going to be
|
|
changed to ``rfc2822``.) A new package, :mod:`email`, has also been added for
|
|
parsing and generating e-mail messages. (Contributed by Barry Warsaw, and
|
|
arising out of his work on Mailman.)
|
|
|
|
* The :mod:`difflib` module now contains a new :class:`Differ` class for
|
|
producing human-readable lists of changes (a "delta") between two sequences of
|
|
lines of text. There are also two generator functions, :func:`ndiff` and
|
|
:func:`restore`, which respectively return a delta from two sequences, or one of
|
|
the original sequences from a delta. (Grunt work contributed by David Goodger,
|
|
from ndiff.py code by Tim Peters who then did the generatorization.)
|
|
|
|
* New constants :const:`ascii_letters`, :const:`ascii_lowercase`, and
|
|
:const:`ascii_uppercase` were added to the :mod:`string` module. There were
|
|
several modules in the standard library that used :const:`string.letters` to
|
|
mean the ranges A-Za-z, but that assumption is incorrect when locales are in
|
|
use, because :const:`string.letters` varies depending on the set of legal
|
|
characters defined by the current locale. The buggy modules have all been fixed
|
|
to use :const:`ascii_letters` instead. (Reported by an unknown person; fixed by
|
|
Fred L. Drake, Jr.)
|
|
|
|
* The :mod:`mimetypes` module now makes it easier to use alternative MIME-type
|
|
databases by the addition of a :class:`MimeTypes` class, which takes a list of
|
|
filenames to be parsed. (Contributed by Fred L. Drake, Jr.)
|
|
|
|
* A :class:`Timer` class was added to the :mod:`threading` module that allows
|
|
scheduling an activity to happen at some future time. (Contributed by Itamar
|
|
Shtull-Trauring.)
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
Interpreter Changes and Fixes
|
|
=============================
|
|
|
|
Some of the changes only affect people who deal with the Python interpreter at
|
|
the C level because they're writing Python extension modules, embedding the
|
|
interpreter, or just hacking on the interpreter itself. If you only write Python
|
|
code, none of the changes described here will affect you very much.
|
|
|
|
* Profiling and tracing functions can now be implemented in C, which can operate
|
|
at much higher speeds than Python-based functions and should reduce the overhead
|
|
of profiling and tracing. This will be of interest to authors of development
|
|
environments for Python. Two new C functions were added to Python's API,
|
|
:cfunc:`PyEval_SetProfile` and :cfunc:`PyEval_SetTrace`. The existing
|
|
:func:`sys.setprofile` and :func:`sys.settrace` functions still exist, and have
|
|
simply been changed to use the new C-level interface. (Contributed by Fred L.
|
|
Drake, Jr.)
|
|
|
|
* Another low-level API, primarily of interest to implementors of Python
|
|
debuggers and development tools, was added. :cfunc:`PyInterpreterState_Head` and
|
|
:cfunc:`PyInterpreterState_Next` let a caller walk through all the existing
|
|
interpreter objects; :cfunc:`PyInterpreterState_ThreadHead` and
|
|
:cfunc:`PyThreadState_Next` allow looping over all the thread states for a given
|
|
interpreter. (Contributed by David Beazley.)
|
|
|
|
* The C-level interface to the garbage collector has been changed to make it
|
|
easier to write extension types that support garbage collection and to debug
|
|
misuses of the functions. Various functions have slightly different semantics,
|
|
so a bunch of functions had to be renamed. Extensions that use the old API will
|
|
still compile but will *not* participate in garbage collection, so updating them
|
|
for 2.2 should be considered fairly high priority.
|
|
|
|
To upgrade an extension module to the new API, perform the following steps:
|
|
|
|
* Rename :cfunc:`Py_TPFLAGS_GC` to :cfunc:`PyTPFLAGS_HAVE_GC`.
|
|
|
|
* Use :cfunc:`PyObject_GC_New` or :cfunc:`PyObject_GC_NewVar` to allocate
|
|
objects, and :cfunc:`PyObject_GC_Del` to deallocate them.
|
|
|
|
* Rename :cfunc:`PyObject_GC_Init` to :cfunc:`PyObject_GC_Track` and
|
|
:cfunc:`PyObject_GC_Fini` to :cfunc:`PyObject_GC_UnTrack`.
|
|
|
|
* Remove :cfunc:`PyGC_HEAD_SIZE` from object size calculations.
|
|
|
|
* Remove calls to :cfunc:`PyObject_AS_GC` and :cfunc:`PyObject_FROM_GC`.
|
|
|
|
* A new ``et`` format sequence was added to :cfunc:`PyArg_ParseTuple`; ``et``
|
|
takes both a parameter and an encoding name, and converts the parameter to the
|
|
given encoding if the parameter turns out to be a Unicode string, or leaves it
|
|
alone if it's an 8-bit string, assuming it to already be in the desired
|
|
encoding. This differs from the ``es`` format character, which assumes that
|
|
8-bit strings are in Python's default ASCII encoding and converts them to the
|
|
specified new encoding. (Contributed by M.-A. Lemburg, and used for the MBCS
|
|
support on Windows described in the following section.)
|
|
|
|
* A different argument parsing function, :cfunc:`PyArg_UnpackTuple`, has been
|
|
added that's simpler and presumably faster. Instead of specifying a format
|
|
string, the caller simply gives the minimum and maximum number of arguments
|
|
expected, and a set of pointers to :ctype:`PyObject\*` variables that will be
|
|
filled in with argument values.
|
|
|
|
* Two new flags :const:`METH_NOARGS` and :const:`METH_O` are available in method
|
|
definition tables to simplify implementation of methods with no arguments or a
|
|
single untyped argument. Calling such methods is more efficient than calling a
|
|
corresponding method that uses :const:`METH_VARARGS`. Also, the old
|
|
:const:`METH_OLDARGS` style of writing C methods is now officially deprecated.
|
|
|
|
* Two new wrapper functions, :cfunc:`PyOS_snprintf` and :cfunc:`PyOS_vsnprintf`
|
|
were added to provide cross-platform implementations for the relatively new
|
|
:cfunc:`snprintf` and :cfunc:`vsnprintf` C lib APIs. In contrast to the standard
|
|
:cfunc:`sprintf` and :cfunc:`vsprintf` functions, the Python versions check the
|
|
bounds of the buffer used to protect against buffer overruns. (Contributed by
|
|
M.-A. Lemburg.)
|
|
|
|
* The :cfunc:`_PyTuple_Resize` function has lost an unused parameter, so now it
|
|
takes 2 parameters instead of 3. The third argument was never used, and can
|
|
simply be discarded when porting code from earlier versions to Python 2.2.
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
Other Changes and Fixes
|
|
=======================
|
|
|
|
As usual there were a bunch of other improvements and bugfixes scattered
|
|
throughout the source tree. A search through the CVS change logs finds there
|
|
were 527 patches applied and 683 bugs fixed between Python 2.1 and 2.2; 2.2.1
|
|
applied 139 patches and fixed 143 bugs; 2.2.2 applied 106 patches and fixed 82
|
|
bugs. These figures are likely to be underestimates.
|
|
|
|
Some of the more notable changes are:
|
|
|
|
* The code for the MacOS port for Python, maintained by Jack Jansen, is now kept
|
|
in the main Python CVS tree, and many changes have been made to support MacOS X.
|
|
|
|
The most significant change is the ability to build Python as a framework,
|
|
enabled by supplying the :option:`--enable-framework` option to the configure
|
|
script when compiling Python. According to Jack Jansen, "This installs a self-
|
|
contained Python installation plus the OS X framework "glue" into
|
|
:file:`/Library/Frameworks/Python.framework` (or another location of choice).
|
|
For now there is little immediate added benefit to this (actually, there is the
|
|
disadvantage that you have to change your PATH to be able to find Python), but
|
|
it is the basis for creating a full-blown Python application, porting the
|
|
MacPython IDE, possibly using Python as a standard OSA scripting language and
|
|
much more."
|
|
|
|
Most of the MacPython toolbox modules, which interface to MacOS APIs such as
|
|
windowing, QuickTime, scripting, etc. have been ported to OS X, but they've been
|
|
left commented out in :file:`setup.py`. People who want to experiment with
|
|
these modules can uncomment them manually.
|
|
|
|
.. Jack's original comments:
|
|
The main change is the possibility to build Python as a
|
|
framework. This installs a self-contained Python installation plus the
|
|
OSX framework "glue" into /Library/Frameworks/Python.framework (or
|
|
another location of choice). For now there is little immedeate added
|
|
benefit to this (actually, there is the disadvantage that you have to
|
|
change your PATH to be able to find Python), but it is the basis for
|
|
creating a fullblown Python application, porting the MacPython IDE,
|
|
possibly using Python as a standard OSA scripting language and much
|
|
more. You enable this with "configure --enable-framework".
|
|
The other change is that most MacPython toolbox modules, which
|
|
interface to all the MacOS APIs such as windowing, quicktime,
|
|
scripting, etc. have been ported. Again, most of these are not of
|
|
immedeate use, as they need a full application to be really useful, so
|
|
they have been commented out in setup.py. People wanting to experiment
|
|
can uncomment them. Gestalt and Internet Config modules are enabled by
|
|
default.
|
|
|
|
* Keyword arguments passed to built-in functions that don't take them now cause a
|
|
:exc:`TypeError` exception to be raised, with the message "*function* takes no
|
|
keyword arguments".
|
|
|
|
* Weak references, added in Python 2.1 as an extension module, are now part of
|
|
the core because they're used in the implementation of new-style classes. The
|
|
:exc:`ReferenceError` exception has therefore moved from the :mod:`weakref`
|
|
module to become a built-in exception.
|
|
|
|
* A new script, :file:`Tools/scripts/cleanfuture.py` by Tim Peters,
|
|
automatically removes obsolete ``__future__`` statements from Python source
|
|
code.
|
|
|
|
* An additional *flags* argument has been added to the built-in function
|
|
:func:`compile`, so the behaviour of ``__future__`` statements can now be
|
|
correctly observed in simulated shells, such as those presented by IDLE and
|
|
other development environments. This is described in :pep:`264`. (Contributed
|
|
by Michael Hudson.)
|
|
|
|
* The new license introduced with Python 1.6 wasn't GPL-compatible. This is
|
|
fixed by some minor textual changes to the 2.2 license, so it's now legal to
|
|
embed Python inside a GPLed program again. Note that Python itself is not
|
|
GPLed, but instead is under a license that's essentially equivalent to the BSD
|
|
license, same as it always was. The license changes were also applied to the
|
|
Python 2.0.1 and 2.1.1 releases.
|
|
|
|
* When presented with a Unicode filename on Windows, Python will now convert it
|
|
to an MBCS encoded string, as used by the Microsoft file APIs. As MBCS is
|
|
explicitly used by the file APIs, Python's choice of ASCII as the default
|
|
encoding turns out to be an annoyance. On Unix, the locale's character set is
|
|
used if :func:`locale.nl_langinfo(CODESET)` is available. (Windows support was
|
|
contributed by Mark Hammond with assistance from Marc-André Lemburg. Unix
|
|
support was added by Martin von Löwis.)
|
|
|
|
* Large file support is now enabled on Windows. (Contributed by Tim Peters.)
|
|
|
|
* The :file:`Tools/scripts/ftpmirror.py` script now parses a :file:`.netrc`
|
|
file, if you have one. (Contributed by Mike Romberg.)
|
|
|
|
* Some features of the object returned by the :func:`xrange` function are now
|
|
deprecated, and trigger warnings when they're accessed; they'll disappear in
|
|
Python 2.3. :class:`xrange` objects tried to pretend they were full sequence
|
|
types by supporting slicing, sequence multiplication, and the :keyword:`in`
|
|
operator, but these features were rarely used and therefore buggy. The
|
|
:meth:`tolist` method and the :attr:`start`, :attr:`stop`, and :attr:`step`
|
|
attributes are also being deprecated. At the C level, the fourth argument to
|
|
the :cfunc:`PyRange_New` function, ``repeat``, has also been deprecated.
|
|
|
|
* There were a bunch of patches to the dictionary implementation, mostly to fix
|
|
potential core dumps if a dictionary contains objects that sneakily changed
|
|
their hash value, or mutated the dictionary they were contained in. For a while
|
|
python-dev fell into a gentle rhythm of Michael Hudson finding a case that
|
|
dumped core, Tim Peters fixing the bug, Michael finding another case, and round
|
|
and round it went.
|
|
|
|
* On Windows, Python can now be compiled with Borland C thanks to a number of
|
|
patches contributed by Stephen Hansen, though the result isn't fully functional
|
|
yet. (But this *is* progress...)
|
|
|
|
* Another Windows enhancement: Wise Solutions generously offered PythonLabs use
|
|
of their InstallerMaster 8.1 system. Earlier PythonLabs Windows installers used
|
|
Wise 5.0a, which was beginning to show its age. (Packaged up by Tim Peters.)
|
|
|
|
* Files ending in ``.pyw`` can now be imported on Windows. ``.pyw`` is a
|
|
Windows-only thing, used to indicate that a script needs to be run using
|
|
PYTHONW.EXE instead of PYTHON.EXE in order to prevent a DOS console from popping
|
|
up to display the output. This patch makes it possible to import such scripts,
|
|
in case they're also usable as modules. (Implemented by David Bolen.)
|
|
|
|
* On platforms where Python uses the C :cfunc:`dlopen` function to load
|
|
extension modules, it's now possible to set the flags used by :cfunc:`dlopen`
|
|
using the :func:`sys.getdlopenflags` and :func:`sys.setdlopenflags` functions.
|
|
(Contributed by Bram Stolk.)
|
|
|
|
* The :func:`pow` built-in function no longer supports 3 arguments when
|
|
floating-point numbers are supplied. ``pow(x, y, z)`` returns ``(x**y) % z``,
|
|
but this is never useful for floating point numbers, and the final result varies
|
|
unpredictably depending on the platform. A call such as ``pow(2.0, 8.0, 7.0)``
|
|
will now raise a :exc:`TypeError` exception.
|
|
|
|
.. ======================================================================
|
|
|
|
|
|
Acknowledgements
|
|
================
|
|
|
|
The author would like to thank the following people for offering suggestions,
|
|
corrections and assistance with various drafts of this article: Fred Bremmer,
|
|
Keith Briggs, Andrew Dalke, Fred L. Drake, Jr., Carel Fellinger, David Goodger,
|
|
Mark Hammond, Stephen Hansen, Michael Hudson, Jack Jansen, Marc-André Lemburg,
|
|
Martin von Löwis, Fredrik Lundh, Michael McLay, Nick Mathewson, Paul Moore,
|
|
Gustavo Niemeyer, Don O'Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom
|
|
Reinhardt, Neil Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.
|
|
|