mirror of https://github.com/python/cpython.git
More Unicode corrections from MAL to match a post-2.2a1 change
Mention additional new imaplib.py features (Don't expect to see an updated version of the Web page until around the 28th of July. Vacation time!)
This commit is contained in:
parent
6c6bfb7c70
commit
a6d2a04065
|
@ -339,33 +339,22 @@ and Tim Peters, with other fixes from the Python Labs crew.}
|
||||||
\section{Unicode Changes}
|
\section{Unicode Changes}
|
||||||
|
|
||||||
Python's Unicode support has been enhanced a bit in 2.2. Unicode
|
Python's Unicode support has been enhanced a bit in 2.2. Unicode
|
||||||
strings are usually stored as UTF-16, as 16-bit unsigned integers.
|
strings are usually stored as UCS-2, as 16-bit unsigned integers.
|
||||||
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
|
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
|
||||||
integers, as its internal encoding by supplying
|
integers, as its internal encoding by supplying
|
||||||
\longprogramopt{enable-unicode=ucs4} to the configure script. When
|
\longprogramopt{enable-unicode=ucs4} to the configure script. When
|
||||||
built to use UCS-4 (a ``wide Python''), the interpreter can natively
|
built to use UCS-4 (a ``wide Python''), the interpreter can natively
|
||||||
handle Unicode characters from U+000000 to U+110000. The range of
|
handle Unicode characters from U+000000 to U+110000, so the range of
|
||||||
legal values for the \function{unichr()} function has been expanded;
|
legal values for the \function{unichr()} function is expanded
|
||||||
it used to only accept values up to 65535, but in 2.2 will accept
|
accordingly. Using an interpreter compiled to use UCS-2 (a ``narrow
|
||||||
values from 0 to 0x110000. Using a ``narrow Python'', an interpreter
|
Python''), values greater than 65535 will still cause
|
||||||
compiled to use UTF-16, values greater than 65535 will result in
|
\function{unichr()} to raise a \exception{ValueError} exception.
|
||||||
\function{unichr()} returning a string of length 2:
|
|
||||||
|
|
||||||
\begin{verbatim}
|
|
||||||
>>> s = unichr(65536)
|
|
||||||
>>> s
|
|
||||||
u'\ud800\udc00'
|
|
||||||
>>> len(s)
|
|
||||||
2
|
|
||||||
\end{verbatim}
|
|
||||||
|
|
||||||
This possibly-confusing behaviour, breaking the intuitive invariant
|
|
||||||
that \function{chr()} and\function{unichr()} always return strings of
|
|
||||||
length 1, may be changed later in 2.2 depending on public reaction.
|
|
||||||
|
|
||||||
All this is the province of the still-unimplemented PEP 261, ``Support
|
All this is the province of the still-unimplemented PEP 261, ``Support
|
||||||
for `wide' Unicode characters''; consult it for further details, and
|
for `wide' Unicode characters''; consult it for further details, and
|
||||||
please offer comments and suggestions on the proposal it describes.
|
please offer comments on the PEP and on your experiences with the
|
||||||
|
2.2 alpha releases.
|
||||||
|
% XXX update previous line once 2.2 reaches beta.
|
||||||
|
|
||||||
Another change is much simpler to explain. Since their introduction,
|
Another change is much simpler to explain. Since their introduction,
|
||||||
Unicode strings have supported an \method{encode()} method to convert
|
Unicode strings have supported an \method{encode()} method to convert
|
||||||
|
@ -576,9 +565,10 @@ See \url{http://www.xmlrpc.com/} for more information about XML-RPC.
|
||||||
two. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was
|
two. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was
|
||||||
contributed by Martin von L\"owis.)
|
contributed by Martin von L\"owis.)
|
||||||
|
|
||||||
\item The \module{imaplib} module now has support for the IMAP
|
\item The \module{imaplib} module, maintained by Piers Lauder, has
|
||||||
NAMESPACE extension defined in \rfc{2342}. (Contributed by Michel
|
support for several new extensions: the NAMESPACE extension defined
|
||||||
Pelletier.)
|
in \rfc{2342}, SORT, GETACL and SETACL. (Contributed by Anthony
|
||||||
|
Baxter and Michel Pelletier.)
|
||||||
|
|
||||||
\item The \module{rfc822} module's parsing of email addresses is
|
\item The \module{rfc822} module's parsing of email addresses is
|
||||||
now compliant with \rfc{2822}, an update to \rfc{822}. The module's
|
now compliant with \rfc{2822}, an update to \rfc{822}. The module's
|
||||||
|
|
Loading…
Reference in New Issue