mirror of https://github.com/python/cpython.git
164 lines
6.7 KiB
TeX
164 lines
6.7 KiB
TeX
\section{Standard Module \sectcode{rfc822}}
|
|
\label{module-rfc822}
|
|
\stmodindex{rfc822}
|
|
|
|
|
|
This module defines a class, \class{Message}, which represents a
|
|
collection of ``email headers'' as defined by the Internet standard
|
|
\rfc{822}. It is used in various contexts, usually to read such
|
|
headers from a file.
|
|
|
|
Note that there's a separate module to read \UNIX{}, MH, and MMDF
|
|
style mailbox files: \module{mailbox}\refstmodindex{mailbox}.
|
|
|
|
\begin{classdesc}{Message}{file\optional{, seekable}}
|
|
A \class{Message} instance is instantiated with an open file object as
|
|
parameter. The optional \var{seekable} parameter indicates if the
|
|
file object is seekable; the default value is \code{1} for true.
|
|
Instantiation reads headers from the file up to a blank line and
|
|
stores them in the instance; after instantiation, the file is
|
|
positioned directly after the blank line that terminates the headers.
|
|
|
|
Input lines as read from the file may either be terminated by CR-LF or
|
|
by a single linefeed; a terminating CR-LF is replaced by a single
|
|
linefeed before the line is stored.
|
|
|
|
All header matching is done independent of upper or lower case;
|
|
e.g. \code{\var{m}['From']}, \code{\var{m}['from']} and
|
|
\code{\var{m}['FROM']} all yield the same result.
|
|
\end{classdesc}
|
|
|
|
\begin{funcdesc}{parsedate}{date}
|
|
Attempts to parse a date according to the rules in \rfc{822}.
|
|
however, some mailers don't follow that format as specified, so
|
|
\function{parsedate()} tries to guess correctly in such cases.
|
|
\var{date} is a string containing an \rfc{822} date, such as
|
|
\code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
|
|
the date, \function{parsedate()} returns a 9-tuple that can be passed
|
|
directly to \function{time.mktime()}; otherwise \code{None} will be
|
|
returned.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{parsedate_tz}{date}
|
|
Performs the same function as \function{parsedate()}, but returns
|
|
either \code{None} or a 10-tuple; the first 9 elements make up a tuple
|
|
that can be passed directly to \function{time.mktime()}, and the tenth
|
|
is the offset of the date's timezone from UTC (which is the official
|
|
term for Greenwich Mean Time). (Note that the sign of the timezone
|
|
offset is the opposite of the sign of the \code{time.timezone}
|
|
variable for the same timezone; the latter variable follows the
|
|
\POSIX{} standard while this module follows \rfc{822}.) If the input
|
|
string has no timezone, the last element of the tuple returned is
|
|
\code{None}.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{mktime_tz}{tuple}
|
|
Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
|
|
timestamp. It the timezone item in the tuple is \code{None}, assume
|
|
local time. Minor deficiency: this first interprets the first 8
|
|
elements as a local time and then compensates for the timezone
|
|
difference; this may yield a slight error around daylight savings time
|
|
switch dates. Not enough to worry about for common use.
|
|
\end{funcdesc}
|
|
|
|
\subsection{Message Objects}
|
|
|
|
A \class{Message} instance has the following methods:
|
|
|
|
\begin{funcdesc}{rewindbody}{}
|
|
Seek to the start of the message body. This only works if the file
|
|
object is seekable.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{getallmatchingheaders}{name}
|
|
Return a list of lines consisting of all headers matching
|
|
\var{name}, if any. Each physical line, whether it is a continuation
|
|
line or not, is a separate list item. Return the empty list if no
|
|
header matches \var{name}.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{getfirstmatchingheader}{name}
|
|
Return a list of lines comprising the first header matching
|
|
\var{name}, and its continuation line(s), if any. Return \code{None}
|
|
if there is no header matching \var{name}.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{getrawheader}{name}
|
|
Return a single string consisting of the text after the colon in the
|
|
first header matching \var{name}. This includes leading whitespace,
|
|
the trailing linefeed, and internal linefeeds and whitespace if there
|
|
any continuation line(s) were present. Return \code{None} if there is
|
|
no header matching \var{name}.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{getheader}{name}
|
|
Like \code{getrawheader(\var{name})}, but strip leading and trailing
|
|
whitespace. Internal whitespace is not stripped.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{getaddr}{name}
|
|
Return a pair \code{(\var{full name}, \var{email address})} parsed
|
|
from the string returned by \code{getheader(\var{name})}. If no
|
|
header matching \var{name} exists, return \code{(None, None)};
|
|
otherwise both the full name and the address are (possibly empty)
|
|
strings.
|
|
|
|
Example: If \var{m}'s first \code{From} header contains the string
|
|
\code{'jack@cwi.nl (Jack Jansen)'}, then
|
|
\code{m.getaddr('From')} will yield the pair
|
|
\code{('Jack Jansen', 'jack@cwi.nl')}.
|
|
If the header contained
|
|
\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
|
|
exact same result.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{getaddrlist}{name}
|
|
This is similar to \code{getaddr(\var{list})}, but parses a header
|
|
containing a list of email addresses (e.g. a \code{To} header) and
|
|
returns a list of \code{(\var{full name}, \var{email address})} pairs
|
|
(even if there was only one address in the header). If there is no
|
|
header matching \var{name}, return an empty list.
|
|
|
|
XXX The current version of this function is not really correct. It
|
|
yields bogus results if a full name contains a comma.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{getdate}{name}
|
|
Retrieve a header using \method{getheader()} and parse it into a 9-tuple
|
|
compatible with \function{time.mktime()}. If there is no header matching
|
|
\var{name}, or it is unparsable, return \code{None}.
|
|
|
|
Date parsing appears to be a black art, and not all mailers adhere to
|
|
the standard. While it has been tested and found correct on a large
|
|
collection of email from many sources, it is still possible that this
|
|
function may occasionally yield an incorrect result.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{getdate_tz}{name}
|
|
Retrieve a header using \method{getheader()} and parse it into a
|
|
10-tuple; the first 9 elements will make a tuple compatible with
|
|
\function{time.mktime()}, and the 10th is a number giving the offset
|
|
of the date's timezone from UTC. Similarly to \method{getdate()}, if
|
|
there is no header matching \var{name}, or it is unparsable, return
|
|
\code{None}.
|
|
\end{funcdesc}
|
|
|
|
\class{Message} instances also support a read-only mapping interface.
|
|
In particular: \code{\var{m}[name]} is the same as
|
|
\code{\var{m}.getheader(name)}; and \code{len(\var{m})},
|
|
\code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
|
|
\code{\var{m}.values()} and \code{\var{m}.items()} act as expected
|
|
(and consistently).
|
|
|
|
Finally, \class{Message} instances have two public instance variables:
|
|
|
|
\begin{datadesc}{headers}
|
|
A list containing the entire set of header lines, in the order in
|
|
which they were read. Each line contains a trailing newline. The
|
|
blank line terminating the headers is not contained in the list.
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{fp}
|
|
The file object passed at instantiation time.
|
|
\end{datadesc}
|