mirror of https://github.com/python/cpython.git
Updated string literals description to encompass Unicode literals and the
additional escape sequences defined for Unicode. This closes bug #117158.
This commit is contained in:
parent
1367b83797
commit
dea764d7f1
|
@ -304,6 +304,9 @@ escapeseq: "\" <any ASCII character>
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
\index{ASCII@\ASCII{}}
|
\index{ASCII@\ASCII{}}
|
||||||
|
|
||||||
|
\index{triple-quoted string}
|
||||||
|
\index{Unicode Consortium}
|
||||||
|
\index{string!Unicode}
|
||||||
In plain English: String literals can be enclosed in matching single
|
In plain English: String literals can be enclosed in matching single
|
||||||
quotes (\code{'}) or double quotes (\code{"}). They can also be
|
quotes (\code{'}) or double quotes (\code{"}). They can also be
|
||||||
enclosed in matching groups of three single or double quotes (these
|
enclosed in matching groups of three single or double quotes (these
|
||||||
|
@ -311,10 +314,12 @@ are generally referred to as \emph{triple-quoted strings}). The
|
||||||
backslash (\code{\e}) character is used to escape characters that
|
backslash (\code{\e}) character is used to escape characters that
|
||||||
otherwise have a special meaning, such as newline, backslash itself,
|
otherwise have a special meaning, such as newline, backslash itself,
|
||||||
or the quote character. String literals may optionally be prefixed
|
or the quote character. String literals may optionally be prefixed
|
||||||
with a letter `r' or `R'; such strings are called raw strings and use
|
with a letter `r' or `R'; such strings are called
|
||||||
different rules for backslash escape sequences.
|
\dfn{raw strings}\index{raw string} and use different rules for
|
||||||
\index{triple-quoted string}
|
backslash escape sequences. A prefix of 'u' or 'U' makes the string
|
||||||
\index{raw string}
|
a Unicode string. Unicode strings use the Unicode character set as
|
||||||
|
defined by the Unicode Consortium and ISO~10646. Some additional
|
||||||
|
escape sequences, described below, are available in Unicode strings.
|
||||||
|
|
||||||
In triple-quoted strings,
|
In triple-quoted strings,
|
||||||
unescaped newlines and quotes are allowed (and are retained), except
|
unescaped newlines and quotes are allowed (and are retained), except
|
||||||
|
@ -339,25 +344,33 @@ to those used by Standard \C{}. The recognized escape sequences are:
|
||||||
\lineii{\e b} {\ASCII{} Backspace (BS)}
|
\lineii{\e b} {\ASCII{} Backspace (BS)}
|
||||||
\lineii{\e f} {\ASCII{} Formfeed (FF)}
|
\lineii{\e f} {\ASCII{} Formfeed (FF)}
|
||||||
\lineii{\e n} {\ASCII{} Linefeed (LF)}
|
\lineii{\e n} {\ASCII{} Linefeed (LF)}
|
||||||
|
\lineii{\e N\{\var{name}\}}
|
||||||
|
{Character named \var{name} in the Unicode database (Unicode only)}
|
||||||
\lineii{\e r} {\ASCII{} Carriage Return (CR)}
|
\lineii{\e r} {\ASCII{} Carriage Return (CR)}
|
||||||
\lineii{\e t} {\ASCII{} Horizontal Tab (TAB)}
|
\lineii{\e t} {\ASCII{} Horizontal Tab (TAB)}
|
||||||
|
\lineii{\e u\var{xxxx}}
|
||||||
|
{Character with 16-bit hex value \var{xxxx} (Unicode only)}
|
||||||
|
\lineii{\e U\var{xxxxxxxx}}
|
||||||
|
{Character with 32-bit hex value \var{xxxxxxxx} (Unicode only)}
|
||||||
\lineii{\e v} {\ASCII{} Vertical Tab (VT)}
|
\lineii{\e v} {\ASCII{} Vertical Tab (VT)}
|
||||||
\lineii{\e\var{ooo}} {\ASCII{} character with octal value \emph{ooo}}
|
\lineii{\e\var{ooo}} {\ASCII{} character with octal value \var{ooo}}
|
||||||
\lineii{\e x\var{hh...}} {\ASCII{} character with hex value \emph{hh...}}
|
\lineii{\e x\var{hh}} {\ASCII{} character with hex value \var{hh}}
|
||||||
\end{tableii}
|
\end{tableii}
|
||||||
\index{ASCII@\ASCII{}}
|
\index{ASCII@\ASCII{}}
|
||||||
|
|
||||||
In strict compatibility with Standard \C, up to three octal digits are
|
In strict compatibility with Standard C, up to three octal digits are
|
||||||
accepted, but an unlimited number of hex digits is taken to be part of
|
accepted, but an unlimited number of hex digits is taken to be part of
|
||||||
the hex escape (and then the lower 8 bits of the resulting hex number
|
the hex escape (and then the lower 8 bits of the resulting hex number
|
||||||
are used in 8-bit implementations).
|
are used in 8-bit implementations).
|
||||||
|
|
||||||
Unlike Standard \C{},
|
Unlike Standard \index{unrecognized escape sequence}C,
|
||||||
all unrecognized escape sequences are left in the string unchanged,
|
all unrecognized escape sequences are left in the string unchanged,
|
||||||
i.e., \emph{the backslash is left in the string.} (This behavior is
|
i.e., \emph{the backslash is left in the string}. (This behavior is
|
||||||
useful when debugging: if an escape sequence is mistyped, the
|
useful when debugging: if an escape sequence is mistyped, the
|
||||||
resulting output is more easily recognized as broken.)
|
resulting output is more easily recognized as broken.) It is also
|
||||||
\index{unrecognized escape sequence}
|
important to note that the escape sequences marked as ``(Unicode
|
||||||
|
only)'' in the table above fall into the category of unrecognized
|
||||||
|
escapes for non-Unicode string literals.
|
||||||
|
|
||||||
When an `r' or `R' prefix is present, backslashes are still used to
|
When an `r' or `R' prefix is present, backslashes are still used to
|
||||||
quote the following character, but \emph{all backslashes are left in
|
quote the following character, but \emph{all backslashes are left in
|
||||||
|
|
Loading…
Reference in New Issue