cpython/Lib/encodings/aliases.py

92 lines
2.1 KiB
Python
Raw Normal View History

2000-03-10 23:17:24 +00:00
""" Encoding Aliases Support
This module is used by the encodings package search function to
map encodings names to module names.
Note that the search function converts the encoding names to lower
case and replaces hyphens with underscores *before* performing the
lookup.
"""
aliases = {
# Latin-1
'latin': 'latin_1',
'latin1': 'latin_1',
# UTF-8
'utf': 'utf_8',
'utf8': 'utf_8',
'u8': 'utf_8',
'utf8@ucs2': 'utf_8',
'utf8@ucs4': 'utf_8',
2000-03-10 23:17:24 +00:00
# UTF-16
'utf16': 'utf_16',
'u16': 'utf_16',
'utf_16be': 'utf_16_be',
'utf_16le': 'utf_16_le',
'unicodebigunmarked': 'utf_16_be',
'unicodelittleunmarked': 'utf_16_le',
2000-03-10 23:17:24 +00:00
# ASCII
'us_ascii': 'ascii',
# ISO
'8859': 'latin_1',
'iso8859': 'latin_1',
2000-03-10 23:17:24 +00:00
'iso8859_1': 'latin_1',
'iso_8859_1': 'latin_1',
'iso_8859_10': 'iso8859_10',
'iso_8859_13': 'iso8859_13',
'iso_8859_14': 'iso8859_14',
'iso_8859_15': 'iso8859_15',
'iso_8859_2': 'iso8859_2',
'iso_8859_3': 'iso8859_3',
'iso_8859_4': 'iso8859_4',
'iso_8859_5': 'iso8859_5',
'iso_8859_6': 'iso8859_6',
'iso_8859_7': 'iso8859_7',
'iso_8859_8': 'iso8859_8',
'iso_8859_9': 'iso8859_9',
# Mac
'maclatin2': 'mac_latin2',
'maccentraleurope': 'mac_latin2',
'maccyrillic': 'mac_cyrillic',
'macgreek': 'mac_greek',
'maciceland': 'mac_iceland',
'macroman': 'mac_roman',
'macturkish': 'mac_turkish',
2000-03-10 23:17:24 +00:00
Marc-Andre's third try at this bulk patch seems to work (except that his copy of test_contains.py seems to be broken -- the lines he deleted were already absent). Checkin messages: New Unicode support for int(), float(), complex() and long(). - new APIs PyInt_FromUnicode() and PyLong_FromUnicode() - added support for Unicode to PyFloat_FromString() - new encoding API PyUnicode_EncodeDecimal() which converts Unicode to a decimal char* string (used in the above new APIs) - shortcuts for calls like int(<int object>) and float(<float obj>) - tests for all of the above Unicode compares and contains checks: - comparing Unicode and non-string types now works; TypeErrors are masked, all other errors such as ValueError during Unicode coercion are passed through (note that PyUnicode_Compare does not implement the masking -- PyObject_Compare does this) - contains now works for non-string types too; TypeErrors are masked and 0 returned; all other errors are passed through Better testing support for the standard codecs. Misc minor enhancements, such as an alias dbcs for the mbcs codec. Changes: - PyLong_FromString() now applies the same error checks as does PyInt_FromString(): trailing garbage is reported as error and not longer silently ignored. The only characters which may be trailing the digits are 'L' and 'l' -- these are still silently ignored. - string.ato?() now directly interface to int(), long() and float(). The error strings are now a little different, but the type still remains the same. These functions are now ready to get declared obsolete ;-) - PyNumber_Int() now also does a check for embedded NULL chars in the input string; PyNumber_Long() already did this (and still does) Followed by: Looks like I've gone a step too far there... (and test_contains.py seem to have a bug too). I've changed back to reporting all errors in PyUnicode_Contains() and added a few more test cases to test_contains.py (plus corrected the join() NameError).
2000-04-05 20:11:21 +00:00
# MBCS
'dbcs': 'mbcs',
# Code pages
'437': 'cp437',
# CJK
#
# The codecs for these encodings are not distributed with the
# Python core, but are included here for reference, since the
# locale module relies on having these aliases available.
#
'jis_7': 'jis_7',
'iso_2022_jp': 'jis_7',
'ujis': 'euc_jp',
'ajec': 'euc_jp',
'eucjp': 'euc_jp',
'tis260': 'tactis',
'sjis': 'shift_jis',
# Content transfer/compression encodings
'rot13': 'rot_13',
'base64': 'base64_codec',
'base_64': 'base64_codec',
'zlib': 'zlib_codec',
'zip': 'zlib_codec',
'hex': 'hex_codec',
'uu': 'uu_codec',
2000-03-10 23:17:24 +00:00
}