cpython/Lib/encodings/__init__.py

""" Standard "encodings" Package

    Standard Python encoding modules are stored in this package
    directory.

    Codec modules must have names corresponding to standard lower-case
    encoding names with hyphens mapped to underscores, e.g. 'utf-8' is
    implemented by the module 'utf_8.py'.

    Each codec module must export the following interface:

    * getregentry() -> (encoder, decoder, stream_reader, stream_writer)
    The getregentry() API must return callable objects which adhere to
    the Python Codec Interface Standard.

    In addition, a module may optionally also define the following
    APIs which are then used by the package's codec search function:

    * getaliases() -> sequence of encoding name strings to use as aliases

    Alias names returned by getaliases() must be lower-case.


Written by Marc-Andre Lemburg (mal@lemburg.com).

(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.

"""#"

import codecs,aliases

_cache = {}
_unknown = '--unknown--'

def search_function(encoding):
    
    # Cache lookup
    entry = _cache.get(encoding,_unknown)
    if entry is not _unknown:
        return entry

    # Import the module
    modname = encoding.replace('-', '_')
    modname = aliases.aliases.get(modname,modname)
    try:
        mod = __import__(modname,globals(),locals(),'*')
    except ImportError,why:
        _cache[encoding] = None
        return None
    
    # Now ask the module for the registry entry
    try:
        entry = tuple(mod.getregentry())
    except AttributeError:
        entry = ()
    if len(entry) != 4:
        raise SystemError,\
              'module "%s.%s" failed to register' % \
              (__name__,modname)
    for obj in entry:
        if not callable(obj):
            raise SystemError,\
                  'incompatible codecs in module "%s.%s"' % \
                  (__name__,modname)

    # Cache the encoding and its aliases
    _cache[encoding] = entry
    try:
        codecaliases = mod.getaliases()
    except AttributeError:
        pass
    else:
        for alias in codecaliases:
            _cache[alias] = entry
    return entry

# Register the search_function in the Python codec registry
codecs.register(search_function)
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00			`""" Standard "encodings" Package`

			`Standard Python encoding modules are stored in this package`
			`directory.`

			`Codec modules must have names corresponding to standard lower-case`
Marc-Andre's third try at this bulk patch seems to work (except that his copy of test_contains.py seems to be broken -- the lines he deleted were already absent). Checkin messages: New Unicode support for int(), float(), complex() and long(). - new APIs PyInt_FromUnicode() and PyLong_FromUnicode() - added support for Unicode to PyFloat_FromString() - new encoding API PyUnicode_EncodeDecimal() which converts Unicode to a decimal char* string (used in the above new APIs) - shortcuts for calls like int(<int object>) and float(<float obj>) - tests for all of the above Unicode compares and contains checks: - comparing Unicode and non-string types now works; TypeErrors are masked, all other errors such as ValueError during Unicode coercion are passed through (note that PyUnicode_Compare does not implement the masking -- PyObject_Compare does this) - contains now works for non-string types too; TypeErrors are masked and 0 returned; all other errors are passed through Better testing support for the standard codecs. Misc minor enhancements, such as an alias dbcs for the mbcs codec. Changes: - PyLong_FromString() now applies the same error checks as does PyInt_FromString(): trailing garbage is reported as error and not longer silently ignored. The only characters which may be trailing the digits are 'L' and 'l' -- these are still silently ignored. - string.ato?() now directly interface to int(), long() and float(). The error strings are now a little different, but the type still remains the same. These functions are now ready to get declared obsolete ;-) - PyNumber_Int() now also does a check for embedded NULL chars in the input string; PyNumber_Long() already did this (and still does) Followed by: Looks like I've gone a step too far there... (and test_contains.py seem to have a bug too). I've changed back to reporting all errors in PyUnicode_Contains() and added a few more test cases to test_contains.py (plus corrected the join() NameError). 2000-04-05 20:11:21 +00:00			`encoding names with hyphens mapped to underscores, e.g. 'utf-8' is`
			`implemented by the module 'utf_8.py'.`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00
			`Each codec module must export the following interface:`

			`* getregentry() -> (encoder, decoder, stream_reader, stream_writer)`
			`The getregentry() API must return callable objects which adhere to`
			`the Python Codec Interface Standard.`

			`In addition, a module may optionally also define the following`
			`APIs which are then used by the package's codec search function:`

			`* getaliases() -> sequence of encoding name strings to use as aliases`

			`Alias names returned by getaliases() must be lower-case.`


			`Written by Marc-Andre Lemburg (mal@lemburg.com).`

			`(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.`

			`"""#"`

Marc-Andre Lemburg <mal@lemburg.com>: Removed import of string module -- use string methods directly. Thanks to Finn Bock. 2000-06-13 12:04:05 +00:00			`import codecs,aliases`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00
			`_cache = {}`
On 17-Mar-2000, Marc-Andre Lemburg said: Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one). 2000-03-20 16:36:48 +00:00			`_unknown = '--unknown--'`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00
			`def search_function(encoding):`

			`# Cache lookup`
On 17-Mar-2000, Marc-Andre Lemburg said: Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one). 2000-03-20 16:36:48 +00:00			`entry = _cache.get(encoding,_unknown)`
			`if entry is not _unknown:`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00			`return entry`

			`# Import the module`
Marc-Andre Lemburg <mal@lemburg.com>: Removed import of string module -- use string methods directly. Thanks to Finn Bock. 2000-06-13 12:04:05 +00:00			`modname = encoding.replace('-', '_')`
Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00			`modname = aliases.aliases.get(modname,modname)`
			`try:`
			`mod = __import__(modname,globals(),locals(),'*')`
			`except ImportError,why:`
			`_cache[encoding] = None`
			`return None`

			`# Now ask the module for the registry entry`
			`try:`
			`entry = tuple(mod.getregentry())`
			`except AttributeError:`
			`entry = ()`
			`if len(entry) != 4:`
			`raise SystemError,\`
			`'module "%s.%s" failed to register' % \`
			`(__name__,modname)`
			`for obj in entry:`
			`if not callable(obj):`
			`raise SystemError,\`
			`'incompatible codecs in module "%s.%s"' % \`
			`(__name__,modname)`

			`# Cache the encoding and its aliases`
			`_cache[encoding] = entry`
			`try:`
			`codecaliases = mod.getaliases()`
			`except AttributeError:`
			`pass`
			`else:`
			`for alias in codecaliases:`
			`_cache[alias] = entry`
			`return entry`

			`# Register the search_function in the Python codec registry`
			`codecs.register(search_function)`