cpython/Doc/library/fileinput.rst

171 lines
6.7 KiB
ReStructuredText
Raw Normal View History

2007-08-15 14:28:22 +00:00
:mod:`fileinput` --- Iterate over lines from multiple input streams
===================================================================
.. module:: fileinput
:synopsis: Loop over standard input or a list of files.
.. moduleauthor:: Guido van Rossum <guido@python.org>
.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
Merged revisions 58095-58132,58136-58148,58151-58197 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r58096 | brett.cannon | 2007-09-10 23:38:27 +0200 (Mon, 10 Sep 2007) | 4 lines Fix a possible segfault from recursing too deep to get the repr of a list. Closes issue #1096. ........ r58097 | bill.janssen | 2007-09-10 23:51:02 +0200 (Mon, 10 Sep 2007) | 33 lines More work on SSL support. * Much expanded test suite: All protocols tested against all other protocols. All protocols tested with all certificate options. Tests for bad key and bad cert. Test of STARTTLS functionality. Test of RAND_* functions. * Fixes for threading/malloc bug. * Issue 1065 fixed: sslsocket class renamed to SSLSocket. sslerror class renamed to SSLError. Function "wrap_socket" now used to wrap an existing socket. * Issue 1583946 finally fixed: Support for subjectAltName added. Subject name now returned as proper DN list of RDNs. * SSLError exported from socket as "sslerror". * RAND_* functions properly exported from ssl.py. * Documentation improved: Example of how to create a self-signed certificate. Better indexing. ........ r58098 | guido.van.rossum | 2007-09-11 00:02:25 +0200 (Tue, 11 Sep 2007) | 9 lines Patch # 1140 (my code, approved by Effbot). Make sure the type of the return value of re.sub(x, y, z) is the type of y+x (i.e. unicode if either is unicode, str if they are both str) even if there are no substitutions or if x==z (which triggered various special cases in join_list()). Could be backported to 2.5; no need to port to 3.0. ........ r58099 | guido.van.rossum | 2007-09-11 00:36:02 +0200 (Tue, 11 Sep 2007) | 8 lines Patch # 1026 by Benjamin Aranguren (with Alex Martelli): Backport abc.py and isinstance/issubclass overloading to 2.6. I had to backport test_typechecks.py myself, and make one small change to abc.py to avoid duplicate work when x.__class__ and type(x) are the same. ........ r58100 | bill.janssen | 2007-09-11 01:41:24 +0200 (Tue, 11 Sep 2007) | 3 lines A better way of finding an open port to test with. ........ r58101 | bill.janssen | 2007-09-11 03:09:19 +0200 (Tue, 11 Sep 2007) | 4 lines Make sure test_ssl doesn't reference the ssl module in a context where it can't be imported. ........ r58102 | bill.janssen | 2007-09-11 04:42:07 +0200 (Tue, 11 Sep 2007) | 3 lines Fix some documentation bugs. ........ r58103 | nick.coghlan | 2007-09-11 16:01:18 +0200 (Tue, 11 Sep 2007) | 1 line Always use the -E flag when spawning subprocesses in test_cmd_line (Issue 1056) ........ r58106 | thomas.heller | 2007-09-11 21:17:48 +0200 (Tue, 11 Sep 2007) | 3 lines Disable some tests that fail on the 'ppc Debian unstable' buildbot to find out if they cause the segfault on the 'alpha Debian' machine. ........ r58108 | brett.cannon | 2007-09-11 23:02:28 +0200 (Tue, 11 Sep 2007) | 6 lines Generators had their throw() method allowing string exceptions. That's a no-no. Fixes issue #1147. Need to fix 2.5 to raise a proper warning if a string exception is passed in. ........ r58112 | georg.brandl | 2007-09-12 20:03:51 +0200 (Wed, 12 Sep 2007) | 3 lines New documentation page for the bdb module. (This doesn't need to be merged to Py3k.) ........ r58114 | georg.brandl | 2007-09-12 20:05:57 +0200 (Wed, 12 Sep 2007) | 2 lines Bug #1152: use non-deprecated name in example. ........ r58115 | georg.brandl | 2007-09-12 20:08:33 +0200 (Wed, 12 Sep 2007) | 2 lines Fix #1122: wrong return type documented for various _Size() functions. ........ r58117 | georg.brandl | 2007-09-12 20:10:56 +0200 (Wed, 12 Sep 2007) | 2 lines Fix #1139: PyFile_Encoding really is PyFile_SetEncoding. ........ r58119 | georg.brandl | 2007-09-12 20:29:18 +0200 (Wed, 12 Sep 2007) | 2 lines bug #1154: release memory allocated by "es" PyArg_ParseTuple format specifier. ........ r58121 | bill.janssen | 2007-09-12 20:52:05 +0200 (Wed, 12 Sep 2007) | 1 line root certificate for https://svn.python.org/, used in test_ssl ........ r58122 | georg.brandl | 2007-09-12 21:00:07 +0200 (Wed, 12 Sep 2007) | 3 lines Bug #1153: repr.repr() now doesn't require set and dictionary items to be orderable to properly represent them. ........ r58125 | georg.brandl | 2007-09-12 21:29:28 +0200 (Wed, 12 Sep 2007) | 4 lines #1120: put explicit version in the shebang lines of pydoc, idle and smtpd.py scripts that are installed by setup.py. That way, they work when only "make altinstall" is used. ........ r58139 | mark.summerfield | 2007-09-13 16:54:30 +0200 (Thu, 13 Sep 2007) | 9 lines Replaced variable o with obj in operator.rst because o is easy to confuse. Added a note about Python 3's collections.Mapping etc., above section that describes isMappingType() etc. Added xrefs between os, os.path, fileinput, and open(). ........ r58143 | facundo.batista | 2007-09-13 20:13:15 +0200 (Thu, 13 Sep 2007) | 7 lines Merged the decimal-branch (revisions 54886 to 58140). Decimal is now fully updated to the latests Decimal Specification (v1.66) and the latests test cases (v2.56). Thanks to Mark Dickinson for all his help during this process. ........ r58145 | facundo.batista | 2007-09-13 20:42:09 +0200 (Thu, 13 Sep 2007) | 7 lines Put the parameter watchexp back in (changed watchexp from an int to a bool). Also second argument to watchexp is now converted to Decimal, just as with all the other two-argument operations. Thanks Mark Dickinson. ........ r58147 | andrew.kuchling | 2007-09-14 00:49:34 +0200 (Fri, 14 Sep 2007) | 1 line Add various items ........ r58148 | andrew.kuchling | 2007-09-14 00:50:10 +0200 (Fri, 14 Sep 2007) | 1 line Make target unique ........ r58154 | facundo.batista | 2007-09-14 20:58:34 +0200 (Fri, 14 Sep 2007) | 3 lines Included the new functions, and new descriptions. ........ r58155 | thomas.heller | 2007-09-14 21:40:35 +0200 (Fri, 14 Sep 2007) | 2 lines ctypes.util.find_library uses dump(1) instead of objdump(1) on Solaris. Fixes issue #1777530; will backport to release25-maint. ........ r58159 | facundo.batista | 2007-09-14 23:29:52 +0200 (Fri, 14 Sep 2007) | 3 lines Some additions (examples and a bit on the tutorial). ........ r58160 | georg.brandl | 2007-09-15 18:53:36 +0200 (Sat, 15 Sep 2007) | 2 lines Remove bdb from the "undocumented modules" list. ........ r58164 | bill.janssen | 2007-09-17 00:06:00 +0200 (Mon, 17 Sep 2007) | 15 lines Add support for asyncore server-side SSL support. This requires adding the 'makefile' method to ssl.SSLSocket, and importing the requisite fakefile class from socket.py, and making the appropriate changes to it to make it use the SSL connection. Added sample HTTPS server to test_ssl.py, and test that uses it. Change SSL tests to use https://svn.python.org/, instead of www.sf.net and pop.gmail.com. Added utility function to ssl module, get_server_certificate, to wrap up the several things to be done to pull a certificate from a remote server. ........ r58173 | bill.janssen | 2007-09-17 01:16:46 +0200 (Mon, 17 Sep 2007) | 1 line use binary mode when reading files for testAsyncore to make Windows happy ........ r58175 | raymond.hettinger | 2007-09-17 02:55:00 +0200 (Mon, 17 Sep 2007) | 7 lines Sync-up named tuples with the latest version of the ASPN recipe. Allows optional commas in the field-name spec (help when named tuples are used in conjuction with sql queries). Adds the __fields__ attribute for introspection and to support conversion to dictionary form. Adds a __replace__() method similar to str.replace() but using a named field as a target. Clean-up spelling and presentation in doc-strings. ........ r58176 | brett.cannon | 2007-09-17 05:28:34 +0200 (Mon, 17 Sep 2007) | 5 lines Add a bunch of GIL release/acquire points in tp_print implementations and for PyObject_Print(). Closes issue #1164. ........ r58177 | sean.reifschneider | 2007-09-17 07:45:04 +0200 (Mon, 17 Sep 2007) | 2 lines issue1597011: Fix for bz2 module corner-case error due to error checking bug. ........ r58180 | facundo.batista | 2007-09-17 18:26:50 +0200 (Mon, 17 Sep 2007) | 3 lines Decimal is updated, :) ........ r58181 | facundo.batista | 2007-09-17 19:30:13 +0200 (Mon, 17 Sep 2007) | 5 lines The methods always return Decimal classes, even if they're executed through a subclass (thanks Mark Dickinson). Added a bit of testing for this. ........ r58183 | sean.reifschneider | 2007-09-17 22:53:21 +0200 (Mon, 17 Sep 2007) | 2 lines issue1082: Fixing platform and system for Vista. ........ r58185 | andrew.kuchling | 2007-09-18 03:36:16 +0200 (Tue, 18 Sep 2007) | 1 line Add item; sort properly ........ r58186 | raymond.hettinger | 2007-09-18 05:33:19 +0200 (Tue, 18 Sep 2007) | 1 line Handle corner cased on 0-tuples and 1-tuples. Add verbose option so people can see how it works. ........ r58192 | georg.brandl | 2007-09-18 09:24:40 +0200 (Tue, 18 Sep 2007) | 2 lines A bit of reordering, also show more subheadings in the lang ref index. ........ r58193 | facundo.batista | 2007-09-18 18:53:18 +0200 (Tue, 18 Sep 2007) | 4 lines Speed up of the various division operations (remainder, divide, divideint and divmod). Thanks Mark Dickinson. ........ r58197 | raymond.hettinger | 2007-09-19 00:18:02 +0200 (Wed, 19 Sep 2007) | 1 line Cleanup docs for NamedTuple. ........
2007-09-19 03:06:30 +00:00
This module implements a helper class and functions to quickly write a
loop over standard input or a list of files. If you just want to read or
write one file see :func:`open`.
2007-08-15 14:28:22 +00:00
The typical use is::
import fileinput
for line in fileinput.input():
process(line)
This iterates over the lines of all files listed in ``sys.argv[1:]``, defaulting
to ``sys.stdin`` if the list is empty. If a filename is ``'-'``, it is also
replaced by ``sys.stdin``. To specify an alternative list of filenames, pass it
as the first argument to :func:`.input`. A single file name is also allowed.
2007-08-15 14:28:22 +00:00
All files are opened in text mode by default, but you can override this by
specifying the *mode* parameter in the call to :func:`.input` or
2007-08-15 14:28:22 +00:00
:class:`FileInput()`. If an I/O error occurs during opening or reading a file,
:exc:`IOError` is raised.
If ``sys.stdin`` is used more than once, the second and further use will return
no lines, except perhaps for interactive use, or if it has been explicitly reset
(e.g. using ``sys.stdin.seek(0)``).
Empty files are opened and immediately closed; the only time their presence in
the list of filenames is noticeable at all is when the last file opened is
empty.
Lines are returned with any newlines intact, which means that the last line in
a file may not have one.
You can control how files are opened by providing an opening hook via the
*openhook* parameter to :func:`fileinput.input` or :class:`FileInput()`. The
hook must be a function that takes two arguments, *filename* and *mode*, and
returns an accordingly opened file-like object. Two useful hooks are already
provided by this module.
The following function is the primary interface of this module:
.. function:: input(files=None, inplace=False, backup='', bufsize=0, mode='r', openhook=None)
2007-08-15 14:28:22 +00:00
Create an instance of the :class:`FileInput` class. The instance will be used
as global state for the functions of this module, and is also returned to use
during iteration. The parameters to this function will be passed along to the
constructor of the :class:`FileInput` class.
The following functions use the global state created by :func:`fileinput.input`;
if there is no active state, :exc:`RuntimeError` is raised.
.. function:: filename()
Return the name of the file currently being read. Before the first line has
been read, returns ``None``.
.. function:: fileno()
Return the integer "file descriptor" for the current file. When no file is
opened (before the first line and between files), returns ``-1``.
.. function:: lineno()
Return the cumulative line number of the line that has just been read. Before
the first line has been read, returns ``0``. After the last line of the last
file has been read, returns the line number of that line.
.. function:: filelineno()
Return the line number in the current file. Before the first line has been
read, returns ``0``. After the last line of the last file has been read,
returns the line number of that line within the file.
.. function:: isfirstline()
Returns true if the line just read is the first line of its file, otherwise
returns false.
.. function:: isstdin()
Returns true if the last line was read from ``sys.stdin``, otherwise returns
false.
.. function:: nextfile()
Close the current file so that the next iteration will read the first line from
the next file (if any); lines not read from the file will not count towards the
cumulative line count. The filename is not changed until after the first line
of the next file has been read. Before the first line has been read, this
function has no effect; it cannot be used to skip the first file. After the
last line of the last file has been read, this function has no effect.
.. function:: close()
Close the sequence.
The class which implements the sequence behavior provided by the module is
available for subclassing as well:
.. class:: FileInput(files=None, inplace=False, backup='', bufsize=0, mode='r', openhook=None)
2007-08-15 14:28:22 +00:00
Class :class:`FileInput` is the implementation; its methods :meth:`filename`,
:meth:`fileno`, :meth:`lineno`, :meth:`filelineno`, :meth:`isfirstline`,
:meth:`isstdin`, :meth:`nextfile` and :meth:`close` correspond to the functions
of the same name in the module. In addition it has a :meth:`readline` method
which returns the next input line, and a :meth:`__getitem__` method which
implements the sequence behavior. The sequence must be accessed in strictly
sequential order; random access and :meth:`readline` cannot be mixed.
With *mode* you can specify which file mode will be passed to :func:`open`. It
must be one of ``'r'``, ``'rU'``, ``'U'`` and ``'rb'``.
The *openhook*, when given, must be a function that takes two arguments,
*filename* and *mode*, and returns an accordingly opened file-like object. You
cannot use *inplace* and *openhook* together.
**Optional in-place filtering:** if the keyword argument ``inplace=1`` is passed
to :func:`fileinput.input` or to the :class:`FileInput` constructor, the file is
moved to a backup file and standard output is directed to the input file (if a
file of the same name as the backup file already exists, it will be replaced
silently). This makes it possible to write a filter that rewrites its input
file in place. If the *backup* parameter is given (typically as
``backup='.<some extension>'``), it specifies the extension for the backup file,
and the backup file remains around; by default, the extension is ``'.bak'`` and
it is deleted when the output file is closed. In-place filtering is disabled
when standard input is read.
.. note::
2009-01-03 21:18:54 +00:00
The current implementation does not work for MS-DOS 8+3 filesystems.
2007-08-15 14:28:22 +00:00
The two following opening hooks are provided by this module:
2007-08-15 14:28:22 +00:00
.. function:: hook_compressed(filename, mode)
Transparently opens files compressed with gzip and bzip2 (recognized by the
extensions ``'.gz'`` and ``'.bz2'``) using the :mod:`gzip` and :mod:`bz2`
modules. If the filename extension is not ``'.gz'`` or ``'.bz2'``, the file is
opened normally (ie, using :func:`open` without any decompression).
Usage example: ``fi = fileinput.FileInput(openhook=fileinput.hook_compressed)``
.. function:: hook_encoded(encoding)
Returns a hook which opens each file with :func:`codecs.open`, using the given
*encoding* to read the file.
Usage example: ``fi =
fileinput.FileInput(openhook=fileinput.hook_encoded("iso-8859-1"))``