From 7cba23164cf82f6619db002cd30021b5dfb1f809 Mon Sep 17 00:00:00 2001 From: Jack DeVries <58614260+jdevries3133@users.noreply.github.com> Date: Tue, 24 Aug 2021 13:01:41 -0400 Subject: [PATCH] bpo-39452: Rewrite and expand __main__.rst (#26883) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Broadened scope of the document to explicitly discuss and differentiate between ``__main__.py`` in packages versus the ``__name__ == '__main__'`` expression (and the idioms that surround it), as well as ``import __main__``. Co-authored-by: Géry Ogam Co-authored-by: Éric Araujo Co-authored-by: Łukasz Langa --- Doc/library/__main__.rst | 377 +++++++++++++++++- Doc/reference/import.rst | 2 + Doc/tools/susp-ignored.csv | 1 + Doc/tutorial/modules.rst | 2 + .../2021-06-23-15-21-36.bpo-39452.o_I-6d.rst | 4 + 5 files changed, 369 insertions(+), 17 deletions(-) create mode 100644 Misc/NEWS.d/next/Documentation/2021-06-23-15-21-36.bpo-39452.o_I-6d.rst diff --git a/Doc/library/__main__.rst b/Doc/library/__main__.rst index a64faf1bbe3..116a9a9d1d7 100644 --- a/Doc/library/__main__.rst +++ b/Doc/library/__main__.rst @@ -1,25 +1,368 @@ - -:mod:`__main__` --- Top-level script environment -================================================ +:mod:`__main__` --- Top-level code environment +============================================== .. module:: __main__ - :synopsis: The environment where the top-level script is run. + :synopsis: The environment where top-level code is run. Covers command-line + interfaces, import-time behavior, and ``__name__ == '__main__'``. -------------- -``'__main__'`` is the name of the scope in which top-level code executes. -A module's __name__ is set equal to ``'__main__'`` when read from -standard input, a script, or from an interactive prompt. +In Python, the special name ``__main__`` is used for two important constructs: -A module can discover whether or not it is running in the main scope by -checking its own ``__name__``, which allows a common idiom for conditionally -executing code in a module when it is run as a script or with ``python --m`` but not when it is imported:: +1. the name of the top-level environment of the program, which can be + checked using the ``__name__ == '__main__'`` expression; and +2. the ``__main__.py`` file in Python packages. - if __name__ == "__main__": - # execute only if run as a script - main() +Both of these mechanisms are related to Python modules; how users interact with +them and how they interact with each other. They are explained in detail +below. If you're new to Python modules, see the tutorial section +:ref:`tut-modules` for an introduction. -For a package, the same effect can be achieved by including a -``__main__.py`` module, the contents of which will be executed when the -module is run with ``-m``. + +.. _name_equals_main: + +``__name__ == '__main__'`` +--------------------------- + +When a Python module or package is imported, ``__name__`` is set to the +module's name. Usually, this is the name of the Python file itself without the +``.py`` extension:: + + >>> import configparser + >>> configparser.__name__ + 'configparser' + +If the file is part of a package, ``__name__`` will also include the parent +package's path:: + + >>> from concurrent.futures import process + >>> process.__name__ + 'concurrent.futures.process' + +However, if the module is executed in the top-level code environment, +its ``__name__`` is set to the string ``'__main__'``. + +What is the "top-level code environment"? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``__main__`` is the name of the environment where top-level code is run. +"Top-level code" is the first user-specified Python module that starts running. +It's "top-level" because it imports all other modules that the program needs. +Sometimes "top-level code" is called an *entry point* to the application. + +The top-level code environment can be: + +* the scope of an interactive prompt:: + + >>> __name__ + '__main__' + +* the Python module passed to the Python interpreter as a file argument: + + .. code-block:: shell-session + + $ python3 helloworld.py + Hello, world! + +* the Python module or package passed to the Python interpreter with the + :option:`-m` argument: + + .. code-block:: shell-session + + $ python3 -m tarfile + usage: tarfile.py [-h] [-v] (...) + +* Python code read by the Python interpreter from standard input: + + .. code-block:: shell-session + + $ echo "import this" | python3 + The Zen of Python, by Tim Peters + + Beautiful is better than ugly. + Explicit is better than implicit. + ... + +* Python code passed to the Python interpreter with the :option:`-c` argument: + + .. code-block:: shell-session + + $ python3 -c "import this" + The Zen of Python, by Tim Peters + + Beautiful is better than ugly. + Explicit is better than implicit. + ... + +In each of these situations, the top-level module's ``__name__`` is set to +``'__main__'``. + +As a result, a module can discover whether or not it is running in the +top-level environment by checking its own ``__name__``, which allows a common +idiom for conditionally executing code when the module is not initialized from +an import statement:: + + if __name__ == '__main__': + # Execute when the module is not initialized from an import statement. + ... + +.. seealso:: + + For a more detailed look at how ``__name__`` is set in all situations, see + the tutorial section :ref:`tut-modules`. + + +Idiomatic Usage +^^^^^^^^^^^^^^^ + +Some modules contain code that is intended for script use only, like parsing +command-line arguments or fetching data from standard input. When a module +like this were to be imported from a different module, for example to unit test +it, the script code would unintentionally execute as well. + +This is where using the ``if __name__ == '__main__'`` code block comes in +handy. Code within this block won't run unless the module is executed in the +top-level environment. + +Putting as few statements as possible in the block below ``if __name___ == +'__main__'`` can improve code clarity and correctness. Most often, a function +named ``main`` encapsulates the program's primary behavior:: + + # echo.py + + import shlex + import sys + + def echo(phrase: str) -> None: + """A dummy wrapper around print.""" + # for demonstration purposes, you can imagine that there is some + # valuable and reusable logic inside this function + print(phrase) + + def main() -> int: + """Echo the input arguments to standard output""" + phrase = shlex.join(sys.argv) + echo(phrase) + return 0 + + if __name__ == '__main__': + sys.exit(main()) # next section explains the use of sys.exit + +Note that if the module didn't encapsulate code inside the ``main`` function +but instead put it directly within the ``if __name__ == '__main__'`` block, +the ``phrase`` variable would be global to the entire module. This is +error-prone as other functions within the module could be unintentionally using +the global variable instead of a local name. A ``main`` function solves this +problem. + +Using a ``main`` function has the added benefit of the ``echo`` function itself +being isolated and importable elsewhere. When ``echo.py`` is imported, the +``echo`` and ``main`` functions will be defined, but neither of them will be +called, because ``__name__ != '__main__'``. + + +Packaging Considerations +^^^^^^^^^^^^^^^^^^^^^^^^ + +``main`` functions are often used to create command-line tools by specifying +them as entry points for console scripts. When this is done, +`pip `_ inserts the function call into a template script, +where the return value of ``main`` is passed into :func:`sys.exit`. +For example:: + + sys.exit(main()) + +Since the call to ``main`` is wrapped in :func:`sys.exit`, the expectation is +that your function will return some value acceptable as an input to +:func:`sys.exit`; typically, an integer or ``None`` (which is implicitly +returned if your function does not have a return statement). + +By proactively following this convention ourselves, our module will have the +same behavior when run directly (i.e. ``python3 echo.py``) as it will have if +we later package it as a console script entry-point in a pip-installable +package. + +In particular, be careful about returning strings from your ``main`` function. +:func:`sys.exit` will interpret a string argument as a failure message, so +your program will have an exit code of ``1``, indicating failure, and the +string will be written to :data:`sys.stderr`. The ``echo.py`` example from +earlier exemplifies using the ``sys.exit(main())`` convention. + +.. seealso:: + + `Python Packaging User Guide `_ + contains a collection of tutorials and references on how to distribute and + install Python packages with modern tools. + + +``__main__.py`` in Python Packages +---------------------------------- + +If you are not familiar with Python packages, see section :ref:`tut-packages` +of the tutorial. Most commonly, the ``__main__.py`` file is used to provide +a command-line interface for a package. Consider the following hypothetical +package, "bandclass": + +.. code-block:: text + + bandclass + ├── __init__.py + ├── __main__.py + └── student.py + +``__main__.py`` will be executed when the package itself is invoked +directly from the command line using the :option:`-m` flag. For example: + +.. code-block:: shell-session + + $ python3 -m bandclass + +This command will cause ``__main__.py`` to run. How you utilize this mechanism +will depend on the nature of the package you are writing, but in this +hypothetical case, it might make sense to allow the teacher to search for +students:: + + # bandclass/__main__.py + + import sys + from .student import search_students + + student_name = sys.argv[2] if len(sys.argv) >= 2 else '' + print(f'Found student: {search_students(student_name)}') + +Note that ``from .student import search_students`` is an example of a relative +import. This import style must be used when referencing modules within a +package. For more details, see :ref:`intra-package-references` in the +:ref:`tut-modules` section of the tutorial. + +Idiomatic Usage +^^^^^^^^^^^^^^^ + +The contents of ``__main__.py`` typically isn't fenced with +``if __name__ == '__main__'`` blocks. Instead, those files are kept short, +functions to execute from other modules. Those other modules can then be +easily unit-tested and are properly reusable. + +If used, an ``if __name__ == '__main__'`` block will still work as expected +for a ``__main__.py`` file within a package, because its ``__name__`` +attribute will include the package's path if imported:: + + >>> import asyncio.__main__ + >>> asyncio.__main__.__name__ + 'asyncio.__main__' + +This won't work for ``__main__.py`` files in the root directory of a .zip file +though. Hence, for consistency, minimal ``__main__.py`` like the :mod:`venv` +one mentioned above are preferred. + +.. seealso:: + + See :mod:`venv` for an example of a package with a minimal ``__main__.py`` + in the standard library. It doesn't contain a ``if __name__ == '__main__'`` + block. You can invoke it with ``python3 -m venv [directory]``. + + See :mod:`runpy` for more details on the :option:`-m` flag to the + interpreter executable. + + See :mod:`zipapp` for how to run applications packaged as *.zip* files. In + this case Python looks for a ``__main__.py`` file in the root directory of + the archive. + + + +``import __main__`` +------------------- + +Regardless of which module a Python program was started with, other modules +running within that same program can import the top-level environment's scope +(:term:`namespace`) by importing the ``__main__`` module. This doesn't import +a ``__main__.py`` file but rather whichever module that received the special +name ``'__main__'``. + +Here is an example module that consumes the ``__main__`` namespace:: + + # namely.py + + import __main__ + + def did_user_define_their_name(): + return 'my_name' in dir(__main__) + + def print_user_name(): + if not did_user_define_their_name(): + raise ValueError('Define the variable `my_name`!') + + if '__file__' in dir(__main__): + print(__main__.my_name, "found in file", __main__.__file__) + else: + print(__main__.my_name) + +Example usage of this module could be as follows:: + + # start.py + + import sys + + from namely import print_user_name + + # my_name = "Dinsdale" + + def main(): + try: + print_user_name() + except ValueError as ve: + return str(ve) + + if __name__ == "__main__": + sys.exit(main()) + +Now, if we started our program, the result would look like this: + +.. code-block:: shell-session + + $ python3 start.py + Define the variable `my_name`! + +The exit code of the program would be 1, indicating an error. Uncommenting the +line with ``my_name = "Dinsdale"`` fixes the program and now it exits with +status code 0, indicating success: + +.. code-block:: shell-session + + $ python3 start.py + Dinsdale found in file /path/to/start.py + +Note that importing ``__main__`` doesn't cause any issues with unintentionally +running top-level code meant for script use which is put in the +``if __name__ == "__main__"`` block of the ``start`` module. Why does this work? + +Python inserts an empty ``__main__`` module in :attr:`sys.modules` at +interpreter startup, and populates it by running top-level code. In our example +this is the ``start`` module which runs line by line and imports ``namely``. +In turn, ``namely`` imports ``__main__`` (which is really ``start``). That's an +import cycle! Fortunately, since the partially populated ``__main__`` +module is present in :attr:`sys.modules`, Python passes that to ``namely``. +See :ref:`Special considerations for __main__ ` in the +import system's reference for details on how this works. + +The Python REPL is another example of a "top-level environment", so anything +defined in the REPL becomes part of the ``__main__`` scope:: + + >>> import namely + >>> namely.did_user_define_their_name() + False + >>> namely.print_user_name() + Traceback (most recent call last): + ... + ValueError: Define the variable `my_name`! + >>> my_name = 'Jabberwocky' + >>> namely.did_user_define_their_name() + True + >>> namely.print_user_name() + Jabberwocky + +Note that in this case the ``__main__`` scope doesn't contain a ``__file__`` +attribute as it's interactive. + +The ``__main__`` scope is used in the implementation of :mod:`pdb` and +:mod:`rlcompleter`. diff --git a/Doc/reference/import.rst b/Doc/reference/import.rst index 81a124f745a..39fcba015b6 100644 --- a/Doc/reference/import.rst +++ b/Doc/reference/import.rst @@ -975,6 +975,8 @@ should expose ``XXX.YYY.ZZZ`` as a usable expression, but .moduleY is not a valid expression. +.. _import-dunder-main: + Special considerations for __main__ =================================== diff --git a/Doc/tools/susp-ignored.csv b/Doc/tools/susp-ignored.csv index 211b49a5449..35bcbd04049 100644 --- a/Doc/tools/susp-ignored.csv +++ b/Doc/tools/susp-ignored.csv @@ -110,6 +110,7 @@ howto/pyporting,,::,Programming Language :: Python :: 3 howto/regex,,::, howto/regex,,:foo,(?:foo) howto/urllib2,,:password,"""joe:password@example.com""" +library/__main__,,`, library/ast,,:upper,lower:upper library/ast,,:step,lower:upper:step library/audioop,,:ipos,"# factor = audioop.findfactor(in_test[ipos*2:ipos*2+len(out_test)]," diff --git a/Doc/tutorial/modules.rst b/Doc/tutorial/modules.rst index af595e5ca04..a495c50cbde 100644 --- a/Doc/tutorial/modules.rst +++ b/Doc/tutorial/modules.rst @@ -533,6 +533,8 @@ importing module needs to use submodules with the same name from different packages. +.. _intra-package-references: + Intra-package References ------------------------ diff --git a/Misc/NEWS.d/next/Documentation/2021-06-23-15-21-36.bpo-39452.o_I-6d.rst b/Misc/NEWS.d/next/Documentation/2021-06-23-15-21-36.bpo-39452.o_I-6d.rst new file mode 100644 index 00000000000..5c8cbd8e652 --- /dev/null +++ b/Misc/NEWS.d/next/Documentation/2021-06-23-15-21-36.bpo-39452.o_I-6d.rst @@ -0,0 +1,4 @@ +Rewrote ``Doc/library/__main__.rst``. Broadened scope of the document to +explicitly discuss and differentiate between ``__main__.py`` in packages +versus the ``__name__ == '__main__'`` expression (and the idioms that +surround it).