mirror of https://github.com/python/cpython.git
135 lines
6.7 KiB
Markdown
135 lines
6.7 KiB
Markdown
# The JIT
|
|
|
|
The [adaptive interpreter](interpreter.md) consists of a main loop that
|
|
executes the bytecode instructions generated by the
|
|
[bytecode compiler](compiler.md) and their
|
|
[specializations](interpreter.md#Specialization). Runtime optimization in
|
|
this interpreter can only be done for one instruction at a time. The JIT
|
|
is based on a mechanism to replace an entire sequence of bytecode instructions,
|
|
and this enables optimizations that span multiple instructions.
|
|
|
|
Historically, the adaptive interpreter was referred to as `tier 1` and
|
|
the JIT as `tier 2`. You will see remnants of this in the code.
|
|
|
|
## The Optimizer and Executors
|
|
|
|
The program begins running on the adaptive interpreter, until a `JUMP_BACKWARD`
|
|
instruction determines that it is "hot" because the counter in its
|
|
[inline cache](interpreter.md#inline-cache-entries) indicates that it
|
|
executed more than some threshold number of times (see
|
|
[`backoff_counter_triggers`](../Include/internal/pycore_backoff.h)).
|
|
It then calls the function `_PyOptimizer_Optimize()` in
|
|
[`Python/optimizer.c`](../Python/optimizer.c), passing it the current
|
|
[frame](frames.md) and instruction pointer. `_PyOptimizer_Optimize()`
|
|
constructs an object of type
|
|
[`_PyExecutorObject`](Include/internal/pycore_optimizer.h) which implements
|
|
an optimized version of the instruction trace beginning at this jump.
|
|
|
|
The optimizer determines where the trace ends, and the executor is set up
|
|
to either return to the adaptive interpreter and resume execution, or
|
|
transfer control to another executor (see `_PyExitData` in
|
|
Include/internal/pycore_optimizer.h).
|
|
|
|
The executor is stored on the [`code object`](code_objects.md) of the frame,
|
|
in the `co_executors` field which is an array of executors. The start
|
|
instruction of the trace (the `JUMP_BACKWARD`) is replaced by an
|
|
`ENTER_EXECUTOR` instruction whose `oparg` is equal to the index of the
|
|
executor in `co_executors`.
|
|
|
|
## The micro-op optimizer
|
|
|
|
The optimizer that `_PyOptimizer_Optimize()` runs is configurable via the
|
|
`_Py_SetTier2Optimizer()` function (this is used in test via
|
|
`_testinternalcapi.set_optimizer()`.)
|
|
|
|
The micro-op (abbreviated `uop` to approximate `μop`) optimizer is defined in
|
|
[`Python/optimizer.c`](../Python/optimizer.c) as the type `_PyUOpOptimizer_Type`.
|
|
It translates an instruction trace into a sequence of micro-ops by replacing
|
|
each bytecode by an equivalent sequence of micro-ops (see
|
|
`_PyOpcode_macro_expansion` in
|
|
[pycore_opcode_metadata.h](../Include/internal/pycore_opcode_metadata.h)
|
|
which is generated from [`Python/bytecodes.c`](../Python/bytecodes.c)).
|
|
The micro-op sequence is then optimized by
|
|
`_Py_uop_analyze_and_optimize` in
|
|
[`Python/optimizer_analysis.c`](../Python/optimizer_analysis.c)
|
|
and an instance of `_PyUOpExecutor_Type` is created to contain it.
|
|
|
|
## The JIT interpreter
|
|
|
|
After a `JUMP_BACKWARD` instruction invokes the uop optimizer to create a uop
|
|
executor, it transfers control to this executor via the `GOTO_TIER_TWO` macro.
|
|
|
|
CPython implements two executors. Here we describe the JIT interpreter,
|
|
which is the simpler of them and is therefore useful for debugging and analyzing
|
|
the uops generation and optimization stages. To run it, we configure the
|
|
JIT to run on its interpreter (i.e., python is configured with
|
|
[`--enable-experimental-jit=interpreter`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit)).
|
|
|
|
When invoked, the executor jumps to the `tier2_dispatch:` label in
|
|
[`Python/ceval.c`](../Python/ceval.c), where there is a loop that
|
|
executes the micro-ops. The body of this loop is a switch statement over
|
|
the uops IDs, resembling the one used in the adaptive interpreter.
|
|
|
|
The swtich implementing the uops is in [`Python/executor_cases.c.h`](../Python/executor_cases.c.h),
|
|
which is generated by the build script
|
|
[`Tools/cases_generator/tier2_generator.py`](../Tools/cases_generator/tier2_generator.py)
|
|
from the bytecode definitions in
|
|
[`Python/bytecodes.c`](../Python/bytecodes.c).
|
|
|
|
When an `_EXIT_TRACE` or `_DEOPT` uop is reached, the uop interpreter exits
|
|
and execution returns to the adaptive interpreter.
|
|
|
|
## Invalidating Executors
|
|
|
|
In addition to being stored on the code object, each executor is also
|
|
inserted into a list of all executors, which is stored in the interpreter
|
|
state's `executor_list_head` field. This list is used when it is necessary
|
|
to invalidate executors because values they used in their construction may
|
|
have changed.
|
|
|
|
## The JIT
|
|
|
|
When the full jit is enabled (python was configured with
|
|
[`--enable-experimental-jit`](https://docs.python.org/dev/using/configure.html#cmdoption-enable-experimental-jit),
|
|
the uop executor's `jit_code` field is populated with a pointer to a compiled
|
|
C function that implements the executor logic. This function's signature is
|
|
defined by `jit_func` in [`pycore_jit.h`](Include/internal/pycore_jit.h).
|
|
When the executor is invoked by `ENTER_EXECUTOR`, instead of jumping to
|
|
the uop interpreter at `tier2_dispatch`, the executor runs the function
|
|
that `jit_code` points to. This function returns the instruction pointer
|
|
of the next Tier 1 instruction that needs to execute.
|
|
|
|
The generation of the jitted functions uses the copy-and-patch technique
|
|
which is described in
|
|
[Haoran Xu's article](https://sillycross.github.io/2023/05/12/2023-05-12/).
|
|
At its core are statically generated `stencils` for the implementation
|
|
of the micro ops, which are completed with runtime information while
|
|
the jitted code is constructed for an executor by
|
|
[`_PyJIT_Compile`](../Python/jit.c).
|
|
|
|
The stencils are generated at build time under the Makefile target `regen-jit`
|
|
by the scripts in [`/Tools/jit`](/Tools/jit). This script reads
|
|
[`Python/executor_cases.c.h`](../Python/executor_cases.c.h) (which is
|
|
generated from [`Python/bytecodes.c`](../Python/bytecodes.c)). For
|
|
each opcode, it constructs a `.c` file that contains a function for
|
|
implementing this opcode, with some runtime information injected.
|
|
This is done by replacing `CASE` by the bytecode definition in the
|
|
template file [`Tools/jit/template.c`](../Tools/jit/template.c).
|
|
|
|
Each of the `.c` files is compiled by LLVM, to produce an object file
|
|
that contains a function that executes the opcode. These compiled
|
|
functions are used to generate the file
|
|
[`jit_stencils.h`](../jit_stencils.h), which contains the functions
|
|
that the JIT can use to emit code for each of the bytecodes.
|
|
|
|
For Python maintainers this means that changes to the bytecodes and
|
|
their implementations do not require changes related to the stencils,
|
|
because everything is automatically generated from
|
|
[`Python/bytecodes.c`](../Python/bytecodes.c) at build time.
|
|
|
|
See Also:
|
|
|
|
* [Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode](https://arxiv.org/abs/2011.13127)
|
|
|
|
* [PyCon 2024: Building a JIT compiler for CPython](https://www.youtube.com/watch?v=kMO3Ju0QCDo)
|