6.7 KiB
The JIT
The adaptive interpreter consists of a main loop that executes the bytecode instructions generated by the bytecode compiler and their specializations. Runtime optimization in this interpreter can only be done for one instruction at a time. The JIT is based on a mechanism to replace an entire sequence of bytecode instructions, and this enables optimizations that span multiple instructions.
Historically, the adaptive interpreter was referred to as tier 1
and
the JIT as tier 2
. You will see remnants of this in the code.
The Optimizer and Executors
The program begins running on the adaptive interpreter, until a JUMP_BACKWARD
instruction determines that it is "hot" because the counter in its
inline cache indicates that it
executed more than some threshold number of times (see
backoff_counter_triggers
).
It then calls the function _PyOptimizer_Optimize()
in
Python/optimizer.c
, passing it the current
frame and instruction pointer. _PyOptimizer_Optimize()
constructs an object of type
_PyExecutorObject
which implements
an optimized version of the instruction trace beginning at this jump.
The optimizer determines where the trace ends, and the executor is set up
to either return to the adaptive interpreter and resume execution, or
transfer control to another executor (see _PyExitData
in
Include/internal/pycore_optimizer.h).
The executor is stored on the code object
of the frame,
in the co_executors
field which is an array of executors. The start
instruction of the trace (the JUMP_BACKWARD
) is replaced by an
ENTER_EXECUTOR
instruction whose oparg
is equal to the index of the
executor in co_executors
.
The micro-op optimizer
The optimizer that _PyOptimizer_Optimize()
runs is configurable via the
_Py_SetTier2Optimizer()
function (this is used in test via
_testinternalcapi.set_optimizer()
.)
The micro-op (abbreviated uop
to approximate μop
) optimizer is defined in
Python/optimizer.c
as the type _PyUOpOptimizer_Type
.
It translates an instruction trace into a sequence of micro-ops by replacing
each bytecode by an equivalent sequence of micro-ops (see
_PyOpcode_macro_expansion
in
pycore_opcode_metadata.h
which is generated from Python/bytecodes.c
).
The micro-op sequence is then optimized by
_Py_uop_analyze_and_optimize
in
Python/optimizer_analysis.c
and an instance of _PyUOpExecutor_Type
is created to contain it.
The JIT interpreter
After a JUMP_BACKWARD
instruction invokes the uop optimizer to create a uop
executor, it transfers control to this executor via the GOTO_TIER_TWO
macro.
CPython implements two executors. Here we describe the JIT interpreter,
which is the simpler of them and is therefore useful for debugging and analyzing
the uops generation and optimization stages. To run it, we configure the
JIT to run on its interpreter (i.e., python is configured with
--enable-experimental-jit=interpreter
).
When invoked, the executor jumps to the tier2_dispatch:
label in
Python/ceval.c
, where there is a loop that
executes the micro-ops. The body of this loop is a switch statement over
the uops IDs, resembling the one used in the adaptive interpreter.
The swtich implementing the uops is in Python/executor_cases.c.h
,
which is generated by the build script
Tools/cases_generator/tier2_generator.py
from the bytecode definitions in
Python/bytecodes.c
.
When an _EXIT_TRACE
or _DEOPT
uop is reached, the uop interpreter exits
and execution returns to the adaptive interpreter.
Invalidating Executors
In addition to being stored on the code object, each executor is also
inserted into a list of all executors, which is stored in the interpreter
state's executor_list_head
field. This list is used when it is necessary
to invalidate executors because values they used in their construction may
have changed.
The JIT
When the full jit is enabled (python was configured with
--enable-experimental-jit
,
the uop executor's jit_code
field is populated with a pointer to a compiled
C function that implements the executor logic. This function's signature is
defined by jit_func
in pycore_jit.h
.
When the executor is invoked by ENTER_EXECUTOR
, instead of jumping to
the uop interpreter at tier2_dispatch
, the executor runs the function
that jit_code
points to. This function returns the instruction pointer
of the next Tier 1 instruction that needs to execute.
The generation of the jitted functions uses the copy-and-patch technique
which is described in
Haoran Xu's article.
At its core are statically generated stencils
for the implementation
of the micro ops, which are completed with runtime information while
the jitted code is constructed for an executor by
_PyJIT_Compile
.
The stencils are generated at build time under the Makefile target regen-jit
by the scripts in /Tools/jit
. This script reads
Python/executor_cases.c.h
(which is
generated from Python/bytecodes.c
). For
each opcode, it constructs a .c
file that contains a function for
implementing this opcode, with some runtime information injected.
This is done by replacing CASE
by the bytecode definition in the
template file Tools/jit/template.c
.
Each of the .c
files is compiled by LLVM, to produce an object file
that contains a function that executes the opcode. These compiled
functions are used to generate the file
jit_stencils.h
, which contains the functions
that the JIT can use to emit code for each of the bytecodes.
For Python maintainers this means that changes to the bytecodes and
their implementations do not require changes related to the stencils,
because everything is automatically generated from
Python/bytecodes.c
at build time.
See Also: