225 lines
8.2 KiB
ReStructuredText
225 lines
8.2 KiB
ReStructuredText
|
||
Ansible Extension
|
||
=================
|
||
|
||
.. image:: images/ansible/cell_division.png
|
||
:align: right
|
||
|
||
An experimental extension to `Ansible`_ is included that implements host
|
||
connections over Mitogen, replacing embedded shell invocations with pure-Python
|
||
equivalents invoked via highly efficient remote procedure calls tunnelled over
|
||
SSH. No changes are required to the target hosts.
|
||
|
||
The extension isn't nearly in a generally dependable state yet, however it
|
||
already works well enough for testing against real-world playbooks. `Bug
|
||
reports`_ in this area are very welcome – Ansible is a huge beast, and only
|
||
significant testing will prove the extension's soundness.
|
||
|
||
.. _Ansible: https://www.ansible.com/
|
||
|
||
.. _Bug reports: https://goo.gl/yLKZiJ
|
||
|
||
|
||
Overview
|
||
--------
|
||
|
||
You should **expect a 1.25x - 7x speedup** and a **CPU usage reduction of at
|
||
least 2x**, depending on network conditions, the specific modules executed, and
|
||
time spent by the target host already doing useful work. Mitogen cannot speed
|
||
up a module once it is executing, it can only ensure the module executes as
|
||
quickly as possible.
|
||
|
||
* **A single SSH connection is used for each target host**, in addition to one
|
||
sudo invocation per distinct user account. Subsequent playbook steps always
|
||
reuse the same connection. This is much better than SSH multiplexing combined
|
||
with pipelining, as significant state can be maintained in RAM between steps,
|
||
and the system logs aren't filled with spam from repeat SSH and sudo
|
||
invocations.
|
||
|
||
* **A single Python interpreter is used** per host and sudo account combination
|
||
for the duration of the run, avoiding the repeat cost of invoking multiple
|
||
interpreters and recompiling imports, saving 300-800 ms for every playbook
|
||
step.
|
||
|
||
* Remote interpreters reuse Mitogen's module import mechanism, caching uploaded
|
||
dependencies between steps at the host and user account level. As a
|
||
consequence, **bandwidth usage is consistently an order of magnitude lower**
|
||
compared to SSH pipelining, and around 5x fewer frames are required to
|
||
traverse the wire for a run to complete successfully.
|
||
|
||
* **No writes to the target host's filesystem occur**, unless explicitly
|
||
triggered by a playbook step. In all typical configurations, Ansible
|
||
repeatedly rewrites and extracts ZIP files to multiple temporary directories
|
||
on the target host. Since no temporary files are used, security issues
|
||
relating to those files in cross-account scenarios are entirely avoided.
|
||
|
||
|
||
Limitations
|
||
-----------
|
||
|
||
This is a proof of concept: issues below are exclusively due to code immaturity.
|
||
|
||
High Risk
|
||
~~~~~~~~~
|
||
|
||
* Connection establishment is single-threaded until more pressing issues are
|
||
solved. To evaluate performance, target only one host. Many hosts still work,
|
||
the first playbook step will simply run unnecessarily slowly.
|
||
|
||
* `Asynchronous Actions And Polling
|
||
<http://docs.ansible.com/ansible/latest/playbooks_async.html>`_ has received
|
||
minimal testing.
|
||
|
||
* Transfer of large (i.e. GB-sized) files using certain Ansible-internal APIs,
|
||
such as triggered via the ``copy`` module, will cause corresponding temporary
|
||
memory and CPU spikes on both host and target machine, due to delivering the
|
||
file as a single large message. If many machines are targetted with a large
|
||
file, the host machine could easily exhaust available RAM. This will be fixed
|
||
soon as it's likely to be tickled by common playbook use cases.
|
||
|
||
* Situations may exist where the playbook's execution conditions are not
|
||
respected, however ``delegate_to``, ``connection: local``, ``become``,
|
||
``become_user``, and ``local_action`` have all been tested.
|
||
|
||
|
||
Medium Risk
|
||
~~~~~~~~~~~
|
||
|
||
* In some cases ``remote_tmp`` may not be respected.
|
||
|
||
* Interaction with modules employing special action plugins is minimally
|
||
tested, except for the ``synchronize``, ``template`` and ``copy`` modules.
|
||
|
||
* For now only Python command modules work, however almost all modules shipped
|
||
with Ansible are Python-based.
|
||
|
||
|
||
Low Risk
|
||
~~~~~~~~
|
||
|
||
* Only UNIX machines running Python 2.x are supported, Windows will come later.
|
||
|
||
* Only the ``sudo`` become method is available, however adding new methods is
|
||
straightforward, and eventually at least ``su`` will be included.
|
||
|
||
* The only supported strategy is ``linear``, which is Ansible's default.
|
||
|
||
* Ansible defaults to requiring pseudo TTYs for most SSH invocations, in order
|
||
to allow it to handle ``sudo`` with ``requiretty`` enabled, however it
|
||
disables pseudo TTYs for certain commands where standard input is required or
|
||
``sudo`` is not in use. Mitogen does not require this, as it can simply call
|
||
:py:func:`pty.openpty` from the SSH user account during ``sudo`` setup.
|
||
|
||
A major downside to Ansible's default is that stdout and stderr of any
|
||
resulting executed command are merged, with additional carriage return
|
||
characters synthesized in the output by the TTY layer. Neither of these
|
||
problems are apparent using the Mitogen extension, which may break some
|
||
playbooks.
|
||
|
||
A future version will emulate Ansible's behaviour, once it is clear precisely
|
||
what that behaviour is supposed to be. See `Ansible#14377`_ for related
|
||
discussion.
|
||
|
||
.. _Ansible#14377: https://github.com/ansible/ansible/issues/14377
|
||
|
||
|
||
Behavioural Differences
|
||
-----------------------
|
||
|
||
* Ansible with SSH multiplexing enabled causes a string like ``Shared
|
||
connection to host closed`` to appear in ``stderr`` output of every executed
|
||
command. This never manifests with the Mitogen extension.
|
||
|
||
* Asynchronous jobs execute in a thread of the single target Python
|
||
interpreter. In future this will be replaced with subprocesses, as it's
|
||
likely some use cases spawn many asynchronous jobs.
|
||
|
||
|
||
Configuration
|
||
-------------
|
||
|
||
.. warning::
|
||
|
||
Don't test the prototype in a live environment until this notice is removed.
|
||
|
||
1. Ensure the host machine is using Python 2.x for Ansible by verifying the
|
||
output of ``ansible --version``. Ensure the ``python`` command starts a
|
||
Python 2.x interpreter. If not, substitute ``python`` for the correct
|
||
command in steps 2 and 3.
|
||
2. ``python -m pip install -U git+https://github.com/dw/mitogen.git`` **on the
|
||
host machine only**.
|
||
3. ``python -c 'import ansible_mitogen as a; print a.__path__'``
|
||
4. Add ``strategy_plugins = /path/to/../ansible_mitogen/plugins/strategy`` using the
|
||
path from above to the ``[defaults]`` section of ``ansible.cfg``.
|
||
5. Add ``strategy = mitogen`` to the ``[defaults]`` section of ``ansible.cfg``.
|
||
6. Cross your fingers and try it out.
|
||
|
||
|
||
Demo
|
||
----
|
||
|
||
Local VM connection
|
||
~~~~~~~~~~~~~~~~~~~
|
||
|
||
This demonstrates Mitogen vs. connection pipelining to a local VM, executing
|
||
the 100 simple repeated steps of ``run_hostname_100_times.yml`` from the
|
||
examples directory. Mitogen requires **43x less bandwidth and 4.25x less
|
||
time**.
|
||
|
||
.. image:: images/ansible/run_hostname_100_times.png
|
||
|
||
|
||
Kathmandu to Paris
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
This is a full Django application playbook over a ~180ms link between Kathmandu
|
||
and Paris. Aside from large pauses where the host performs useful work, the
|
||
high latency of this link means Mitogen only manages a 1.7x speedup.
|
||
|
||
Many early roundtrips are due to inefficiencies in Mitogen's importer that will
|
||
be fixed over time, however the majority, comprising at least 10 seconds, are
|
||
due to idling while the host's previous result and next command are in-flight
|
||
on the network.
|
||
|
||
The initial extension lays groundwork for exciting structural changes to the
|
||
execution model: a future version will tackle latency head-on by delegating
|
||
some control flow to the target host, melding the performance and scalability
|
||
benefits of pull-based operation with the management simplicity of push-based
|
||
operation.
|
||
|
||
.. image:: images/ansible/costapp.png
|
||
|
||
|
||
SSH Variables
|
||
-------------
|
||
|
||
This list will grow as more missing pieces are discovered.
|
||
|
||
* ansible_python_interpreter
|
||
* ansible_ssh_timeout
|
||
* ansible_host, ansible_ssh_host
|
||
* ansible_user, ansible_ssh_user
|
||
* ansible_port, ssh_port
|
||
* ansible_ssh_executable, ssh_executable
|
||
* password (default: assume passwordless)
|
||
|
||
|
||
Sudo Variables
|
||
--------------
|
||
|
||
* ansible_python_interpreter
|
||
* ansible_sudo_exe, ansible_become_exe
|
||
* ansible_sudo_user, ansible_become_user (default: root)
|
||
* ansible_sudo_pass, ansible_become_pass (default: assume passwordless)
|
||
|
||
Unsupported:
|
||
|
||
* sudo_flags
|
||
|
||
|
||
Debugging
|
||
---------
|
||
|
||
See :ref:`logging-env-vars` in the Getting Started guide for environment
|
||
variables that activate debug logging.
|