Commit Graph

490 Commits

Author SHA1 Message Date
Steven Robertson e8f3154cab Merge branch 'master' into complexAnsiblePythonInterpreterArg 2019-11-15 16:08:51 -08:00
David Wilson d6329f3446 Merge devel/290 @ 79b979ec8544ef5d8620c64068d4a42fabf50415 2019-11-02 16:46:59 +00:00
Steven Robertson 4669c8774f handles templating ansible_python_interpreter values 2019-10-30 13:43:48 -07:00
Steven Robertson cc11864b7d code cleanup 2019-10-29 17:51:27 -07:00
Steven Robertson 24b170311a able to get to 'sudo: source not found' after preventing escape of && so python connects 2019-10-29 14:44:49 -07:00
David Wilson be4f1bdb50 issue #646: add extra logging to assertions and start_child() 2019-09-11 19:58:20 +01:00
David Wilson efd82dd35a issue #633: various task_vars fixes
- take host_vars from task_vars too
- make missing task_vars a hard error
- update tests to provide stub task_vars
2019-08-20 14:47:33 +01:00
David Wilson fc09b81949 issue #633: handle meta: reset_connection when become is active
- don't create a new connection during reset if no existing connection
  exists
- strip off last hop in connection stack if PlayContext.become is True.
- log a debug message if reset cannot find an existing connection
2019-08-20 14:04:45 +01:00
David Wilson b6d1df749c issue #633: take inventory_hostname from task_vars
It used to be set by on_action_run() from task_vars, but this doesn't
work for meta: reset_connection. That meant MITOGEN_CPU_COUNT>1 would
pick the wrong mux to reset the connection on.
2019-08-20 13:59:01 +01:00
David Wilson f4cf67f0bd issue #615: remove meaningless test
It has been dead code since at least 2015
2019-08-17 12:56:21 +01:00
David Wilson e02be89879 issue #625: ignore SIGINT within MuxProcess
Without this, MuxProcess will start dying too early, before Ansible /
TaskQueueManager.cleanup() has a chance to wait on worker processes.
That would allow WorkerProcess to see ECONNREFUSED from the MuxProcess
socket much more easily.
2019-08-17 12:56:16 +01:00
David Wilson 67759371f9 issue #615: ensure 4GB max_message_size is configured for task workers.
This 4GB limit was already set for MuxProcess and inherited by all
descendents including the context running on the target host, but it was
not applied to the WorkerProcess router.

That explains why the error from the ticket is being raised by the
router within the WorkerProcess rather than the router on the original
target.
2019-08-17 03:19:32 +01:00
David Wilson 151b490890 issue #615: fetch_file() might be called with AnsibleUnicode. 2019-08-17 02:23:58 +01:00
David Wilson 03d2bc6c59 issue #615: redirect 'fetch' action to 'mitogen_fetch'. 2019-08-17 02:23:46 +01:00
David Wilson 52c8ed7715 issue #615: extricate slurp brainwrong from mitogen_fetch 2019-08-17 02:20:09 +01:00
David Wilson 069285a588 issue #615: ansible: import Ansible fetch.py action plug-in
From ansible/ansible#9773a1f2896a914d237cb9926e3b5cdc0f004d1a
2019-08-17 02:13:35 +01:00
David Wilson 8dfb3966df issue #558, #582: preserve remote tmpdir if caller did not supply one
The undocumented 'tmp' parameter controls whether _execute_module()
would delete anything on 2.3, so mimic that. This means
_execute_remove_stat() calls will not blow away the temp directory,
which broke the unarchive plugin.
2019-08-12 15:41:17 +01:00
David Wilson 3b63da670f Fix up another handful of LGTM errors. 2019-08-12 11:46:37 +01:00
David Wilson f78a5f08c6 issue #605: ansible: share a sem_t instead of a pthread_mutex_t
The previous version quite reliably causes worker deadlocks within 10
minutes running:

    # 100 times:
    - import_playbook: integration/async/runner_one_job.yml
    # 100 times:
    - import_playbook: integration/module_utils/adjacent_to_playbook.yml

via .ci/soak/mitogen.sh with PLAYBOOK= set to the above playbook.

Attaching to the worker with gdb reveals it in an instruction
immediately following a futex() call, which likely returned EINTR due to
attaching gdb. Examining the pthread_mutex_t state reveals it to be
completely unlocked.

pthread_mutex_t on Linux should have zero trouble living in shmem, so
it's not clear how this deadlock is happening. Meanwhile POSIX
semaphores are explicitly designed for cross-process use and have a
completely different internal implementation, so try those instead. 1
hour of soaking reveals no deadlock.

This is about avoiding managing a lockable temporary file on disk to
contain our counter, and somehow communicating a reference to it into
subprocesses (despite the subprocess module closing inherited fds, etc),
somehow deleting it reliably at exit, and somehow avoiding concurrent
Ansible runs stepping on the same file. For now ctypes is still less
pain.

A final possibility would be to abandon a shared counter and instead
pick a CPU based on the hash of e.g. the new child's process ID. That
would likely balance equally well, and might be worth exploring when
making this code work on BSD.
2019-08-10 23:40:36 +00:00
David Wilson 5af6c9b26f issue #615: use FileService for target->controll file transfers 2019-08-10 00:37:17 +01:00
David Wilson 6f12980611 [linear2] merge fallout: re-enable _send_module_forwards(). 2019-08-04 20:43:14 +01:00
David Wilson 5298e87548 Split out and make readable more log messages across both packages 2019-08-04 14:41:47 +01:00
David Wilson 0f23a90d50 ansible: log affinity assignments 2019-08-04 14:37:59 +01:00
David Wilson 4f051a38a7 ansible: improve docstring 2019-08-04 12:14:48 +01:00
David Wilson 5811909c8d [linear2] simplify _listener_for_name() 2019-08-04 12:14:48 +01:00
David Wilson c68dbdd569 ansible: stop relying on SIGTERM to shut down service pool
It's no longer necessary, since connection attempts are no longer truly
blocking. When CTRL+C is hit in the top-level process, broker will begin
shutdown, which will cancel all pending connection attempts, causing
pool threads to wake. The pool can't block during shutdown anymore.
2019-08-04 12:14:48 +01:00
David Wilson f4ca926b21 ansible: cleanup various docstrings 2019-08-04 12:14:48 +01:00
David Wilson edde251d58 issue #549: ansible: reduce risk by capping RLIM_INFINITY 2019-08-03 21:40:57 +01:00
David Wilson d408caccf5 issue #573: guard against a forked top-level Ansible process
See comment.
2019-08-03 18:43:18 +01:00
David Wilson 3ceac2c9ed [linear2] simplify ClassicWorkerModel and fix repeat initialization
"self.initialized = False" slipped in a few days ago, on second thoughts
that flag is not needed at all, by simply rearranging ClassicWorkerModel
to have a regular constructor.

This hierarchy is still squishy, it needs more love. Remaining
MuxProcess class attributes should eliminated.
2019-08-03 18:24:52 +01:00
David Wilson 395b03a77d issue #549: fix setrlimit() crash and hard-wire OS X default
OS X advertised unlimited, but really it means kern.maxfilesperproc.
2019-08-02 22:30:58 +01:00
David Wilson 33bceb6eb4 issue #602: recover task_vars for synchronize and meta: reset_connection 2019-08-02 04:05:34 +01:00
David Wilson 6b4bcf4fe0 ansible: remove cutpasted docstring 2019-08-02 04:05:28 +01:00
David Wilson 619f4dee07 [linear2] merge fallout: restore optimization from #491 / 7b129e857 2019-08-02 04:05:19 +01:00
David Wilson e4321f81a0 issue #600: /etc/environment may be non-ASCII in an unknown encoding 2019-08-01 12:12:18 +01:00
David Wilson 75d179e4b9 remove unused imports flagged by lgtm 2019-07-31 11:46:23 +01:00
David Wilson c80fddd487 [linear2]: merge fallout flaggged by LGTM 2019-07-31 11:41:29 +01:00
David Wilson eeb7150f24 issue #549: increase open file limit automatically if possible
While catching every possible case where "open file limit exceeded" is
not possible, we can at least increase the soft limit to the available
hard limit without any user effort.

Do this in Ansible top-level process, even though we probably only need
it in the MuxProcess. It seems there is no reason this could hurt
2019-07-31 04:20:04 +01:00
David Wilson acab26d796 ansible: improve process.py docs 2019-07-31 04:09:48 +01:00
David Wilson 4dfbe82e76 tests: hide ugly error during Ansible tests 2019-07-31 01:50:37 +01:00
David Wilson 108015aa22 ansible: gracefully handle failure to connect to MuxProcess
It's possible to hit an ugly exception during early CTRL+C
2019-07-31 01:50:37 +01:00
David Wilson bf1f3682aa ansible: pin per-CPU muxes to their corresponding CPU
This slightly breaks the old scheme, in that CPU 1 may now end up with a
mux and the top-level process pinned to it.
2019-07-31 01:50:37 +01:00
David Wilson dc9f4e89e6 ansible: reap mux processes on shut down
Previously we exitted without calling waitpid(), which meant the
top-level process struct rusage did not reflect the resource usage
consumed by the multiplexer processes.

Existing benchmarks are made using perf so this never created a problem,
but it could be confusing to others using the "time" command, and also
allows logging the final exit status of the process.
2019-07-31 01:50:37 +01:00
David Wilson 136dee1fb4 [linear2] more merge fallout, fix Connection._mitogen_reset(mode=) 2019-07-29 17:52:44 +01:00
David Wilson a9755d4ad0 [linear2] update mitogen_get_stack for new _build_stack() return value 2019-07-29 16:30:01 +01:00
David Wilson 1fca0b7a94 [linear2] fix MuxProcess test fixture and some merge fallout 2019-07-29 16:10:36 +01:00
David Wilson 0f63ca4c68 Make setting affinity optional. 2019-07-29 13:52:30 +01:00
David Wilson 9035884c77 ansible: abstract worker process model.
Move all details of broker/router setup out of connection.py, instead
deferring it to a WorkerModel class exported by process.py via
get_worker_model(). The running strategy can override the configured
worker model via _get_worker_model().

ClassicWorkerModel is installed by default, which implements the
extension's existing process model.

Add optional support for the third party setproctitle module, so
children have pretty names in ps output.

Add optional support for per-CPU multiplexers to classic runs.
2019-07-29 13:52:30 +01:00
David Wilson 402dba4197 module_finder: pass raw file to compile()
Newer Ansibles have e.g. UTF-8 present in apt.py.
2019-07-23 16:04:44 +01:00
David Wilson 1aceacf89e [stream-refactor] replace old detach_popen() reference 2019-07-23 14:07:00 +01:00