Commit Graph

13 Commits

Author SHA1 Message Date
David Wilson d6329f3446 Merge devel/290 @ 79b979ec8544ef5d8620c64068d4a42fabf50415 2019-11-02 16:46:59 +00:00
David Wilson f78a5f08c6 issue #605: ansible: share a sem_t instead of a pthread_mutex_t
The previous version quite reliably causes worker deadlocks within 10
minutes running:

    # 100 times:
    - import_playbook: integration/async/runner_one_job.yml
    # 100 times:
    - import_playbook: integration/module_utils/adjacent_to_playbook.yml

via .ci/soak/mitogen.sh with PLAYBOOK= set to the above playbook.

Attaching to the worker with gdb reveals it in an instruction
immediately following a futex() call, which likely returned EINTR due to
attaching gdb. Examining the pthread_mutex_t state reveals it to be
completely unlocked.

pthread_mutex_t on Linux should have zero trouble living in shmem, so
it's not clear how this deadlock is happening. Meanwhile POSIX
semaphores are explicitly designed for cross-process use and have a
completely different internal implementation, so try those instead. 1
hour of soaking reveals no deadlock.

This is about avoiding managing a lockable temporary file on disk to
contain our counter, and somehow communicating a reference to it into
subprocesses (despite the subprocess module closing inherited fds, etc),
somehow deleting it reliably at exit, and somehow avoiding concurrent
Ansible runs stepping on the same file. For now ctypes is still less
pain.

A final possibility would be to abandon a shared counter and instead
pick a CPU based on the hash of e.g. the new child's process ID. That
would likely balance equally well, and might be worth exploring when
making this code work on BSD.
2019-08-10 23:40:36 +00:00
David Wilson 0f23a90d50 ansible: log affinity assignments 2019-08-04 14:37:59 +01:00
David Wilson bf1f3682aa ansible: pin per-CPU muxes to their corresponding CPU
This slightly breaks the old scheme, in that CPU 1 may now end up with a
mux and the top-level process pinned to it.
2019-07-31 01:50:37 +01:00
David Wilson 1aceacf89e [stream-refactor] replace old detach_popen() reference 2019-07-23 14:07:00 +01:00
David Wilson 3ff6123483 issue #557: support correct cpu_set_t size 2019-03-06 17:55:31 +00:00
David Wilson 8f9c67daf1 ansible: refactor affinity class and add abstract tests. 2019-02-14 00:35:16 +00:00
David Wilson 1f77d24bec Update copyright year everywhere. 2019-02-13 16:16:49 +00:00
David Wilson a2ae4ed696 SyntaxError. 2019-02-09 22:04:19 +00:00
David Wilson a9d48a8fdc ansible: don't pin controller if <4 cores. 2019-02-09 22:04:19 +00:00
David Wilson 4531338b12 ansible: document and make affinity stuff portable to non-Linux
Portable as in does nothing for the time at least for now.
2019-02-09 22:04:18 +00:00
David Wilson de5c050707 ansible: fix affinity.py test failure on 2 cores. 2019-02-09 22:04:18 +00:00
David Wilson c6d5aa29ba ansible: new multiplexer/workers configuration
Following on from 152effc26c9a5918cb7ead7a97fe7fa7f81b6764,

* Pin mux to CPU 0
* Pin top-level CPU 1
* Pin workers sequentially to CPU 2..n

Nets 19.5% improvement on issue_140__thread_pileup.yml when targetting
64 Docker containers on the same 8 core/16 thread machine.

Before (prior to last scheme, no affinity at all):

    2294528.731458      task-clock (msec)         #    6.443 CPUs utilized
        10,429,745      context-switches          #    0.005 M/sec
         2,049,618      cpu-migrations            #    0.893 K/sec
         8,258,952      page-faults               #    0.004 M/sec
 5,532,719,253,824      cycles                    #    2.411 GHz                      (83.35%)
 3,267,471,616,230      instructions              #    0.59  insn per cycle
                                                  #    1.22  stalled cycles per insn  (83.35%)
   662,006,455,943      branches                  #  288.515 M/sec                    (83.33%)
    39,453,895,977      branch-misses             #    5.96% of all branches          (83.37%)

     356.148064576 seconds time elapsed

After:

    2226463.958975      task-clock (msec)         #    7.784 CPUs utilized
         9,831,466      context-switches          #    0.004 M/sec
           180,065      cpu-migrations            #    0.081 K/sec
         5,082,278      page-faults               #    0.002 M/sec
 5,592,548,587,259      cycles                    #    2.512 GHz                      (83.35%)
 3,135,038,855,414      instructions              #    0.56  insn per cycle
                                                  #    1.32  stalled cycles per insn  (83.32%)
   636,397,509,232      branches                  #  285.833 M/sec                    (83.30%)
    39,135,441,790      branch-misses             #    6.15% of all branches          (83.35%)

     286.036681644 seconds time elapsed
2019-02-09 22:04:18 +00:00