Old: each scheduler process holds a semaphore
while scanning the shared-mem job array.
On machines with many CPUs
there seems to be contention for this semaphore,
causing slow scheduler response and possibly connection failures.
New: Don't hold the semaphore while scanning array.
Instead, if find a job that passes quick_check(),
acquire the semaphore and recheck that the job is present in array
and passes quick_check().
- client: show messages if app_config.xml has unrecognized tags
- Win process control (affects API and wrapper):
Since Win doesn't have an API for process suspend/resume,
we were suspending processes by
1) enumerating all the threads in the system (typically several thousand)
2) suspending those belonging to the given process
The problem: for each thread, the code was calling a function
in diagnostics_win.cpp to see if the thread was exempted from suspension.
This check (which is unnecessary anyway if we're suspending another process)
was surrounded by a semaphore acquire/release.
The result: performance problems.
It could take a minute to suspend the threads.
Solution:
1) do the check for exemption only if we're suspending threads
in our own process (i.e. from the API)
2) if we're suspending multiple processes, enumerate the threads
only once, and see if each one belongs to any of the processes
3) have the wrapper elevate itself to normal priority.
Otherwise it can get preempted for long periods,
sometimes in the middle of scanning the threads.
Note: post-9x versions of Win have a process group API that includes suspend/resume.
We'll switch to this soon.
PRINCIPLE: AVOID PER-GPU-TYPE VARIABLES
- get rid of alloca() stuff in gutil.cpp; almost certainly not needed
- don't include malloc.h; it doesn't exist on BSD systems
If the user has set things up so that slots/ is a symlink
to a different filesystem, things won't work when the client
moves output files from the slot to project dir.
Solution: if rename() fails, try system("mv ...")
since mv works across filesystems