It turns out we have two different encodings of process priority:
1) specified in cc_config.xml and used by the client: 0 (low) to 4 (high)
2) specified in job.xml and used by the wrapper: 1 (low) to 5 (high).
This didn't cause any problems until recently when I added code
to pass the cc_config.xml info to the wrapper;
it was interpreting it on the 1-5 scale.
Fix: have the wrapper convert it (add one).
Also: I forgot to have the client actually put the priority into
in the app_init_data.xml file.
To quote from comments in lib/proc_control.cpp:
// The only way to do this on Windows is to enumerate
// all the threads in the entire system,
// and identify those belonging to one of the processes (ugh!!)
//
// In the suspend case, this creates a potential synch problem:
// - CPU throttling sends suspend message
// - we enumerate threads
// - one of those threads creates a new thread T
// - we suspend the enumerated threads
//
// In this case, T will run, which is undesirable but not an error.
// But suppose that
// - the app uses a mutex,
// - at the start of the above sequence some thread holds the mutex
// - T immediately tries to acquire the mutex (and is suspended).
// Then when the client sends a resume message,
// T resumes and there are two threads in the mutex section. Error!
//
// There are a couple of solutions to this.
// 1) enumerate all the threads twice.
// 2) have suspend() make a record of the threads it suspends,
// and have resume() resume only these threads.
//
// 1) doubles the overhead, so I'm going with 2) for now.
Add optional <priority>N</priority> to <task> element to job.xml.
Lets you specify the process priority of the task;
in particular, task can run at high priority.
Apparently this is needed to make bitcoin ASIC apps perform well.
- Win process control (affects API and wrapper):
Since Win doesn't have an API for process suspend/resume,
we were suspending processes by
1) enumerating all the threads in the system (typically several thousand)
2) suspending those belonging to the given process
The problem: for each thread, the code was calling a function
in diagnostics_win.cpp to see if the thread was exempted from suspension.
This check (which is unnecessary anyway if we're suspending another process)
was surrounded by a semaphore acquire/release.
The result: performance problems.
It could take a minute to suspend the threads.
Solution:
1) do the check for exemption only if we're suspending threads
in our own process (i.e. from the API)
2) if we're suspending multiple processes, enumerate the threads
only once, and see if each one belongs to any of the processes
3) have the wrapper elevate itself to normal priority.
Otherwise it can get preempted for long periods,
sometimes in the middle of scanning the threads.
Note: post-9x versions of Win have a process group API that includes suspend/resume.
We'll switch to this soon.
Old: heartbeat mechanism
Problem: if the client is blocked for > 30 secs
(e.g. because it takes a long time to write the state file,
of because it's stopped in a debugger)
then apps exit.
This is bad is the app doesn't checkpoint and has been
running for a long time.
New: the client passes its PID to the app.
The app periodically (10 sec) checks that the process still exists.
Notes:
- For backward compatibility (e.g. new API w/ old client,
or vice versa) the client still sends heartbeats,
and the API checks heartbeats if the client doesn't pass a PID.
- The new mechanism works only if the client's PID isn't assigned
to a new process within 10 secs of the client exiting.
Windows 2000 reuses PIDs immediately, so check for Win2K
and don't use this mechanism if so.
TODO: For Unix multithread apps,
critical sections aren't currently being enforced.
Need to fix this by masking signals.
svn path=/trunk/boinc/; revision=26147
proc_control: controlling processes
procinfo: enumerating and querying processes
run_app_windows: launching apps as other users on Win
svn path=/trunk/boinc/; revision=24120
building in a Unicode enabled environment.
NOTE: For files that are shared between the core client and
the manager, it was simpliar to just call the ANSI versions
of the specific Windows API functions then to monkey with
all of the string handling code and convert between ANSI
and UCS-2 strings. CreateFile becomes CreateFileA instead
of the default of CreateFileW.
Down to 11 compile time errors from over 100.
clientgui/
BOINCBaseFrame.cpp
BOINCTaskBar.cpp
browser.cpp
browser.h
sg_StatImageLoader.cpp
lib/
boinc_win.h
diagnostics_win.cpp
filesys.cpp
gui_rpc_client_ops.cpp
proc_control.cpp
stackwalker_imports.h
stackwalker_win.cpp
str_util.cpp
util.cpp
win_util.cpp, .h
svn path=/trunk/boinc/; revision=17859