Per-thread signal masking doesn't work in Android pre-4.1.
As a result, the SIGALRM signals used by the BOINC runtime system,
which are supposed to be handled by the worker thread,
sometimes are handled by the timer thread.
As a result, suspended apps never resume.
Workaround: in the SIGALRM handler, see if we're the timer thread.
If so, use pthread_kill() to send a SIGALRM to the work thread, and return.
Trying to fix bug where timer thread stops doing anything
after first suspend on Android (old, 1-core devices).
I suspect that the sleep() in the worker thread's signal handle
is sleeping the entire process.
Insert a sched_yield() before the sleep so that the time thread will run.
- If you run the client with --run_test_app,
runs "test_app" in the current directory and interacts with it
(and does nothing else).
It can suspend/resume it with arbitrary timing;
this is controlled in run_test_app() (app_start.cpp).
- example app: add --critical_section option.
This lets you test the runtime system for apps that do
most of their work in a critical section (like GPU apps).
- Add some logging messages (conditioned by DEBUG_BOINC_API)
to the runtime system.
- boinc_finish() waits for the timer thread to write final messages;
make sure it doesn't do anything else
(like suspend the worker thread) during this period
Old: if the timer thread gets a <suspend> message while we're in
a critical section, it sets a "suspend_request" flag.
The timer then periodically (10X/sec) checks whether
suspend_request is set and we're no longer in a critical section;
if so it suspends the worker thread.
Problem (pointed out by Oliver): this doesn't work if the worker thread
is almost always in a critical section
(as is the case for GPU apps, which treat GPU kernels as critical sections).
The app never gets suspended.
New:
1) boinc_end_critical_section() checks suspend_request;
if set, it calls suspend_activities()
2) On Unix, if suspend_activities() is called from the worker thread,
it calls sleep() in a loop until the suspension is over.
(Note: pthreads has no suspend/resume).
3) Add a mutex to protect the data structures shared between
the timer and worker threads.
Oliver pointed out that
* Move the windows_format_error_string function to win_util.cpp, .h instead of it being scattered between util.h and str_util.cpp.
* Convert the Windows error string into UTF8 before allowing it to be used by the caller
* Remove windows_error_string from library
- Win process control (affects API and wrapper):
Since Win doesn't have an API for process suspend/resume,
we were suspending processes by
1) enumerating all the threads in the system (typically several thousand)
2) suspending those belonging to the given process
The problem: for each thread, the code was calling a function
in diagnostics_win.cpp to see if the thread was exempted from suspension.
This check (which is unnecessary anyway if we're suspending another process)
was surrounded by a semaphore acquire/release.
The result: performance problems.
It could take a minute to suspend the threads.
Solution:
1) do the check for exemption only if we're suspending threads
in our own process (i.e. from the API)
2) if we're suspending multiple processes, enumerate the threads
only once, and see if each one belongs to any of the processes
3) have the wrapper elevate itself to normal priority.
Otherwise it can get preempted for long periods,
sometimes in the middle of scanning the threads.
Note: post-9x versions of Win have a process group API that includes suspend/resume.
We'll switch to this soon.
PRINCIPLE: AVOID PER-GPU-TYPE VARIABLES
- get rid of alloca() stuff in gutil.cpp; almost certainly not needed
- don't include malloc.h; it doesn't exist on BSD systems
Old: heartbeat mechanism
Problem: if the client is blocked for > 30 secs
(e.g. because it takes a long time to write the state file,
of because it's stopped in a debugger)
then apps exit.
This is bad is the app doesn't checkpoint and has been
running for a long time.
New: the client passes its PID to the app.
The app periodically (10 sec) checks that the process still exists.
Notes:
- For backward compatibility (e.g. new API w/ old client,
or vice versa) the client still sends heartbeats,
and the API checks heartbeats if the client doesn't pass a PID.
- The new mechanism works only if the client's PID isn't assigned
to a new process within 10 secs of the client exiting.
Windows 2000 reuses PIDs immediately, so check for Win2K
and don't use this mechanism if so.
TODO: For Unix multithread apps,
critical sections aren't currently being enforced.
Need to fix this by masking signals.
svn path=/trunk/boinc/; revision=26147
clear the suspend_request flag.
Otherwise we'll end up doing two suspends,
and on Win the app will be suspended forever.
svn path=/trunk/boinc/; revision=26143
Lets application specify a min checkpoint interval.
The actual min checkpoint interval is the max of this
and the user-specified pref for min disk interval.
svn path=/trunk/boinc/; revision=26005