- If you run the client with --run_test_app,
runs "test_app" in the current directory and interacts with it
(and does nothing else).
It can suspend/resume it with arbitrary timing;
this is controlled in run_test_app() (app_start.cpp).
- example app: add --critical_section option.
This lets you test the runtime system for apps that do
most of their work in a critical section (like GPU apps).
- Add some logging messages (conditioned by DEBUG_BOINC_API)
to the runtime system.
- boinc_finish() waits for the timer thread to write final messages;
make sure it doesn't do anything else
(like suspend the worker thread) during this period
Old: if the timer thread gets a <suspend> message while we're in
a critical section, it sets a "suspend_request" flag.
The timer then periodically (10X/sec) checks whether
suspend_request is set and we're no longer in a critical section;
if so it suspends the worker thread.
Problem (pointed out by Oliver): this doesn't work if the worker thread
is almost always in a critical section
(as is the case for GPU apps, which treat GPU kernels as critical sections).
The app never gets suspended.
New:
1) boinc_end_critical_section() checks suspend_request;
if set, it calls suspend_activities()
2) On Unix, if suspend_activities() is called from the worker thread,
it calls sleep() in a loop until the suspension is over.
(Note: pthreads has no suspend/resume).
3) Add a mutex to protect the data structures shared between
the timer and worker threads.
Oliver pointed out that
* Move the windows_format_error_string function to win_util.cpp, .h instead of it being scattered between util.h and str_util.cpp.
* Convert the Windows error string into UTF8 before allowing it to be used by the caller
* Remove windows_error_string from library
- Win process control (affects API and wrapper):
Since Win doesn't have an API for process suspend/resume,
we were suspending processes by
1) enumerating all the threads in the system (typically several thousand)
2) suspending those belonging to the given process
The problem: for each thread, the code was calling a function
in diagnostics_win.cpp to see if the thread was exempted from suspension.
This check (which is unnecessary anyway if we're suspending another process)
was surrounded by a semaphore acquire/release.
The result: performance problems.
It could take a minute to suspend the threads.
Solution:
1) do the check for exemption only if we're suspending threads
in our own process (i.e. from the API)
2) if we're suspending multiple processes, enumerate the threads
only once, and see if each one belongs to any of the processes
3) have the wrapper elevate itself to normal priority.
Otherwise it can get preempted for long periods,
sometimes in the middle of scanning the threads.
Note: post-9x versions of Win have a process group API that includes suspend/resume.
We'll switch to this soon.
Old: heartbeat mechanism
Problem: if the client is blocked for > 30 secs
(e.g. because it takes a long time to write the state file,
of because it's stopped in a debugger)
then apps exit.
This is bad is the app doesn't checkpoint and has been
running for a long time.
New: the client passes its PID to the app.
The app periodically (10 sec) checks that the process still exists.
Notes:
- For backward compatibility (e.g. new API w/ old client,
or vice versa) the client still sends heartbeats,
and the API checks heartbeats if the client doesn't pass a PID.
- The new mechanism works only if the client's PID isn't assigned
to a new process within 10 secs of the client exiting.
Windows 2000 reuses PIDs immediately, so check for Win2K
and don't use this mechanism if so.
TODO: For Unix multithread apps,
critical sections aren't currently being enforced.
Need to fix this by masking signals.
svn path=/trunk/boinc/; revision=26147
clear the suspend_request flag.
Otherwise we'll end up doing two suspends,
and on Win the app will be suspended forever.
svn path=/trunk/boinc/; revision=26143
Lets application specify a min checkpoint interval.
The actual min checkpoint interval is the max of this
and the user-specified pref for min disk interval.
svn path=/trunk/boinc/; revision=26005
lets an application report its network usage to BOINC,
and hence take it into account with monthly limits etc.
- API: get rid of deprecated boinc_ops_per_cpu_sec(),
boinc_ops_cumulative(), and
boinc_set_credit_claim();
- admin web: update manage_apps.php;
add the ability to set homogeneous app version
svn path=/trunk/boinc/; revision=25700
boinc_temporary_exit(),
explaining why the app is exiting.
Convey this to the client, and then to the Manager,
and display it there and in the log.
clientgui/
MainDocument.cpp
lib/
gui_rpc_client_ops.cpp
gui_rpc_client.h
api/
boinc_api.cpp,h
client/
client_types.cpp,h
app.h
app_control.cpp
svn path=/trunk/boinc/; revision=25315
core client. Next commit will create an extra "VM Console"
button in the manager when detected. Volunteers will just have
to click the button to see what is going on with the VM.
api/
boinc_api.cpp, .h
samples/vboxwrapper
vbox.cpp, .h
vboxwrapper.cpp, .h
svn path=/trunk/boinc/; revision=25035
allow applications to supply a "web graphics URL",
in which case the manager's "Show Graphics" button
opens a browser at that URL.
This typically would used for applications that
implement a web server that serves pages showing
job information in HTML.
- vboxwrapper: if <pf_guest_port> is specified in the config file,
set up port forwarding to that port
and use the above API call with URL "http://localhost:port"
svn path=/trunk/boinc/; revision=24898
add a mechanism so that apps can report sub-processes
that are not descendants (e.g., virtual machines)
These processes are then counted as part of the app,
not as "non-BOINC CPU time".
This fixes a bug where processing was incorrectly suspended
because CPU usage by VM apps exceeded the "CPU usage limit" pref.
Implementation:
- the PIDs of the processes in question
are passed from app to client via shared-memory,
in the app_status channel.
A new variant of boinc_report_app_status() supports this.
- the VBox wrapper queries the PID of the VM,
and reports it in this way.
- procinfo_app() includes a new argument: a list of PIDs
that are part of the app, although not ancestrally
related to the main process.
- in the client, ACTIVE_TASK now includes a vector "other_pids".
If this is nonempty, it's passed to procinfo_app().
svn path=/trunk/boinc/; revision=24123
proc_control: controlling processes
procinfo: enumerating and querying processes
run_app_windows: launching apps as other users on Win
svn path=/trunk/boinc/; revision=24120
could use the following for safe exit checking.
#ifdef _WIN32
//Jason: Safe exit check macro to play nicer with Cuda & MS-CRT
#ifdef USE_CUDA
#define SAFE_EXIT_CHECK do { \
if (worker_thread_exit_request) { \
fprintf(stderr,"-> Worker received exit request, syncing Cuda...");
cudaThreadSynchronize(); fprintf(stderr,"Done.\n"); \
fprintf(stderr," Worker Freeing Cuda data..."); cudaAcc_free();
fprintf(stderr,"Done.\n"); \
fprintf(stderr," Worker Acknowledging exit request, spinning->\n");
worker_thread_exit_ack = true; \
while (1) Sleep(10); \
} \
} while (0);
#else
#define SAFE_EXIT_CHECK do { \
if (worker_thread_exit_request) { \
fprintf(stderr," Worker Acknowledging exit request, spinning-> ");
worker_thread_exit_ack = true; \
while (1) Sleep(10); \
} \
} while (0);
#endif
#else // Linux or other probably have their own safe exit handling, defined as
blank, do nothing
#define SAFE_EXIT_CHECK
#endif
and install at the top of the cffft loop, and more locations if desired:
SAFE_EXIT_CHECK;
I'd like to implement these as BOINC API functions, but have not yet done so.
svn path=/trunk/boinc/; revision=23646
otherwise non-ASCII characters in client_state.xml
make it invalid XML
- client: fix (I think) to scheduling logic.
a job is preemptable if it's finished its time slice and
Old: has checkpointed in last 10 sec
New: has checkpointed since the end of the time slice
svn path=/trunk/boinc/; revision=23551