Added safety features requested by Rom Walton:
* Change COPROC_ATI::get_available_ram and COPROC_NVIDIA::get_available_ram to static routines to prevent calling them without first loading CAL or CUDA libraries.
* Add tests for NULL library calls in these routines.
* Add comments warning about need to call from a separate child process on dual-GPU laptops, proper library initialization, etc.
Some dual-GPU laptops (e.g., Macbook Pro) don't power down the more powerful GPU until all applications which used them exit. To save battery life, the client launches a second instance of the client as a child process to detect and get info about the GPUs.
The child process writes the info to a temp file which our main client then reads.
This option is enabled at compile time by defining USE_CHILD_PROCESS_TO_DETECT_GPUS as non-zero in gpu_detect.cpp
Old: if the timer thread gets a <suspend> message while we're in
a critical section, it sets a "suspend_request" flag.
The timer then periodically (10X/sec) checks whether
suspend_request is set and we're no longer in a critical section;
if so it suspends the worker thread.
Problem (pointed out by Oliver): this doesn't work if the worker thread
is almost always in a critical section
(as is the case for GPU apps, which treat GPU kernels as critical sections).
The app never gets suspended.
New:
1) boinc_end_critical_section() checks suspend_request;
if set, it calls suspend_activities()
2) On Unix, if suspend_activities() is called from the worker thread,
it calls sleep() in a loop until the suspension is over.
(Note: pthreads has no suspend/resume).
3) Add a mutex to protect the data structures shared between
the timer and worker threads.
Oliver pointed out that
BOINC_QUERY_BATCHES now prints, for each queried batch,
a count of jobs followed by the jobs
BOINC_ABORT_JOBS takes a list of jobs, which may belong to different batches.
The handler for this looks up the batches and makes sure
the jobs belong to the user.
If the user typed an extremely long URL into the
Attach to Account Manager wizard, a buffer overrun could result.
There were several places in the code that assumed user-entered
URLs are small (e.g. 256 chars):
- canonicalize_master_url.cpp()
- several GUI RPC interfaces, when generating XML request message
- URL-escaping (not relevant here, but fix anyway)
Change all these to stay within buffers regardless of URL size.
Note: do this by truncation.
This will cause error messages like "can't connect to project"
rather than saying the URL is too long. That's OK.
- XML parser: for parse_string(), malloc the 256KB buffer instead of
allocating it on the stack; the latter crashes threads with 32KB stacks.
However, do the malloc() only if we've actually seen the start tag
(this required a bit of code shuffle).
- BOINC GAHP: escape spaces in error msgs
We want to track the product name (e.g. "HTC One X") of Android devices.
On Android, the API to get this is Java,
so we need to do it in the GUI rather than the client.
- Add product_name field to HOST_INFO
- Add a GUI RPC for passing this info from the GUI to the client.
- Store it in client_state.xml, so that the client knows it initially.
The product name is included in scheduler RPC requests, as part of <host_info>.
TODO: add server-side support for parsing it and storing in DB.
Also: move DEVICE_STATUS out of HOST_INFO; it didn't belong there.
if a project sends us <no_rsc_apps> flags for all processor types,
then by default the client will never do a scheduler RPC to that project again.
This could happen because of a transient condition in the project,
e.g. it deprecates all its app versions for a while.
To avoid this situation, the client now checks whether the no_rsc_apps flags
are set for all processor types.
If they are, it clears them all.
This will cause work fetch to use backoff,
and the client will occasionally contact the project.
Previously the client had (C++) code to
- check whether on AC or USB power
- get battery status and temperature
- check whether on wifi
These functions looked in various places under /sys.
Problem: the paths are system-dependent,
so whatever we do won't work on all devices.
The Android APIs for getting this info are in Java,
so we can't call them from the client.
Solution: have the GUI periodically get this info
and report it to the client via a GUI RPC.
The GUI must make this RPC periodically:
if the client doesn't get one within some period of time
(currently 30 sec) it suspends computing and network.
Also: if suspending jobs because of battery charge level
or temperature, leave them in memory.
Add OPENCL_DEVICE_PROP cpu_opencl_prop to HOST_INFO;
this store info about the host's ability to run CPU OpenCL apps.
Detect this, and report it in scheduler requests.