We were using an int bitmap to store flags for the instances of a coproc.
Furthermore, because of the use of 2^n-1 to generate a bitmap of 1s,
the limit on instances was 31.
Use a long long for the bitmap instead, and don't use 2^n-1.
This increases the limit to 64.
For now, handle AMD/ATI, NVIDIA or Intel GPUs as before. But for other, "new" vendors, we treat each device as a separate resource, creating an entry for each instance in the COPROCS::coprocs[] array and copying the device name COPROC::opencl_prop.name into the COPROC::type field (instead of the vendor name.)
For devices from "new" vendors, set <gpu_type> field in init_data.xml file to the vendor string supplied by OpenCL. This should allow boinc_get_opencl_ids() to work correctly with these "new" devices without modification.
- Show the OpenCL platform vendor for each OpenCL CPU description.
- OpenCL may not reliably report total RAM, available RAM and max FLOPS for CPUs, so exclude these from the OpenCL CPU descriptions; that information is available elsewhere.
Notes:
- The same CPU can have a different cpu_opencl_prop for each of multiple OpenCL platforms. We send them all to the project server because:
- Different OpenCL platforms report different values for the same CPU.
- Some OpenCL CPU apps may work better with certain OpenCL platforms.
- OpenCL has only 64 bits for global_mem_size, so it can report a max of only 4GB; get the CPU RAM size from gstate.hostinfo.m_nbytes.
Some dual-GPU laptops (e.g., Macbook Pro) don't power down the more powerful GPU until all applications which used them exit. To save battery life, the client launches a second instance of the client as a child process to detect and get info about the GPUs.
The child process writes the info to a temp file which our main client then reads.
This option is enabled at compile time by defining USE_CHILD_PROCESS_TO_DETECT_GPUS as non-zero in gpu_detect.cpp
- Win process control (affects API and wrapper):
Since Win doesn't have an API for process suspend/resume,
we were suspending processes by
1) enumerating all the threads in the system (typically several thousand)
2) suspending those belonging to the given process
The problem: for each thread, the code was calling a function
in diagnostics_win.cpp to see if the thread was exempted from suspension.
This check (which is unnecessary anyway if we're suspending another process)
was surrounded by a semaphore acquire/release.
The result: performance problems.
It could take a minute to suspend the threads.
Solution:
1) do the check for exemption only if we're suspending threads
in our own process (i.e. from the API)
2) if we're suspending multiple processes, enumerate the threads
only once, and see if each one belongs to any of the processes
3) have the wrapper elevate itself to normal priority.
Otherwise it can get preempted for long periods,
sometimes in the middle of scanning the threads.
Note: post-9x versions of Win have a process group API that includes suspend/resume.
We'll switch to this soon.
PRINCIPLE: AVOID PER-GPU-TYPE VARIABLES
- get rid of alloca() stuff in gutil.cpp; almost certainly not needed
- don't include malloc.h; it doesn't exist on BSD systems