Various bad things could happen when CPU throttling was used together w/ GPU apps.
Examples:
- on a multi-GPU system, several GPU tasks are assigned to the same GPU
- a suspended GPU task remains in memory (tying up its GPU resources)
while other tasks try to use the GPU.
The problem was that parts of the code assumed that suspended
GPU processes don't exist - i.e. that when a GPU task is suspended
it's always removed from memory.
This isn't true in the presence of CPU throttling.
So I made the following changes:
- When assigning GPUs to tasks, treat suspended tasks like running tasks
(i.e. reserve their GPUs)
- At the end of the CPU-scheduling logic, if there are any GPU tasks
that are suspended and not scheduled, remove them from memory,
and trigger a reschedule so we can reallocate their GPUs.
Also, a cosmetic change: in the resource usage string shown in the GUI,
include "(device X)" even if the task is suspended (i.e. because of throttling).
Also: zero out COPROC::opencl_device_indexes[] so we don't write
a garbage number to init_data.xml for non-OpenCL jobs
This makes the host CPID stable; if you repeatedly install BOINC
on a particular node, it will get the same host CPID each time,
and your host table won't get lots of redundant entries.
A host can have multiple NICs;
we use the MAC address of the first Ethernet controller we find,
or the last NIC if there is none.
Of course, this will create problems if we get the same MAC address
for different hosts; in principle this shouldn't happen.
Remove the unused file hostinfo_network.h
This will allow the core client to kill VirtualBox VM's launched indirectly by vboxwrapper. Vboxwrapper launches vboxsvc.exe which launches vboxheadless.exe. This should also take care of the core client being able to kill child processes of the regular wrapper as well. I don't know the full scope of this type of issue? Maybe the default ACLs for a process changed within the last couple of versions of Windows.
On Windows, the working-set size reported by the OS for VM apps is too low.
Apparently the RAM usage is in fact roughly the VM size.
This can lead to running multiple VM apps,
which use more RAM than is available, causing performance problems.
Solution: use workunit.rsc_memory_bound as the working set size for VM apps.
(Note: for now, a VM app is one where the plan class includes "vbox").
- Show the OpenCL platform vendor for each OpenCL CPU description.
- OpenCL may not reliably report total RAM, available RAM and max FLOPS for CPUs, so exclude these from the OpenCL CPU descriptions; that information is available elsewhere.
- Batches now have optional "expire time".
If this time passes and the batch is not retired, abort and retire it.
- Add script "expire_batches" which enforces the above.
Run it as a periodic task.
- Add a web RPC for setting the expire time of a batch
(it can be changed multiple times)
- Add a C++ interface for this RPC
- Add a BOINC_SET_LEASE command to the BOINC GAHP
("lease" is Condor term for expire time)
Notes:
- The same CPU can have a different cpu_opencl_prop for each of multiple OpenCL platforms. We send them all to the project server because:
- Different OpenCL platforms report different values for the same CPU.
- Some OpenCL CPU apps may work better with certain OpenCL platforms.
- OpenCL has only 64 bits for global_mem_size, so it can report a max of only 4GB; get the CPU RAM size from gstate.hostinfo.m_nbytes.
- the eliminates the need to include global_prefs_override.xml in the Android distribution
- defering reporting tasks confuses some Android users; since WiFi connectivity is likely to be sporadic, and jobs tend to be long, it's probably better the report them ASAP