for the things that BOINC actually needs
(fixes server compile problems)
- client: fix various compile errors in coproc_detect.cpp
svn path=/trunk/boinc/; revision=23310
was causing a crash on Windows. Remove for now.
- client: Fix ATI OpenCL detection so that the coproc test messages will appear.
client/
coproc_detect.cpp
lib/
coproc.h
svn path=/trunk/boinc/; revision=23298
- new GPU types can be added easily
- users can specify GPUs in cc_config.xml,
referred to by app_info.xml,
and they will be scheduled by BOINC
and passed --device N options
Note: the parsing of cc_config.xml is not done yet.
- RPC protocols (account manager and scheduler)
can now specify GPU types in separate elements
rather than embedding them in tag names
e.g. <no_rsc>NVIDIA</no_rsc> rather than <no_cuda/>
- client: in account manager replies, parse elements of the form
<no_rsc>NAME</no_rsc>
indicating the GPUs of type NAME should not be used.
This allows account managers to control GPU types
not hardwired into the client.
Note: <no_cuda/> and <no_ati/> will continue to be supported.
- scheduler RPC reply: add
<no_rsc_apps>NAME</no_rsc_apps>
(NAME = GPU name)
to indicate that the project has no jobs for the indicated GPU type.
<no_cuda_apps> etc. are still supported
- client/lib: remove set_debts() GUI RPC
- client/scheduler RPC
remove <cuda_backoff> etc. (superceded by no_app)
Exception: <ip_result> elements in sched request
still have <ncudas> and <natis>.
Fix this later.
Implementation notes:
- client/lib: change "CUDA" to "NVIDIA" in type/variable names, and in XML
Continue to recognize "CUDA" for compatibility
- host_info.coprocs no longer used within the client;
use a global var (COPROCS coprocs) instead.
COPROCS now has an array of COPROCs;
GPUs types are identified by the array index.
Index zero means CPU.
- a bunch of other resource-specific structs (like RSC_WORK_FETCH)
are now stored in arrays, with same indices as COPROCS
(i.e. index 0 is CPU)
- COPROCS still has COPROC_NVIDIA and COPROC_ATI structs to hold vendor-specific info
- APP_VERSION now has a struct GPU_USAGE to describe its GPU usage
svn path=/trunk/boinc/; revision=23253
(either at startup or during execution)
reset a number of "wait until X" variables;
otherwise we might wait years to contact a project, restart a file xfer, etc.
Notes:
- there is no problem setting clocks forward; things just happen prematurely
- some variables (e.g. task deadlines) are not reset,
because it's not clear what to set them to
- sched: remove ati_opencl plan class until we understand what it is
svn path=/trunk/boinc/; revision=22842
as the major criterion in choosing non-EDF GPU jobs.
GPU scheduling now respects resource share,
and as a result STD should no longer diverge.
- client simulator: various improvements, most notably
that we now generate gnuplot graphs of all debt types
NOTE: the client problem was found and fixed using the simulator!
svn path=/trunk/boinc/; revision=22536
Old: various redundant and/or misleading messages were sent.
New:
- if host w/ no GPU contacts a GPU-only project,
send high-pri message saying they need a GPU
- if host w/ GPU has driver too old for all versions,
send high-pri message saying to update driver
- if host w/ GPU has driver too old for some versions,
send low-pri message saying to update driver
- if host has GPU but too little RAM for any app,
send low-pri message saying so
- scheduler: revamp GPU plan class functions
svn path=/trunk/boinc/; revision=21760
avoid conflict with nvidia's structure.
Note: these structures don't have to be the same,
since we populate our struct one item at a time.
svn path=/trunk/boinc/; revision=21668
pointers to dynamically allocated COPROC-derived objects,
just have the objects themselves.
Dynamic allocation should be avoided at all costs.
svn path=/trunk/boinc/; revision=21564
and default it to off
- client: if we print available GPU RAM (which we now don't)
have a separate timer per GPU type
- scheduler: add new plan classes cuda_opencl (sic) and ati_opencl
svn path=/trunk/boinc/; revision=21498
Some of them allow only 1 CUDA context at a time.
You need to create a CUDA context to get available VRAM.
So the client would run a CUDA job, then immediately kill it.
Solution:
- If a GPU app is running,
let it keep running regardless of available VRAM
(if it's still running, it has enough VRAM).
- But don't start new apps if there's not enough available VRAM,
or it the amount is unknown
(if the client can't create a CUDA context,
the app won't be able to either)
- client: if <coproc_debug> is set, print available GPU RAM periodically
svn path=/trunk/boinc/; revision=21253
of other jobs of that type.
They're waiting for GPU RAM, which may now be available.
- client: bug fix in GPU RAM availability
- client: fix testing setup for GPU RAM availability
svn path=/trunk/boinc/; revision=21206
old: assign GPUs, then check available RAM
Problem: may cause starvation on multi-GPU systems.
new: use available RAM info in the assignment process.
Prevents starvation, also reduces the number of driver calls.
svn path=/trunk/boinc/; revision=21205
RAM to run job, but when we actually run the job
not enough GPU RAM is free, so the application fails.
This can cause a large number of jobs to fail.
Solution:
- app_plan() can specify the GPU RAM requirements of an app version.
This is passed to the client in a new field
<gpu_ram> of the <app_version> element.
- prior to starting or restarting a GPU app, the client
checks the amount of free RAM on the particular GPU.
If it's not enough for the app version,
the client doesn't start it,
and arranges for the scheduler to ignore it for 5 minutes
(by which point there might be more free GPU RAM)
Notes:
1) this change will have effect only when
both client and scheduler are updated.
2) the check is done in enforce_schedule(),
rather than schedule_cpus(),
because only at that point
have we assigned a specific GPU to the job.
3) there's another case to deal with:
a GPU app's malloc of GPU RAM fails in the middle of the job.
Currently the job fails.
I plan to add an API call boinc_temporary_exit(x) so
that the job can exit and potentially restart in x seconds.
(In principle this mechanism is sufficient for all cases,
but it could lead to a lot of starting/exiting,
so the current change is worthwhile).
svn path=/trunk/boinc/; revision=19864
<ignore_cuda_dev>n</ignore_cuda_dev>
<ignore_ati_dev>n</ignore_ati_dev>
to ignore (not use) specific NVIDIA or ATI GPUs.
You can ignore more than one.
svn path=/trunk/boinc/; revision=19566
Make them both peak FLOPS,
according to the formula supplied by the manufacturer.
The impact on the client is minor:
- the startup message describing the GPU
- the weight of the resource type in computing long-term debt
On the server, I changed the example app_plan() function
to assume that app FLOPS is 20% of peak FLOPS
(that's about what it is for SETI@home)
svn path=/trunk/boinc/; revision=19310
for certain periods (e.g. when Remote Desktop is used on Win).
- add is_usable() member function to COPROC.
Currently this just calls the respective (CUDA or CAL)
initialization function.
We need to check whether this works and/or causes problems.
- in enforce_schedule(), check whether usability has changed
for each GPU type.
If we've gone from usable to unusable,
flag all jobs for that GPU as coproc_missing
(so they won't get run, and will quit if they're running).
If we've gone from unusable to usable, clear the flag.
This should deal with all cases except where
the client is started up with GPUs unusable.
- scheduler: more query optimizations for locality scheduling
(from Oliver Bock)
svn path=/trunk/boinc/; revision=19301