- new GPU types can be added easily
- users can specify GPUs in cc_config.xml,
referred to by app_info.xml,
and they will be scheduled by BOINC
and passed --device N options
Note: the parsing of cc_config.xml is not done yet.
- RPC protocols (account manager and scheduler)
can now specify GPU types in separate elements
rather than embedding them in tag names
e.g. <no_rsc>NVIDIA</no_rsc> rather than <no_cuda/>
- client: in account manager replies, parse elements of the form
<no_rsc>NAME</no_rsc>
indicating the GPUs of type NAME should not be used.
This allows account managers to control GPU types
not hardwired into the client.
Note: <no_cuda/> and <no_ati/> will continue to be supported.
- scheduler RPC reply: add
<no_rsc_apps>NAME</no_rsc_apps>
(NAME = GPU name)
to indicate that the project has no jobs for the indicated GPU type.
<no_cuda_apps> etc. are still supported
- client/lib: remove set_debts() GUI RPC
- client/scheduler RPC
remove <cuda_backoff> etc. (superceded by no_app)
Exception: <ip_result> elements in sched request
still have <ncudas> and <natis>.
Fix this later.
Implementation notes:
- client/lib: change "CUDA" to "NVIDIA" in type/variable names, and in XML
Continue to recognize "CUDA" for compatibility
- host_info.coprocs no longer used within the client;
use a global var (COPROCS coprocs) instead.
COPROCS now has an array of COPROCs;
GPUs types are identified by the array index.
Index zero means CPU.
- a bunch of other resource-specific structs (like RSC_WORK_FETCH)
are now stored in arrays, with same indices as COPROCS
(i.e. index 0 is CPU)
- COPROCS still has COPROC_NVIDIA and COPROC_ATI structs to hold vendor-specific info
- APP_VERSION now has a struct GPU_USAGE to describe its GPU usage
svn path=/trunk/boinc/; revision=23253
(either at startup or during execution)
reset a number of "wait until X" variables;
otherwise we might wait years to contact a project, restart a file xfer, etc.
Notes:
- there is no problem setting clocks forward; things just happen prematurely
- some variables (e.g. task deadlines) are not reset,
because it's not clear what to set them to
- sched: remove ati_opencl plan class until we understand what it is
svn path=/trunk/boinc/; revision=22842
as the major criterion in choosing non-EDF GPU jobs.
GPU scheduling now respects resource share,
and as a result STD should no longer diverge.
- client simulator: various improvements, most notably
that we now generate gnuplot graphs of all debt types
NOTE: the client problem was found and fixed using the simulator!
svn path=/trunk/boinc/; revision=22536
Old: various redundant and/or misleading messages were sent.
New:
- if host w/ no GPU contacts a GPU-only project,
send high-pri message saying they need a GPU
- if host w/ GPU has driver too old for all versions,
send high-pri message saying to update driver
- if host w/ GPU has driver too old for some versions,
send low-pri message saying to update driver
- if host has GPU but too little RAM for any app,
send low-pri message saying so
- scheduler: revamp GPU plan class functions
svn path=/trunk/boinc/; revision=21760
avoid conflict with nvidia's structure.
Note: these structures don't have to be the same,
since we populate our struct one item at a time.
svn path=/trunk/boinc/; revision=21668
pointers to dynamically allocated COPROC-derived objects,
just have the objects themselves.
Dynamic allocation should be avoided at all costs.
svn path=/trunk/boinc/; revision=21564
and default it to off
- client: if we print available GPU RAM (which we now don't)
have a separate timer per GPU type
- scheduler: add new plan classes cuda_opencl (sic) and ati_opencl
svn path=/trunk/boinc/; revision=21498
Some of them allow only 1 CUDA context at a time.
You need to create a CUDA context to get available VRAM.
So the client would run a CUDA job, then immediately kill it.
Solution:
- If a GPU app is running,
let it keep running regardless of available VRAM
(if it's still running, it has enough VRAM).
- But don't start new apps if there's not enough available VRAM,
or it the amount is unknown
(if the client can't create a CUDA context,
the app won't be able to either)
- client: if <coproc_debug> is set, print available GPU RAM periodically
svn path=/trunk/boinc/; revision=21253
of other jobs of that type.
They're waiting for GPU RAM, which may now be available.
- client: bug fix in GPU RAM availability
- client: fix testing setup for GPU RAM availability
svn path=/trunk/boinc/; revision=21206
old: assign GPUs, then check available RAM
Problem: may cause starvation on multi-GPU systems.
new: use available RAM info in the assignment process.
Prevents starvation, also reduces the number of driver calls.
svn path=/trunk/boinc/; revision=21205
RAM to run job, but when we actually run the job
not enough GPU RAM is free, so the application fails.
This can cause a large number of jobs to fail.
Solution:
- app_plan() can specify the GPU RAM requirements of an app version.
This is passed to the client in a new field
<gpu_ram> of the <app_version> element.
- prior to starting or restarting a GPU app, the client
checks the amount of free RAM on the particular GPU.
If it's not enough for the app version,
the client doesn't start it,
and arranges for the scheduler to ignore it for 5 minutes
(by which point there might be more free GPU RAM)
Notes:
1) this change will have effect only when
both client and scheduler are updated.
2) the check is done in enforce_schedule(),
rather than schedule_cpus(),
because only at that point
have we assigned a specific GPU to the job.
3) there's another case to deal with:
a GPU app's malloc of GPU RAM fails in the middle of the job.
Currently the job fails.
I plan to add an API call boinc_temporary_exit(x) so
that the job can exit and potentially restart in x seconds.
(In principle this mechanism is sufficient for all cases,
but it could lead to a lot of starting/exiting,
so the current change is worthwhile).
svn path=/trunk/boinc/; revision=19864
<ignore_cuda_dev>n</ignore_cuda_dev>
<ignore_ati_dev>n</ignore_ati_dev>
to ignore (not use) specific NVIDIA or ATI GPUs.
You can ignore more than one.
svn path=/trunk/boinc/; revision=19566
Make them both peak FLOPS,
according to the formula supplied by the manufacturer.
The impact on the client is minor:
- the startup message describing the GPU
- the weight of the resource type in computing long-term debt
On the server, I changed the example app_plan() function
to assume that app FLOPS is 20% of peak FLOPS
(that's about what it is for SETI@home)
svn path=/trunk/boinc/; revision=19310
for certain periods (e.g. when Remote Desktop is used on Win).
- add is_usable() member function to COPROC.
Currently this just calls the respective (CUDA or CAL)
initialization function.
We need to check whether this works and/or causes problems.
- in enforce_schedule(), check whether usability has changed
for each GPU type.
If we've gone from usable to unusable,
flag all jobs for that GPU as coproc_missing
(so they won't get run, and will quit if they're running).
If we've gone from unusable to usable, clear the flag.
This should deal with all cases except where
the client is started up with GPUs unusable.
- scheduler: more query optimizations for locality scheduling
(from Oliver Bock)
svn path=/trunk/boinc/; revision=19301
start only enough jobs to fill CPUs per project,
not all the CPU jobs at once.
I'm not sure how much difference this makes,
but this is how it's supposed to work.
- client: if app_info.xml doesn't specify flops,
use an estimate that takes GPUs into account.
- client: if it's been more than 2 weeks since time stats update,
don't decay on_frac at all.
svn path=/trunk/boinc/; revision=19035
is running a graphics application.
Change the semantics of the "don't use GPU while computer in use" pref
to "don't use a GPU that's running a graphics app while
computer is in use".
This will increase GPU utilization on multi-GPU systems.
svn path=/trunk/boinc/; revision=18942
- different data structure for keeping track of coproc usage;
instead of COPROC having per-instance pointers to ACTIVE_TASK,
ACTIVE_TASK now has an array of device number indices
for each instance that it's using.
- in enforce_schedule(), we call a new function assign_coprocs()
that decides what coproc instances each job will use,
and prunes jobs for which we can't get an assignment.
This function embodies lots of subtlety.
- coproc_cmdline() no longer deals with reserving instances;
it just has to generate the --device X cmdline
svn path=/trunk/boinc/; revision=18880
e.g. the Milkyway@home ATI app, of which we can typically run
2 or 3 instances at once on a GPU.
Changes include:
- In APP_VERSION, don't use a COPROCS to represent the GPU
requirements; just use doubles ncudas and natis.
- sufficient_coprocs() etc. are no longer members of COPROCS
- in HOST_USAGE, ncudas and natis are doubles
- in scheduler request, req_instances is now a double
This checkin doesn't include the job scheduling logic,
i.e. assigning jobs to GPUs. That will follow.
svn path=/trunk/boinc/; revision=18868
old: find fastest GPU, and pretend that others are the same.
Problem: other GPUs might be less capable,
and not able to handle jobs sent by server.
new: find the most "capable" GPU, use others that are equivalent,
don't use those that are not.
"Capable" is defined by
- compute capability (i.e., hardware version)
- driver version
- memory size
- FLOPs
in that priority order.
See comments in lib/coproc.h
svn path=/trunk/boinc/; revision=17855
(say what kind of job and why we're scheduling it)
- client: log messages describing GPUs: one line per GPU; fixes#879
svn path=/trunk/boinc/; revision=17847
and passing them the corresponding --device N cmdline args.
This fixes a bug introduced in 17402 (Feb 26)
that broke the --device feature,
presumably causing problems on systems with multiple GPUs.
svn path=/trunk/boinc/; revision=17549
(app versions don't have a <coprocs> around coproc elements,
may an oversight but let's stick with it).
Anyway, I think it's working now.
- lib: remove "owner" array from COPROC.
This was used in client to keep track of assignment of
coprocessors to tasks, but we got rid of the reserve/free scheme.
NOTE: this breaks the mechanism for passing --device N to apps;
I'll have to do this another way. Stay tuned.
svn path=/trunk/boinc/; revision=17543
There are two mechanisms to prevent the scheduler from
sending jobs that won't finish by their deadline.
Simple mechanism:
The client sends the interval x for which CPUs are projected
to be saturated.
Given a job with estimated duration y,
the scheduler doesn't send it if x + y exceeds the delay bound.
If it does send it, x is incremented by y.
Complex mechanism:
Client sends workload description.
Scheduler does EDF simulation, sees if deadlines are missed.
The only project using this AFAIK is BOINC alpha test.
Neither of these mechanisms takes coprocessors into account,
and as a result jobs could be sent that are doomed to
miss their deadline.
This checkin adds coprocessor awareness to the Simple mechanism.
Changes:
Client:
compute estimated delay (i.e. time until non-saturation)
for coprocessors as well as CPU.
Send them in scheduler request as part of coproc descriptor.
Scheduler:
Keep track of estimated delays separately for different resources
- client: fixed bug that computed CPU estimated delay incorrectly
- client: the work request (req_secs) for a resource is the min
of the project's share and the shortfall.
svn path=/trunk/boinc/; revision=17086
put a textual summary of them in host.serialnum (currently unused)
- web: show coprocs on host detail page
- db_dump: include coproc info in host XML
svn path=/trunk/boinc/; revision=16697
a modified boinc version.
- Added new header "boinc_fcgi.h" to be used instead of "fcgi_stdio.h".
This header defines I/O functions in the namespace FCGI rather than using
redefined functions the way "fcgi_stdio.h" does. This was causing a lot
of headaches when both <cstdio> and "fcgi_stdio.h" was called. Using
overloaded functions fixes this problem, except when the only difference
between functions is the return type (for example ::fopen() returns FILE*
and FCGI::fopen() returns FCGI_FILE*).
- Fixed some missing "#ifdef _WIN32" blocks in filesys.C
svn path=/trunk/boinc/; revision=15984
should be 2.0. This avoids crashes related to data structure
changes in the Runtime.
coprocs/CUDA/mswin/Win32/Debug/bin/
cudart.dll
coprocs/CUDA/mswin/Win32/Release/bin/
cudart.dll
coprocs/CUDA/mswin/Win32/ReleaseSigned/bin/
cudart.dll
coprocs/CUDA/mswin/x64/Debug/bin/
cudart.dll
coprocs/CUDA/mswin/x64/Release/bin/
cudart.dll
coprocs/CUDA/mswin/x64/ReleaseSigned/bin/
cudart.dll
lib/
coproc.C, .h
svn path=/trunk/boinc/; revision=15925
Old: when checking whether an app can be run,
check for sufficient coprocessors relative to
the current coprocessor usage.
Bug: it there are 2 CUDA jobs,
the scheduler will decide to run both.
enforce_scheduler() will only be able to run one,
and the other CPU will be idle.
New: include coprocessor usage (along with RAM and CPUs)
in the check, and do a simulated reservation.
In the above scenario, the scheduler will select
one CUDA app and one non-CUDA app.
svn path=/trunk/boinc/; revision=15904
- scheduler: fix bug in adaptive replication:
if send an unreplicated job to untrusted host,
set both wu.target_nresults and wu.min_quorum to app.target_nresults.
svn path=/trunk/boinc/; revision=15762
- client: better messages reporting coprocessors
- manager: bounds checks to avoid wxwidgets asserts
when job CPU estimates are absurdly large
svn path=/trunk/boinc/; revision=15644
libcudart{32,64}.so is bundled with client.
client loads it and if successful calls the device-query functions.
- client, Linux: append the current directory
(i.e., the BOINC data directory) to the LD_LIBRARY_PATH for apps.
This goes after the project dir and the slot dir.
This lets apps link to libcudartX.so.
NOTE: this is not recommended; better to include it with your app.
- client: allow for multiple messages from coproc probing
- fixed indentation in cs_platforms.C
svn path=/trunk/boinc/; revision=15591
to avoid confusion with "name" field of CUDA.
This is a bug fix - please port.
- start script: don't error out if run_state.xml file is empty
(which happens if project runs out of disk space)
svn path=/trunk/boinc/; revision=15168
in <app_version>s from the server,
keep track of the number free of each type of coproc,
and don't run an app that needs more than are available.
(not quite working yet)
svn path=/trunk/boinc/; revision=14992
and change the correspending structure field from 64KB to 256KB
(could increase this if needed).
This is needed to handle app versions with lots (> 100) of files
- change LARGE_BLOB_SIZE to BLOB_SIZE a bunch of places
- Change COPROCS from vector<COPROC> to vector<COPROC*>.
Otherwise the right virtual functions of COPROCs don't get called
svn path=/trunk/boinc/; revision=14986