of other jobs of that type.
They're waiting for GPU RAM, which may now be available.
- client: bug fix in GPU RAM availability
- client: fix testing setup for GPU RAM availability
svn path=/trunk/boinc/; revision=21206
old: assign GPUs, then check available RAM
Problem: may cause starvation on multi-GPU systems.
new: use available RAM info in the assignment process.
Prevents starvation, also reduces the number of driver calls.
svn path=/trunk/boinc/; revision=21205
that way it will query server for file length when it resumes,
rather than uploading from the beginning
- client: back out SEH handling for GPU detection
svn path=/trunk/boinc/; revision=20750
routines. Take care of situations where something within
the vendors functions cause a crash.
client/
coproc_detect.cpp
svn path=/trunk/boinc/; revision=20744
RAM to run job, but when we actually run the job
not enough GPU RAM is free, so the application fails.
This can cause a large number of jobs to fail.
Solution:
- app_plan() can specify the GPU RAM requirements of an app version.
This is passed to the client in a new field
<gpu_ram> of the <app_version> element.
- prior to starting or restarting a GPU app, the client
checks the amount of free RAM on the particular GPU.
If it's not enough for the app version,
the client doesn't start it,
and arranges for the scheduler to ignore it for 5 minutes
(by which point there might be more free GPU RAM)
Notes:
1) this change will have effect only when
both client and scheduler are updated.
2) the check is done in enforce_schedule(),
rather than schedule_cpus(),
because only at that point
have we assigned a specific GPU to the job.
3) there's another case to deal with:
a GPU app's malloc of GPU RAM fails in the middle of the job.
Currently the job fails.
I plan to add an API call boinc_temporary_exit(x) so
that the job can exit and potentially restart in x seconds.
(In principle this mechanism is sufficient for all cases,
but it could lead to a lot of starting/exiting,
so the current change is worthwhile).
svn path=/trunk/boinc/; revision=19864