A "generic" coprocessor is one that's reported by the client,
but's not of a type that the scheduler knows about (NVIDIA, AMD, Intel).
With this commit the following works:
- On the client, define a <coproc> in your cc_config.xml
with a custom name, say 'miner_asic'.
- define a plan class such as
<plan_class>
<name>foobar</name>
<gpu_type>miner_asic</gpu_type>
<cpu_frac>0.5</cpu_frac>
<plan_class>
- App versions of this plan class will be sent only to hosts
that report a coproc of type "miner_asic".
The <app_version>s in the scheduler reply will include
a <coproc> element with the given name and count=1.
This will cause the client (at least the current client)
to run only one of these jobs at a time,
and to schedule the CPU appropriately.
Note: there's a lot missing from this;
- app version FLOPS will be those of a CPU app;
- jobs will be sent only if CPU work is requested
... and many other things.
Fixing these issues requires a significant re-architecture of the scheduler,
in particular getting rid of the PROC_TYPE_* constants
and the associated arrays,
which hard-wire the 3 fixed GPU types.
This is meant not to break anything, just add some
(optional) logging and features needed for Einstein@Home.
Please contact me before changing or removing any of this.
Conflicts:
sched/db_dump.cpp
sched/file_deleter.cpp
sched/validator.cpp
Previously, if a project specified a limit on GPU jobs in progress,
it would be enforced across GPU types.
This could lead to starvation for hosts with multiple GPU types.
E.g. the limit is 10, and a host has 10 NVIDIA jobs and no AMD jobs.
Fix this by enforcing limits separately for each GPU type.
nvidia plan classes in plan_class_spec.xml
- SCHED: Scheduler was not using properly estimated performance when assigning
work. It was using theoretical performance to choose version and actual
preformance to determine how long it would take. I've changed that to start
with theoretical performance and converge to actual performance as
host_app_version pfc_n increases.
- SCHED: Added some additional app version selection debugging output.
allows projected_flops to be calculated from host_app_version pfc rather
than elapsed time. This is valuable if result elapsed times are highly
variable and dependent on input.
We were using a static BEST_APP_VERSION in
check_homogeneous_app_version(),
and it wasn't being initialized on each call
(e.g. its HOST_USAGE was not being cleared).
svn path=/trunk/boinc/; revision=26076
"cpu" in XML, and other code was looking for "CPU".
To fix this and prevent similar problems,
processor type names are now encapsulated in proc_type_name_xml().
Code should use this rather than having hard-wired names.
Redefine: GPU_TYPE_* as macros that call proc_type_name_xml().
svn path=/trunk/boinc/; revision=25996
performed for a particular app version. It is not necessary
to tell the user to upgrade the client just to suite the needs of
a particular app version if this app version requires resources
that the host dosn't have or didn't request work for.
Actually I don't think it's good to tell the user he needs to
upgrade the client if there is only one particular app version
that requires a more recent one than he has. I think that the
purpose of the g_wreq->outdated_client flag was checking the
min_core_version in the project configuration. For this the
flag and the notice/message that it triggers is still ok. But
in the app version checks setting this flag leads to misleading
messages in most cases, so I commented that out for now.
I'm not sure, though, that both of these measurements are needed.
svn path=/trunk/boinc/; revision=25742
by default we skip app versions that use a resource
for which work has not been requested.
This is determined by the "check_req" arg to get_app_version().
This flag is cleared whenever we want to send a job
regardless of whether a requested resource can be used:
namely, when resending lost jobs, and when sending assigned jobs.
Fix a bug that could skip unrequested versions even
when check_req is false.
NOTES:
1) The current semantics aren't right.
When check_req is false, we selected the fastest of all app versions,
including those for which no work is requested.
Instead, we should select the fastest of the versions
for which work is requested if there are any;
otherwise, select the fastest version.
2) The mechanism isn't implemented for anonymous platform.
It should be.
3) If we've cached an answer (including NULL) for a given
value of check_req, that answer may be wrong for a different value.
svn path=/trunk/boinc/; revision=25342
we multiple projected FLOPS by a normal random var
with mean 1 and stddev 0.1.
Make the stddev configurable; in particular it can be zero.
svn path=/trunk/boinc/; revision=25311
Some credit cheats (e.g. with credit_by_runtime) can be done
by reporting a huge value.
Fix this by capping the value at 1.1 times the 95th percentile
of host.p_fpops, taken over active hosts.
svn path=/trunk/boinc/; revision=25017
to a superceded or deprecated app version, use it anyway.
The current app version may not validate against the old one.
svn path=/trunk/boinc/; revision=24823
- client: msg tweak
- client: minimum work buffer lower bound is 180 sec
- scheduler: in computing HOST_USAGE::project_flops for a job,
if we don't have sufficient elapsed_time statistics
for either the (host, app_version) or the app_version,
use a conservative estimate (p_fpops*(#cpus+#ngpus))
rather than the number returned by app_plan().
This avoids "time limit exceeded" errors when the latter is way off.
svn path=/trunk/boinc/; revision=24820
reduce backoff intervals somewhat
- vboxwrapper: fix buffer size typo (from Attila)
- scheduler: fix crash if using homogeneous app version,
and a WU is committed to an old or deprecated app version.
From Kevin Reed.
svn path=/trunk/boinc/; revision=24775
are assumed to be for NVIDIA GPU apps;
plan class names containing 'ati' are assumed to be for AMD GPU apps.
Clauses for 'nvidia' were missing in a couple of places.
svn path=/trunk/boinc/; revision=24512
(in sched_customize.cpp)
the flops_scale argument is intended to express the
GPU efficiency (actual/peak).
Pass appropriate values.
svn path=/trunk/boinc/; revision=24405
The problem: the choice of app version was based on
the "projected FLOPS" return by estimate_flops(av).
If usage stats exist for the host / app version,
this returns a number X such that
WU.rsc_fpops_est/X approximates the runtime of a job
using the given app version..
(If WU.rsc_fpops_est is way off, this will be correspondingly way off
from the actual FLOPS the app version will get.)
However, if there are no usage stats,
it return an estimate based on host hardware speed,
which might be 100X less.
Hence, in some cases a new app version would never get used.
Solution: choose app versions based on the values
returned by the app plan functions.
Use estimate_flops() AFTER choosing the version.
- scheduler: improve the accuracy of FLOPS estimation for GPU apps.
The "flops_scale" argument to coproc_perf
(which expresses the difference between peak GPU FLOPS
and actual FLOPS) should be used to scale GPU FLOPS
prior to calling coproc_perf(),
rather than scaling the estimate returned by coproc_perf().
- show_shmem: show have_X_apps flags
svn path=/trunk/boinc/; revision=24385
- scheduler: when using elapsed time stats to predict runtime,
cap the estimated FLOPS at twice the peak FLOPS;
otherwise, if a host has received a lot of very short jobs
recently, it will get a too-high FLOPS estimate and
will exceed the rsc_fpops_bound limit.
svn path=/trunk/boinc/; revision=24128
app version for their platform for a particular app.
The may be versions for other apps which don't have jobs right now.
TODO: send a message if there are no versions of ANY app
for any platform.
- fix makefile indentation, caused manager to not be built
svn path=/trunk/boinc/; revision=24052
Lets you specify, on a per-app basis,
that all instances should be done using the same app version.
This is for validation in the presence of GPUs.
- scheduler: code cleanup
- Instead of adding a bunch of non-DB fields to RESULT,
used a derived class SCHED_DB_RESULT.
- Instead of storing a pointer to BEST_APP_VERSION in RESULT,
store the structure itself.
This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
on <file_xfer_debug>
svn path=/trunk/boinc/; revision=23636
in the case where we don't have enough elapsed-time stats
for the host/app_version.
The right formula is (peak FLOPS)/app_version.avg_pfc
svn path=/trunk/boinc/; revision=23634