When we first send a job, we pick an app version,
then call wu_is_infeasible_fast()
to see if the host is able to run the job with that app version.
In addition to checking disk space etc.
this calls wu_is_infeasible_custom() to do project-specific checks
(e.g. for SETI@home: don't use GPUs for VLAR jobs).
However, when we resend a job, we pick an app version
(possibly different from the original one)
and send the job without any checking.
So, for example, we might send a VLAR job to a GPU,
or send a job to a host with insufficient disk space
(because free space has changed since original send).
Solution: call wu_is_infeasible_fast() before resending a job,
and if it returns true, mark the job as done and don't resend it.
svn path=/trunk/boinc/; revision=23098
for some WUs
- back end: fix the way "report grace period" is implemented
old: result.report_deadline (i.e. what's in the DB) and
the deadline sent to the client are the same.
Some confusing and incorrect logic in the transitioner
tries to provide the desired semantics.
new: result.report_deadline is the deadline sent to the client,
plus the grace period.
No logic in the transitioner is needed.
svn path=/trunk/boinc/; revision=23040
so that we can look for memory leaks.
- client: enable bandwidth quota limit only if both
#MB and #days are nonzero.
- scheduler: when resending work, don't send more than
client is requesting
- scheduler: restore Cobblestone factor to 100
svn path=/trunk/boinc/; revision=21460
- daily quota mechanism
- reliable mechanism (accelerated retries)
- "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
host_scale_check set.
- validator: do scale probation on invalid results
(need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options
Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
a host will have separate quotas for each one.
That means it could error out on 100 jobs for cuda_fermi,
and when its quota goes to zero,
error out on 100 jobs for cuda23, etc.
This is intentional; there may be cases where one version
works but not the others.
- host.error_rate and host.max_results_day are deprecated
TODO:
- the values in the app table for limits on jobs in progress etc.
should override rather than config.xml.
Implementation notes:
scheduler:
process_request():
read all host_app_versions for host at start;
Compute "reliable" and "trusted" for each one.
write modified records at end
get_app_version():
add "reliable_only" arg; if set, use only reliable versions
skip over-quota versions
Multi-pass scheduling: if have at least one reliable version,
do a pass for jobs that need reliable,
and use only reliable versions.
Then clear best_app_versions cache.
Score-based scheduling: for need-reliable jobs,
it will pick the fastest version,
then give a score bonus if that version happens to be reliable.
When get back a successful result from client:
increase daily quota
When get back an error result from client:
impose scale probation
decrease daily quota if not aborted
Validator:
when handling a WU, create a vector of HOST_APP_VERSION
parallel to vector of RESULT.
Pass it to assign_credit_set().
Make copies of originals so we can update only modified ones
update HOST_APP_VERSION error rates
Transitioner:
decrease quota on timeout
svn path=/trunk/boinc/; revision=21181
- scheduler: when calculate scheduler runtime,
don't include the part reading request msg from client.
That can be misleadingly long
svn path=/trunk/boinc/; revision=20781
The limit on jobs in progress is now
max_wus_in_progress * NCPUS
+ max_wus_in_progress * NGPUS
where NCPUS and NGPUS reflect prefs and are capped.
Furthermore: if the client reports plan class for in-progress jobs
(see checkin of 31 May 2009)
then these limits are enforced separately;
i.e. the # of in-progress CPU jobs is <= max_wus_in_progress*NCPUS,
and the # of in-progress GPU jobs is <= max_wus_in_progress_gpu*NGPUS
- scheduler config: rename <cuda_multiplier> to <gpu_multiplier>
- scheduler: <max_wus_to_send> is now scaled by
(NCPUS + gpu_multiplier*NGPUS)
- scheduler: don't keep scanning array if !work_needed()
- scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp
- scheduler: don't say "no work available" if jobs are available
but work_needed() is initially false
svn path=/trunk/boinc/; revision=18255
- web: check whether to show profile in separate function
from displaying profile; eliminate double headers
- scheduler: finish purge of redundant arguments
svn path=/trunk/boinc/; revision=16726
for the selected APP_VERSION, rather than on the CPU benchmarks.
Otherwise estimates are wrong for GPU or multi-thread apps.
- scheduler: start switching from having SCHED_REQUEST and
SCHED_REPLY as globals instead of passing them around as args;
to be continued.
svn path=/trunk/boinc/; revision=16691