where the client tells the scheduler which app versions
its queued jobs use
(this is needed, e.g., to enforce per-app or per-resource job limits).
In this mechanism, the client sends an array of <app_version>s,
and each <other_result> includes an index into this array.
- The wrong index was being sent (client).
- If an <app_version> had a non-existent app name
(e.g. because that app had been deprecated)
it wasn't getting put in the array, invalidating array indices
Furthermore, an erroneous message was being sent to the user
Fix: if parse error for <app_version>,
put it in the array anyway, but with cav.app = NULL,
meaning that it's a place-holder.
Send a message to user only if anon platform.
- manager: increase notice buffers to 64K
svn path=/trunk/boinc/; revision=22052
only if the previous request was on a different day
AND the current request asks for work.
Sometimes it wasn't getting cleared when it should have.
svn path=/trunk/boinc/; revision=21824
- daily quota mechanism
- reliable mechanism (accelerated retries)
- "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
host_scale_check set.
- validator: do scale probation on invalid results
(need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options
Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
a host will have separate quotas for each one.
That means it could error out on 100 jobs for cuda_fermi,
and when its quota goes to zero,
error out on 100 jobs for cuda23, etc.
This is intentional; there may be cases where one version
works but not the others.
- host.error_rate and host.max_results_day are deprecated
TODO:
- the values in the app table for limits on jobs in progress etc.
should override rather than config.xml.
Implementation notes:
scheduler:
process_request():
read all host_app_versions for host at start;
Compute "reliable" and "trusted" for each one.
write modified records at end
get_app_version():
add "reliable_only" arg; if set, use only reliable versions
skip over-quota versions
Multi-pass scheduling: if have at least one reliable version,
do a pass for jobs that need reliable,
and use only reliable versions.
Then clear best_app_versions cache.
Score-based scheduling: for need-reliable jobs,
it will pick the fastest version,
then give a score bonus if that version happens to be reliable.
When get back a successful result from client:
increase daily quota
When get back an error result from client:
impose scale probation
decrease daily quota if not aborted
Validator:
when handling a WU, create a vector of HOST_APP_VERSION
parallel to vector of RESULT.
Pass it to assign_credit_set().
Make copies of originals so we can update only modified ones
update HOST_APP_VERSION error rates
Transitioner:
decrease quota on timeout
svn path=/trunk/boinc/; revision=21181
Old:
1) check deadline based on wu.delay_bound
2) in add_result_to_reply(), potentially modify wu.delay_bound,
e.g. because of retry acceleration
problem: reducing delay bound may cause deadline miss
New:
1) new function get_delay_bound_range()
(called from wu_is_infeasible_fast())
returns optimistic and pessimistic delay bounds.
Retry acceleration logic is here.
2) check deadline based on optimistic bound;
if that fails, check based on pessimistic bound.
Set wu.delay_bound to the one that worked.
Notes:
- get_delay_bound_range() needs result priority and report deadline,
and it's called before we read the full result.
So add these items to WORK_ITEM and WU_RESULT.
- get_delay_bound_range() could be customized for
project-specific deadline policy.
- add_result_to_reply() was becoming a toxic waste dump.
Deadline-related stuff should have been factored out in any case.
svn path=/trunk/boinc/; revision=18946
The limit on jobs in progress is now
max_wus_in_progress * NCPUS
+ max_wus_in_progress * NGPUS
where NCPUS and NGPUS reflect prefs and are capped.
Furthermore: if the client reports plan class for in-progress jobs
(see checkin of 31 May 2009)
then these limits are enforced separately;
i.e. the # of in-progress CPU jobs is <= max_wus_in_progress*NCPUS,
and the # of in-progress GPU jobs is <= max_wus_in_progress_gpu*NGPUS
- scheduler config: rename <cuda_multiplier> to <gpu_multiplier>
- scheduler: <max_wus_to_send> is now scaled by
(NCPUS + gpu_multiplier*NGPUS)
- scheduler: don't keep scanning array if !work_needed()
- scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp
- scheduler: don't say "no work available" if jobs are available
but work_needed() is initially false
svn path=/trunk/boinc/; revision=18255
- web: check whether to show profile in separate function
from displaying profile; eliminate double headers
- scheduler: finish purge of redundant arguments
svn path=/trunk/boinc/; revision=16726
for the selected APP_VERSION, rather than on the CPU benchmarks.
Otherwise estimates are wrong for GPU or multi-thread apps.
- scheduler: start switching from having SCHED_REQUEST and
SCHED_REPLY as globals instead of passing them around as args;
to be continued.
svn path=/trunk/boinc/; revision=16691
If set the "effective NCPUS" (which is used to scale
daily_result_quota and max_wus_in_progress)
is max'd with the # of CUDA GPUs.
svn path=/trunk/boinc/; revision=16246
- scheduler: fix bug in adaptive replication:
if send an unreplicated job to untrusted host,
set both wu.target_nresults and wu.min_quorum to app.target_nresults.
svn path=/trunk/boinc/; revision=15762
before looking for CUDA library
- scheduler: some additional work on matchmaker scheduling
Changed check_app_filter() so that it doesn't depend on
the current multi-phase approach;
move that logic to scan_array()
svn path=/trunk/boinc/; revision=15109
- update_versions: use __ (not :) as separator for plan class
- client: add plan_class to APP_VERSION;
an app version is now identified by platform/version/plan_class
- client CPU scheduler: don't assume apps use 1 CPU
- client: add avg_ncpus, max_cpus, flops, cmdline to RESULT
- scheduler: implement app planning scheme
Other changes:
- client: if symlink() fails, make a XML soft link instead
(for Unix running off a FAT32 FS)
- client: don't accept nonpositive resource share from AMS
- daemons and DB: check for error returns from enumerations,
and exit if so. Thus, if the MySQL server goes down,
all the daemons will soon exit.
The cron script will restart them every 5 min,
so when the DB server comes back up so will the project.
- web: show empty max CPU % as ---
- API: get rid of all_threads_cpu_time option (always the case now)
svn path=/trunk/boinc/; revision=14966
- DB code: remove "is_high_priority" stuff.
- scheduler: merge find_app_version() into get_app_version().
Have the latter memoize its results (both positive and negative).
Have it call app_plan() for apps with nonempty plan_class.
- scheduler: first steps towards improved selectability of log messages.
It will eventually be like the client,
where you can select among various types of messages.
- feeder: if can't unlink the reread_db trigger file, exit
(else we'd go into an infinite loop)
svn path=/trunk/boinc/; revision=14940
and apps that use coprocessors.
There now can be several app_versions for the same
(app, platform, version_num) combination.
This changes a number of things.
- Added app_version.plan_class field to DB
- update_versions now looks for a :plan-class in the
file or directory name, and puts it in the app_version's DB record
- Change uniqueness constraint to include plan_class
- Feeder: the feeder was putting non-deprecated app_versions
in shared mem, and leaving it to the scheduler to
find the latest version for a given platform.
This is dumb.
Instead, for each app/platform pair the feeder now
finds the highest version number of a non-deprecated app version,
and enumerates all non-deprecated app_versions with that
app/platform/version
- Scheduler: add a BEST_APP_VERSION data structure that keeps track,
for each app, what the best app_version is for this host.
This saves the work of recomputing it for each job.
svn path=/trunk/boinc/; revision=14906
The design has been changed to constant #threads per app version
Various changes from Kevin Reed/WCG:
- server: add workunit.rsc_bandwidth_bound: if nonzero,
send this WU only to hosts with that much download bandwidth
- assimilators: if a handler returns DEFER_ASSIMILATION,
the WU remains in INIT state and will be handled when the
next instance completes.
Useful if you want the assimilator to see all instances.
- scheduler: when setting result.outcome = DETACHED,
set received_time to now
- scheduler: removed the reliable_time and reliable_min_avg_credit
options
- scheduler/web: add optional <allow_non_preferred_projects>
in project preferences.
If present, user will accept work from non-selected apps
if no work is available for selected apps
- scheduler: improved messages for projects with multiple apps
- scheduler: added config options
<granted_credit_weight> and <granted_credit_ramp_up>.
Used in calculating host.claimed_credit_per_cpu_sec,
but I'm not sure how.
- Added two new credit-granting formulas (validate_util.C):
stddev_credit() and two_credit()
- server DB: add rollback_transaction() and affected_rows() to DB_CONN
NOTE: DB update required
svn path=/trunk/boinc/; revision=14870
Lets you assign a WU to a particular host,
to one or all hosts belonging to a user or team, or to all hosts.
See http://boinc.berkeley.edu/trac/wiki/AssignedWork
Disabled unless you include <enable_assignment> in config.xml
Uses a new DB table.
Tested but only a little.
- Server: code cleanup; moved result-handling to a new file,
and removed the PLATFORM_LIST arg to everything
(put it in SCHEDULER_REQUEST instead)
svn path=/trunk/boinc/; revision=14767
in wu_is_infeasible(), check whether host type is unknown
before seeing if WU is already committed to different type
svn path=/trunk/boinc/; revision=13777
In principle, a project can now use both
locality scheduling and homogeneous redundancy.
- scheduler: do HR check before deadline check,
since the latter is slower.
- scheduler: wu_is_infeasible() doesn't return a bitmap.
Change its return values to sequential numbers.
- scheduler: ignore <accelerator> and <p_capabilities> tags
sched/
sched_send.C,h
sched_array.C
sched_locality.C
server_types.C
svn path=/trunk/boinc/; revision=12791
- core client: fix bug in set_debt() GUI RPC
- scheduler: some of the "quick checks" in scan_work_array()
are applicable to locality scheduling also,
so they should be moved to wu_is_infeasible().
I did this for one: the check for one result
per user (or host) per WU. Should do for others.
client/
gui_rpc_server_ops.C
html/
host_edit_action.php
host_edit_form.php
sched/
sched_array.C
sched_send.C,h
svn path=/trunk/boinc/; revision=12784
If set, the scheduler will use EDF simulation,
together with the in-progress workload reported by the client,
to avoid sending results that
1) will miss their deadline, or
2) will cause an in-progress result to miss its deadline, or
3) will make an in-progress result miss its deadline
by more than is already predicted.
If this option is not set, or if the client request doesn't
include a workload description (i.e. the client is old)
use the existing approach, which assumes there's no workload.
NOTE: this is experimental. Production projects should not use it.
- EDF sim: write debug stuff to stderr instead of stdout
- Account manager:
- if an account is detach_when_done, set dont_request_more_work
- check done_request_more_work even for first-time projects
- update_uotd: generate a file for use by Google gadget
- user_links(): use full URLs (so can use in Google gadget)
client/
acct_mgr.C
work_fetch.C
html/
inc/
uotd.inc
util.inc
user/
uotd_gadget.php (new)
sched/
Makefile.am
edf_sim.C
sched_config.C,h
sched_resend.C
sched_send.C,h
server_types.C,h
svn path=/trunk/boinc/; revision=12639