This now supports two main use cases:
1) there's a job that you want to run once on all hosts,
present and future
(or all hosts belonging to a user, or to a team).
The job is never transitioned, validated, or assimilated.
2) There's a normal job for which you want to use only
hosts belonging to a specific user (e.g. cluster or cloud hosts).
This restriction can be made either when the job is created,
or on the fly,
e.g. as part of a scheme for accelerating batch completion.
For the latter purposes we now provide a function
restrict_wu_to_user(DB_WORKUNIT&, int userid);
The job goes through the standard
transitioner/validator/assimilator path.
These cases are enabled by config flags
<enable_assignment_multi/>
<enable_assignment/>
respectively.
Assignment of type 2) are no longer stored in shared mem,
so there is no limit on their number.
There is no longer a rule that assigned job names must contain "asgn".
NOTE: this requires a database update.
svn path=/trunk/boinc/; revision=25169
Some credit cheats (e.g. with credit_by_runtime) can be done
by reporting a huge value.
Fix this by capping the value at 1.1 times the 95th percentile
of host.p_fpops, taken over active hosts.
svn path=/trunk/boinc/; revision=25017
Tells multicore apps how many cores to use.
The --nthreads command line arg to the app is now deprecated
though we'll keep it around for the time being.
svn path=/trunk/boinc/; revision=24708
- daily quota mechanism
- reliable mechanism (accelerated retries)
- "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
host_scale_check set.
- validator: do scale probation on invalid results
(need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options
Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
a host will have separate quotas for each one.
That means it could error out on 100 jobs for cuda_fermi,
and when its quota goes to zero,
error out on 100 jobs for cuda23, etc.
This is intentional; there may be cases where one version
works but not the others.
- host.error_rate and host.max_results_day are deprecated
TODO:
- the values in the app table for limits on jobs in progress etc.
should override rather than config.xml.
Implementation notes:
scheduler:
process_request():
read all host_app_versions for host at start;
Compute "reliable" and "trusted" for each one.
write modified records at end
get_app_version():
add "reliable_only" arg; if set, use only reliable versions
skip over-quota versions
Multi-pass scheduling: if have at least one reliable version,
do a pass for jobs that need reliable,
and use only reliable versions.
Then clear best_app_versions cache.
Score-based scheduling: for need-reliable jobs,
it will pick the fastest version,
then give a score bonus if that version happens to be reliable.
When get back a successful result from client:
increase daily quota
When get back an error result from client:
impose scale probation
decrease daily quota if not aborted
Validator:
when handling a WU, create a vector of HOST_APP_VERSION
parallel to vector of RESULT.
Pass it to assign_credit_set().
Make copies of originals so we can update only modified ones
update HOST_APP_VERSION error rates
Transitioner:
decrease quota on timeout
svn path=/trunk/boinc/; revision=21181
we're the main program (otherwise we didn't lock it in
the first place, and a crash results). From Artyom Sharov.
- scheduler: add support for the GCL simulator,
which uses special versions of backend programs
that use virtual time,
and that wait for signals instead of sleep()ing.
To compile:
make clean
configure CXXFLAGS="-DGCL_SIMULATOR"
make
svn path=/trunk/boinc/; revision=16036
- scheduler: fix bug in adaptive replication:
if send an unreplicated job to untrusted host,
set both wu.target_nresults and wu.min_quorum to app.target_nresults.
svn path=/trunk/boinc/; revision=15762
wish to use it.
- The script calculate_credit_multiplier (expected to be run daily as
a config.xml task) looks at the ratio of granted credit to CPU time
for recent results for each app. Multiplier is calculated to cause
median hosts granted credit per cpu second to equal to equal that
expected from its benchmarks. This is 30-day exponentially averaged
with the previous value of the multplier and stored in the table
credit_multplier.
- When a result is received the server adjusts claimed credit by the
value the multiplier had when the result was sent.
svn path=/trunk/boinc/; revision=15661
for WUs of different applications
(need to count unsent results separately by app)
- feeder: major code cleanup
- application interleaving (for -allapps) is now done
by building a static slot-to-app array "app_indices".
Fractional weights now work correctly.
- enum sizes (for -allapps) are now precomputed
in an array "enum_sizes"
- rename "found" (confusing!!) to "collision"
- swapped the names of mod_select_clause and select_clause,
to reflect what they actually are
- file deleter: in finding oldest WU, order by id instead of create_time
(there's no index on create_time)
- user web: show "merge by name" only to host owner
- add cpu_scheduler_period() member to GLOBAL_PREFS
(so you don't have to multiply by 60 everywhere)
- infinite() fix for HPUX
client/
cpu_sched.C
cs_cmdline.C
cs_scheduler.C
rrsim_test.C
sim.C
work_fetch.C
html/user/
hosts_user.php
lib/
parse.h
prefs.h
sched/
feeder.C
file_deleter.C
make_work.C
sample_work_generator.C
sched_util.C,h
tools/
updater.C
svn path=/trunk/boinc/; revision=12968
name of elapsed_time() to elapsed_wallclock_time().
- Backend logging statements on exit() which echo elapsed run time to logs now
do this with much higher printed precision.
- Backend logging, separate scheduler requests with an almost blank line
svn path=/trunk/boinc/; revision=8027
- Address David's comment of Feb 2. Now properly reduce the
disk size resource requirements of a WU being sent if the
file is already on the host, or already included in a previous
WU being sent. DAVID: please check that reply_copy.wus.pop_back()
is right.
- For this, define a function host_has_file(). This can also
be used in the future for more intelligent file deletion
schemes.
- Make warnings to upgrade old clients have low priority until
3 days before deadline. Then high priority.
- Fix sign error in messages sent to users about insufficient
disk space.
- Move extract_filename() from sched_locality.C to sched_util.C
- Pretty up the ordered list of URLs printed for a given host.
- I've even tested these changes before committing them!
svn path=/trunk/boinc/; revision=5382