boinc/sched
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
..
Makefile.am - server: various changes; 2010-03-29 22:28:20 +00:00
assimilate_handler.h - client: error if a <file_info> in app_info.xml has any URLs 2009-07-09 20:18:56 +00:00
assimilator.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
assimilator.py - python assimilator fix 2009-06-16 19:38:35 +00:00
census.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
credit.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
credit.h - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
credit_test.cpp - server: various changes; 2010-03-29 22:28:20 +00:00
db_dump.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
db_dump_spec.xml *** empty log message *** 2006-03-06 21:40:07 +00:00
db_purge.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
delete_file.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
edf_sim.cpp - scheduler: improve message formatting; add <debug_locality> flag 2009-01-15 20:23:20 +00:00
edf_sim.h - changed some comments for Doxygen 2008-10-04 23:44:24 +00:00
feeder.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
file_deleter.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
file_upload_handler.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
get_file.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
handle_request.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
handle_request.h - code shuffling 2009-08-10 04:56:46 +00:00
hr.cpp - validator: -d 4 means -d 3 plus print all DB queries 2010-04-05 18:59:16 +00:00
hr.h - validator: -d 4 means -d 3 plus print all DB queries 2010-04-05 18:59:16 +00:00
hr_info.cpp - validator: -d 4 means -d 3 plus print all DB queries 2010-04-05 18:59:16 +00:00
hr_info.h - added copyright and license info to .C, .cpp, .h files 2008-08-06 18:36:30 +00:00
make_work.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
message_handler.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
pymw_assimilator.py - PyMW assimilator fixes from Jeremy 2009-07-01 23:58:04 +00:00
request_file_list.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sample_assimilator.cpp - client: don't print error message if output file with <copy_file> 2009-09-17 21:06:11 +00:00
sample_bitwise_validator.cpp - sample bitwise validator: make it work for binary files 2009-05-01 18:25:17 +00:00
sample_db_dump_spec.xml *** empty log message *** 2004-06-21 05:03:56 +00:00
sample_dummy_assimilator.cpp - rename .C files to .cpp so that Doxygen will work 2008-09-26 18:20:24 +00:00
sample_hr_info.txt - added sample host-distribution file for HR 2007-07-06 18:19:10 +00:00
sample_trivial_validator.cpp - rename .C files to .cpp so that Doxygen will work 2008-09-26 18:20:24 +00:00
sample_work_generator.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_array.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_array.h - scheduler: add new config option <max_wus_in_progress_gpus>. 2009-06-01 22:15:14 +00:00
sched_assign.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_assign.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_config.cpp - server: various changes; 2010-03-29 22:28:20 +00:00
sched_config.h - server: various changes; 2010-03-29 22:28:20 +00:00
sched_customize.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_customize.h - client/scheduler: deal with situation where GPU has enough 2009-12-11 22:45:59 +00:00
sched_driver.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_hr.cpp - validator: -d 4 means -d 3 plus print all DB queries 2010-04-05 18:59:16 +00:00
sched_hr.h - scheduler: move app-version selection and score-based scheduling 2009-03-19 16:35:35 +00:00
sched_locality.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_locality.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_main.cpp - server: make the -d 4 feature work with FCGI 2010-04-05 23:12:02 +00:00
sched_main.h - code shuffling 2009-08-10 04:56:46 +00:00
sched_msgs.cpp - scheduler: improve message formatting; add <debug_locality> flag 2009-01-15 20:23:20 +00:00
sched_msgs.h - STILL WORK TO BE DONE TO GET locale STUFF INSTALLED PROPERLY!!! 2009-01-13 23:06:02 +00:00
sched_resend.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_resend.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_result.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_result.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_score.cpp - scheduler: compute no_jobs_available correctly 2009-11-12 21:30:33 +00:00
sched_score.h svn path=/trunk/boinc/; revision=18765 2009-07-29 18:34:27 +00:00
sched_send.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_send.h - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_shmem.cpp - scheduler: sweeping changes to the way job runtimes are estimated: 2010-04-08 23:14:47 +00:00
sched_shmem.h - scheduler: sweeping changes to the way job runtimes are estimated: 2010-04-08 23:14:47 +00:00
sched_timezone.cpp svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
sched_timezone.h - API: remove BOINC_OPTIONS::worker_thread_stack_size 2008-12-19 18:14:02 +00:00
sched_types.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_types.h - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_util.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_util.h - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_version.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
sched_version.h - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
send_file.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
show_shmem.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
single_job_assimilator.cpp - client, Mac: don't do res_init(). It causes a crash. 2009-05-07 13:54:51 +00:00
start - backend programs: change the way PFC and elapsed-time statistics 2010-04-02 19:10:37 +00:00
testasm.py - server: improve the Python assimilator framework; 2009-06-12 03:06:01 +00:00
time_stats_log.cpp - code shuffling 2009-08-10 04:56:46 +00:00
time_stats_log.h - code shuffling 2009-08-10 04:56:46 +00:00
transitioner.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
trickle_handler.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
update_stats.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
validate_util.cpp - server code: moved everything related to credit-granting to credit.cpp, 2009-08-12 16:26:43 +00:00
validate_util.h - server code: moved everything related to credit-granting to credit.cpp, 2009-08-12 16:26:43 +00:00
validate_util2.cpp - server: various changes; 2010-03-29 22:28:20 +00:00
validate_util2.h - server: various changes; 2010-03-29 22:28:20 +00:00
validator.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00
validator.h - validator: -d 4 means -d 3 plus print all DB queries 2010-04-05 18:59:16 +00:00
wu_check.cpp - server: change the following from per-host to per-(host, app version): 2010-04-15 03:13:56 +00:00