Commit Graph

49 Commits

Author SHA1 Message Date
David Anderson 6b8a569d6d - client/scheduler: fix a group of bugs related to the new mechanism
where the client tells the scheduler which app versions
    its queued jobs use
    (this is needed, e.g., to enforce per-app or per-resource job limits).
    In this mechanism, the client sends an array of <app_version>s,
    and each <other_result> includes an index into this array.

    - The wrong index was being sent (client).
    - If an <app_version> had a non-existent app name
        (e.g. because that app had been deprecated)
        it wasn't getting put in the array, invalidating array indices
        Furthermore, an erroneous message was being sent to the user

        Fix: if parse error for <app_version>,
        put it in the array anyway, but with cav.app = NULL,
        meaning that it's a place-holder.
        Send a message to user only if anon platform.

- manager: increase notice buffers to 64K

svn path=/trunk/boinc/; revision=22052
2010-07-23 17:43:20 +00:00
David Anderson f7ce13cdd4 - scheduler: host_app_version.n_jobs_today was being cleared
only if the previous request was on a different day
    AND the current request asks for work.
    Sometimes it wasn't getting cleared when it should have.

svn path=/trunk/boinc/; revision=21824
2010-06-25 22:00:09 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
David Anderson 8b701fc73f - scheduler: fix messed-up deadline check logic.
Old:
        1) check deadline based on wu.delay_bound
        2) in add_result_to_reply(), potentially modify wu.delay_bound,
            e.g. because of retry acceleration
        problem: reducing delay bound may cause deadline miss
    New:
        1) new function get_delay_bound_range()
            (called from wu_is_infeasible_fast())
            returns optimistic and pessimistic delay bounds.
            Retry acceleration logic is here.
        2) check deadline based on optimistic bound;
            if that fails, check based on pessimistic bound.
            Set wu.delay_bound to the one that worked.
    Notes:
    - get_delay_bound_range() needs result priority and report deadline,
        and it's called before we read the full result.
        So add these items to WORK_ITEM and WU_RESULT.
    - get_delay_bound_range() could be customized for
        project-specific deadline policy.
    - add_result_to_reply() was becoming a toxic waste dump.
        Deadline-related stuff should have been factored out in any case.

svn path=/trunk/boinc/; revision=18946
2009-08-31 19:35:46 +00:00
David Anderson a525453b5e - code shuffling
svn path=/trunk/boinc/; revision=18826
2009-08-10 04:56:46 +00:00
David Anderson 77055d17e7 svn path=/trunk/boinc/; revision=18765 2009-07-29 18:34:27 +00:00
David Anderson 2e5d9bd778 - scheduler: add new config option <max_wus_in_progress_gpus>.
The limit on jobs in progress is now
        max_wus_in_progress * NCPUS
        + max_wus_in_progress * NGPUS
    where NCPUS and NGPUS reflect prefs and are capped.
    Furthermore: if the client reports plan class for in-progress jobs
    (see checkin of 31 May 2009)
    then these limits are enforced separately;
    i.e. the # of in-progress CPU jobs is <= max_wus_in_progress*NCPUS,
    and the # of in-progress GPU jobs is <= max_wus_in_progress_gpu*NGPUS
- scheduler config: rename <cuda_multiplier> to <gpu_multiplier>
- scheduler: <max_wus_to_send> is now scaled by
    (NCPUS + gpu_multiplier*NGPUS)
- scheduler: don't keep scanning array if !work_needed()
- scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp
- scheduler: don't say "no work available" if jobs are available
    but work_needed() is initially false


svn path=/trunk/boinc/; revision=18255
2009-06-01 22:15:14 +00:00
David Anderson 84afd18450 - scheduler: move app-version selection and score-based scheduling
to new files.

svn path=/trunk/boinc/; revision=17630
2009-03-19 16:35:35 +00:00
David Anderson 41ed82f791 - scheduler: fix bugs that caused only 1 job to be sent
svn path=/trunk/boinc/; revision=17555
2009-03-07 01:00:05 +00:00
David Anderson efbe5f616f - scheduler: move all send-work setup stuff (including messages)
into a function that's called before resend_lost_results()

svn path=/trunk/boinc/; revision=17515
2009-03-05 23:08:53 +00:00
David Anderson 012bf4c696 - scheduler: get work request parameters before resend_lost_jobs();
otherwise get NaNs for CPU fraction, etc.
- scheduler: show reasons in English when send job aborts


svn path=/trunk/boinc/; revision=17514
2009-03-05 22:12:21 +00:00
David Anderson 33d5a81cf6 - scheduler: add locality_scheduling arg to add_result_to_reply();
eliminate the need to diddle around with config.locality_scheduling.

svn path=/trunk/boinc/; revision=17445
2009-03-03 16:38:54 +00:00
David Anderson 312ffba708 - API: remove BOINC_OPTIONS::worker_thread_stack_size
- web: check whether to show profile in separate function
    from displaying profile; eliminate double headers
- scheduler: finish purge of redundant arguments

svn path=/trunk/boinc/; revision=16726
2008-12-19 18:14:02 +00:00
David Anderson ef52366c1b - web: fix bug that caused login to fail
- sched: more global vars

svn path=/trunk/boinc/; revision=16695
2008-12-16 16:29:54 +00:00
David Anderson 49a69de194 - scheduler: estimate job durations based on the FLOPS estimate
for the selected APP_VERSION, rather than on the CPU benchmarks.
    Otherwise estimates are wrong for GPU or multi-thread apps.
- scheduler: start switching from having SCHED_REQUEST and
    SCHED_REPLY as globals instead of passing them around as args;
    to be continued.

svn path=/trunk/boinc/; revision=16691
2008-12-15 21:14:32 +00:00
David Anderson 5039207e2c - scheduler: add <have_cuda_apps> config flag.
If set the "effective NCPUS" (which is used to scale
    daily_result_quota and max_wus_in_progress)
    is max'd with the # of CUDA GPUs.

svn path=/trunk/boinc/; revision=16246
2008-10-21 23:16:07 +00:00
David Anderson 4f66bb4c95 - added copyright and license info to .C, .cpp, .h files
- scheduler: fix bug in adaptive replication:
    if send an unreplicated job to untrusted host,
    set both wu.target_nresults and wu.min_quorum to app.target_nresults.

svn path=/trunk/boinc/; revision=15762
2008-08-06 18:36:30 +00:00
David Anderson 55c6b2fc64 - scheduler: added a debug msg for anonymous platform
svn path=/trunk/boinc/; revision=15749
2008-08-04 18:48:26 +00:00
David Anderson 938d77ce4c - client (Unix) Add /usr/local/cuda/lib/ to LD_LIBRARY_PATH
before looking for CUDA library
- scheduler: some additional work on matchmaker scheduling
    Changed check_app_filter() so that it doesn't depend on
    the current multi-phase approach;
    move that logic to scan_array()


svn path=/trunk/boinc/; revision=15109
2008-04-30 20:31:33 +00:00
David Anderson 13400c9516 Changes for multithread app support:
- update_versions: use __ (not :) as separator for plan class
- client: add plan_class to APP_VERSION;
    an app version is now identified by platform/version/plan_class
- client CPU scheduler: don't assume apps use 1 CPU
- client: add avg_ncpus, max_cpus, flops, cmdline to RESULT
- scheduler: implement app planning scheme

Other changes:

- client: if symlink() fails, make a XML soft link instead
    (for Unix running off a FAT32 FS)
- client: don't accept nonpositive resource share from AMS
- daemons and DB: check for error returns from enumerations,
    and exit if so.  Thus, if the MySQL server goes down,
    all the daemons will soon exit.
    The cron script will restart them every 5 min,
    so when the DB server comes back up so will the project.
- web: show empty max CPU % as ---
- API: get rid of all_threads_cpu_time option (always the case now)


svn path=/trunk/boinc/; revision=14966
2008-03-27 18:25:29 +00:00
David Anderson 4e9fbac5e0 - admin web: touch reread_db in manage_app_versions.php
- DB code: remove "is_high_priority" stuff.
- scheduler: merge find_app_version() into get_app_version().
    Have the latter memoize its results (both positive and negative).
    Have it call app_plan() for apps with nonempty plan_class.
- scheduler: first steps towards improved selectability of log messages.
    It will eventually be like the client,
    where you can select among various types of messages.
- feeder: if can't unlink the reread_db trigger file, exit
    (else we'd go into an infinite loop)

svn path=/trunk/boinc/; revision=14940
2008-03-18 21:22:44 +00:00
David Anderson 815b8fc043 Various preparation for handling multithreaded apps
and apps that use coprocessors.
There now can be several app_versions for the same
(app, platform, version_num) combination.
This changes a number of things.

- Added app_version.plan_class field to DB
- update_versions now looks for a :plan-class in the
    file or directory name, and puts it in the app_version's DB record
- Change uniqueness constraint to include plan_class
- Feeder: the feeder was putting non-deprecated app_versions
    in shared mem, and leaving it to the scheduler to
    find the latest version for a given platform.
    This is dumb.
    Instead, for each app/platform pair the feeder now
    finds the highest version number of a non-deprecated app version,
    and enumerates all non-deprecated app_versions with that
    app/platform/version
- Scheduler: add a BEST_APP_VERSION data structure that keeps track,
    for each app, what the best app_version is for this host.
    This saves the work of recomputing it for each job.

svn path=/trunk/boinc/; revision=14906
2008-03-13 22:57:24 +00:00
David Anderson 95772cba77 - removed boinc_ncpus_available() and boinc_nthreads() calls.
The design has been changed to constant #threads per app version
    Various changes from Kevin Reed/WCG:
    - server: add workunit.rsc_bandwidth_bound: if nonzero,
        send this WU only to hosts with that much download bandwidth
    - assimilators: if a handler returns DEFER_ASSIMILATION,
        the WU remains in INIT state and will be handled when the
        next instance completes.
        Useful if you want the assimilator to see all instances.
    - scheduler: when setting result.outcome = DETACHED,
        set received_time to now
    - scheduler: removed the reliable_time and reliable_min_avg_credit
        options
    - scheduler/web: add optional <allow_non_preferred_projects>
        in project preferences.
        If present, user will accept work from non-selected apps
        if no work is available for selected apps
    - scheduler: improved messages for projects with multiple apps
    - scheduler: added config options
        <granted_credit_weight> and <granted_credit_ramp_up>.
        Used in calculating host.claimed_credit_per_cpu_sec,
        but I'm not sure how.
    - Added two new credit-granting formulas (validate_util.C):
        stddev_credit() and two_credit()
    - server DB: add rollback_transaction() and affected_rows() to DB_CONN

    NOTE: DB update required

svn path=/trunk/boinc/; revision=14870
2008-03-07 21:13:01 +00:00
David Anderson e5f1f2f9cb - scheduler: code cleanup: use global var for SCHED_SHMEM
instead of passing it around as argument
    (should do same for request and reply at some point)

svn path=/trunk/boinc/; revision=14781
2008-02-22 22:21:00 +00:00
David Anderson 54519a4ee1 - Server: add "job assignment" feature.
Lets you assign a WU to a particular host,
    to one or all hosts belonging to a user or team, or to all hosts.
    See http://boinc.berkeley.edu/trac/wiki/AssignedWork
    Disabled unless you include <enable_assignment> in config.xml
    Uses a new DB table.
    Tested but only a little.
- Server: code cleanup; moved result-handling to a new file,
    and removed the PLATFORM_LIST arg to everything
    (put it in SCHEDULER_REQUEST instead)

svn path=/trunk/boinc/; revision=14767
2008-02-21 00:47:50 +00:00
David Anderson df8cbdb294 - scheduler:
- if WU is infeasible, print message instead of number
    - remove useless messages
    - remove EDF simulations printfs
    - don't update nresults_on_host in resend_lost_work()
        (it's already done in add_result_to_reply())

svn path=/trunk/boinc/; revision=14336
2007-11-30 23:02:55 +00:00
Frank Thomas 3bfc78b511 Updated the postal address of the Free Software Foundation in all license headers. See http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2007-October/008939.html for reference.
svn path=/trunk/boinc/; revision=13804
2007-10-09 11:35:47 +00:00
David Anderson 64db0fa655 - scheduler: bug fix in HR code:
in wu_is_infeasible(), check whether host type is unknown
    before seeing if WU is already committed  to different type

svn path=/trunk/boinc/; revision=13777
2007-10-04 21:55:37 +00:00
David Anderson fcab43c4da - scheduler: support per-app HR type, specified in the DB;
this supercedes the global HR type specified in config.xml
- API: eliminate small memory leak
    (if reread app init file, free malloc'd project prefs from last time)
- file upload handler: parse <md5_cksum> to avoid error msg

lib/
    app_ipc.C
sched/
    file_upload_handler.C
    sched_array.C
    sched_hr.C,h
    sched_send.C,h

svn path=/trunk/boinc/; revision=12936
2007-06-14 18:02:00 +00:00
David Anderson 2fa5db2775 - scheduler: move HR check to wu_is_infeasible()
In principle, a project can now use both
    locality scheduling and homogeneous redundancy.
- scheduler: do HR check before deadline check,
    since the latter is slower.
- scheduler: wu_is_infeasible() doesn't return a bitmap.
    Change its return values to sequential numbers.
- scheduler: ignore <accelerator> and <p_capabilities> tags

sched/
    sched_send.C,h
    sched_array.C
    sched_locality.C
    server_types.C

svn path=/trunk/boinc/; revision=12791
2007-05-31 18:14:45 +00:00
David Anderson 88860ed316 - user web: fix bug in host merge function
- core client: fix bug in set_debt() GUI RPC
- scheduler: some of the "quick checks" in scan_work_array()
    are applicable to locality scheduling also,
    so they should be moved to wu_is_infeasible().
    I did this for one: the check for one result
    per user (or host) per WU.  Should do for others.
    
client/
    gui_rpc_server_ops.C
html/
    host_edit_action.php
    host_edit_form.php
sched/
    sched_array.C
    sched_send.C,h

svn path=/trunk/boinc/; revision=12784
2007-05-30 17:25:51 +00:00
David Anderson a37403a673 - scheduler: add <workload_sim> config option.
If set, the scheduler will use EDF simulation,
    together with the in-progress workload reported by the client,
    to avoid sending results that
    1) will miss their deadline, or
    2) will cause an in-progress result to miss its deadline, or
    3) will make an in-progress result miss its deadline
        by more than is already predicted.
    If this option is not set, or if the client request doesn't
    include a workload description (i.e. the client is old)
    use the existing approach, which assumes there's no workload.
    NOTE: this is experimental.  Production projects should not use it.
- EDF sim: write debug stuff to stderr instead of stdout
- Account manager:
    - if an account is detach_when_done, set dont_request_more_work
    - check done_request_more_work even for first-time projects
- update_uotd: generate a file for use by Google gadget
- user_links(): use full URLs (so can use in Google gadget)

client/
    acct_mgr.C
    work_fetch.C
html/
    inc/
        uotd.inc
        util.inc
    user/
        uotd_gadget.php (new)
sched/
    Makefile.am
    edf_sim.C
    sched_config.C,h
    sched_resend.C
    sched_send.C,h
    server_types.C,h

svn path=/trunk/boinc/; revision=12639
2007-05-10 21:50:52 +00:00
David Anderson 2e7b82b631 - scheduler: added (correct this time!) support for
<alt_platform> tags in scheduler requests.
    - file_deleter: add check for -dont_delete_batches 

    sched/
        file_deleter.C
        handle_request.C
        sched_array.C,h
        sched_locality.C,h
        sched_resend.C,h
        sched_send.C,h
        server_types.h

svn path=/trunk/boinc/; revision=12512
2007-04-30 21:19:24 +00:00
David Anderson 7767ca1eb8 *** empty log message ***
svn path=/trunk/boinc/; revision=11492
2006-11-07 17:40:55 +00:00
David Anderson c98a2415af *** empty log message ***
svn path=/trunk/boinc/; revision=11336
2006-10-22 00:42:44 +00:00
David Anderson 2e7b35b9ec feeder/scheduler enhancements
svn path=/trunk/boinc/; revision=10083
2006-05-02 22:17:09 +00:00
David Anderson 18bf9ebf22 *** empty log message ***
svn path=/trunk/boinc/; revision=7177
2005-08-04 03:58:00 +00:00
David Anderson 9898cdd8e3 split up scheduler code
svn path=/trunk/boinc/; revision=7171
2005-08-04 03:50:04 +00:00
David Anderson 3f785e8bdd resend lost results
svn path=/trunk/boinc/; revision=6866
2005-07-28 10:13:30 +00:00
David Anderson 4710fe1771 *** empty log message ***
svn path=/trunk/boinc/; revision=6863
2005-07-28 09:29:12 +00:00
David Anderson 1c119cb037 *** empty log message ***
svn path=/trunk/boinc/; revision=5889
2005-04-18 18:42:29 +00:00
David Anderson 647c8122b3 *** empty log message ***
svn path=/trunk/boinc/; revision=5886
2005-04-18 17:54:03 +00:00
David Anderson d01e0c50ab *** empty log message ***
svn path=/trunk/boinc/; revision=5363
2005-02-08 19:54:10 +00:00
David Anderson 694ae2973e *** empty log message ***
svn path=/trunk/boinc/; revision=5341
2005-02-07 06:24:14 +00:00
David Anderson 981799643c *** empty log message ***
svn path=/trunk/boinc/; revision=5284
2005-02-02 22:58:46 +00:00
David Anderson 4a0fb78aa6 *** empty log message ***
svn path=/trunk/boinc/; revision=5258
2005-01-31 23:20:49 +00:00
David Anderson 435f8edd47 *** empty log message ***
svn path=/trunk/boinc/; revision=5161
2005-01-20 23:22:22 +00:00
David Anderson 76a5940333 *** empty log message ***
svn path=/trunk/boinc/; revision=4178
2004-09-10 00:41:48 +00:00
David Anderson d240b170b1 *** empty log message ***
svn path=/trunk/boinc/; revision=3215
2004-04-04 01:59:47 +00:00