Commit Graph

23 Commits

Author SHA1 Message Date
David Anderson b169e5ab0f - server programs: print error message instead of numeric retval
in log messages

svn path=/trunk/boinc/; revision=22647
2010-11-08 17:51:57 +00:00
David Anderson 81973a9fff - scheduler: fix structural problems with sending user messages.
Old: various redundant and/or misleading messages were sent.
    New:
        - if host w/ no GPU contacts a GPU-only project,
            send high-pri message saying they need a GPU
        - if host w/ GPU has driver too old for all versions,
            send high-pri message saying to update driver
        - if host w/ GPU has driver too old for some versions,
            send low-pri message saying to update driver
        - if host has GPU but too little RAM for any app,
            send low-pri message saying so
- scheduler: revamp GPU plan class functions

svn path=/trunk/boinc/; revision=21760
2010-06-16 22:07:19 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
David Anderson 5b7f8b8348 - web: fix bug that caused "send email" and "show hosts"
in project prefs to always select "no"


svn path=/trunk/boinc/; revision=20786
2010-03-04 04:16:00 +00:00
David Anderson 12a85e5ced - scheduler: code cleanup: goto considered harmful
- scheduler: when calculate scheduler runtime,
    don't include the part reading request msg from client.
    That can be misleadingly long

svn path=/trunk/boinc/; revision=20781
2010-03-03 19:29:23 +00:00
David Anderson 56a8296b5b - scheduler: compute no_jobs_available correctly
in the presence of multiple scheduling types
    (e.g., locality and job array)
    From Nils Brause

svn path=/trunk/boinc/; revision=19559
2009-11-12 21:30:33 +00:00
David Anderson 8b701fc73f - scheduler: fix messed-up deadline check logic.
Old:
        1) check deadline based on wu.delay_bound
        2) in add_result_to_reply(), potentially modify wu.delay_bound,
            e.g. because of retry acceleration
        problem: reducing delay bound may cause deadline miss
    New:
        1) new function get_delay_bound_range()
            (called from wu_is_infeasible_fast())
            returns optimistic and pessimistic delay bounds.
            Retry acceleration logic is here.
        2) check deadline based on optimistic bound;
            if that fails, check based on pessimistic bound.
            Set wu.delay_bound to the one that worked.
    Notes:
    - get_delay_bound_range() needs result priority and report deadline,
        and it's called before we read the full result.
        So add these items to WORK_ITEM and WU_RESULT.
    - get_delay_bound_range() could be customized for
        project-specific deadline policy.
    - add_result_to_reply() was becoming a toxic waste dump.
        Deadline-related stuff should have been factored out in any case.

svn path=/trunk/boinc/; revision=18946
2009-08-31 19:35:46 +00:00
David Anderson b300519444 svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
David Anderson 2e5d9bd778 - scheduler: add new config option <max_wus_in_progress_gpus>.
The limit on jobs in progress is now
        max_wus_in_progress * NCPUS
        + max_wus_in_progress * NGPUS
    where NCPUS and NGPUS reflect prefs and are capped.
    Furthermore: if the client reports plan class for in-progress jobs
    (see checkin of 31 May 2009)
    then these limits are enforced separately;
    i.e. the # of in-progress CPU jobs is <= max_wus_in_progress*NCPUS,
    and the # of in-progress GPU jobs is <= max_wus_in_progress_gpu*NGPUS
- scheduler config: rename <cuda_multiplier> to <gpu_multiplier>
- scheduler: <max_wus_to_send> is now scaled by
    (NCPUS + gpu_multiplier*NGPUS)
- scheduler: don't keep scanning array if !work_needed()
- scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp
- scheduler: don't say "no work available" if jobs are available
    but work_needed() is initially false


svn path=/trunk/boinc/; revision=18255
2009-06-01 22:15:14 +00:00
David Anderson 84afd18450 - scheduler: move app-version selection and score-based scheduling
to new files.

svn path=/trunk/boinc/; revision=17630
2009-03-19 16:35:35 +00:00
David Anderson 76da7d8653 - scheduler: msg tweak
svn path=/trunk/boinc/; revision=17584
2009-03-10 21:34:49 +00:00
David Anderson 41ed82f791 - scheduler: fix bugs that caused only 1 job to be sent
svn path=/trunk/boinc/; revision=17555
2009-03-07 01:00:05 +00:00
David Anderson c22b62f25b - scheduler: fix bugs in support for anonymous platform + coprocs
(app versions don't have a <coprocs> around coproc elements,
    may an oversight but let's stick with it).
    Anyway, I think it's working now.
- lib: remove "owner" array from COPROC.
    This was used in client to keep track of assignment of
    coprocessors to tasks, but we got rid of the reserve/free scheme.
    NOTE: this breaks the mechanism for passing --device N to apps;
    I'll have to do this another way.  Stay tuned.

svn path=/trunk/boinc/; revision=17543
2009-03-06 22:21:47 +00:00
David Anderson 33d5a81cf6 - scheduler: add locality_scheduling arg to add_result_to_reply();
eliminate the need to diddle around with config.locality_scheduling.

svn path=/trunk/boinc/; revision=17445
2009-03-03 16:38:54 +00:00
David Anderson 4cd3c530b0 - scheduler: reduce frequency of calls to work_needed()
svn path=/trunk/boinc/; revision=17003
2009-01-23 22:52:35 +00:00
David Anderson 91e120b3f4 - scheduler: improve message formatting; add <debug_locality> flag
for locality scheduling messages

svn path=/trunk/boinc/; revision=16921
2009-01-15 20:23:20 +00:00
David Anderson 74423f23b6 - scheduler: if no jobs available to send, inform the user
svn path=/trunk/boinc/; revision=16730
2008-12-22 00:10:02 +00:00
David Anderson 312ffba708 - API: remove BOINC_OPTIONS::worker_thread_stack_size
- web: check whether to show profile in separate function
    from displaying profile; eliminate double headers
- scheduler: finish purge of redundant arguments

svn path=/trunk/boinc/; revision=16726
2008-12-19 18:14:02 +00:00
David Anderson ef52366c1b - web: fix bug that caused login to fail
- sched: more global vars

svn path=/trunk/boinc/; revision=16695
2008-12-16 16:29:54 +00:00
David Anderson 49a69de194 - scheduler: estimate job durations based on the FLOPS estimate
for the selected APP_VERSION, rather than on the CPU benchmarks.
    Otherwise estimates are wrong for GPU or multi-thread apps.
- scheduler: start switching from having SCHED_REQUEST and
    SCHED_REPLY as globals instead of passing them around as args;
    to be continued.

svn path=/trunk/boinc/; revision=16691
2008-12-15 21:14:32 +00:00
David Anderson 8ea8081626 - scheduler: fix memory leak when reporting time stats logs
- scheduler: fix egregious bug where wu_is_infeasible_fast() result
    is ignored, and we send jobs to hosts that can't handle them.
- scheduler: don't check for disk space in work_needed();
    do it in check_disk(), which generates a message to user.
- scheduler: add -debug_log flag, which sends stderr to
    "debug_log" rather than scheduler_log.txt (for debugging)

svn path=/trunk/boinc/; revision=16578
2008-11-26 21:49:36 +00:00
David Anderson 5039207e2c - scheduler: add <have_cuda_apps> config flag.
If set the "effective NCPUS" (which is used to scale
    daily_result_quota and max_wus_in_progress)
    is max'd with the # of CUDA GPUs.

svn path=/trunk/boinc/; revision=16246
2008-10-21 23:16:07 +00:00
David Anderson 98cfb8d3b0 - rename .C files to .cpp so that Doxygen will work
svn path=/trunk/boinc/; revision=16069
2008-09-26 18:20:24 +00:00