Commit Graph

28 Commits

Author SHA1 Message Date
David Anderson 81faa12ff3 scheduler: make matchmaker scheduling the default
Remove <matchmaker> config option; add <sched_old> option.
2014-07-08 12:35:45 -07:00
David Anderson 5fd1656935 scheduler: fix bug in score-based scheduler that cause last job to not be sent
This is the problem where the scheduler marks one job with its PID
to reserve it.
It needs to be able to send this job.
I fixed this recently in old scheduler, neglected to fix in score-based as well
2014-06-13 00:30:03 -07:00
David Anderson 56934f8fbe scheduler: clean up job dispatch logging
There are now 3 flags for job dispatch logging:
<debug_send/>: info about work request, jobs sent, other high-level stuff
<debug_send_scan/>: info about scans through job cache
<debug_send_job/>: info about individual jobs (e.g. reason for not sending)
2014-06-12 11:33:11 -07:00
David Anderson 52bd196c4f scheduler: fix bugs in sending non-compute-intensive jobs 2014-05-26 21:07:07 -07:00
David Anderson 641099040e scheduler: add log message for beta-test pref, score-based case 2014-05-12 10:21:28 -07:00
David Anderson 6216673eca web: fix missing mysqli change 2014-03-22 09:04:58 -07:00
David Anderson 5381def663 server: use gpu_active_frac in scheduling decisions
On some hosts, gpu_active_frac may be much less than active_frac
(i.e., GPUs may be available much less than CPUs).
Use gpu_active_frac in the following places:

- scheduler: in estimating the elapsed time of jobs,
    to decide whether they can meet deadline
- scheduler: in computing the effective speed of a (host, app version),
    when deciding what size class it belongs to
- size_census: in computing effective speed of (host, app versions)

(Previously, we were just using active_frac in all these cases)
2014-03-06 21:23:02 -08:00
David Anderson 6d4999767f example app: print "starting" message after boinc_init, so that it appears in stdferr file
Also remove old score-based sched code
2013-12-10 14:00:31 -08:00
David Anderson 1c31f6feaa Condor: fix bug when 2 input files have same contents; fix error messages 2013-08-09 16:06:36 -07:00
Eric J Korpela 244ba5bc85 SCHED: modified scheduled log output to use unsigned format for WU and RESULT
ids.  This allows IDs greater than 2^31 to be printed.
2013-06-19 10:15:08 -07:00
David Anderson bf96878fe8 scheduler: comparison function for score-based scheduling was backwards 2013-05-22 10:20:54 -07:00
David Anderson fa43e0fe6e scheduler: enforce job limits in score-based scheduling 2013-05-21 20:14:39 -07:00
David Anderson 0c430ce1fa Add support for multi-size apps
See http://boinc.berkeley.edu/trac/wiki/MultiSize
The components of this include:
- DB changes:
    add size_class to workunit and result
    n_size_classes to app; >1 means multi-size
- size_regulator daemon program: change results states
    from INACTIVE to UNSENT carefully
- size_census program; writes quantile info in flat files
- transitioner: when creating results for multi-size apps,
    set server state to INACTIVE
- sched shmem (feeder): read quantile info from flat files,
    store in shared memory
- scheduler (score-based scheduling): for multi-size apps,
    add component to score function for size class.
- show_shmem: show result size class
- make_work (and other callers of count_unsent_results()):
    count both INACTIVE and UNSENT
- create_work: add --size_class cmdline option

Also:
- if get MySQL errors in upgrade, don't rewrite db_version
2013-04-25 00:27:35 -07:00
David Anderson 55f45c8d22 - scheduler: fixes for new score-based scheduling 2013-04-10 15:52:53 -07:00
David Anderson c9c9f2bae0 - scheduler: code shuffle; new file sched_check.cpp contains functions
that decide whether a job can be sent to a host
2013-04-09 12:19:00 -07:00
David Anderson 12319ca82b - scheduler: add code (commented out for now) for new implementation
of score-based scheduling.
2013-04-09 11:10:50 -07:00
David Anderson 4b826b52a0 - scheduler: fix bug in the "homogeneous app version" (HAV) feature
(reported by Kevin Reed).
    The problem: cache inconsistency.
    If there are 2 results for the same WU in shared mem,
    and 2 scheduler instances get them around the same time,
    they can send them with different app versions.
    We already fixed this problem for HR by
    1) rereading the relevant WU fields while deciding
        whether to send the result
    2) doing a "careful update" of the WU field using a where clause
        to make sure it wasn't modified in the (short) interval
        since rereading it.
    I fixed the HAV problem in the same way,
    and merged the two mechanisms to combine the DB queries.

    Also:
    - The rereads are done in slow_check() (see below).
    - The careful updates are done in update_wu_on_send(),
        and this is called *before* doing careful updates on result fields.
        That way, if the WU updates fail, we don't have orphaned results.
    - already_sent_to_different_platform_careful() (sic)
        no longer does DB stuff, so it's merged with
        already_send_to_different_hr_class() (better name)

    NOTE: slow_check() is used in array scheduling only.
        Score-based scheduling uses other code,
        in which this bug is not yet fixed.
        Locality scheduling doesn't support HR or HAV at all.
        This should be unified.


svn path=/trunk/boinc/; revision=24484
2011-10-26 07:15:22 +00:00
David Anderson 3b73c8dc0a - db_purge: make zip compression work (from Teemu Mannermaa)
- get rid of a few compile warnings


svn path=/trunk/boinc/; revision=23789
2011-07-01 02:12:11 +00:00
David Anderson 436415cfe1 - scheduler, back end: add "homogeneous app version" feature.
Lets you specify, on a per-app basis,
    that all instances should be done using the same app version.
    This is for validation in the presence of GPUs.
- scheduler: code cleanup
    - Instead of adding a bunch of non-DB fields to RESULT,
        used a derived class SCHED_DB_RESULT.
    - Instead of storing a pointer to BEST_APP_VERSION in RESULT,
        store the structure itself.
        This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
    on <file_xfer_debug>


svn path=/trunk/boinc/; revision=23636
2011-06-06 03:40:42 +00:00
David Anderson b169e5ab0f - server programs: print error message instead of numeric retval
in log messages

svn path=/trunk/boinc/; revision=22647
2010-11-08 17:51:57 +00:00
David Anderson 40c50852f5 - scheduler: fix logic that deals with jobs that need > 2GB RAM.
My change of 1 Oct ([22440]) required that such jobs
    be processed with 64-bit apps,
    on the assumption that 32-bit apps have a 2 GB user address space limit.
    However, it turns out this limit applies only to Windows
    (kernel and user mode share the 4GB address space; each gets half).
    On Linux, the split is 3GB user / 1 GB kernel.
    On Mac OS X, user mode and kernel mode have separate address spaces,
    each of them 4 GB.


svn path=/trunk/boinc/; revision=22599
2010-10-27 22:58:16 +00:00
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson 56a8296b5b - scheduler: compute no_jobs_available correctly
in the presence of multiple scheduling types
    (e.g., locality and job array)
    From Nils Brause

svn path=/trunk/boinc/; revision=19559
2009-11-12 21:30:33 +00:00
David Anderson b300519444 svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
David Anderson 77055d17e7 svn path=/trunk/boinc/; revision=18765 2009-07-29 18:34:27 +00:00
David Anderson 4c070e3bfb - scheduler: Gianni requested a feature where jobs have a
"min # of GPU processors" attribute (stored in batch)
    and are sent only to hosts whose GPUs have at least this #.

    The logical place for this is in the scoring function, JOB::get_score().
    I added a clause (#ifdef'd out) that does this.
    It rejects the WU if #procs is too small,
    otherwise it adds min/actual to the score.
    This favors sending jobs that need lots of procs to GPUs that have them.

svn path=/trunk/boinc/; revision=18764
2009-07-29 17:29:56 +00:00
David Anderson 2e5d9bd778 - scheduler: add new config option <max_wus_in_progress_gpus>.
The limit on jobs in progress is now
        max_wus_in_progress * NCPUS
        + max_wus_in_progress * NGPUS
    where NCPUS and NGPUS reflect prefs and are capped.
    Furthermore: if the client reports plan class for in-progress jobs
    (see checkin of 31 May 2009)
    then these limits are enforced separately;
    i.e. the # of in-progress CPU jobs is <= max_wus_in_progress*NCPUS,
    and the # of in-progress GPU jobs is <= max_wus_in_progress_gpu*NGPUS
- scheduler config: rename <cuda_multiplier> to <gpu_multiplier>
- scheduler: <max_wus_to_send> is now scaled by
    (NCPUS + gpu_multiplier*NGPUS)
- scheduler: don't keep scanning array if !work_needed()
- scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp
- scheduler: don't say "no work available" if jobs are available
    but work_needed() is initially false


svn path=/trunk/boinc/; revision=18255
2009-06-01 22:15:14 +00:00
David Anderson 84afd18450 - scheduler: move app-version selection and score-based scheduling
to new files.

svn path=/trunk/boinc/; revision=17630
2009-03-19 16:35:35 +00:00