Commit Graph

37 Commits

Author SHA1 Message Date
David Anderson 5867396956 scheduler: fix bug in daily (host, app version) job limit.
I was doing the check on each call to get_app_version().
But it turns out this gets called during the job scan,
before calling add_result_to_reply(),
which is where n_jobs_today gets incremented.

Solution:move the check to the loop (in sched_score.cpp)
where add_result_reply() is called.

Also: fix bad logic in work_needed() where a global check
(on job limit in user project prefs)
was being done inside a loop over resources.
2019-04-09 22:58:18 -07:00
David Anderson 6144b0aaee scheduler: fix crashing bug in keyword scheduling code 2017-07-27 23:36:32 -07:00
David Anderson 20d07be2b8 back end: add keyword-based component to job scheduling score.
- add DB field for storing job keywords: workunit.keywords
    add this to various DB parse/write functions
- add --keywords option to create_work for specifying job keywords
- add <keyword_sched> option in config.xml for enabling keyword score
    (it's disabled by default).
    If set, increment score for "yes" keyword matches,
    and disallow jobs with "no" matches
- in scheduler, add array job_keywords_array for parsed versions
    of job keywords (vector<int>)

also:
- use symbols instead of numbers for slow_check() return values
- parse unused fields in req message to remove unparsed-XML warnings
2017-07-22 00:48:38 -07:00
David Anderson 5c573ddae8 scheduler: use "allow_non_preferred_apps" to match PHP code
Fix bug introduced in 671446c
2017-06-27 23:45:53 -07:00
David Anderson 671446ced1 code cleanup: in scheduler, factor project prefs into their own struct 2016-07-27 15:15:08 -07:00
David Anderson 24b52bd4a9 scheduler: add result priority to job score 2015-10-25 23:34:45 -07:00
David Anderson 8cd8c8e7ee server software: handle 64-bit database IDs
The SETI@home result table is about to run out of 32-bit IDs,
so we need to move to 64-bit result IDs.
This will happen to the workunit table at some point too.

I changed the server C++ code to use the "long" type for all DB IDs
(and to use appropriate conversion codes like %lu).
"long" is 64 bit on 64-bit machines.
For uniformity I did this for all tables,
even ones (like app) that will never get big.

I chose NOT to change the DB schema for now.
The new code will work with 32-bit ID fields in the DB.
As projects approach the 32-bit limit on a table they can change
its ID field, and fields that reference this table, to BIGINT.
This is likely to happen only on the result and workunit tables.
I put functions in html/ops/db_update.php
to change the IDs of these tables.
2015-07-23 10:11:08 -07:00
David Anderson c2a0421074 scheduler: add support for miner_asic coprocessor type
I.e. treat miner ASICs as a distinct processor type;
send miner_asic jobs only if the client requests them.

Note: I was planning to do this in a more general way,
in which the scheduler wouldn't have a hard-wired list of processor types.
However, that would be a large code change,
so for now I just added miner_asic to the list of processor types
(nvidia, ati, intel_gpu),
and made various changes to get things to work.

Also: in the job dispatch logic, try to send coproc jobs
before CPU jobs.
That way if e.g. there's a limit on jobs in progress,
we'll preferentially send coproc jobs.
2014-09-21 21:08:09 -07:00
David Anderson 7a84dabbbc scheduler: fix bug that broke enforcement of per-app job limits
The old scheduler worked as follows:
    scan jobs; for each job
        get_app_version
        do various checks
        add_job_to_reply
The check for per-app job limits was in get_app_version,
and the incrementing of per-app job count is in add_job_to_reply.

The new (score-based) scheduler works as follows:
    scan jobs; for each job
        get_app_version
        add to list
    sort list by score
    scan list; for each job
        do various checks
        add_job_to_reply

So the limit check (in get_app_version) was ineffective
because it happens before we've incremented counts.

Fix: do the limit check (also) in the "scan list" loop.

Bigger picture: we need to restructure app version selection;
job limit enforcement doesn't belong there any more.
2014-09-16 11:07:36 -07:00
David Anderson 81faa12ff3 scheduler: make matchmaker scheduling the default
Remove <matchmaker> config option; add <sched_old> option.
2014-07-08 12:35:45 -07:00
David Anderson 5fd1656935 scheduler: fix bug in score-based scheduler that cause last job to not be sent
This is the problem where the scheduler marks one job with its PID
to reserve it.
It needs to be able to send this job.
I fixed this recently in old scheduler, neglected to fix in score-based as well
2014-06-13 00:30:03 -07:00
David Anderson 56934f8fbe scheduler: clean up job dispatch logging
There are now 3 flags for job dispatch logging:
<debug_send/>: info about work request, jobs sent, other high-level stuff
<debug_send_scan/>: info about scans through job cache
<debug_send_job/>: info about individual jobs (e.g. reason for not sending)
2014-06-12 11:33:11 -07:00
David Anderson 52bd196c4f scheduler: fix bugs in sending non-compute-intensive jobs 2014-05-26 21:07:07 -07:00
David Anderson 641099040e scheduler: add log message for beta-test pref, score-based case 2014-05-12 10:21:28 -07:00
David Anderson 6216673eca web: fix missing mysqli change 2014-03-22 09:04:58 -07:00
David Anderson 5381def663 server: use gpu_active_frac in scheduling decisions
On some hosts, gpu_active_frac may be much less than active_frac
(i.e., GPUs may be available much less than CPUs).
Use gpu_active_frac in the following places:

- scheduler: in estimating the elapsed time of jobs,
    to decide whether they can meet deadline
- scheduler: in computing the effective speed of a (host, app version),
    when deciding what size class it belongs to
- size_census: in computing effective speed of (host, app versions)

(Previously, we were just using active_frac in all these cases)
2014-03-06 21:23:02 -08:00
David Anderson 6d4999767f example app: print "starting" message after boinc_init, so that it appears in stdferr file
Also remove old score-based sched code
2013-12-10 14:00:31 -08:00
David Anderson 1c31f6feaa Condor: fix bug when 2 input files have same contents; fix error messages 2013-08-09 16:06:36 -07:00
Eric J Korpela 244ba5bc85 SCHED: modified scheduled log output to use unsigned format for WU and RESULT
ids.  This allows IDs greater than 2^31 to be printed.
2013-06-19 10:15:08 -07:00
David Anderson bf96878fe8 scheduler: comparison function for score-based scheduling was backwards 2013-05-22 10:20:54 -07:00
David Anderson fa43e0fe6e scheduler: enforce job limits in score-based scheduling 2013-05-21 20:14:39 -07:00
David Anderson 0c430ce1fa Add support for multi-size apps
See http://boinc.berkeley.edu/trac/wiki/MultiSize
The components of this include:
- DB changes:
    add size_class to workunit and result
    n_size_classes to app; >1 means multi-size
- size_regulator daemon program: change results states
    from INACTIVE to UNSENT carefully
- size_census program; writes quantile info in flat files
- transitioner: when creating results for multi-size apps,
    set server state to INACTIVE
- sched shmem (feeder): read quantile info from flat files,
    store in shared memory
- scheduler (score-based scheduling): for multi-size apps,
    add component to score function for size class.
- show_shmem: show result size class
- make_work (and other callers of count_unsent_results()):
    count both INACTIVE and UNSENT
- create_work: add --size_class cmdline option

Also:
- if get MySQL errors in upgrade, don't rewrite db_version
2013-04-25 00:27:35 -07:00
David Anderson 55f45c8d22 - scheduler: fixes for new score-based scheduling 2013-04-10 15:52:53 -07:00
David Anderson c9c9f2bae0 - scheduler: code shuffle; new file sched_check.cpp contains functions
that decide whether a job can be sent to a host
2013-04-09 12:19:00 -07:00
David Anderson 12319ca82b - scheduler: add code (commented out for now) for new implementation
of score-based scheduling.
2013-04-09 11:10:50 -07:00
David Anderson 4b826b52a0 - scheduler: fix bug in the "homogeneous app version" (HAV) feature
(reported by Kevin Reed).
    The problem: cache inconsistency.
    If there are 2 results for the same WU in shared mem,
    and 2 scheduler instances get them around the same time,
    they can send them with different app versions.
    We already fixed this problem for HR by
    1) rereading the relevant WU fields while deciding
        whether to send the result
    2) doing a "careful update" of the WU field using a where clause
        to make sure it wasn't modified in the (short) interval
        since rereading it.
    I fixed the HAV problem in the same way,
    and merged the two mechanisms to combine the DB queries.

    Also:
    - The rereads are done in slow_check() (see below).
    - The careful updates are done in update_wu_on_send(),
        and this is called *before* doing careful updates on result fields.
        That way, if the WU updates fail, we don't have orphaned results.
    - already_sent_to_different_platform_careful() (sic)
        no longer does DB stuff, so it's merged with
        already_send_to_different_hr_class() (better name)

    NOTE: slow_check() is used in array scheduling only.
        Score-based scheduling uses other code,
        in which this bug is not yet fixed.
        Locality scheduling doesn't support HR or HAV at all.
        This should be unified.


svn path=/trunk/boinc/; revision=24484
2011-10-26 07:15:22 +00:00
David Anderson 3b73c8dc0a - db_purge: make zip compression work (from Teemu Mannermaa)
- get rid of a few compile warnings


svn path=/trunk/boinc/; revision=23789
2011-07-01 02:12:11 +00:00
David Anderson 436415cfe1 - scheduler, back end: add "homogeneous app version" feature.
Lets you specify, on a per-app basis,
    that all instances should be done using the same app version.
    This is for validation in the presence of GPUs.
- scheduler: code cleanup
    - Instead of adding a bunch of non-DB fields to RESULT,
        used a derived class SCHED_DB_RESULT.
    - Instead of storing a pointer to BEST_APP_VERSION in RESULT,
        store the structure itself.
        This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
    on <file_xfer_debug>


svn path=/trunk/boinc/; revision=23636
2011-06-06 03:40:42 +00:00
David Anderson b169e5ab0f - server programs: print error message instead of numeric retval
in log messages

svn path=/trunk/boinc/; revision=22647
2010-11-08 17:51:57 +00:00
David Anderson 40c50852f5 - scheduler: fix logic that deals with jobs that need > 2GB RAM.
My change of 1 Oct ([22440]) required that such jobs
    be processed with 64-bit apps,
    on the assumption that 32-bit apps have a 2 GB user address space limit.
    However, it turns out this limit applies only to Windows
    (kernel and user mode share the 4GB address space; each gets half).
    On Linux, the split is 3GB user / 1 GB kernel.
    On Mac OS X, user mode and kernel mode have separate address spaces,
    each of them 4 GB.


svn path=/trunk/boinc/; revision=22599
2010-10-27 22:58:16 +00:00
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson 56a8296b5b - scheduler: compute no_jobs_available correctly
in the presence of multiple scheduling types
    (e.g., locality and job array)
    From Nils Brause

svn path=/trunk/boinc/; revision=19559
2009-11-12 21:30:33 +00:00
David Anderson b300519444 svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
David Anderson 77055d17e7 svn path=/trunk/boinc/; revision=18765 2009-07-29 18:34:27 +00:00
David Anderson 4c070e3bfb - scheduler: Gianni requested a feature where jobs have a
"min # of GPU processors" attribute (stored in batch)
    and are sent only to hosts whose GPUs have at least this #.

    The logical place for this is in the scoring function, JOB::get_score().
    I added a clause (#ifdef'd out) that does this.
    It rejects the WU if #procs is too small,
    otherwise it adds min/actual to the score.
    This favors sending jobs that need lots of procs to GPUs that have them.

svn path=/trunk/boinc/; revision=18764
2009-07-29 17:29:56 +00:00
David Anderson 2e5d9bd778 - scheduler: add new config option <max_wus_in_progress_gpus>.
The limit on jobs in progress is now
        max_wus_in_progress * NCPUS
        + max_wus_in_progress * NGPUS
    where NCPUS and NGPUS reflect prefs and are capped.
    Furthermore: if the client reports plan class for in-progress jobs
    (see checkin of 31 May 2009)
    then these limits are enforced separately;
    i.e. the # of in-progress CPU jobs is <= max_wus_in_progress*NCPUS,
    and the # of in-progress GPU jobs is <= max_wus_in_progress_gpu*NGPUS
- scheduler config: rename <cuda_multiplier> to <gpu_multiplier>
- scheduler: <max_wus_to_send> is now scaled by
    (NCPUS + gpu_multiplier*NGPUS)
- scheduler: don't keep scanning array if !work_needed()
- scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp
- scheduler: don't say "no work available" if jobs are available
    but work_needed() is initially false


svn path=/trunk/boinc/; revision=18255
2009-06-01 22:15:14 +00:00
David Anderson 84afd18450 - scheduler: move app-version selection and score-based scheduling
to new files.

svn path=/trunk/boinc/; revision=17630
2009-03-19 16:35:35 +00:00