Commit Graph

137 Commits

Author SHA1 Message Date
David Anderson 3b73c8dc0a - db_purge: make zip compression work (from Teemu Mannermaa)
- get rid of a few compile warnings


svn path=/trunk/boinc/; revision=23789
2011-07-01 02:12:11 +00:00
David Anderson 9cfb88c3ea - scheduler: when creating HOST_APP_VERSION records,
initialize the n_jobs_today field correctly


svn path=/trunk/boinc/; revision=23637
2011-06-06 04:10:59 +00:00
David Anderson 436415cfe1 - scheduler, back end: add "homogeneous app version" feature.
Lets you specify, on a per-app basis,
    that all instances should be done using the same app version.
    This is for validation in the presence of GPUs.
- scheduler: code cleanup
    - Instead of adding a bunch of non-DB fields to RESULT,
        used a derived class SCHED_DB_RESULT.
    - Instead of storing a pointer to BEST_APP_VERSION in RESULT,
        store the structure itself.
        This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
    on <file_xfer_debug>


svn path=/trunk/boinc/; revision=23636
2011-06-06 03:40:42 +00:00
David Anderson 6d1133fb1d - scheduler: add <user_filter> config option.
If set, and a WU has nonzero batch,
    it is interpreted as a user ID,
    and the job will be sent only to hosts with that user ID.

    Note: the use of workunit.batch is arbitrary;
    we could also use workunit.opaque or other deprecated field.


svn path=/trunk/boinc/; revision=23556
2011-05-17 21:11:39 +00:00
David Anderson 8a4c3dccf3 - scheduler: if an in-progress limit is given in config_aux.xml,
and <per_proc> is not specified, default it to false.
- scheduler: add some log messages


svn path=/trunk/boinc/; revision=23555
2011-05-17 19:11:44 +00:00
David Anderson a2fc8edcae - scheduler: per-processor limits should be based on
"effective" # of processors (taking prefs into account)
    rather than the physical number


svn path=/trunk/boinc/; revision=23547
2011-05-13 22:04:10 +00:00
David Anderson 597320db39 - scheduler: compile fixes
svn path=/trunk/boinc/; revision=23281
2011-03-25 22:47:49 +00:00
David Anderson e480cef000 - scheduler: if we're not sending jobs because of user prefs
(no CPU, no GPU, selected apps)
    send a message, not a notice.
    Assume the user knew what they were doing,
    and doesn't want to be nagged.
- scheduler: check for the existence of an app version
    before checking for user selected-app prefs.
    This prevents sending "no jobs available for selected apps"
    message when no app versions exist for non-selected apps
- scheduler: use "tasks" instead of "work" in user messages


svn path=/trunk/boinc/; revision=23168
2011-03-04 19:40:59 +00:00
David Anderson 3b05dc6203 - scheduler: fix a problem with job resend.
When we first send a job, we pick an app version,
    then call wu_is_infeasible_fast()
    to see if the host is able to run the job with that app version.
    In addition to checking disk space etc.
    this calls wu_is_infeasible_custom() to do project-specific checks
    (e.g. for SETI@home: don't use GPUs for VLAR jobs).

    However, when we resend a job, we pick an app version
    (possibly different from the original one)
    and send the job without any checking.
    So, for example, we might send a VLAR job to a GPU,
    or send a job to a host with insufficient disk space
    (because free space has changed since original send).

    Solution: call wu_is_infeasible_fast() before resending a job,
    and if it returns true, mark the job as done and don't resend it.


svn path=/trunk/boinc/; revision=23098
2011-02-24 19:30:43 +00:00
David Anderson 5421335dbb - transitioner: fix bug that could cause file deletion to not be done
for some WUs
- back end: fix the way "report grace period" is implemented
    old: result.report_deadline (i.e. what's in the DB) and
        the deadline sent to the client are the same.
        Some confusing and incorrect logic in the transitioner
        tries to provide the desired semantics.
    new: result.report_deadline is the deadline sent to the client,
        plus the grace period.
        No logic in the transitioner is needed.


svn path=/trunk/boinc/; revision=23040
2011-02-15 22:07:14 +00:00
David Anderson b169e5ab0f - server programs: print error message instead of numeric retval
in log messages

svn path=/trunk/boinc/; revision=22647
2010-11-08 17:51:57 +00:00
David Anderson 7d3d8adc73 - client: comment out update_rec() call
svn path=/trunk/boinc/; revision=22631
2010-11-05 18:02:34 +00:00
David Anderson ef472e3df7 - client simulator: model the scheduler's deadline check mechanism
- scheduler: improve the deadline check mechanism slightly.
    When updating "estimated delay" (a rough measure of how long
    a resource is saturated with high-priority work)
    take into account the # of instances used by the job,
    and the # of total instances


svn path=/trunk/boinc/; revision=22612
2010-11-01 16:53:41 +00:00
David Anderson 3de5a1d410 - client: remove spurious msgs about deleting files while in use
- scheduler: add log messages related to max jobs in progress

svn path=/trunk/boinc/; revision=22342
2010-09-13 21:20:30 +00:00
David Anderson 6df96d612a - boinc_cmd: don't crash if can't connect to local client
svn path=/trunk/boinc/; revision=22338
2010-09-12 01:10:39 +00:00
David Anderson 23de5a887f - client/scheduler: tweak translatable messages
svn path=/trunk/boinc/; revision=22129
2010-08-04 18:41:24 +00:00
David Anderson 6b8a569d6d - client/scheduler: fix a group of bugs related to the new mechanism
where the client tells the scheduler which app versions
    its queued jobs use
    (this is needed, e.g., to enforce per-app or per-resource job limits).
    In this mechanism, the client sends an array of <app_version>s,
    and each <other_result> includes an index into this array.

    - The wrong index was being sent (client).
    - If an <app_version> had a non-existent app name
        (e.g. because that app had been deprecated)
        it wasn't getting put in the array, invalidating array indices
        Furthermore, an erroneous message was being sent to the user

        Fix: if parse error for <app_version>,
        put it in the array anyway, but with cav.app = NULL,
        meaning that it's a place-holder.
        Send a message to user only if anon platform.

- manager: increase notice buffers to 64K

svn path=/trunk/boinc/; revision=22052
2010-07-23 17:43:20 +00:00
David Anderson c0776ea188 - user web: put RSS item titles in CDATA
- sched: get rid of unused config items
- manager: msg tweak

svn path=/trunk/boinc/; revision=22045
2010-07-22 22:57:15 +00:00
David Anderson faab0991f7 - scheduler: fix and restore fpops scaling for anonymous platform jobs
svn path=/trunk/boinc/; revision=21962
2010-07-15 21:38:24 +00:00
David Anderson 55e0e86c90 - scheduler: make messages translatable
svn path=/trunk/boinc/; revision=21896
2010-07-13 02:49:35 +00:00
David Anderson e53e9710e8 - scheduler: make some "notice"-priority messages translatable
- scheduler: add a clause to wu_is_infeasible_custom() for SETI@home:
    don't process VLAR jobs using CUDA apps.
    Note: this is implemented in a slightly non-optimal way.
    If the request asks for both GPU and CPU jobs,
    the scheduler will first decide to use the GPU version.
    It will scan jobs, skipping over VLAR jobs.
    When the GPU request is satisfied, it will switch to the CPU version
    and continue scanning, accepting VLAR jobs.
    But the jobs that were skipped initially won't be rescanned.
    Also, it would be slightly nice to preferentially send
    VLAR jobs to hosts asking for CPU work.
    (This could be done in the scoring function).

svn path=/trunk/boinc/; revision=21895
2010-07-12 22:43:53 +00:00
David Anderson 7e121f35bf - fix gcc 4 compiler warnings
svn path=/trunk/boinc/; revision=21882
2010-07-08 18:02:07 +00:00
David Anderson 114f4f15cf - scheduler and client: use "notice" rather than "high" priority
for messages intended as notices.
    This will avoid showing lots of obscure stuff as notices
    for projects with old server code.

svn path=/trunk/boinc/; revision=21836
2010-06-29 03:23:13 +00:00
David Anderson d756994bda - scheduler and back end: message tweaks and fixes
svn path=/trunk/boinc/; revision=21835
2010-06-29 03:20:19 +00:00
David Anderson f7ce13cdd4 - scheduler: host_app_version.n_jobs_today was being cleared
only if the previous request was on a different day
    AND the current request asks for work.
    Sometimes it wasn't getting cleared when it should have.

svn path=/trunk/boinc/; revision=21824
2010-06-25 22:00:09 +00:00
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson cae5d08407 - scheduler: the way rsc_fpops_est and rsc_fpops_bound were
being scaled for anon platform was messed up.
    Turn off this scaling until I can figure out the problem.

svn path=/trunk/boinc/; revision=21806
2010-06-24 23:24:51 +00:00
David Anderson 587a4cde3f - scheduler: msg tweaks
svn path=/trunk/boinc/; revision=21805
2010-06-24 22:58:05 +00:00
David Anderson 81973a9fff - scheduler: fix structural problems with sending user messages.
Old: various redundant and/or misleading messages were sent.
    New:
        - if host w/ no GPU contacts a GPU-only project,
            send high-pri message saying they need a GPU
        - if host w/ GPU has driver too old for all versions,
            send high-pri message saying to update driver
        - if host w/ GPU has driver too old for some versions,
            send low-pri message saying to update driver
        - if host has GPU but too little RAM for any app,
            send low-pri message saying so
- scheduler: revamp GPU plan class functions

svn path=/trunk/boinc/; revision=21760
2010-06-16 22:07:19 +00:00
David Anderson 5a28b5672e - client: user-visible text (message body, notice title and description)
are now translatable,
    using the convention that any substring enclosed in _(" ... ")
    should be passed throughh wxGetTranslation() or the equivalent.
- client: when writing messages to stdout, strip out _(...)
- manager: translate strings from client
- scheduler: message tweaks

svn path=/trunk/boinc/; revision=21706
2010-06-08 18:56:53 +00:00
David Anderson db667b77e8 - scheduler: fix anon platform bug that caused zero FPOPS est
svn path=/trunk/boinc/; revision=21689
2010-06-03 19:24:57 +00:00
David Anderson 356327d88c - scheduler: change backoff policy if a host has reached daily job quota.
Old: back off until random time in 1st hour of next day
    New: no server-dictated backoff; rely on client backoff
    This is needed to let hosts recover in a reasonable amount of time
    after a burst of errors.
- scheduler config: it turns out we can't put arbitrary XML in config.xml;
    The Python code is set up to parse only 1 level of tags (??),
    and I'm not up to the task of changing this.
    So the fine-grained job limit feature [21674] needs to use
    a different file, namely config_aux.xml

svn path=/trunk/boinc/; revision=21686
2010-06-03 04:59:27 +00:00
David Anderson 89fab4ece5 - back end: change "daily result quota" mechanism.
Old: config.xml specifies an initial daily quota (say, 100).
        Each host_app_version starts out with this quota.
        On the return of a SUCCESS result,
        the quota is doubled, up to the initial value.
        On the return of an error result, or a timeout,
        the quota is decremented down to 1.
    Problem:
        Doesn't accommodate hosts that can do more than 100 jobs/day.
    New: similar, but
        - on validation of a job, daily quota is incremented.
        - on invalidation of a job, daily quota is decremented.
        - on return of an error result, or a timeout,
            daily quota is min'd with initial quota, then decremented.
    Notes:
        - This allows a host to have an unboundedly large quota
            as long as it continues to return more valid
            than invalid results.
        - Even with this change, hosts that return SUCCESS but
            invalid results will continue to get the initial daily quota.
            It would be desirable to reduce their quota to 1.

svn path=/trunk/boinc/; revision=21675
2010-06-02 00:11:01 +00:00
David Anderson cf7fb29227 - scheduler: add fine-grained "max jobs in progress" control.
You can now specify limits for specific apps,
    and/or for the project as a whole.
    Within each of these, you can specify limits on
    CPU jobs, GPU jobs, or total jobs.
    In the case of CPU and GPU limits, you can specify
    whether the limit should be scaled by the number of devices.

    Note: the enforcement of this is done in get_app_version(),
    since per-resource-type limits may dictate what app versions
    we can use for a particular job.

svn path=/trunk/boinc/; revision=21674
2010-06-01 23:41:07 +00:00
David Anderson 64def3d588 - scheduler: fix bug that caused resent jobs with anonymous platform
to have zero FPOPS est and bound

svn path=/trunk/boinc/; revision=21671
2010-06-01 19:56:54 +00:00
David Anderson 2b33429f18 - scheduler: fix bug in single-replication decision (from Rytis)
svn path=/trunk/boinc/; revision=21576
2010-05-18 22:32:05 +00:00
David Anderson 40eebe00af - client/scheduler: in COPROCS, instead of having a vector of
pointers to dynamically allocated COPROC-derived objects,
    just have the objects themselves.
    Dynamic allocation should be avoided at all costs.

svn path=/trunk/boinc/; revision=21564
2010-05-18 19:22:34 +00:00
David Anderson 6fbfee024b - client: day boundary for "transfer at most X in N days"
is midnight local time, not UTC
- update translation templates

svn path=/trunk/boinc/; revision=21362
2010-05-03 17:20:44 +00:00
David Anderson 5035007b90 - back end: new way of deciding:
- whether host is "reliable" for an app version
    - whether host is eligible for single replication for an app version
    - whether to use host scaling
    In each case, the answer is yes if the number of
    consecutive valid results is above a threshold.
    This replaces existing "error rate" and "scale probation" mechanisms.

    TODO: the # of consecutive valid results should also determine
        a limit on jobs in progress for an app version.
        Namely, if N is the threshold for host scaling, the limit should be
            ndevices*(max(1, consecutive_valid - N))
        The client currently doesn't supply enough
        app version info to do this.
        It could be approximated; that would give some protection
        against cherry-picking.
- credit: more conservative formulas for combining claimed credit
    among replicas.
    If there are normal replicas, we use a "low average"
    that weights each sample by the sum of the other samples.
    Otherwise we use the min (not the average) of the approximate samples.

NOTE: a DB update is required


svn path=/trunk/boinc/; revision=21230
2010-04-21 19:33:20 +00:00
David Anderson 021edb02c2 - back end programs: improve log msgs
svn path=/trunk/boinc/; revision=21193
2010-04-16 18:07:08 +00:00
David Anderson 02717af2f3 - bug fixes
svn path=/trunk/boinc/; revision=21187
2010-04-15 21:58:44 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
David Anderson 2e41153d8b - scheduler: fix egregious bug which limited sending to 1 job per RPC
- scheduler: fix bug that broke anon platform

Note: Bruce Allen once advised me to take a few days and just
observe BOINC in action.
I should really do this more often; it always turns up bugs
and/or design flaws.


svn path=/trunk/boinc/; revision=21165
2010-04-11 04:42:52 +00:00
David Anderson e05a479f42 - scheduler and validator: distinguish between
1) peak FLOPS (based on benchmarks or GPU attributes).
        This does not change over time.
        It's not adjusted on the basis of statistics.
        It's not affected by wu.rsc_fpops_est.
        It can be compared across projects.
    versus
    2) projected FLOPS: the scheduler's best guess as to what will satisfy
        X * elapsed_time = wu.rsc_fpops_est;
        this is used to make server-side runtime estimates,
        and it's sent to the client and used for its runtime estimates.
        It may be based on the (host, app version) elapsed time average.
    My checkin [21153] mistakently confounded these two.

    Notes:
    1) app_plan() now must return both peak and projected FLOPS.
    2) result.flops_estimate stores peak FLOPS
    3) the <flops> field in app_info.xml files should be
        projected FLOPS.  But its accuracy is not important;
        it's not used once the server has statistics
        for the (host, app version)

svn path=/trunk/boinc/; revision=21164
2010-04-10 05:49:51 +00:00
David Anderson 1d765245ed - scheduler: sweeping changes to the way job runtimes are estimated:
see http://boinc.berkeley.edu/trac/wiki/RuntimeEstimation


svn path=/trunk/boinc/; revision=21153
2010-04-08 23:14:47 +00:00
David Anderson 85e06afe4b - scheduler: app_plan() no longer has to guess how efficiently
an app version will run on a particular host.
- scheduler: fix memory leak: BEST_APP_VERSIONs weren't being freed


svn path=/trunk/boinc/; revision=21148
2010-04-08 18:27:27 +00:00
David Anderson 4462fe534b - client: don't do RSS fetch if network suspended
svn path=/trunk/boinc/; revision=21123
2010-04-06 20:32:02 +00:00
David Anderson a2a661993b - validator: -d 4 means -d 3 plus print all DB queries
(todo: do this for all daemons)
- validator: change cmdline args from -foo to --foo
    (todo: do this for all daemons)
- validator: pass max_granted_credit to assign_credit_set()

svn path=/trunk/boinc/; revision=21093
2010-04-05 18:59:16 +00:00
David Anderson fb851311e0 - server: various changes;
see http://boinc.berkeley.edu/trac/wiki/CreditNew

    Projects will need to update DB and recompile all back-end programs.

    Summary:
    - new way of computing credit
    - "reliable host" mechanism is per app version
    - "host punishment" mechanism is per app version
    - adjustment of wu.rsc_fpops_est provides the
        equivalent of per app version DCF
    - max jobs in progress is now per app
    - max jobs per RPC is now per app

    TODO:
    - reliable mechanism:
        - populate and use host_app_version.error_rate
        - populate host_app_version.turnaround
    - host punishment:
        - populate host_app_version.max_jobs_per_day
        - populate host_app_version.n_jobs_today
        - use app.max_jobs_per_day_init
    - job limits:
        - use app.max_jobs_in_progress, max_gpu_jobs_in_progress
        - use app.max_jobs_per_rpc
    - adjust wu.rsc_fpops_est
    - remove old credit stuff
        fpops_cumulative, credit_multiplier
        credit computation in scheduler

- AVERAGE class: use the Knuth algorithm (Wikipedia)


svn path=/trunk/boinc/; revision=21021
2010-03-29 22:28:20 +00:00
David Anderson 0c1a1421f8 - scheduler/feeder: if any client version number field
(min_core_version etc.) is < 10000,
    multiply it by 100 and print a warning.

svn path=/trunk/boinc/; revision=20187
2010-01-18 04:52:58 +00:00