Commit Graph

52 Commits

Author SHA1 Message Date
David Anderson e00b080b5e - scheduler: fix crashing bug when using HR. From Kevin Reed.
svn path=/trunk/boinc/; revision=24355
2011-10-08 08:16:24 +00:00
David Anderson a4cccec2cc - scheduler: revise [21428] to include non-anonymous-platform,
and change the ratio limit from 2 to 10.


svn path=/trunk/boinc/; revision=24217
2011-09-15 06:53:01 +00:00
David Anderson 249435f0d8 - scheduler: fix crashing bug
svn path=/trunk/boinc/; revision=24139
2011-09-07 17:37:50 +00:00
David Anderson 7c81d72378 - web: fix warnings in forum pages
- scheduler: when using elapsed time stats to predict runtime,
    cap the estimated FLOPS at twice the peak FLOPS;
    otherwise, if a host has received a lot of very short jobs
    recently, it will get a too-high FLOPS estimate and
    will exceed the rsc_fpops_bound limit.


svn path=/trunk/boinc/; revision=24128
2011-09-05 17:29:53 +00:00
David Anderson 1bf54d11ff - client: send all running jobs a "reread app info" message
when global prefs change
    (they might have to suspend or resume network activity)


svn path=/trunk/boinc/; revision=24084
2011-08-30 21:34:27 +00:00
David Anderson 0059d6bf78 - scheduler: don't send user a message when there is no
app version for their platform for a particular app.
    The may be versions for other apps which don't have jobs right now.
    TODO: send a message if there are no versions of ANY app
    for any platform.
- fix makefile indentation, caused manager to not be built


svn path=/trunk/boinc/; revision=24052
2011-08-27 02:54:39 +00:00
David Anderson 436415cfe1 - scheduler, back end: add "homogeneous app version" feature.
Lets you specify, on a per-app basis,
    that all instances should be done using the same app version.
    This is for validation in the presence of GPUs.
- scheduler: code cleanup
    - Instead of adding a bunch of non-DB fields to RESULT,
        used a derived class SCHED_DB_RESULT.
    - Instead of storing a pointer to BEST_APP_VERSION in RESULT,
        store the structure itself.
        This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
    on <file_xfer_debug>


svn path=/trunk/boinc/; revision=23636
2011-06-06 03:40:42 +00:00
David Anderson 86205059cd - scheduler: app version FLOPS estimates were wrong
in the case where we don't have enough elapsed-time stats
    for the host/app_version.
    The right formula is (peak FLOPS)/app_version.avg_pfc


svn path=/trunk/boinc/; revision=23634
2011-06-03 19:53:52 +00:00
David Anderson 8a4c3dccf3 - scheduler: if an in-progress limit is given in config_aux.xml,
and <per_proc> is not specified, default it to false.
- scheduler: add some log messages


svn path=/trunk/boinc/; revision=23555
2011-05-17 19:11:44 +00:00
David Anderson 597320db39 - scheduler: compile fixes
svn path=/trunk/boinc/; revision=23281
2011-03-25 22:47:49 +00:00
David Anderson 18f2e90929 - client: work fetch: if the chosen project is currently uploading a file,
and an upload started in the last 5 min, don't fetch work from it.
    The goal is to merge the 2 scheduler RPCs
    (fetch work, report completed taskS) into a single RPC.
    Note: this may result in idleness in some cases.
- scheduler: if client doesn't handle plan class (pre-5.10),
    check plan-class app versions anyway,
    but only use if it's a single-CPU app.
    This allows single-CPU app versions with specific requirements
    (like SSE) to be issued to old clients.
    From Bernd Machenschalk


svn path=/trunk/boinc/; revision=22841
2010-12-13 22:58:15 +00:00
David Anderson 864ee7e3a3 - scheduler: in some cases the system may have a too-low estimate
of the performance of an app version on a host.
    It will then stop using that app version,
    so the estimate never has a chance to converge to its correct value.
    Fix: multiply by a random factor (mean 1, stddev .1)
    when comparing the FLOPS estimates of app versions.

svn path=/trunk/boinc/; revision=22837
2010-12-09 00:32:50 +00:00
David Anderson f8e2d07cf9 - scheduler: add vbox32 and vbox64 plan classes for VirtualBox apps.
svn path=/trunk/boinc/; revision=22778
2010-11-30 19:36:07 +00:00
David Anderson 40c50852f5 - scheduler: fix logic that deals with jobs that need > 2GB RAM.
My change of 1 Oct ([22440]) required that such jobs
    be processed with 64-bit apps,
    on the assumption that 32-bit apps have a 2 GB user address space limit.
    However, it turns out this limit applies only to Windows
    (kernel and user mode share the 4GB address space; each gets half).
    On Linux, the split is 3GB user / 1 GB kernel.
    On Mac OS X, user mode and kernel mode have separate address spaces,
    each of them 4 GB.


svn path=/trunk/boinc/; revision=22599
2010-10-27 22:58:16 +00:00
David Anderson 7bd620e6b5 - scheduler: instead of "app is not available for your type of computer",
say "app is not available for Microsoft Windows (98 or later) running on an Intel x86-compatible CPU" (or whatever)


svn path=/trunk/boinc/; revision=22537
2010-10-15 20:25:51 +00:00
David Anderson be14996a1e - scheduler: deal correctly with jobs that need > 2GB RAM.
Such jobs fail on 32-bit machines, even if they have sufficient RAM,
    because 32-bit OSs don't support address spaces > 2GB.

    In general, we want to support the following scenario:
    - an app has a mixture of small (< 2GB) and big (> 2GB) jobs.
    - there are app versions for both 32b and 64b platforms
    - one of the 32b versions is faster than the 64b version
        (say, it's a 32b GPU app)

    Goals:
    If the client is 32b, send it only small jobs,
        using the fast 32b version if possible
    If the client is 64b and has sufficient RAM,
        send it large jobs using the 64b version;
        send it small jobs using the fast 32b version if possible,
        else the 64b version

    Solution: extend get_app_version() so that it detects big jobs,
        and uses only 64b versions for them.
        Add a "for_64b_jobs" field to BEST_APP_VERSION
        so that we maintain a separate memoized set of
        BEST_APP_VERSIONs for big jobs.

- client: don't set report_results_immediately inappropriately

svn path=/trunk/boinc/; revision=22440
2010-10-01 19:54:09 +00:00
David Anderson 18b5b46aab - scheduler: fix "prefer_primary_platform" logic (I hope).
svn path=/trunk/boinc/; revision=22332
2010-09-09 20:01:28 +00:00
David Anderson 84679f482a - scheduler: change the "primary_platform_only" config option
to "prefer_primary_platform".
    If an app has only only 32-bit versions, use the for 64-bit clients.


svn path=/trunk/boinc/; revision=22282
2010-08-22 19:13:25 +00:00
David Anderson d79ca6a9f2 - scheduler: add <primary_platform_only> config option:
send only 64-bit app versions to 64-bit hosts 
    (the default is to send whatever app version is fastest)

svn path=/trunk/boinc/; revision=22183
2010-08-10 22:17:59 +00:00
David Anderson 6b8a569d6d - client/scheduler: fix a group of bugs related to the new mechanism
where the client tells the scheduler which app versions
    its queued jobs use
    (this is needed, e.g., to enforce per-app or per-resource job limits).
    In this mechanism, the client sends an array of <app_version>s,
    and each <other_result> includes an index into this array.

    - The wrong index was being sent (client).
    - If an <app_version> had a non-existent app name
        (e.g. because that app had been deprecated)
        it wasn't getting put in the array, invalidating array indices
        Furthermore, an erroneous message was being sent to the user

        Fix: if parse error for <app_version>,
        put it in the array anyway, but with cav.app = NULL,
        meaning that it's a place-holder.
        Send a message to user only if anon platform.

- manager: increase notice buffers to 64K

svn path=/trunk/boinc/; revision=22052
2010-07-23 17:43:20 +00:00
David Anderson faab0991f7 - scheduler: fix and restore fpops scaling for anonymous platform jobs
svn path=/trunk/boinc/; revision=21962
2010-07-15 21:38:24 +00:00
David Anderson 55e0e86c90 - scheduler: make messages translatable
svn path=/trunk/boinc/; revision=21896
2010-07-13 02:49:35 +00:00
David Anderson e53e9710e8 - scheduler: make some "notice"-priority messages translatable
- scheduler: add a clause to wu_is_infeasible_custom() for SETI@home:
    don't process VLAR jobs using CUDA apps.
    Note: this is implemented in a slightly non-optimal way.
    If the request asks for both GPU and CPU jobs,
    the scheduler will first decide to use the GPU version.
    It will scan jobs, skipping over VLAR jobs.
    When the GPU request is satisfied, it will switch to the CPU version
    and continue scanning, accepting VLAR jobs.
    But the jobs that were skipped initially won't be rescanned.
    Also, it would be slightly nice to preferentially send
    VLAR jobs to hosts asking for CPU work.
    (This could be done in the scoring function).

svn path=/trunk/boinc/; revision=21895
2010-07-12 22:43:53 +00:00
David Anderson b1851ce02c - user web: PHP 5.3 compatibility fix, from Nicolas. Fixes #787
svn path=/trunk/boinc/; revision=21878
2010-07-06 23:31:26 +00:00
David Anderson d756994bda - scheduler and back end: message tweaks and fixes
svn path=/trunk/boinc/; revision=21835
2010-06-29 03:20:19 +00:00
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson 587a4cde3f - scheduler: msg tweaks
svn path=/trunk/boinc/; revision=21805
2010-06-24 22:58:05 +00:00
David Anderson ae7866b251 - scheduler: restore scaling of daily quota by # processors
and/or config.gpu_multiplier
- client: msg tweak

svn path=/trunk/boinc/; revision=21753
2010-06-15 22:21:57 +00:00
David Anderson f849faea5e - scheduler: bug fixes for jobs-in-progress limits
- client: msg tweak

svn path=/trunk/boinc/; revision=21692
2010-06-04 16:57:33 +00:00
David Anderson e80e54fd4d - user web: add "Application info" link in host page,
linking to new page showing host_app_versions for this host
- scheduler: message tweaks

svn path=/trunk/boinc/; revision=21690
2010-06-03 20:26:02 +00:00
David Anderson cf7fb29227 - scheduler: add fine-grained "max jobs in progress" control.
You can now specify limits for specific apps,
    and/or for the project as a whole.
    Within each of these, you can specify limits on
    CPU jobs, GPU jobs, or total jobs.
    In the case of CPU and GPU limits, you can specify
    whether the limit should be scaled by the number of devices.

    Note: the enforcement of this is done in get_app_version(),
    since per-resource-type limits may dictate what app versions
    we can use for a particular job.

svn path=/trunk/boinc/; revision=21674
2010-06-01 23:41:07 +00:00
David Anderson ca239d913a - scheduler: fix memory leak (free BEST_APP_VERSION objects)
svn path=/trunk/boinc/; revision=21597
2010-05-21 21:49:54 +00:00
David Anderson fa66519441 - scheduler: SETI@home's CUDA and CUDA 2.3 apps apparently don't
run on Fermi (compute capability 2) hardware.
    Temporary solution: change app_plan() accordingly
- scheduler: message tweaks

svn path=/trunk/boinc/; revision=21595
2010-05-20 22:49:00 +00:00
David Anderson 7a7cf4f5e7 - client, Unix: error checking in reading /proc entries.
Avoid garbage values e.g. of working_set_size
- scheduler: message tweaks

svn path=/trunk/boinc/; revision=21591
2010-05-20 17:50:00 +00:00
David Anderson 5470d7289a - scheduler: fix bug in daily job quota check
svn path=/trunk/boinc/; revision=21506
2010-05-13 16:45:27 +00:00
David Anderson 7688a6c5d6 - scheduler: fix for daily quota enforcement
svn path=/trunk/boinc/; revision=21495
2010-05-12 21:24:52 +00:00
David Anderson 63dcfabe0e - scheduler: changeset 21148 broke the scheduler.
We store pointers to BEST_APP_VERSION in both APP_VERSION and RESULT.
    We can't then fiddle with the vector that these point into.
    Switch back to using a vector of pointers.
    This restores the memory leak, which I'll deal with later.

svn path=/trunk/boinc/; revision=21494
2010-05-12 21:07:39 +00:00
David Anderson 021edb02c2 - back end programs: improve log msgs
svn path=/trunk/boinc/; revision=21193
2010-04-16 18:07:08 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
David Anderson 2e41153d8b - scheduler: fix egregious bug which limited sending to 1 job per RPC
- scheduler: fix bug that broke anon platform

Note: Bruce Allen once advised me to take a few days and just
observe BOINC in action.
I should really do this more often; it always turns up bugs
and/or design flaws.


svn path=/trunk/boinc/; revision=21165
2010-04-11 04:42:52 +00:00
David Anderson e05a479f42 - scheduler and validator: distinguish between
1) peak FLOPS (based on benchmarks or GPU attributes).
        This does not change over time.
        It's not adjusted on the basis of statistics.
        It's not affected by wu.rsc_fpops_est.
        It can be compared across projects.
    versus
    2) projected FLOPS: the scheduler's best guess as to what will satisfy
        X * elapsed_time = wu.rsc_fpops_est;
        this is used to make server-side runtime estimates,
        and it's sent to the client and used for its runtime estimates.
        It may be based on the (host, app version) elapsed time average.
    My checkin [21153] mistakently confounded these two.

    Notes:
    1) app_plan() now must return both peak and projected FLOPS.
    2) result.flops_estimate stores peak FLOPS
    3) the <flops> field in app_info.xml files should be
        projected FLOPS.  But its accuracy is not important;
        it's not used once the server has statistics
        for the (host, app version)

svn path=/trunk/boinc/; revision=21164
2010-04-10 05:49:51 +00:00
David Anderson 1d765245ed - scheduler: sweeping changes to the way job runtimes are estimated:
see http://boinc.berkeley.edu/trac/wiki/RuntimeEstimation


svn path=/trunk/boinc/; revision=21153
2010-04-08 23:14:47 +00:00
David Anderson 85e06afe4b - scheduler: app_plan() no longer has to guess how efficiently
an app version will run on a particular host.
- scheduler: fix memory leak: BEST_APP_VERSIONs weren't being freed


svn path=/trunk/boinc/; revision=21148
2010-04-08 18:27:27 +00:00
David Anderson 71c7e7a74b - client/scheduler/web: add per-project preferences for whether
to accept CPU, NVIDIA and ATI jobs.
    These prefs are shown only where relevant:
    e.g., only for processor types for which the project has app versions,
    and if it has versions for only one type, no pref is shown.

    These prefs affect both client and scheduler.
    The client won't ask for work for a device blocked by prefs,
    and the scheduler won't send it.

    This replaces earlier optional project-specific prefs for
    "no CPU jobs" and "no GPU jobs".
    (However, these prefs continue to be honored on the server side).

- client: if NVIDIA driver is unknown, say that rather than 0


svn path=/trunk/boinc/; revision=19194
2009-09-28 04:24:18 +00:00
David Anderson eafb410cf8 - scheduler: simplify and fix the way that app_plan() conveys messages
to the user.  app_plan() now generates the messages directly
    rather than returning integer error codes.

svn path=/trunk/boinc/; revision=18899
2009-08-21 20:38:39 +00:00
David Anderson 9e9f2a9878 - scheduler: code cleanup
svn path=/trunk/boinc/; revision=18896
2009-08-21 19:14:15 +00:00
David Anderson 7278ab1787 - scheduler: add support for ATI GPUs
svn path=/trunk/boinc/; revision=18851
2009-08-17 17:07:38 +00:00
David Anderson b300519444 svn path=/trunk/boinc/; revision=18825 2009-08-10 04:49:02 +00:00
David Anderson f163897d8a - scheduler: add plan class for CUDA 2.3
svn path=/trunk/boinc/; revision=18804
2009-08-03 21:30:19 +00:00
David Anderson e3363c7eb8 - scheduler: on second thought, it would be better to add the above
feature without requiring use of score-based scheduling.
    So add a new customizable function, wu_is_infeasible_custom(),
    where projects can put job-specific checks.

    Also, move customizable functions (of which there are now 4)
    to a new file, sched_customize.cpp.

svn path=/trunk/boinc/; revision=18767
2009-07-29 18:55:50 +00:00