Commit Graph

33 Commits

Author SHA1 Message Date
David Anderson 58dadd91a8 - client, acct manager protocol:
allow <no_cpu>, <no_cuda> and <no_ati> bools
    within <account> in reply message.
    They suppress work fetch for that resource type from that project.
- scheduler:
    check max_granted_credit after wu.rsc_fpops_bound,
    so that max_granted_credit will be enforced
    even if wu.rsc_fpops_bound is absurdly high
    Fixes #1034.  From Diggory Hardy.


svn path=/trunk/boinc/; revision=22793
2010-12-02 04:53:12 +00:00
David Anderson b169e5ab0f - server programs: print error message instead of numeric retval
in log messages

svn path=/trunk/boinc/; revision=22647
2010-11-08 17:51:57 +00:00
David Anderson 8aa29bec33 - validator: fix another bug with --credit_from_wu
- make_project, update scripts: don't quit it user_profiles
    already exists


svn path=/trunk/boinc/; revision=22630
2010-11-05 17:15:27 +00:00
David Anderson 4edfe2ec28 - client: small initial checkin for new scheduling system.
Keep track of per-project recent estimated credit

svn path=/trunk/boinc/; revision=22608
2010-10-29 23:41:34 +00:00
David Anderson 5ef4dead7d - validator: need parens in boolean expression
svn path=/trunk/boinc/; revision=21814
2010-06-25 19:23:16 +00:00
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson 25f9b05bdb - validator: there were a couple of places where we needed to
scale wu.rsc_fpops_est by app.min_avg_pfc.
- validator: assume that app.min_avg_pfc is nonzero;
    it will be, since the DB default is now 1.

svn path=/trunk/boinc/; revision=21804
2010-06-24 22:17:33 +00:00
David Anderson 1250f41313 - validator: fix a divide by zero (happens w/ old clients
that don't report elapsed time)

svn path=/trunk/boinc/; revision=21788
2010-06-22 18:09:55 +00:00
David Anderson 9262cc8c1a - validator: fix possible divide-by-zero
- validator: when claimed credit is too high,
    assign standard credit rather than exiting.

svn path=/trunk/boinc/; revision=21783
2010-06-21 17:56:12 +00:00
David Anderson f2e8d4601b - validator: because of the above problem,
some results have flops_estimate == 0, which causes divide by zero.
    Check for this and use 1e10.

svn path=/trunk/boinc/; revision=21776
2010-06-18 22:27:09 +00:00
David Anderson 4147249de2 - server: delete old credit stuff
- user web: show host link in user result list.  Fixes #999


svn path=/trunk/boinc/; revision=21735
2010-06-12 22:08:15 +00:00
David Anderson ef0019d8c3 - validator: bug fixes: bad formula for low_average();
failure to reread app_versions because of 1e6/1e-6 typo


svn path=/trunk/boinc/; revision=21302
2010-04-26 23:12:40 +00:00
David Anderson 5035007b90 - back end: new way of deciding:
- whether host is "reliable" for an app version
    - whether host is eligible for single replication for an app version
    - whether to use host scaling
    In each case, the answer is yes if the number of
    consecutive valid results is above a threshold.
    This replaces existing "error rate" and "scale probation" mechanisms.

    TODO: the # of consecutive valid results should also determine
        a limit on jobs in progress for an app version.
        Namely, if N is the threshold for host scaling, the limit should be
            ndevices*(max(1, consecutive_valid - N))
        The client currently doesn't supply enough
        app version info to do this.
        It could be approximated; that would give some protection
        against cherry-picking.
- credit: more conservative formulas for combining claimed credit
    among replicas.
    If there are normal replicas, we use a "low average"
    that weights each sample by the sum of the other samples.
    Otherwise we use the min (not the average) of the approximate samples.

NOTE: a DB update is required


svn path=/trunk/boinc/; revision=21230
2010-04-21 19:33:20 +00:00
David Anderson 6893691ae2 - validator: message tweak
svn path=/trunk/boinc/; revision=21212
2010-04-19 22:57:49 +00:00
David Anderson 61195cb59d - validator: fix bug where host.total_credit not incremented
svn path=/trunk/boinc/; revision=21211
2010-04-19 21:46:45 +00:00
David Anderson b71d3e6cf4 - back end: typo and tweaks
svn path=/trunk/boinc/; revision=21196
2010-04-16 21:16:18 +00:00
David Anderson 021edb02c2 - back end programs: improve log msgs
svn path=/trunk/boinc/; revision=21193
2010-04-16 18:07:08 +00:00
David Anderson 02717af2f3 - bug fixes
svn path=/trunk/boinc/; revision=21187
2010-04-15 21:58:44 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
David Anderson e05a479f42 - scheduler and validator: distinguish between
1) peak FLOPS (based on benchmarks or GPU attributes).
        This does not change over time.
        It's not adjusted on the basis of statistics.
        It's not affected by wu.rsc_fpops_est.
        It can be compared across projects.
    versus
    2) projected FLOPS: the scheduler's best guess as to what will satisfy
        X * elapsed_time = wu.rsc_fpops_est;
        this is used to make server-side runtime estimates,
        and it's sent to the client and used for its runtime estimates.
        It may be based on the (host, app version) elapsed time average.
    My checkin [21153] mistakently confounded these two.

    Notes:
    1) app_plan() now must return both peak and projected FLOPS.
    2) result.flops_estimate stores peak FLOPS
    3) the <flops> field in app_info.xml files should be
        projected FLOPS.  But its accuracy is not important;
        it's not used once the server has statistics
        for the (host, app version)

svn path=/trunk/boinc/; revision=21164
2010-04-10 05:49:51 +00:00
David Anderson 1d765245ed - scheduler: sweeping changes to the way job runtimes are estimated:
see http://boinc.berkeley.edu/trac/wiki/RuntimeEstimation


svn path=/trunk/boinc/; revision=21153
2010-04-08 23:14:47 +00:00
David Anderson 212fb765e9 - validator: detect jobs that used GPU app but fell back to CPU
(SETI@home does this if GPU initialization fails).
    Treat these like CPU apps for credit purposes.

svn path=/trunk/boinc/; revision=21130
2010-04-06 23:48:35 +00:00
David Anderson e276aa5ed6 - server: make the -d 4 feature work with FCGI
svn path=/trunk/boinc/; revision=21109
2010-04-05 23:12:02 +00:00
David Anderson 2536797068 - validator: remove update_credit_per_cpu_sec(). Irrelevant.
TODO: remove related code
- validator: update wu.canonical_credit correctly.
    However, this field should be deprecated.
- validator: check for error return from assign_credit_set().

svn path=/trunk/boinc/; revision=21096
2010-04-05 20:03:54 +00:00
David Anderson a2a661993b - validator: -d 4 means -d 3 plus print all DB queries
(todo: do this for all daemons)
- validator: change cmdline args from -foo to --foo
    (todo: do this for all daemons)
- validator: pass max_granted_credit to assign_credit_set()

svn path=/trunk/boinc/; revision=21093
2010-04-05 18:59:16 +00:00
David Anderson 54dce55e98 - backend: fix scaling problem that was producing xe15 size credits.
This had messed up the beta DB, which I had to clean up.
    Added a cap (1e5) to prevent this in the future.

svn path=/trunk/boinc/; revision=21064
2010-04-02 23:18:47 +00:00
David Anderson 78d11a263b - backend: improved messages for app version credit updates
svn path=/trunk/boinc/; revision=21063
2010-04-02 21:45:43 +00:00
David Anderson 19f7d66b53 - backend programs: change the way PFC and elapsed-time statistics
are written to the DB.
    The incremental approach was bogus.
    New approach:
    host_app_version: write directly; R/W interval is tiny
    app_version: maintain an explicit list of update samples
        for both PFC and credit.
        When the validator flushes its app_version cache,
        do careful updates.
    Note: when using double fields in careful updates,
    you can't test for equality.  Use abs(new-old) < 1e-N

svn path=/trunk/boinc/; revision=21057
2010-04-02 19:10:37 +00:00
David Anderson 38bd1c8def - validator: improved log messages
- fix some compiler warnings


svn path=/trunk/boinc/; revision=21053
2010-04-01 22:51:19 +00:00
David Anderson fb851311e0 - server: various changes;
see http://boinc.berkeley.edu/trac/wiki/CreditNew

    Projects will need to update DB and recompile all back-end programs.

    Summary:
    - new way of computing credit
    - "reliable host" mechanism is per app version
    - "host punishment" mechanism is per app version
    - adjustment of wu.rsc_fpops_est provides the
        equivalent of per app version DCF
    - max jobs in progress is now per app
    - max jobs per RPC is now per app

    TODO:
    - reliable mechanism:
        - populate and use host_app_version.error_rate
        - populate host_app_version.turnaround
    - host punishment:
        - populate host_app_version.max_jobs_per_day
        - populate host_app_version.n_jobs_today
        - use app.max_jobs_per_day_init
    - job limits:
        - use app.max_jobs_in_progress, max_gpu_jobs_in_progress
        - use app.max_jobs_per_rpc
    - adjust wu.rsc_fpops_est
    - remove old credit stuff
        fpops_cumulative, credit_multiplier
        credit computation in scheduler

- AVERAGE class: use the Knuth algorithm (Wikipedia)


svn path=/trunk/boinc/; revision=21021
2010-03-29 22:28:20 +00:00
David Anderson 3fb7c8f13f - server code: moved everything related to credit-granting to credit.cpp,
where it can be used by trickle handlers as well as by validators.

svn path=/trunk/boinc/; revision=18831
2009-08-12 16:26:43 +00:00
David Anderson 3eeefc0048 - server code cleanup
svn path=/trunk/boinc/; revision=18830
2009-08-12 16:01:46 +00:00
David Anderson f6d3e8a477 svn path=/trunk/boinc/; revision=18829 2009-08-11 15:17:37 +00:00