Commit Graph

547 Commits

Author SHA1 Message Date
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson e260b47bd5 - database: app.min_avg_pfc should default to 1, not 0
svn path=/trunk/boinc/; revision=21763
2010-06-17 16:44:33 +00:00
David Anderson ae7866b251 - scheduler: restore scaling of daily quota by # processors
and/or config.gpu_multiplier
- client: msg tweak

svn path=/trunk/boinc/; revision=21753
2010-06-15 22:21:57 +00:00
David Anderson 4147249de2 - server: delete old credit stuff
- user web: show host link in user result list.  Fixes #999


svn path=/trunk/boinc/; revision=21735
2010-06-12 22:08:15 +00:00
David Anderson 8b836a391b - database: remove unused fields from app table
svn path=/trunk/boinc/; revision=21728
2010-06-11 03:50:47 +00:00
David Anderson 89fab4ece5 - back end: change "daily result quota" mechanism.
Old: config.xml specifies an initial daily quota (say, 100).
        Each host_app_version starts out with this quota.
        On the return of a SUCCESS result,
        the quota is doubled, up to the initial value.
        On the return of an error result, or a timeout,
        the quota is decremented down to 1.
    Problem:
        Doesn't accommodate hosts that can do more than 100 jobs/day.
    New: similar, but
        - on validation of a job, daily quota is incremented.
        - on invalidation of a job, daily quota is decremented.
        - on return of an error result, or a timeout,
            daily quota is min'd with initial quota, then decremented.
    Notes:
        - This allows a host to have an unboundedly large quota
            as long as it continues to return more valid
            than invalid results.
        - Even with this change, hosts that return SUCCESS but
            invalid results will continue to get the initial daily quota.
            It would be desirable to reduce their quota to 1.

svn path=/trunk/boinc/; revision=21675
2010-06-02 00:11:01 +00:00
David Anderson 7daae1d0c7 - client: when emerge from bandwidth quota network suspension,
add 0..1hr random delay to existing transfers,
    to avoid DDOS effect

svn path=/trunk/boinc/; revision=21415
2010-05-07 20:08:59 +00:00
David Anderson ef0019d8c3 - validator: bug fixes: bad formula for low_average();
failure to reread app_versions because of 1e6/1e-6 typo


svn path=/trunk/boinc/; revision=21302
2010-04-26 23:12:40 +00:00
David Anderson c4df1f3104 svn path=/trunk/boinc/; revision=21232 2010-04-21 20:11:41 +00:00
David Anderson 5035007b90 - back end: new way of deciding:
- whether host is "reliable" for an app version
    - whether host is eligible for single replication for an app version
    - whether to use host scaling
    In each case, the answer is yes if the number of
    consecutive valid results is above a threshold.
    This replaces existing "error rate" and "scale probation" mechanisms.

    TODO: the # of consecutive valid results should also determine
        a limit on jobs in progress for an app version.
        Namely, if N is the threshold for host scaling, the limit should be
            ndevices*(max(1, consecutive_valid - N))
        The client currently doesn't supply enough
        app version info to do this.
        It could be approximated; that would give some protection
        against cherry-picking.
- credit: more conservative formulas for combining claimed credit
    among replicas.
    If there are normal replicas, we use a "low average"
    that weights each sample by the sum of the other samples.
    Otherwise we use the min (not the average) of the approximate samples.

NOTE: a DB update is required


svn path=/trunk/boinc/; revision=21230
2010-04-21 19:33:20 +00:00
David Anderson 61195cb59d - validator: fix bug where host.total_credit not incremented
svn path=/trunk/boinc/; revision=21211
2010-04-19 21:46:45 +00:00
David Anderson 021edb02c2 - back end programs: improve log msgs
svn path=/trunk/boinc/; revision=21193
2010-04-16 18:07:08 +00:00
David Anderson 02717af2f3 - bug fixes
svn path=/trunk/boinc/; revision=21187
2010-04-15 21:58:44 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
David Anderson e276aa5ed6 - server: make the -d 4 feature work with FCGI
svn path=/trunk/boinc/; revision=21109
2010-04-05 23:12:02 +00:00
David Anderson 2536797068 - validator: remove update_credit_per_cpu_sec(). Irrelevant.
TODO: remove related code
- validator: update wu.canonical_credit correctly.
    However, this field should be deprecated.
- validator: check for error return from assign_credit_set().

svn path=/trunk/boinc/; revision=21096
2010-04-05 20:03:54 +00:00
David Anderson a5c2f98481 - backend: make "print queries" a runtime instead of compile-time
decision (bool g_print_queries)


svn path=/trunk/boinc/; revision=21065
2010-04-03 00:02:38 +00:00
David Anderson 19f7d66b53 - backend programs: change the way PFC and elapsed-time statistics
are written to the DB.
    The incremental approach was bogus.
    New approach:
    host_app_version: write directly; R/W interval is tiny
    app_version: maintain an explicit list of update samples
        for both PFC and credit.
        When the validator flushes its app_version cache,
        do careful updates.
    Note: when using double fields in careful updates,
    you can't test for equality.  Use abs(new-old) < 1e-N

svn path=/trunk/boinc/; revision=21057
2010-04-02 19:10:37 +00:00
David Anderson 2335e009bb svn path=/trunk/boinc/; revision=21022 2010-03-29 22:29:21 +00:00
David Anderson fb851311e0 - server: various changes;
see http://boinc.berkeley.edu/trac/wiki/CreditNew

    Projects will need to update DB and recompile all back-end programs.

    Summary:
    - new way of computing credit
    - "reliable host" mechanism is per app version
    - "host punishment" mechanism is per app version
    - adjustment of wu.rsc_fpops_est provides the
        equivalent of per app version DCF
    - max jobs in progress is now per app
    - max jobs per RPC is now per app

    TODO:
    - reliable mechanism:
        - populate and use host_app_version.error_rate
        - populate host_app_version.turnaround
    - host punishment:
        - populate host_app_version.max_jobs_per_day
        - populate host_app_version.n_jobs_today
        - use app.max_jobs_per_day_init
    - job limits:
        - use app.max_jobs_in_progress, max_gpu_jobs_in_progress
        - use app.max_jobs_per_rpc
    - adjust wu.rsc_fpops_est
    - remove old credit stuff
        fpops_cumulative, credit_multiplier
        credit computation in scheduler

- AVERAGE class: use the Knuth algorithm (Wikipedia)


svn path=/trunk/boinc/; revision=21021
2010-03-29 22:28:20 +00:00
David Anderson 836b935234 - fix bugs in strcasestr(), which apparently had never been tested
- fix typo in schema

svn path=/trunk/boinc/; revision=20923
2010-03-16 17:20:14 +00:00
David Anderson 3452bbbc87 - GUI RPC: more replacement of std::string
svn path=/trunk/boinc/; revision=20889
2010-03-13 04:40:37 +00:00
David Anderson 85d0027d3b - server: DB update queries check that the number of affected rows is 1.
However, MySQL's default is that "affected rows" is
    rows actually modified, which is not what we want.
    Use the CLIENT_FOUND_ROWS option in mysql_real_connect()
    to change the semantics to "rows matched".
    From Oliver Bock.

svn path=/trunk/boinc/; revision=20880
2010-03-12 19:36:20 +00:00
David Anderson 4f77556c74 - client: if a GPU job is blocked on available mem,
don't fetch more jobs for that resource type

svn path=/trunk/boinc/; revision=20817
2010-03-10 06:00:37 +00:00
David Anderson 0ad0886df3 - server credit stuff.
New policy: anon platform and old platform jobs
    get average credit, possibly scaled by elapsed time.
    We no longer attempt to guess what app version produced them.

svn path=/trunk/boinc/; revision=20816
2010-03-10 00:33:31 +00:00
David Anderson 8062f21d59 - server credit stuff (partial checkin)
svn path=/trunk/boinc/; revision=20810
2010-03-09 04:15:10 +00:00
David Anderson 295d4b54ea - server: major improvements to locality scheduling from Einstein@home.
Triggering the work generator is now done via the DB
    instead of flat files.

    Since only E@h uses locality scheduling,
    I kept the DB changes in a separate file (db/schema_locality.sql).
    There's a new field in the workunit table,
    and that's a required update (in db_update.php)
- manager: compile fix


svn path=/trunk/boinc/; revision=20807
2010-03-05 22:55:16 +00:00
David Anderson 9020d0b715 - server: if MySQL version is 5.0.19 <= v < 5.1,
set the reconnect option before real_connect() instead of after.
    From Oliver Bock.

svn path=/trunk/boinc/; revision=20763
2010-03-01 19:12:19 +00:00
David Anderson 09b0a9f93c - admin web: reorganize main page;
add "transition all" command

svn path=/trunk/boinc/; revision=20745
2010-02-26 21:34:20 +00:00
David Anderson 65b581b28d svn path=/trunk/boinc/; revision=20316 2010-01-29 16:28:46 +00:00
David Anderson 5661321de3 - feeder: fix crashing bug
svn path=/trunk/boinc/; revision=19509
2009-11-06 23:31:42 +00:00
David Anderson 53aa10570a - feeder: fix crashing bug
svn path=/branches/server_stable/; revision=19508
2009-11-06 23:31:26 +00:00
David Anderson 8caa2cf3d5 - test code for new credit system
svn path=/trunk/boinc/; revision=19462
2009-11-04 21:23:56 +00:00
David Anderson c76d166344 - user web: forum_preferences.{low_rating_threshold, high_rating_threshold}
were deprecated, but were still used in deciding whether to show a post.
    This broke the "ignore list" function.

svn path=/trunk/boinc/; revision=19218
2009-09-29 21:58:54 +00:00
David Anderson 381a15c724 - create_work function and script:
check for valid ordering among max_success_results,
    max_total_results, max_error_results, and target_nresults

svn path=/trunk/boinc/; revision=19054
2009-09-16 03:10:22 +00:00
David Anderson da7e82fe15 - scheduler and back end: add new fields to result table:
elapsed_time: the elapsed time (runtime) as reported by client
    flops_estimate: the app's estimated FLOPS as reported by app_plan()
    app_version_id: the DB ID of the app_version used
        (or -1 if anonymous platform)
    TODO: show these in the web interfaces,
    and use them where appropriate

svn path=/trunk/boinc/; revision=19002
2009-09-03 20:26:31 +00:00
David Anderson 8b701fc73f - scheduler: fix messed-up deadline check logic.
Old:
        1) check deadline based on wu.delay_bound
        2) in add_result_to_reply(), potentially modify wu.delay_bound,
            e.g. because of retry acceleration
        problem: reducing delay bound may cause deadline miss
    New:
        1) new function get_delay_bound_range()
            (called from wu_is_infeasible_fast())
            returns optimistic and pessimistic delay bounds.
            Retry acceleration logic is here.
        2) check deadline based on optimistic bound;
            if that fails, check based on pessimistic bound.
            Set wu.delay_bound to the one that worked.
    Notes:
    - get_delay_bound_range() needs result priority and report deadline,
        and it's called before we read the full result.
        So add these items to WORK_ITEM and WU_RESULT.
    - get_delay_bound_range() could be customized for
        project-specific deadline policy.
    - add_result_to_reply() was becoming a toxic waste dump.
        Deadline-related stuff should have been factored out in any case.

svn path=/trunk/boinc/; revision=18946
2009-08-31 19:35:46 +00:00
David Anderson 12d4b978be - scheduler: if client request uses a weak authenticator,
don't modify user preferences or CPID.
- client: fix bug that shows ATI version incorrectly
- database: host.posts has been repurposed as a salt (or seqno)
    for a new type of weak authenticator that won't depend on password
- web code:
    modify forum_preferences.posts instead of host.posts.
    (actually, the former isn't used either, we just do a select count(*);
    should fix this at some point).

svn path=/trunk/boinc/; revision=18865
2009-08-18 20:44:12 +00:00
David Anderson 8dc464f1e4 - fix typo in schema
svn path=/trunk/boinc/; revision=18506
2009-06-27 21:10:40 +00:00
Jeff Cobb 15ccf7b778 Added table state_counts.
svn path=/trunk/boinc/; revision=18490
2009-06-23 21:45:22 +00:00
David Anderson 10f9e11ee6 - lib: created a new file for declaring "replacements"
for functions like strlcpy() etc.
    config.h is included here rather than in str_util.h


svn path=/trunk/boinc/; revision=18437
2009-06-16 20:54:44 +00:00
David Anderson c86419a9ff - DB: for tables w/ fulltext indices, specify engine as MyISAM
from Nicolas; fixes #904

svn path=/trunk/boinc/; revision=18201
2009-05-26 16:56:00 +00:00
David Anderson 323fdc0e21 - DB code: fixed three places where we accessed a MYSQL_ROW
after freeing the MYSQL_RES it came from.
    (this didn't appear to cause any problems, but not good form).
    Fixes #883

svn path=/trunk/boinc/; revision=17904
2009-04-28 19:20:23 +00:00
David Anderson 04cdfe9cab - scheduler and web: add a project preference for whether to use the CPU.
This complements the "use GPU?" pref.
    Neither should be necessary, but what the heck.

svn path=/trunk/boinc/; revision=17628
2009-03-18 21:14:44 +00:00
Eric J. Korpela 5fe6e9a098 Added include of Makefile.incl and "if INSTALL_HEADERS" around
pkginclude_HEADERS=


svn path=/trunk/boinc/; revision=17563
2009-03-09 15:38:21 +00:00
David Anderson 65679139c5 - scheduler: make host.p_features available to app_plan()
svn path=/trunk/boinc/; revision=17307
2009-02-19 15:43:37 +00:00
David Anderson 85a8e6a772 - scheduler: remove the config flag <have_cuda_apps>,
and add <cuda_multiplier>.
    The latter is used in calculating max jobs/day for a host;
    namely, it's host.max_results_day * (NCPUS + NCUDA*cuda_multiplier).
    Set it to 10 or so if you have CUDA apps.
- scheduler: don't overload effective_ncpus();
    instead, add two new functions,
    max_results_day_multiplier() and max_wus_in_progress_multiplier()
- scheduler: don't reduce max_results_day if we get an aborted job
    (it might have been aborted by the project;
    not appopriate to punish host in this case)

svn path=/trunk/boinc/; revision=16959
2009-01-20 00:54:16 +00:00
David Anderson 4a65681176 - scheduler: if client has coprocs,
put a textual summary of them in host.serialnum (currently unused)
- web: show coprocs on host detail page
- db_dump: include coproc info in host XML

svn path=/trunk/boinc/; revision=16697
2008-12-16 18:46:28 +00:00
Jeff Cobb 8b18ab4a16 Added result userid and teamid to the column set in DB_VALIDATOR_ITEM_SET.
This is needed by application validator code for cuda logging.

svn path=/trunk/boinc/; revision=16673
2008-12-12 17:54:01 +00:00
David Anderson 79fb6e969e - Remove the notion of "CPU efficiency" from both client and server.
This wasn't being measured correctly for coproc/multithread apps,
    and its effect is now subsumed in DCF.

svn path=/trunk/boinc/; revision=16610
2008-12-03 19:50:06 +00:00