Commit Graph

181 Commits

Author SHA1 Message Date
David Anderson 3355b66241 - client and scheduler: a client host may have multiple VM systems installed.
TODO: check for VirtualBox on Mac, Linux

svn path=/trunk/boinc/; revision=22704
2010-11-17 23:19:07 +00:00
Rom Walton 1564a49816 - sched: Parse the detected virtual machine software from
the scheduler request so it can be used in plan classes.
        
    db/
        boinc_db.h
    sched/
        sched_types.cpp

svn path=/trunk/boinc/; revision=22703
2010-11-17 20:52:01 +00:00
David Anderson ae7866b251 - scheduler: restore scaling of daily quota by # processors
and/or config.gpu_multiplier
- client: msg tweak

svn path=/trunk/boinc/; revision=21753
2010-06-15 22:21:57 +00:00
David Anderson 4147249de2 - server: delete old credit stuff
- user web: show host link in user result list.  Fixes #999


svn path=/trunk/boinc/; revision=21735
2010-06-12 22:08:15 +00:00
David Anderson 8b836a391b - database: remove unused fields from app table
svn path=/trunk/boinc/; revision=21728
2010-06-11 03:50:47 +00:00
David Anderson c4df1f3104 svn path=/trunk/boinc/; revision=21232 2010-04-21 20:11:41 +00:00
David Anderson 5035007b90 - back end: new way of deciding:
- whether host is "reliable" for an app version
    - whether host is eligible for single replication for an app version
    - whether to use host scaling
    In each case, the answer is yes if the number of
    consecutive valid results is above a threshold.
    This replaces existing "error rate" and "scale probation" mechanisms.

    TODO: the # of consecutive valid results should also determine
        a limit on jobs in progress for an app version.
        Namely, if N is the threshold for host scaling, the limit should be
            ndevices*(max(1, consecutive_valid - N))
        The client currently doesn't supply enough
        app version info to do this.
        It could be approximated; that would give some protection
        against cherry-picking.
- credit: more conservative formulas for combining claimed credit
    among replicas.
    If there are normal replicas, we use a "low average"
    that weights each sample by the sum of the other samples.
    Otherwise we use the min (not the average) of the approximate samples.

NOTE: a DB update is required


svn path=/trunk/boinc/; revision=21230
2010-04-21 19:33:20 +00:00
David Anderson 021edb02c2 - back end programs: improve log msgs
svn path=/trunk/boinc/; revision=21193
2010-04-16 18:07:08 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
David Anderson 2536797068 - validator: remove update_credit_per_cpu_sec(). Irrelevant.
TODO: remove related code
- validator: update wu.canonical_credit correctly.
    However, this field should be deprecated.
- validator: check for error return from assign_credit_set().

svn path=/trunk/boinc/; revision=21096
2010-04-05 20:03:54 +00:00
David Anderson 19f7d66b53 - backend programs: change the way PFC and elapsed-time statistics
are written to the DB.
    The incremental approach was bogus.
    New approach:
    host_app_version: write directly; R/W interval is tiny
    app_version: maintain an explicit list of update samples
        for both PFC and credit.
        When the validator flushes its app_version cache,
        do careful updates.
    Note: when using double fields in careful updates,
    you can't test for equality.  Use abs(new-old) < 1e-N

svn path=/trunk/boinc/; revision=21057
2010-04-02 19:10:37 +00:00
David Anderson fb851311e0 - server: various changes;
see http://boinc.berkeley.edu/trac/wiki/CreditNew

    Projects will need to update DB and recompile all back-end programs.

    Summary:
    - new way of computing credit
    - "reliable host" mechanism is per app version
    - "host punishment" mechanism is per app version
    - adjustment of wu.rsc_fpops_est provides the
        equivalent of per app version DCF
    - max jobs in progress is now per app
    - max jobs per RPC is now per app

    TODO:
    - reliable mechanism:
        - populate and use host_app_version.error_rate
        - populate host_app_version.turnaround
    - host punishment:
        - populate host_app_version.max_jobs_per_day
        - populate host_app_version.n_jobs_today
        - use app.max_jobs_per_day_init
    - job limits:
        - use app.max_jobs_in_progress, max_gpu_jobs_in_progress
        - use app.max_jobs_per_rpc
    - adjust wu.rsc_fpops_est
    - remove old credit stuff
        fpops_cumulative, credit_multiplier
        credit computation in scheduler

- AVERAGE class: use the Knuth algorithm (Wikipedia)


svn path=/trunk/boinc/; revision=21021
2010-03-29 22:28:20 +00:00
David Anderson 4f77556c74 - client: if a GPU job is blocked on available mem,
don't fetch more jobs for that resource type

svn path=/trunk/boinc/; revision=20817
2010-03-10 06:00:37 +00:00
David Anderson 0ad0886df3 - server credit stuff.
New policy: anon platform and old platform jobs
    get average credit, possibly scaled by elapsed time.
    We no longer attempt to guess what app version produced them.

svn path=/trunk/boinc/; revision=20816
2010-03-10 00:33:31 +00:00
David Anderson 8062f21d59 - server credit stuff (partial checkin)
svn path=/trunk/boinc/; revision=20810
2010-03-09 04:15:10 +00:00
David Anderson 295d4b54ea - server: major improvements to locality scheduling from Einstein@home.
Triggering the work generator is now done via the DB
    instead of flat files.

    Since only E@h uses locality scheduling,
    I kept the DB changes in a separate file (db/schema_locality.sql).
    There's a new field in the workunit table,
    and that's a required update (in db_update.php)
- manager: compile fix


svn path=/trunk/boinc/; revision=20807
2010-03-05 22:55:16 +00:00
David Anderson 09b0a9f93c - admin web: reorganize main page;
add "transition all" command

svn path=/trunk/boinc/; revision=20745
2010-02-26 21:34:20 +00:00
David Anderson 8caa2cf3d5 - test code for new credit system
svn path=/trunk/boinc/; revision=19462
2009-11-04 21:23:56 +00:00
David Anderson 381a15c724 - create_work function and script:
check for valid ordering among max_success_results,
    max_total_results, max_error_results, and target_nresults

svn path=/trunk/boinc/; revision=19054
2009-09-16 03:10:22 +00:00
David Anderson da7e82fe15 - scheduler and back end: add new fields to result table:
elapsed_time: the elapsed time (runtime) as reported by client
    flops_estimate: the app's estimated FLOPS as reported by app_plan()
    app_version_id: the DB ID of the app_version used
        (or -1 if anonymous platform)
    TODO: show these in the web interfaces,
    and use them where appropriate

svn path=/trunk/boinc/; revision=19002
2009-09-03 20:26:31 +00:00
David Anderson 8b701fc73f - scheduler: fix messed-up deadline check logic.
Old:
        1) check deadline based on wu.delay_bound
        2) in add_result_to_reply(), potentially modify wu.delay_bound,
            e.g. because of retry acceleration
        problem: reducing delay bound may cause deadline miss
    New:
        1) new function get_delay_bound_range()
            (called from wu_is_infeasible_fast())
            returns optimistic and pessimistic delay bounds.
            Retry acceleration logic is here.
        2) check deadline based on optimistic bound;
            if that fails, check based on pessimistic bound.
            Set wu.delay_bound to the one that worked.
    Notes:
    - get_delay_bound_range() needs result priority and report deadline,
        and it's called before we read the full result.
        So add these items to WORK_ITEM and WU_RESULT.
    - get_delay_bound_range() could be customized for
        project-specific deadline policy.
    - add_result_to_reply() was becoming a toxic waste dump.
        Deadline-related stuff should have been factored out in any case.

svn path=/trunk/boinc/; revision=18946
2009-08-31 19:35:46 +00:00
David Anderson 12d4b978be - scheduler: if client request uses a weak authenticator,
don't modify user preferences or CPID.
- client: fix bug that shows ATI version incorrectly
- database: host.posts has been repurposed as a salt (or seqno)
    for a new type of weak authenticator that won't depend on password
- web code:
    modify forum_preferences.posts instead of host.posts.
    (actually, the former isn't used either, we just do a select count(*);
    should fix this at some point).

svn path=/trunk/boinc/; revision=18865
2009-08-18 20:44:12 +00:00
Jeff Cobb 15ccf7b778 Added table state_counts.
svn path=/trunk/boinc/; revision=18490
2009-06-23 21:45:22 +00:00
David Anderson 04cdfe9cab - scheduler and web: add a project preference for whether to use the CPU.
This complements the "use GPU?" pref.
    Neither should be necessary, but what the heck.

svn path=/trunk/boinc/; revision=17628
2009-03-18 21:14:44 +00:00
David Anderson 65679139c5 - scheduler: make host.p_features available to app_plan()
svn path=/trunk/boinc/; revision=17307
2009-02-19 15:43:37 +00:00
David Anderson 85a8e6a772 - scheduler: remove the config flag <have_cuda_apps>,
and add <cuda_multiplier>.
    The latter is used in calculating max jobs/day for a host;
    namely, it's host.max_results_day * (NCPUS + NCUDA*cuda_multiplier).
    Set it to 10 or so if you have CUDA apps.
- scheduler: don't overload effective_ncpus();
    instead, add two new functions,
    max_results_day_multiplier() and max_wus_in_progress_multiplier()
- scheduler: don't reduce max_results_day if we get an aborted job
    (it might have been aborted by the project;
    not appopriate to punish host in this case)

svn path=/trunk/boinc/; revision=16959
2009-01-20 00:54:16 +00:00
David Anderson 4a65681176 - scheduler: if client has coprocs,
put a textual summary of them in host.serialnum (currently unused)
- web: show coprocs on host detail page
- db_dump: include coproc info in host XML

svn path=/trunk/boinc/; revision=16697
2008-12-16 18:46:28 +00:00
David Anderson 79fb6e969e - Remove the notion of "CPU efficiency" from both client and server.
This wasn't being measured correctly for coproc/multithread apps,
    and its effect is now subsumed in DCF.

svn path=/trunk/boinc/; revision=16610
2008-12-03 19:50:06 +00:00
David Anderson f17a800807 - API: in boinc_exit(), release the lockfile only if
we're the main program (otherwise we didn't lock it in
    the first place, and a crash results).  From Artyom Sharov.
- scheduler: add support for the GCL simulator,
    which uses special versions of backend programs
    that use virtual time,
    and that wait for signals instead of sleep()ing.

    To compile:
        make clean
        configure CXXFLAGS="-DGCL_SIMULATOR"
        make

svn path=/trunk/boinc/; revision=16038
2008-09-22 20:33:59 +00:00
David Anderson 4f66bb4c95 - added copyright and license info to .C, .cpp, .h files
- scheduler: fix bug in adaptive replication:
    if send an unreplicated job to untrusted host,
    set both wu.target_nresults and wu.min_quorum to app.target_nresults.

svn path=/trunk/boinc/; revision=15762
2008-08-06 18:36:30 +00:00
Eric J. Korpela a5a6f693cd - Implementation of automatic credit leveling for cpu based projects that
wish to use it.
- The script calculate_credit_multiplier (expected to be run daily as
      a config.xml task) looks at the ratio of granted credit to CPU time 
      for recent results for each app.  Multiplier is calculated to cause 
      median hosts granted credit per cpu second to equal to equal that 
      expected from its benchmarks.  This is 30-day exponentially averaged 
      with the previous value of the multplier and stored in the table 
      credit_multplier.
- When a result is received the server adjusts claimed credit by the
      value the multiplier had when the result was sent.


svn path=/trunk/boinc/; revision=15661
2008-07-22 23:36:55 +00:00
David Anderson 0e03df254b - Back end: add adaptive validation feature
(DB update required)
- Fixed typo in Eric's 5/28 checkin

svn path=/trunk/boinc/; revision=15357
2008-06-04 23:04:12 +00:00
David Anderson 6af9f66b4e - DB/feeder/scheduler: change app_version.xml_doc from blob to mediumblob,
and change the correspending structure field from 64KB to 256KB
    (could increase this if needed).
    This is needed to handle app versions with lots (> 100) of files
- change LARGE_BLOB_SIZE to BLOB_SIZE a bunch of places
- Change COPROCS from vector<COPROC> to vector<COPROC*>.
    Otherwise the right virtual functions of COPROCs don't get called

svn path=/trunk/boinc/; revision=14986
2008-03-31 16:19:45 +00:00
David Anderson 4554fa5ce3 - server and client:
in server->client reply messages and in the client itself,
    move app-planning info from RESULT to APP_VERSION.
    This was necessary to allow anonymous platform info (app_info.xml)
    to specify avg_ncpus, etc.
    e.g., if someone wants to write a multithread version of SETI@home,
    or a GPU/CUDA version,
    they can run it using the anonymous platform mechanism
    and it will be scheduled correctly.

    If a server sends an existing APP_VERSION but with different
    app-planning info, the client will accept and use the new info.

svn path=/trunk/boinc/; revision=14978
2008-03-28 18:00:27 +00:00
David Anderson 13400c9516 Changes for multithread app support:
- update_versions: use __ (not :) as separator for plan class
- client: add plan_class to APP_VERSION;
    an app version is now identified by platform/version/plan_class
- client CPU scheduler: don't assume apps use 1 CPU
- client: add avg_ncpus, max_cpus, flops, cmdline to RESULT
- scheduler: implement app planning scheme

Other changes:

- client: if symlink() fails, make a XML soft link instead
    (for Unix running off a FAT32 FS)
- client: don't accept nonpositive resource share from AMS
- daemons and DB: check for error returns from enumerations,
    and exit if so.  Thus, if the MySQL server goes down,
    all the daemons will soon exit.
    The cron script will restart them every 5 min,
    so when the DB server comes back up so will the project.
- web: show empty max CPU % as ---
- API: get rid of all_threads_cpu_time option (always the case now)


svn path=/trunk/boinc/; revision=14966
2008-03-27 18:25:29 +00:00
David Anderson 815b8fc043 Various preparation for handling multithreaded apps
and apps that use coprocessors.
There now can be several app_versions for the same
(app, platform, version_num) combination.
This changes a number of things.

- Added app_version.plan_class field to DB
- update_versions now looks for a :plan-class in the
    file or directory name, and puts it in the app_version's DB record
- Change uniqueness constraint to include plan_class
- Feeder: the feeder was putting non-deprecated app_versions
    in shared mem, and leaving it to the scheduler to
    find the latest version for a given platform.
    This is dumb.
    Instead, for each app/platform pair the feeder now
    finds the highest version number of a non-deprecated app version,
    and enumerates all non-deprecated app_versions with that
    app/platform/version
- Scheduler: add a BEST_APP_VERSION data structure that keeps track,
    for each app, what the best app_version is for this host.
    This saves the work of recomputing it for each job.

svn path=/trunk/boinc/; revision=14906
2008-03-13 22:57:24 +00:00
David Anderson 95772cba77 - removed boinc_ncpus_available() and boinc_nthreads() calls.
The design has been changed to constant #threads per app version
    Various changes from Kevin Reed/WCG:
    - server: add workunit.rsc_bandwidth_bound: if nonzero,
        send this WU only to hosts with that much download bandwidth
    - assimilators: if a handler returns DEFER_ASSIMILATION,
        the WU remains in INIT state and will be handled when the
        next instance completes.
        Useful if you want the assimilator to see all instances.
    - scheduler: when setting result.outcome = DETACHED,
        set received_time to now
    - scheduler: removed the reliable_time and reliable_min_avg_credit
        options
    - scheduler/web: add optional <allow_non_preferred_projects>
        in project preferences.
        If present, user will accept work from non-selected apps
        if no work is available for selected apps
    - scheduler: improved messages for projects with multiple apps
    - scheduler: added config options
        <granted_credit_weight> and <granted_credit_ramp_up>.
        Used in calculating host.claimed_credit_per_cpu_sec,
        but I'm not sure how.
    - Added two new credit-granting formulas (validate_util.C):
        stddev_credit() and two_credit()
    - server DB: add rollback_transaction() and affected_rows() to DB_CONN

    NOTE: DB update required

svn path=/trunk/boinc/; revision=14870
2008-03-07 21:13:01 +00:00
David Anderson 54519a4ee1 - Server: add "job assignment" feature.
Lets you assign a WU to a particular host,
    to one or all hosts belonging to a user or team, or to all hosts.
    See http://boinc.berkeley.edu/trac/wiki/AssignedWork
    Disabled unless you include <enable_assignment> in config.xml
    Uses a new DB table.
    Tested but only a little.
- Server: code cleanup; moved result-handling to a new file,
    and removed the PLATFORM_LIST arg to everything
    (put it in SCHEDULER_REQUEST instead)

svn path=/trunk/boinc/; revision=14767
2008-02-21 00:47:50 +00:00
David Anderson 618a5c1651 - assimilator: there was a bug in the situation where:
1) a WU is marked as ready for assimilation and has no errors;
    2) it has no canonical result
    In this case, the assimilate handler gets called anyway,
    typically with the canonical result of the previous WU as arg.
    Note: this situation doesn't arise normally;
    it might happen if some results are deleted accidentally.
    The fix: 
    - identify this situation, and set the WU.error_mask to a new code
        (WU_ERROR_NO_CANONICAL_RESULT)
    - zero out the "canonical_result" variable passed to the handler,
        so even if the handler fails to check wu.error_mask,
        at least it won't assimilate the same result twice.
    Thanks to Hendrik Verhoek for finding this bug.
- DB schema: team table type is MyISAM, not InnoDB

svn path=/trunk/boinc/; revision=13938
2007-10-23 17:11:56 +00:00
Frank Thomas fbcfeaf456 - Removed the svn:executable property from files that should not be executable,
like source code and text files. I skipped to check most files in html/
  and mac_*/ though.
- Added svn:executable to tools/watch_tcp because it has a shebang.


svn path=/trunk/boinc/; revision=13819
2007-10-10 09:25:40 +00:00
Frank Thomas 3bfc78b511 Updated the postal address of the Free Software Foundation in all license headers. See http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2007-October/008939.html for reference.
svn path=/trunk/boinc/; revision=13804
2007-10-09 11:35:47 +00:00
David Anderson bb5f54d31f - feeder/scheduler: fix bug where APP.homogeneous_redundancy
is defined as a bool instead of int
    (see its only nonzero value is 1, so the "coarse" HR type is ignored).

svn path=/trunk/boinc/; revision=13786
2007-10-05 22:32:47 +00:00
David Anderson bc5b979afb - Added new script "update_versions_v6"; use this instead of
update_versions to add version 6 apps.
    It looks for API_VERSION string in main executable,
    adds the API version to the app_version XML,
    and sets min_core_version to 6 for version 6+ apps
- API: include API_VERSION string
- convert tabs to spaces here and there
- scheduler: parse unused elements in <net_stats>
- ops/show_log.php: if no URL args, just show form (fixes #415)
- client: parse and store api_version (not used yet)

svn path=/trunk/boinc/; revision=13627
2007-09-21 18:10:54 +00:00
David Anderson 2af893b0f2 - user web: code cleanup related to team creation.
Make a single function that creates teams
    and cleanses arguments.
- API: don't include config.h in parse.h.
    This file is included from apps
    (indirectly, via graphics_api.h)
    so it shouldn't assume that config.h exists

svn path=/trunk/boinc/; revision=13212
2007-07-25 03:17:31 +00:00
David Anderson ef247eed41 - user web: moved functions to send specific messages out of email.inc
- user web: code cleanup involving team founder transfer,
    and improved the text of some messages

svn path=/trunk/boinc/; revision=13210
2007-07-23 20:30:30 +00:00
David Anderson 797c464b3a - Back end: add a feature for "blackballing" hosts.
To do this, set host.max_results_day to -1.
    If you do this, scheduler requests from that host
    will get an error message, and will otherwise be ignored
    (no jobs in or out, no trickles).
- Scheduler: send_message() should be called ONLY if you're
    not going to call handle_request();
    otherwise we'll write two separate replies.
    To fix this, I added a separate function (send_error_message())
    that can be called within handle_request()
    to deal with error situations.
- Scheduler: moved debug_sched() to main.C
- Scheduler: moved logic to send "delete file" commands
    out of handle_request() into a separate function,
    send_file_deletes() in sched_locality.C.
    Remove #ifdef EINSTEIN_AT_HOMEs; maybe someday another project
    will use locality scheduling!

svn path=/trunk/boinc/; revision=13108
2007-07-06 16:37:00 +00:00
David Anderson a97556bdfd - feeder: added a new enumerator of DB_WORK_ITEM that,
on successive calls, scans through ALL the sendable
    jobs satisfying the select clause
    (it does this by ID order, so there's no order clause)
    This is used for HR, so that if a job has been committed
    to an HR class, we eventually get it.

    With extremely minimal testing, the new HR stuff seems to work.

db/
    boinc_db.C,h
sched/
    feeder.C
    sample_work_generator.C
    server_types.C

svn path=/trunk/boinc/; revision=12988
2007-06-22 23:48:37 +00:00
David Anderson f5d94818dd - added "census", a program that counts up how much RAC
there is for each HR class, and writes it to a file.
    This will be used soon for HR support in the feeder.
- split the HR code into hr.C,h (stuff used by both census and scheduler)
    and sched_hr.C (stuff used only by the scheduler)
- database: change DB_CREDITED_JOB to treat workunitid
    as a double (which it is) rather than a long.
    BTW, long == int.
- fixed lots of compile warnings in the server code

db/
    boinc_db.C,h
lib/
    boinc_cmd.C
    miofile.C
    util.C
sched/
    Makefile.am
    census.C (new)
    feeder.C
    file_deleter.C
    file_upload_handler.C
    handle_request.C
    hr.C,h (new)
    main.C
    sample_assimilator.C
    sample_work_generator.C
    sched_array.C
    sched_hr.C,h
    sched_send.C
    server_types.C
    transitioner.C
    validator.C

svn path=/trunk/boinc/; revision=12970
2007-06-20 22:34:06 +00:00
David Anderson d673440faf - Scheduler: increased the resolution of homogeneous redundancy (HR)
e.g. distinguish between models of Intel and AMD
- Scheduler: add a quick HR check that doesn't access the DB
- Transitioner: if a workunit has >0 error results and no success results,
    set its HR class to zero
From M.F. Somers.

db/
    boinc_db.C,h
sched/
    sched_array.C
    sched_hr.C,h
    transitioner.C

svn path=/trunk/boinc/; revision=12773
2007-05-29 23:41:31 +00:00
Matt Lebofsky 136ce49b84 svn path=/trunk/boinc/; revision=12536 2007-05-02 23:17:52 +00:00