Commit Graph

582 Commits

Author SHA1 Message Date
David Anderson 22a911516c - server: more fixes to DB to handle unsigned result IDs
svn path=/trunk/boinc/; revision=24563
2011-11-09 17:27:50 +00:00
David Anderson 7c201eba3f - DB: use %u when writing result IDs in SQL queries;
this is to support SETI@home, which ran out of result IDs
    and changed the DB field type to int unsigned.
    Note: eventually I'll make this change official
    and change the .h types as well.
- web: put <apps_selected> tags around <app_id> elements
    in project-specific prefs.


svn path=/trunk/boinc/; revision=24555
2011-11-09 07:41:49 +00:00
David Anderson ae04b50a71 - client: don't crash if trickle up exceeds 64KB
(this bug was introduced Sept 20)
- scheduler: truncate long trickle-ups to 256KB; don't crash


svn path=/trunk/boinc/; revision=24535
2011-11-06 06:25:48 +00:00
David Anderson 4b826b52a0 - scheduler: fix bug in the "homogeneous app version" (HAV) feature
(reported by Kevin Reed).
    The problem: cache inconsistency.
    If there are 2 results for the same WU in shared mem,
    and 2 scheduler instances get them around the same time,
    they can send them with different app versions.
    We already fixed this problem for HR by
    1) rereading the relevant WU fields while deciding
        whether to send the result
    2) doing a "careful update" of the WU field using a where clause
        to make sure it wasn't modified in the (short) interval
        since rereading it.
    I fixed the HAV problem in the same way,
    and merged the two mechanisms to combine the DB queries.

    Also:
    - The rereads are done in slow_check() (see below).
    - The careful updates are done in update_wu_on_send(),
        and this is called *before* doing careful updates on result fields.
        That way, if the WU updates fail, we don't have orphaned results.
    - already_sent_to_different_platform_careful() (sic)
        no longer does DB stuff, so it's merged with
        already_send_to_different_hr_class() (better name)

    NOTE: slow_check() is used in array scheduling only.
        Score-based scheduling uses other code,
        in which this bug is not yet fixed.
        Locality scheduling doesn't support HR or HAV at all.
        This should be unified.


svn path=/trunk/boinc/; revision=24484
2011-10-26 07:15:22 +00:00
David Anderson 3410abef6f - backend API: added function cancel_jobs(minid, maxid)
for canceling jobs
- added program cancel_jobs for canceling jobs
- DB interface: it's not an error if update_fields_noid()
    affects != 1 rows


svn path=/trunk/boinc/; revision=24413
2011-10-18 07:15:04 +00:00
David Anderson e279b59913 - Updates Linux notifications to use current libnotify.
- Fix build problems on Mac OS X using autotools
- Consistently use #if HAVE_X for platform checks,
    rather than #ifdef HAVE_X or #if defined(HAVE_X)
- In Unix build, make lots of compiler checks standard
- Fix some compile warnings

From Matt Arsenault.

Note: there are now lots of compile warnings in clientgui/ on Unix,
    mostly in WxWidgets code


svn path=/trunk/boinc/; revision=24303
2011-09-27 19:45:27 +00:00
David Anderson 75c6032043 - fix typo in DB schema
- msg tweaks in make_project


svn path=/trunk/boinc/; revision=24253
2011-09-22 03:15:21 +00:00
David Anderson 9d2ba11e09 - back end: extend the access control system for remote job submission
and other operations.
    You can now designate a user as "manager" for a particular app.
    They can then:
    - control job-submit permissions for that app
    - deprecate/undeprecate versions of the app.
    - abort jobs for that app

    You can also designate a user a manage for the project.
    They can then edit permissions and quotas,
    as well as performing the app-specific functions for all apps.

    This is described here:
    http://boinc.berkeley.edu/trac/wiki/MultiUser#Accesscontrol

    This required some changes to the DB schema.


svn path=/trunk/boinc/; revision=24250
2011-09-21 21:26:00 +00:00
David Anderson a886e0be7c - transitioner: fix bug related to new runtime_outlier field
svn path=/trunk/boinc/; revision=24228
2011-09-16 20:42:45 +00:00
David Anderson e49f945908 - Validator: allow project-specific code to mark a result
is a "runtime outlier", i.e. its runtime does
    not correspond to the job's rsc_fpops_est.
    Runtime outliers are not counted in the statistics for
    elapsed time, turnaround time, and peak FLOPs count.

    The is intended for applications like SETI@home,
    some of whose jobs finish more or less instantly
    (this happens if the data contains a lot of interference).
    If a host happens to get a bunch of these short jobs,
    its statistics will get skewed: in essence, the server
    will think that the host is extremely fast,
    and will send it too many jobs.


svn path=/trunk/boinc/; revision=24225
2011-09-16 16:43:15 +00:00
David Anderson 176b0a4327 - validator: add a --credit_from_runtime option.
This assigns credit proportional to runtime*p_fpops.
    To prevent cheating, p_fpops is capped at the 95th percentile value
    among active hosts,
    and runtime is capped at a specified limit.
    This option supports apps, like LHC's CERNvm app,
    that run for a certain amount of time and then exit.
    The CreditNew system doesn't work for such apps.
- trickle_credit:
    To prevent cheating,
    cap p_fpops at the 95th percentile value among active hosts,
    and require a limit on runtime.
- require that trickle handlers supply an initialization function


svn path=/trunk/boinc/; revision=24182
2011-09-13 21:01:42 +00:00
David Anderson b80f1525f6 - feeder: change the DB query to skip jobs for deprecated apps.
Otherwise, if you have a deprecated app with >= 200 jobs
    (200 is the query's limit)
    it could always get jobs for that app,
    and never put anything into the cache.


svn path=/trunk/boinc/; revision=24142
2011-09-07 19:57:46 +00:00
David Anderson 8c49c174c3 - preliminary stuff for mechanism where privileged users
can create apps and app versions
- crontab commands should be preceded by cd to project root


svn path=/trunk/boinc/; revision=24137
2011-09-07 06:54:44 +00:00
David Anderson 7c81d72378 - web: fix warnings in forum pages
- scheduler: when using elapsed time stats to predict runtime,
    cap the estimated FLOPS at twice the peak FLOPS;
    otherwise, if a host has received a lot of very short jobs
    recently, it will get a too-high FLOPS estimate and
    will exceed the rsc_fpops_bound limit.


svn path=/trunk/boinc/; revision=24128
2011-09-05 17:29:53 +00:00
David Anderson c5c5975b44 - Improve interface of XML_PARSER.
Add parsed_tag and is_tag to the class,
    so that parsing functions don't need to declare them
    and pass them around.
- Complete the task of using XML_PARSER as the argument
    to all parsing functions.
    (Internally, many of these functions still use the old XML parser;
    that's the next step.)


svn path=/trunk/boinc/; revision=23978
2011-08-10 17:11:08 +00:00
David Anderson 271699ea0a - server: fix typo
svn path=/trunk/boinc/; revision=23904
2011-07-30 22:42:05 +00:00
David Anderson 6e5acbbe60 - web: remote job submission:
- add fields to batch table, extend APIs accordingly
    - require that example web interface run on BOINC server
        (this makes many things easier;
        an actual remote interface would require a bit more work)


svn path=/trunk/boinc/; revision=23881
2011-07-27 06:20:48 +00:00
David Anderson 83965db576 - web: more remote job submission code. Not finished.
svn path=/trunk/boinc/; revision=23871
2011-07-25 21:45:53 +00:00
David Anderson 2177a6bd95 - server: restore fpops/intops_cumulative to RESULT
(structure, not table) for AQUA
- client, Windows: when wake up from hibernation,
    get the time before printing log msg


svn path=/trunk/boinc/; revision=23784
2011-06-29 23:00:39 +00:00
David Anderson 4403df42d8 - client: add <type> element to <exclude_gpu> config option,
in case of multiple GPU types


svn path=/trunk/boinc/; revision=23777
2011-06-25 05:13:56 +00:00
David Anderson 8a9605e48c - web: add a web-service interface for remotely submitting, querying
and controlling batches of jobs
- web: add an administrative interface for controlling
    user permissions for submitting jobs
- web: add an interface where users can view and control
    their submitted jobs
See: http://boinc.berkeley.edu/trac/wiki/RemoteJobs
This is at a functional but rough stage.


svn path=/trunk/boinc/; revision=23762
2011-06-21 22:56:15 +00:00
David Anderson 61cbe110ef - web: when creating an item in News forum,
show "Export as Notice?" checkbox, and default to off.


svn path=/trunk/boinc/; revision=23721
2011-06-14 05:42:52 +00:00
David Anderson 93add14614 - backend: use new XML parser for input template files
(so that they don't have to be 1 element/line)
    and also allow optional <input_template> root element
- fix bug in WORKUNIT DB interface


svn path=/trunk/boinc/; revision=23648
2011-06-07 04:12:49 +00:00
David Anderson 436415cfe1 - scheduler, back end: add "homogeneous app version" feature.
Lets you specify, on a per-app basis,
    that all instances should be done using the same app version.
    This is for validation in the presence of GPUs.
- scheduler: code cleanup
    - Instead of adding a bunch of non-DB fields to RESULT,
        used a derived class SCHED_DB_RESULT.
    - Instead of storing a pointer to BEST_APP_VERSION in RESULT,
        store the structure itself.
        This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
    on <file_xfer_debug>


svn path=/trunk/boinc/; revision=23636
2011-06-06 03:40:42 +00:00
David Anderson b6140088e3 - update_versions: trim XML strings
svn path=/trunk/boinc/; revision=23569
2011-05-21 06:22:15 +00:00
Bernd Machenschalk 4fa5f4bd8c Einstein@home extensions:
- protect malloc.h
- validator: allow to update 'random' result field
- assimilator: add global variables results_prefix and transcripts_prefix

svn path=/trunk/boinc/; revision=23241
2011-03-18 08:20:11 +00:00
David Anderson 0bc1a36451 - user web: add a feature allowing project admins to control
whether news items are exported as notices.
    The creator of a news item is shown a "Don't export" or "Export"
    button on the thread page.
    By default, news items are exported.


svn path=/trunk/boinc/; revision=23119
2011-02-28 19:02:59 +00:00
David Anderson 53a7307305 - scheduler: fix nasty bug introduced in [23040]
that caused no jobs to be sent.


svn path=/trunk/boinc/; revision=23096
2011-02-23 21:22:45 +00:00
David Anderson 5421335dbb - transitioner: fix bug that could cause file deletion to not be done
for some WUs
- back end: fix the way "report grace period" is implemented
    old: result.report_deadline (i.e. what's in the DB) and
        the deadline sent to the client are the same.
        Some confusing and incorrect logic in the transitioner
        tries to provide the desired semantics.
    new: result.report_deadline is the deadline sent to the client,
        plus the grace period.
        No logic in the transitioner is needed.


svn path=/trunk/boinc/; revision=23040
2011-02-15 22:07:14 +00:00
David Anderson 3355b66241 - client and scheduler: a client host may have multiple VM systems installed.
TODO: check for VirtualBox on Mac, Linux

svn path=/trunk/boinc/; revision=22704
2010-11-17 23:19:07 +00:00
Rom Walton 1564a49816 - sched: Parse the detected virtual machine software from
the scheduler request so it can be used in plan classes.
        
    db/
        boinc_db.h
    sched/
        sched_types.cpp

svn path=/trunk/boinc/; revision=22703
2010-11-17 20:52:01 +00:00
David Anderson c0612ab77f - make_project: with --test_app, copy all the executables
(for many platforms) from samples/example_app/bin
- make_project: change name of example app from uppercase to example_app.
- update_versions: allow version numbers to not have decimal points
- sample work generator: make app name and template files
    command-line options;
    default to "example_app", "example_app_in.xml", "example_app_out.xml"

svn path=/trunk/boinc/; revision=22667
2010-11-10 00:10:32 +00:00
David Anderson 5911a059dd - compile fix
svn path=/trunk/boinc/; revision=22390
2010-09-20 17:16:44 +00:00
David Anderson 2e00bb3084 - scheduler: fix crashing bug when client reports a large # (1000+)
of results (256KB not enough for query in this case)

svn path=/trunk/boinc/; revision=22389
2010-09-19 03:42:51 +00:00
David Anderson bf56e80bce - tweaks
svn path=/trunk/boinc/; revision=22306
2010-08-29 08:43:40 +00:00
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson e260b47bd5 - database: app.min_avg_pfc should default to 1, not 0
svn path=/trunk/boinc/; revision=21763
2010-06-17 16:44:33 +00:00
David Anderson ae7866b251 - scheduler: restore scaling of daily quota by # processors
and/or config.gpu_multiplier
- client: msg tweak

svn path=/trunk/boinc/; revision=21753
2010-06-15 22:21:57 +00:00
David Anderson 4147249de2 - server: delete old credit stuff
- user web: show host link in user result list.  Fixes #999


svn path=/trunk/boinc/; revision=21735
2010-06-12 22:08:15 +00:00
David Anderson 8b836a391b - database: remove unused fields from app table
svn path=/trunk/boinc/; revision=21728
2010-06-11 03:50:47 +00:00
David Anderson 89fab4ece5 - back end: change "daily result quota" mechanism.
Old: config.xml specifies an initial daily quota (say, 100).
        Each host_app_version starts out with this quota.
        On the return of a SUCCESS result,
        the quota is doubled, up to the initial value.
        On the return of an error result, or a timeout,
        the quota is decremented down to 1.
    Problem:
        Doesn't accommodate hosts that can do more than 100 jobs/day.
    New: similar, but
        - on validation of a job, daily quota is incremented.
        - on invalidation of a job, daily quota is decremented.
        - on return of an error result, or a timeout,
            daily quota is min'd with initial quota, then decremented.
    Notes:
        - This allows a host to have an unboundedly large quota
            as long as it continues to return more valid
            than invalid results.
        - Even with this change, hosts that return SUCCESS but
            invalid results will continue to get the initial daily quota.
            It would be desirable to reduce their quota to 1.

svn path=/trunk/boinc/; revision=21675
2010-06-02 00:11:01 +00:00
David Anderson 7daae1d0c7 - client: when emerge from bandwidth quota network suspension,
add 0..1hr random delay to existing transfers,
    to avoid DDOS effect

svn path=/trunk/boinc/; revision=21415
2010-05-07 20:08:59 +00:00
David Anderson ef0019d8c3 - validator: bug fixes: bad formula for low_average();
failure to reread app_versions because of 1e6/1e-6 typo


svn path=/trunk/boinc/; revision=21302
2010-04-26 23:12:40 +00:00
David Anderson c4df1f3104 svn path=/trunk/boinc/; revision=21232 2010-04-21 20:11:41 +00:00
David Anderson 5035007b90 - back end: new way of deciding:
- whether host is "reliable" for an app version
    - whether host is eligible for single replication for an app version
    - whether to use host scaling
    In each case, the answer is yes if the number of
    consecutive valid results is above a threshold.
    This replaces existing "error rate" and "scale probation" mechanisms.

    TODO: the # of consecutive valid results should also determine
        a limit on jobs in progress for an app version.
        Namely, if N is the threshold for host scaling, the limit should be
            ndevices*(max(1, consecutive_valid - N))
        The client currently doesn't supply enough
        app version info to do this.
        It could be approximated; that would give some protection
        against cherry-picking.
- credit: more conservative formulas for combining claimed credit
    among replicas.
    If there are normal replicas, we use a "low average"
    that weights each sample by the sum of the other samples.
    Otherwise we use the min (not the average) of the approximate samples.

NOTE: a DB update is required


svn path=/trunk/boinc/; revision=21230
2010-04-21 19:33:20 +00:00
David Anderson 61195cb59d - validator: fix bug where host.total_credit not incremented
svn path=/trunk/boinc/; revision=21211
2010-04-19 21:46:45 +00:00
David Anderson 021edb02c2 - back end programs: improve log msgs
svn path=/trunk/boinc/; revision=21193
2010-04-16 18:07:08 +00:00
David Anderson 02717af2f3 - bug fixes
svn path=/trunk/boinc/; revision=21187
2010-04-15 21:58:44 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
David Anderson e276aa5ed6 - server: make the -d 4 feature work with FCGI
svn path=/trunk/boinc/; revision=21109
2010-04-05 23:12:02 +00:00