Commit Graph

60 Commits

Author SHA1 Message Date
Christian Beer 6c10091740 check return value of host.update_diff_validator()
fixes CID 27961 found by Coverity
2015-10-28 14:41:09 +01:00
David Anderson 1af264747f validator: fix 64-bit ID problem 2015-07-28 16:19:31 -07:00
David Anderson 8cd8c8e7ee server software: handle 64-bit database IDs
The SETI@home result table is about to run out of 32-bit IDs,
so we need to move to 64-bit result IDs.
This will happen to the workunit table at some point too.

I changed the server C++ code to use the "long" type for all DB IDs
(and to use appropriate conversion codes like %lu).
"long" is 64 bit on 64-bit machines.
For uniformity I did this for all tables,
even ones (like app) that will never get big.

I chose NOT to change the DB schema for now.
The new code will work with 32-bit ID fields in the DB.
As projects approach the 32-bit limit on a table they can change
its ID field, and fields that reference this table, to BIGINT.
This is likely to happen only on the result and workunit tables.
I put functions in html/ops/db_update.php
to change the IDs of these tables.
2015-07-23 10:11:08 -07:00
David Anderson 5ad43a6509 validator: add --wu_id N option for debugging single WU 2015-04-03 20:00:13 -07:00
David Anderson dbd2d03a0d server/web: add support for per-application credit
See http://boinc.berkeley.edu/trac/wiki/PerAppCredit
If enabled (by the <credit_by_app> config flag)
validators will maintain on a per-(app, user, credit type) basis,
and same for teams,
in new DB tables credit_user and credit_team.
This info is displayed in the web site, on user and team pages,
using project-supplied functions to generate the HTML.

Note: update_stats doesn't decay the recent-average values
for per-app credit; I'll add this if needed.
2014-08-15 14:01:32 -07:00
David Anderson dfc99e225c scheduler: don't resent job if app is deprecated or user has de-selected it 2014-06-08 20:20:25 -07:00
Bernd Machenschalk 34c823a9ab Merge branch 'EinsteinAtHome' into 'master'
This is meant not to break anything, just add some
(optional) logging and features needed for Einstein@Home.
Please contact me before changing or removing any of this.

Conflicts:
	sched/db_dump.cpp
	sched/file_deleter.cpp
	sched/validator.cpp
2014-05-26 14:42:36 +02:00
Bernd Machenschalk d67776a93c validator:
fix one_pass: leave main loop even if we did_something
2014-05-23 12:06:00 +02:00
Bernd Machenschalk 2f6d140c56 validator:
added options -min_wu_id and -max_wu_id to validator
2014-05-23 12:06:00 +02:00
Bernd Machenschalk 93798a5732 validator:
add '--dry_run' to validator daemon (run w/o DB update)
2014-05-23 12:06:00 +02:00
David Anderson 6f29a50812 validator: fixes and features
- add --is_gzip option to sample_bitwise_validator.
  If set, all files are treated as gzip archives.
  Check their 10-byte header to verify that it's a gzip file,
  but ignore it when comparing files.
- validator.cpp: don't error out on unparsed cmdline args,
  since we're now using them in sample_bitwise_validator
  and sample_substr_validator.
- fix build error on Debian
2014-03-20 12:38:29 -07:00
David Anderson cf0a0817c0 server: fix some compile warnings
Add a derived class DB_APP_VERSION_VAL for use by the validator,
containing the extra fields it uses,
so that we're not doing memset 0 on vectors
2014-03-19 14:55:16 -07:00
David Anderson 834ac11661 server: add sample validator that checks for string in stderr 2014-03-18 19:12:13 -07:00
Eric J Korpela 244ba5bc85 SCHED: modified scheduled log output to use unsigned format for WU and RESULT
ids.  This allows IDs greater than 2^31 to be printed.
2013-06-19 10:15:08 -07:00
David Anderson 78f7610f6e remove dependency of boinc_api.h on str_replace.h (and hence config.h)
Any files that use strlcpy() or strlcat() must directly include str_replace.h
2013-06-06 17:31:46 -07:00
David Anderson b9f0733c06 server: replace strcpy() with strlcpy() various places 2013-06-03 22:42:53 -07:00
David Anderson 9049737d1f validator: retry if transient failure
check_set() wasn't returning "retry" properly in the case where
one of the calls to init_result() return ERR_OPEN_DIR
(treated as a transient failure, since it can be caused by a failed NFS mount)
2013-05-20 13:01:10 -07:00
David Anderson 24e8133e4b - tabs -> spaces 2013-04-02 17:23:37 -07:00
Eric J Korpela f6ee54a602 Added a couple debugging statements. 2013-03-26 15:24:45 -07:00
David Anderson 980c9b66c9 - validator: fix confused logic.
A "viable" result is one that could potentially become the canonical result,
    i.e. the outcome is SUCCESS and the validate state is not INVALID.
    The existing code treated all results with outcome SUCCESS as viable,
    which is wrong.
    In particular, this could cause workunit.target_nresults
    to be incremented inappropriately.
2013-03-22 10:28:20 +01:00
David Anderson 3017ed943f - scheduler: debug the above 2013-02-26 16:44:26 +01:00
David Anderson 282af6effc - user web: show the right page/message after the following actions:
- rate a post
    - moderate a post
    - edit a post
    - report a post


svn path=/trunk/boinc/; revision=26152
2012-10-15 18:47:55 +00:00
David Anderson fc2af21221 - client: add missing end tag for <pci_info>. Doh!
- validator: add some sanity-checking for credit,
    to prevent granting 1e38 credit.
    max_granted_credit now defaults to the equivalent of 1 TeraFLOP-year.
    Instances that exceed this are not counted in the credit
    calculation, and a critical-mode log message is written
- wrapper: remove wall_cpu_time; not used anymore


svn path=/trunk/boinc/; revision=25825
2012-06-29 22:24:07 +00:00
David Anderson d41f79588d - server daemons: add daemon_sleep(n), which sleeps for n secs
but checks for the "stop_daemons" trigger file every 1 sec.
    Use this instead of sleep() in daemons.
    This will speed up bin/stop.


svn path=/trunk/boinc/; revision=25708
2012-05-23 18:11:59 +00:00
David Anderson 759c23ed27 - server: create a harness for testing validator code.
If you link your functions (init_result(), compare_results(),
    cleanup_result()) with validate_test.cpp,
    you'll get a program that you can run as
        validate_test file1 file2
    and it will compare the two files
    (this works only for validators that expect 1 file per result).

    I added a makefile, sched/makefile_validator_test,
    that you can use for this.
- server: shuffle code so that the above doesn't need to
    link MySQL libraries
- client: if we fetch a master file and it contains no scheduler URLs,
    show a message of class INTERNAL_ERROR
- client/scheduler: make CUDA_DEVICE_PROP.totalGlobalMem a double,
    and remove dtotalGlobalMem.
    Although NVIDIA reports RAM size as a size_t,
    there's no reason to store it as an integer after that.


svn path=/trunk/boinc/; revision=25542
2012-04-10 00:32:35 +00:00
Bernd Machenschalk df439c128b validator: output the version string even when not in project directory
svn path=/trunk/boinc/; revision=25345
2012-02-27 11:54:02 +00:00
David Anderson e8657adfd2 - scheduler: change vbox_mt app plan function to use 1, 2 or 3 CPUs
depending on how many the host has,
    and whether CPU VM extensions are present
    (this reflects the requirements of CernVM).


svn path=/trunk/boinc/; revision=25009
2012-01-08 01:28:39 +00:00
David Anderson 5020e3af2f - validator: for credit_from_runtime,
use result.flops_estimate rather than host.p_fpops;
    otherwise it doesn't work for multicore apps.
    TODO: cheat-proofing


svn path=/trunk/boinc/; revision=25006
2012-01-06 22:22:02 +00:00
David Anderson e49f945908 - Validator: allow project-specific code to mark a result
is a "runtime outlier", i.e. its runtime does
    not correspond to the job's rsc_fpops_est.
    Runtime outliers are not counted in the statistics for
    elapsed time, turnaround time, and peak FLOPs count.

    The is intended for applications like SETI@home,
    some of whose jobs finish more or less instantly
    (this happens if the data contains a lot of interference).
    If a host happens to get a bunch of these short jobs,
    its statistics will get skewed: in essence, the server
    will think that the host is extremely fast,
    and will send it too many jobs.


svn path=/trunk/boinc/; revision=24225
2011-09-16 16:43:15 +00:00
David Anderson 176b0a4327 - validator: add a --credit_from_runtime option.
This assigns credit proportional to runtime*p_fpops.
    To prevent cheating, p_fpops is capped at the 95th percentile value
    among active hosts,
    and runtime is capped at a specified limit.
    This option supports apps, like LHC's CERNvm app,
    that run for a certain amount of time and then exit.
    The CreditNew system doesn't work for such apps.
- trickle_credit:
    To prevent cheating,
    cap p_fpops at the 95th percentile value among active hosts,
    and require a limit on runtime.
- require that trickle handlers supply an initialization function


svn path=/trunk/boinc/; revision=24182
2011-09-13 21:01:42 +00:00
David Anderson 048c6a48a4 - validator: add --no_credit option;
maintains stats but doesn't grant credit


svn path=/trunk/boinc/; revision=24175
2011-09-13 05:23:10 +00:00
David Anderson 9b89168c49 - validator: in "credit_from_wu" case, record what the new credit
system would have assigned in result.claimed_credit.

svn path=/trunk/boinc/; revision=24088
2011-08-30 22:28:52 +00:00
David Anderson 4d45dda3d9 - validator: update credit statistics even if credit_from_wu
is being used.
- web: make almost everything translatable.  From Christian Beer.


svn path=/trunk/boinc/; revision=24048
2011-08-25 22:12:48 +00:00
David Anderson bffeeb0851 - web: don't error out on old-style notice URL
svn path=/trunk/boinc/; revision=23506
2011-05-05 14:56:32 +00:00
David Anderson fb04266eaf - validator: fix bug when check_pair() returns retry=true.
svn path=/trunk/boinc/; revision=23443
2011-04-25 18:27:03 +00:00
David Anderson 73dfafde79 - validator: if --credit_from_wu is set, and no credit specified in WU,
assign zero credit and keep going
- client simulator work


svn path=/trunk/boinc/; revision=23231
2011-03-14 06:27:51 +00:00
David Anderson 732866b8aa - back end: add two example trickle handlers:
trickle_credit: grants credit based on CPU time reported in msg
    trickle_echo: echoes trickle-up as a trickle-down

svn path=/trunk/boinc/; revision=23118
2011-02-27 00:10:14 +00:00
David Anderson b169e5ab0f - server programs: print error message instead of numeric retval
in log messages

svn path=/trunk/boinc/; revision=22647
2010-11-08 17:51:57 +00:00
David Anderson 8aa29bec33 - validator: fix another bug with --credit_from_wu
- make_project, update scripts: don't quit it user_profiles
    already exists


svn path=/trunk/boinc/; revision=22630
2010-11-05 17:15:27 +00:00
David Anderson 805f73b66c - validator: fix bug with --credit_from_wu
HOWEVER: use of this option is discouraged.
    Use the default credit system.

svn path=/trunk/boinc/; revision=22621
2010-11-03 22:06:56 +00:00
David Anderson 794214208f - validator: if credit calculation returns an error,
wait 6 hours before retrying

svn path=/trunk/boinc/; revision=22418
2010-09-28 20:17:09 +00:00
Bernd Machenschalk 5cb98247a3 validator, assimilator: added --help and --version
svn path=/trunk/boinc/; revision=21966
2010-07-16 07:15:57 +00:00
David Anderson b677f0c25e - validator: remove app and app_versions arguments from check_set().
These weren't used, and I'm not sure why they were added.
- include sched_limit.h in "make install" list

svn path=/trunk/boinc/; revision=21894
2010-07-12 21:35:05 +00:00
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson 89fab4ece5 - back end: change "daily result quota" mechanism.
Old: config.xml specifies an initial daily quota (say, 100).
        Each host_app_version starts out with this quota.
        On the return of a SUCCESS result,
        the quota is doubled, up to the initial value.
        On the return of an error result, or a timeout,
        the quota is decremented down to 1.
    Problem:
        Doesn't accommodate hosts that can do more than 100 jobs/day.
    New: similar, but
        - on validation of a job, daily quota is incremented.
        - on invalidation of a job, daily quota is decremented.
        - on return of an error result, or a timeout,
            daily quota is min'd with initial quota, then decremented.
    Notes:
        - This allows a host to have an unboundedly large quota
            as long as it continues to return more valid
            than invalid results.
        - Even with this change, hosts that return SUCCESS but
            invalid results will continue to get the initial daily quota.
            It would be desirable to reduce their quota to 1.

svn path=/trunk/boinc/; revision=21675
2010-06-02 00:11:01 +00:00
David Anderson 5035007b90 - back end: new way of deciding:
- whether host is "reliable" for an app version
    - whether host is eligible for single replication for an app version
    - whether to use host scaling
    In each case, the answer is yes if the number of
    consecutive valid results is above a threshold.
    This replaces existing "error rate" and "scale probation" mechanisms.

    TODO: the # of consecutive valid results should also determine
        a limit on jobs in progress for an app version.
        Namely, if N is the threshold for host scaling, the limit should be
            ndevices*(max(1, consecutive_valid - N))
        The client currently doesn't supply enough
        app version info to do this.
        It could be approximated; that would give some protection
        against cherry-picking.
- credit: more conservative formulas for combining claimed credit
    among replicas.
    If there are normal replicas, we use a "low average"
    that weights each sample by the sum of the other samples.
    Otherwise we use the min (not the average) of the approximate samples.

NOTE: a DB update is required


svn path=/trunk/boinc/; revision=21230
2010-04-21 19:33:20 +00:00
David Anderson b2451544e1 - server: change the following from per-host to per-(host, app version):
- daily quota mechanism
    - reliable mechanism (accelerated retries)
    - "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
    host_scale_check set.
- validator: do scale probation on invalid results
    (need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options

Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
    a host will have separate quotas for each one.
    That means it could error out on 100 jobs for cuda_fermi,
    and when its quota goes to zero,
    error out on 100 jobs for cuda23, etc.
    This is intentional; there may be cases where one version
    works but not the others.
- host.error_rate and host.max_results_day are deprecated

TODO:
    - the values in the app table for limits on jobs in progress etc.
        should override rather than config.xml.

Implementation notes:
scheduler:
    process_request():
        read all host_app_versions for host at start;
        Compute "reliable" and "trusted" for each one.
        write modified records at end
    get_app_version():
        add "reliable_only" arg; if set, use only reliable versions
        skip over-quota versions
    Multi-pass scheduling: if have at least one reliable version,
        do a pass for jobs that need reliable,
        and use only reliable versions.
        Then clear best_app_versions cache.
    Score-based scheduling: for need-reliable jobs,
        it will pick the fastest version,
        then give a score bonus if that version happens to be reliable.
    When get back a successful result from client:
        increase daily quota
    When get back an error result from client:
        impose scale probation
        decrease daily quota if not aborted
Validator:
    when handling a WU, create a vector of HOST_APP_VERSION
        parallel to vector of RESULT.
        Pass it to assign_credit_set().
        Make copies of originals so we can update only modified ones
    update HOST_APP_VERSION error rates
Transitioner:
    decrease quota on timeout


svn path=/trunk/boinc/; revision=21181
2010-04-15 03:13:56 +00:00
David Anderson 2536797068 - validator: remove update_credit_per_cpu_sec(). Irrelevant.
TODO: remove related code
- validator: update wu.canonical_credit correctly.
    However, this field should be deprecated.
- validator: check for error return from assign_credit_set().

svn path=/trunk/boinc/; revision=21096
2010-04-05 20:03:54 +00:00
David Anderson a2a661993b - validator: -d 4 means -d 3 plus print all DB queries
(todo: do this for all daemons)
- validator: change cmdline args from -foo to --foo
    (todo: do this for all daemons)
- validator: pass max_granted_credit to assign_credit_set()

svn path=/trunk/boinc/; revision=21093
2010-04-05 18:59:16 +00:00
David Anderson 19f7d66b53 - backend programs: change the way PFC and elapsed-time statistics
are written to the DB.
    The incremental approach was bogus.
    New approach:
    host_app_version: write directly; R/W interval is tiny
    app_version: maintain an explicit list of update samples
        for both PFC and credit.
        When the validator flushes its app_version cache,
        do careful updates.
    Note: when using double fields in careful updates,
    you can't test for equality.  Use abs(new-old) < 1e-N

svn path=/trunk/boinc/; revision=21057
2010-04-02 19:10:37 +00:00