Commit Graph

14 Commits

Author SHA1 Message Date
David Anderson 94c8e53204 Server: add "punitive validation" mechanism
Say that a job has a "long-term failure" if it fails in a way
(as evidenced by its exit code and/or stderr)
suggesting that other jobs for that (host, app version) will fail too.
In this case we want to avoid sending more jobs to that (host, app version).

This implements this feature.
To use it, have your validator's init_result() return
VAL_RESULT_LONG_TERM_FAIL if it finds a long-term failure,
and run your validator with the --check_punitive option.
("Punitive" because we're "punishing" the host for its failure).

The validator punishes the (host, app version) by
setting host_app_version.max_jobs_per_day to 1.
One job per day can still be sent.
That way if the underlying problem is fixed
(e.g. the user enables VM acceleration in the BIOS)
we'll eventually go back to normal.

Also: normally HAV.max_jobs_per_day is scaled by the numbers
of CPUs and GPUs.
Disable this scaling in the case where it's 1.
2019-02-18 21:29:04 -08:00
David Anderson b857a37008 Add support for post-assigned credit
Add --post_assigned_credit option to validator.
If set, it gets claimed credit from result.claimed_credit
(put there by project's init_result() function).
The claimed credit of the canonical result is the job's granted credit.

Also changed --credit_from_runtime so that it averages
claimed credit across instances,
instead of just using the canonical instance.
2018-05-15 14:55:30 -07:00
David Anderson 13a5b9bf3e change multiple-inclusion guard names to BOINC_FILENAME_H 2017-04-07 23:54:49 -07:00
Bernd Machenschalk e82d9d87a9 Validator: implement "suspicious" results
A validator now has the possibility to mark a single result as "suspicious" by making init_result() return VAL_RESULT_SUSPICIOUS. If this is the single quorum result of an adaptive replication, this will trigger another task to be generated for validation.
2016-09-01 10:47:45 +02:00
David Anderson 1af264747f validator: fix 64-bit ID problem 2015-07-28 16:19:31 -07:00
David Anderson 759c23ed27 - server: create a harness for testing validator code.
If you link your functions (init_result(), compare_results(),
    cleanup_result()) with validate_test.cpp,
    you'll get a program that you can run as
        validate_test file1 file2
    and it will compare the two files
    (this works only for validators that expect 1 file per result).

    I added a makefile, sched/makefile_validator_test,
    that you can use for this.
- server: shuffle code so that the above doesn't need to
    link MySQL libraries
- client: if we fetch a master file and it contains no scheduler URLs,
    show a message of class INTERNAL_ERROR
- client/scheduler: make CUDA_DEVICE_PROP.totalGlobalMem a double,
    and remove dtotalGlobalMem.
    Although NVIDIA reports RAM size as a size_t,
    there's no reason to store it as an integer after that.


svn path=/trunk/boinc/; revision=25542
2012-04-10 00:32:35 +00:00
David Anderson b677f0c25e - validator: remove app and app_versions arguments from check_set().
These weren't used, and I'm not sure why they were added.
- include sched_limit.h in "make install" list

svn path=/trunk/boinc/; revision=21894
2010-07-12 21:35:05 +00:00
David Anderson fb851311e0 - server: various changes;
see http://boinc.berkeley.edu/trac/wiki/CreditNew

    Projects will need to update DB and recompile all back-end programs.

    Summary:
    - new way of computing credit
    - "reliable host" mechanism is per app version
    - "host punishment" mechanism is per app version
    - adjustment of wu.rsc_fpops_est provides the
        equivalent of per app version DCF
    - max jobs in progress is now per app
    - max jobs per RPC is now per app

    TODO:
    - reliable mechanism:
        - populate and use host_app_version.error_rate
        - populate host_app_version.turnaround
    - host punishment:
        - populate host_app_version.max_jobs_per_day
        - populate host_app_version.n_jobs_today
        - use app.max_jobs_per_day_init
    - job limits:
        - use app.max_jobs_in_progress, max_gpu_jobs_in_progress
        - use app.max_jobs_per_rpc
    - adjust wu.rsc_fpops_est
    - remove old credit stuff
        fpops_cumulative, credit_multiplier
        credit computation in scheduler

- AVERAGE class: use the Knuth algorithm (Wikipedia)


svn path=/trunk/boinc/; revision=21021
2010-03-29 22:28:20 +00:00
David Anderson 4f66bb4c95 - added copyright and license info to .C, .cpp, .h files
- scheduler: fix bug in adaptive replication:
    if send an unreplicated job to untrusted host,
    set both wu.target_nresults and wu.min_quorum to app.target_nresults.

svn path=/trunk/boinc/; revision=15762
2008-08-06 18:36:30 +00:00
David Anderson 05f703559f - scheduler: add preliminary support for "job size matching"
(attempt to send big jobs to fast hosts, small jobs to slow hosts).
    - have "census" compute mean/stdev of host speeds,
        write it to a file perf_info.txt
    - have feeder compute mean/stdev of sizes of jobs in shmem
    - have feeder read perf_info.txt into shmem
- scheduler: add some debugging messages for app version selection
- Add LGPL license to a few files
- upgrade/setup scripts: copy census to bin/


svn path=/trunk/boinc/; revision=15136
2008-05-06 19:53:49 +00:00
David Anderson 8098622210 - Validator framework: remove some consts, and other changes,
to allow validator to assign different credit
    to different instances of a job
- Scheduler: if can't open DB, return <project_is_down/>
    (fixes #578)
- clean up logic of modify_claimed_credit
- feeder: for -priority_order_create_time, use workunitid
    rather than create time (faster for the DB)
from Kevin Reed

svn path=/trunk/boinc/; revision=14908
2008-03-13 23:35:13 +00:00
David Anderson ff91c8450f *** empty log message ***
svn path=/trunk/boinc/; revision=12004
2007-01-30 18:19:30 +00:00
David Anderson b0ce2533c6 *** empty log message ***
svn path=/trunk/boinc/; revision=11945
2007-01-23 21:37:27 +00:00
David Anderson 7d144b3d4d *** empty log message ***
svn path=/trunk/boinc/; revision=10291
2006-06-09 23:17:05 +00:00