Commit Graph

145 Commits

Author SHA1 Message Date
David Anderson afd8783b60 server: add <hr_class_static> option
For projects (like Lattice) that assign a WU's HR class when it's created,
we don't want the mechanism that clears the HR class
if there are error results and no in-progress of completed results.
This option suppresses this.
2014-12-10 23:18:40 -08:00
David Anderson dbd2d03a0d server/web: add support for per-application credit
See http://boinc.berkeley.edu/trac/wiki/PerAppCredit
If enabled (by the <credit_by_app> config flag)
validators will maintain on a per-(app, user, credit type) basis,
and same for teams,
in new DB tables credit_user and credit_team.
This info is displayed in the web site, on user and team pages,
using project-supplied functions to generate the HTML.

Note: update_stats doesn't decay the recent-average values
for per-app credit; I'll add this if needed.
2014-08-15 14:01:32 -07:00
David Anderson 61fb82e99b scheduler: add <maintenance_delay> config option.
Tells clients minimum delay to next request when the project is down.
Previously this was hardwired at 3600 sec.
2014-07-14 14:13:33 -07:00
David Anderson 81faa12ff3 scheduler: make matchmaker scheduling the default
Remove <matchmaker> config option; add <sched_old> option.
2014-07-08 12:35:45 -07:00
David Anderson 56934f8fbe scheduler: clean up job dispatch logging
There are now 3 flags for job dispatch logging:
<debug_send/>: info about work request, jobs sent, other high-level stuff
<debug_send_scan/>: info about scans through job cache
<debug_send_job/>: info about individual jobs (e.g. reason for not sending)
2014-06-12 11:33:11 -07:00
Eric J Korpela fd5c8c6e82 - Added scheduler config boolean option <estimate_flops_from_hav_pfc> which
allows projected_flops to be calculated from host_app_version pfc rather
  than elapsed time.  This is valuable if result elapsed times are highly
  variable and dependent on input.
2013-04-30 16:30:27 -07:00
David Anderson b250f1bb98 - scheduler: add <debug_client_files> flag for showing actions
involving sticky files
2013-03-05 13:43:14 +01:00
David Anderson 01c0a9a4b0 - scheduler: when resend jobs:
- don't use devices for which work is not being requested
    - obey wu_is_infeasible_custom()
        (e.g. don't send SETI@home VLAR jobs to GPUs)
- scheduler: add <debug_array_detail> log flag for slot-level messages
- admin web: show and allow control of app.beta
2013-03-01 16:26:08 +01:00
David Anderson 9fb0328d9b - scheduler: add separate log flag for locality sched lite 2013-03-01 16:26:08 +01:00
David Anderson a9e78b6459 - volunteer storage: fix the way that hosts are classified as alive/dead
- add a config item vda_host_timeout.
        A host that hasn't done a scheduler RPC for this long
        is considered dead.
    - a host that's not running a version 7+ client is considered dead
    - host.cpu_efficiency (an otherwise unused field) is used
        as a flag for dead hosts
    - the scheduler clears the flag if the client is v7+
    - vdad sets the flag for hosts where last RPC is old
    - before choosing a host for chunk download,
        vdad checks its client version.


svn path=/trunk/boinc/; revision=26059
2012-08-24 19:06:41 +00:00
David Anderson 751fdd97ca - scheduler: add <min_cal_target>, <max_cal_target>
to plan class XML spec options;
    lets you specify a range of ATI GPU models to use


svn path=/trunk/boinc/; revision=25749
2012-06-07 21:08:47 +00:00
David Anderson f08ca99ff3 - scheduler: add max_results_accepted config option.
Limits mem usage by the scheduler, can prevent crashes.


svn path=/trunk/boinc/; revision=25748
2012-06-07 18:34:53 +00:00
David Anderson 0492e0c2b8 - scheduler: add <need_ati_libs> option
svn path=/trunk/boinc/; revision=25747
2012-06-07 03:39:37 +00:00
David Anderson 73474ac408 - scheduler: when HR is being used,
make per-HR slot allocation an option rather than the default.
    Kevin reported that slot allocation wasn't working for WCG.
    The default is now no slot allocation,
    and use the regular result enumeration function
    rather than the once that scans the entire table.
    The config flag for enabling slot allocation is <hr_allocate_slots/>.


svn path=/trunk/boinc/; revision=25432
2012-03-15 19:50:10 +00:00
David Anderson f18ffd6fe7 - VDA: add some log messages
- scheduler: add VDA
- client, web: change default prefs to min_buf=.1 days, max_buf=.5 days
- scheduler: app plan function for vbox requires 7.0+ client


svn path=/trunk/boinc/; revision=25351
2012-02-28 06:57:28 +00:00
Bernd Machenschalk 9cb28dd25c scheduler: Another feature for debugging the scheduler.
Previously (little known) the scheduler could be hacked to preserve
  the sched_request.xml and sched_reply.xml in own directories
  (you had to modify the initial value of use_files in sched_main.cpp).
  This feature could now be switched on and off on the fly just by
  changing the project config.
  When there is an (existing) directory configured as
  <debug_req_reply_dir>, each schduler instance will write three
  files in there: PID_C_sched.log, PID_C_sched_request.xml and (if all
  goes well) PID_C_sched_reply.xml. PID is the process id of this
  scheduler instance, C is an internal counter within the process if
  FCGI is used. The sched.log will contain nothing else than the
  pid and the IP address of the client. This should allow for
  identifying the scheduler instance responsible for a given
  apache error log message ("premature end of script headers") when
  a scheduler crashed. sched_request.xml (obviously) is the scheduler
  request, and if the scheduler doesn't crash in between, there will
  also be the reply to the client kept in sched_reply.xml
  Remove the <debug_req_reply_dir> tag from the project config
  to turn this feature off.

svn path=/trunk/boinc/; revision=25349
2012-02-27 13:12:24 +00:00
Bernd Machenschalk 3fa88ac1e3 scheduler: the scheduler (stderr) log is buffered to keep the output of
one instance together in the scheduler.log when multiple instances are
  running. Currently the buffer has a fixed size of 32768 charaters.
  On one hand with much debug output this buffer may turn out to be
  too small. OTOH the log of this instance is completely lost in case
  of a crash, which doesn't help with debugging. Thus make the
  scheduler log buffer size configurable using the tag
  <scheduler_log_buffer> in project config. The default value is
  still the old size (32768), set it to 0 to disable buffering
  completely, e.g. for debugging.

svn path=/trunk/boinc/; revision=25348
2012-02-27 12:40:43 +00:00
Bernd Machenschalk 5bb86f79b8 scheduler: allow to configure userids for which the scheduler should
not scan the host table. This was previously hardcoded for
  Einstein@home to prevent some users with many (identical) hosts
  from flooding the DB with slow queries. Now add
  <dont_search_host_for_userid>userid</dont_search_host_for_userid>
  to the project config (in config.xml) for each such userid.

svn path=/trunk/boinc/; revision=25346
2012-02-27 12:08:25 +00:00
David Anderson c4d1229830 - scheduler: in version selection, when deciding which version is fastest,
we multiple projected FLOPS by a normal random var
    with mean 1 and stddev 0.1.
    Make the stddev configurable; in particular it can be zero.


svn path=/trunk/boinc/; revision=25311
2012-02-22 19:51:09 +00:00
David Anderson 61e169f270 - server: add volunteer data archival to the build system
svn path=/trunk/boinc/; revision=25285
2012-02-17 19:16:49 +00:00
David Anderson 130d6ed4f0 - server: revamp the "assigned job" mechanism.
This now supports two main use cases:
    1) there's a job that you want to run once on all hosts,
        present and future
        (or all hosts belonging to a user, or to a team).
        The job is never transitioned, validated, or assimilated.
    2) There's a normal job for which you want to use only
        hosts belonging to a specific user (e.g. cluster or cloud hosts).
        This restriction can be made either when the job is created,
        or on the fly,
        e.g. as part of a scheme for accelerating batch completion.
        For the latter purposes we now provide a function
            restrict_wu_to_user(DB_WORKUNIT&, int userid);

        The job goes through the standard
        transitioner/validator/assimilator path.

    These cases are enabled by config flags
        <enable_assignment_multi/>
        <enable_assignment/>
    respectively.

    Assignment of type 2) are no longer stored in shared mem,
    so there is no limit on their number.

    There is no longer a rule that assigned job names must contain "asgn".

    NOTE: this requires a database update.


svn path=/trunk/boinc/; revision=25169
2012-01-30 22:39:13 +00:00
David Anderson 6d1133fb1d - scheduler: add <user_filter> config option.
If set, and a WU has nonzero batch,
    it is interpreted as a user ID,
    and the job will be sent only to hosts with that user ID.

    Note: the use of workunit.batch is arbitrary;
    we could also use workunit.opaque or other deprecated field.


svn path=/trunk/boinc/; revision=23556
2011-05-17 21:11:39 +00:00
David Anderson 53a7307305 - scheduler: fix nasty bug introduced in [23040]
that caused no jobs to be sent.


svn path=/trunk/boinc/; revision=23096
2011-02-23 21:22:45 +00:00
David Anderson 5421335dbb - transitioner: fix bug that could cause file deletion to not be done
for some WUs
- back end: fix the way "report grace period" is implemented
    old: result.report_deadline (i.e. what's in the DB) and
        the deadline sent to the client are the same.
        Some confusing and incorrect logic in the transitioner
        tries to provide the desired semantics.
    new: result.report_deadline is the deadline sent to the client,
        plus the grace period.
        No logic in the transitioner is needed.


svn path=/trunk/boinc/; revision=23040
2011-02-15 22:07:14 +00:00
David Anderson 43a3036101 - back end: allow the specification of a read-only DB replica
(in config.xml) to include DB name, user, and password.
- back end: add read-only replica info to SCHED_CONFIG,
    so that C++ programs can use the replica
    (currently only PHP code can use it)
- db_dump: use the read-only DB replica if it exists.


svn path=/trunk/boinc/; revision=22958
2011-01-28 22:03:46 +00:00
David Anderson b356552c9c - scheduler/feeder: add a project config option <dont_send_jobs>.
If set, the feeder doesn't read jobs into shmem,
    and the scheduler doesn't send jobs.
    Intended for use when a project wants to process
    a backlog of completed jobs and not issue more.

svn path=/trunk/boinc/; revision=22601
2010-10-28 19:02:19 +00:00
David Anderson 84679f482a - scheduler: change the "primary_platform_only" config option
to "prefer_primary_platform".
    If an app has only only 32-bit versions, use the for 64-bit clients.


svn path=/trunk/boinc/; revision=22282
2010-08-22 19:13:25 +00:00
David Anderson d79ca6a9f2 - scheduler: add <primary_platform_only> config option:
send only 64-bit app versions to 64-bit hosts 
    (the default is to send whatever app version is fastest)

svn path=/trunk/boinc/; revision=22183
2010-08-10 22:17:59 +00:00
David Anderson e0cea31781 - API: add result name to APP_INFO_DATA structure (for Volpex)
- scheduler: add max_download_urls_per_file config option
    (to limit the length of workunit.xml_doc,
    which is currently capped at 64KB).
    From Bernd.

svn path=/trunk/boinc/; revision=22082
2010-07-30 21:43:23 +00:00
David Anderson c0776ea188 - user web: put RSS item titles in CDATA
- sched: get rid of unused config items
- manager: msg tweak

svn path=/trunk/boinc/; revision=22045
2010-07-22 22:57:15 +00:00
David Anderson 0f613d61d8 - scheduler and client: fix the "allow multiple clients" feature.
This feature lets you run the BOINC client as a job on grid systems
    that handle only 1-CPU jobs;
    it disables various mechanisms that prevent multiple clients per host
    (which is normally a bad thing).
    Old:
        - Run the client with a --allow_multiple_clients flag.
            This tells it not to use a mutex that prevents
            multiple clients per host.
        - Run the project with the <multiple_clients_per_host> config flag.
            This suppresses two mechanisms:
            - (avoid duplicate host records)
                on a scheduler request with no host ID,
                looks for a host with same domain name, OS type,
                and mem size, and assumes the request is from that host
            - (job retry)
                If we get a request that doesn't have a host ID
                but does have a host CPID,
                mark its in-progress results as over
                NOTE: I CAN'T REMEMBER WHY WE SUPPRESS THIS;
                MARK S, DO YOU REMEMBER?

    Problem:
        if the grid clients attach to a project that
        doesn't use <multiple_clients_per_host>, bad things happen.
        E.g., if there are several requests at about the same time,
        most of them will fail with
        "another RPC already in progress" errors.
        If a project does include this flag,
        it loses protection from duplicate host records.

    New:
        - If the client is run with --allow_multiple_clients flag,
            it passes a <allow_multiple_clients> element
            in scheduler requests.
        - The scheduler skips the duplicate-host check on
            requests that include this flag.
        - There is no more <multiple_clients_per_host> scheduler option.

    Note: if a project using the old mechanism upgrades to this change,
    it will need to use new clients for its grid deployment.


svn path=/trunk/boinc/; revision=21839
2010-06-29 16:37:28 +00:00
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson 4147249de2 - server: delete old credit stuff
- user web: show host link in user result list.  Fixes #999


svn path=/trunk/boinc/; revision=21735
2010-06-12 22:08:15 +00:00
David Anderson 356327d88c - scheduler: change backoff policy if a host has reached daily job quota.
Old: back off until random time in 1st hour of next day
    New: no server-dictated backoff; rely on client backoff
    This is needed to let hosts recover in a reasonable amount of time
    after a burst of errors.
- scheduler config: it turns out we can't put arbitrary XML in config.xml;
    The Python code is set up to parse only 1 level of tags (??),
    and I'm not up to the task of changing this.
    So the fine-grained job limit feature [21674] needs to use
    a different file, namely config_aux.xml

svn path=/trunk/boinc/; revision=21686
2010-06-03 04:59:27 +00:00
David Anderson cf7fb29227 - scheduler: add fine-grained "max jobs in progress" control.
You can now specify limits for specific apps,
    and/or for the project as a whole.
    Within each of these, you can specify limits on
    CPU jobs, GPU jobs, or total jobs.
    In the case of CPU and GPU limits, you can specify
    whether the limit should be scaled by the number of devices.

    Note: the enforcement of this is done in get_app_version(),
    since per-resource-type limits may dictate what app versions
    we can use for a particular job.

svn path=/trunk/boinc/; revision=21674
2010-06-01 23:41:07 +00:00
David Anderson d45d3b488f - server: code cleanup
svn path=/trunk/boinc/; revision=21664
2010-06-01 03:45:49 +00:00
David Anderson 9187cb52ba - client and scheduler RPC:
Add more info to "project in-progress job list".
    Old: entries included only job name and app plan class;
        this was used to resend lost jobs,
        and to count the # of CPU and GPU jobs.
        But it's not usable e.g. for per-app in-progress limits.
    New: send the client's app versions (including usage info)
        and for each in-progress job, which app version it uses.
        (This reduces request-message size compared with sending
        usage info and app name per job).
- client and scheduler RPC:
    Add more info to "all in-progress job list", and make it optional.
    This list is used by schedulers that do deadline checks
    using EDF workload simulation.
    Old: the list is always sent, and it contains no info
        about job resource usage
    New: the list is sent only if the scheduler asked for it
        in a previous reply,
        and each entry now contains resource usage (CPU, GPUs)
    Note: the scheduler's EDF simulator is outdated;
        it doesn't know about GPU jobs.
        But we may as well get the info in place.


svn path=/trunk/boinc/; revision=21513
2010-05-13 20:18:27 +00:00
David Anderson 5035007b90 - back end: new way of deciding:
- whether host is "reliable" for an app version
    - whether host is eligible for single replication for an app version
    - whether to use host scaling
    In each case, the answer is yes if the number of
    consecutive valid results is above a threshold.
    This replaces existing "error rate" and "scale probation" mechanisms.

    TODO: the # of consecutive valid results should also determine
        a limit on jobs in progress for an app version.
        Namely, if N is the threshold for host scaling, the limit should be
            ndevices*(max(1, consecutive_valid - N))
        The client currently doesn't supply enough
        app version info to do this.
        It could be approximated; that would give some protection
        against cherry-picking.
- credit: more conservative formulas for combining claimed credit
    among replicas.
    If there are normal replicas, we use a "low average"
    that weights each sample by the sum of the other samples.
    Otherwise we use the min (not the average) of the approximate samples.

NOTE: a DB update is required


svn path=/trunk/boinc/; revision=21230
2010-04-21 19:33:20 +00:00
David Anderson 021edb02c2 - back end programs: improve log msgs
svn path=/trunk/boinc/; revision=21193
2010-04-16 18:07:08 +00:00
David Anderson fb851311e0 - server: various changes;
see http://boinc.berkeley.edu/trac/wiki/CreditNew

    Projects will need to update DB and recompile all back-end programs.

    Summary:
    - new way of computing credit
    - "reliable host" mechanism is per app version
    - "host punishment" mechanism is per app version
    - adjustment of wu.rsc_fpops_est provides the
        equivalent of per app version DCF
    - max jobs in progress is now per app
    - max jobs per RPC is now per app

    TODO:
    - reliable mechanism:
        - populate and use host_app_version.error_rate
        - populate host_app_version.turnaround
    - host punishment:
        - populate host_app_version.max_jobs_per_day
        - populate host_app_version.n_jobs_today
        - use app.max_jobs_per_day_init
    - job limits:
        - use app.max_jobs_in_progress, max_gpu_jobs_in_progress
        - use app.max_jobs_per_rpc
    - adjust wu.rsc_fpops_est
    - remove old credit stuff
        fpops_cumulative, credit_multiplier
        credit computation in scheduler

- AVERAGE class: use the Knuth algorithm (Wikipedia)


svn path=/trunk/boinc/; revision=21021
2010-03-29 22:28:20 +00:00
Rytis Slatkevičius f239587bdb Sched: config option not to store stderr_out if exit_status==0 (to save on DB size). With help from Nicolas Alvarez.
svn path=/trunk/boinc/; revision=18528
2009-06-30 18:00:58 +00:00
David Anderson 2e5d9bd778 - scheduler: add new config option <max_wus_in_progress_gpus>.
The limit on jobs in progress is now
        max_wus_in_progress * NCPUS
        + max_wus_in_progress * NGPUS
    where NCPUS and NGPUS reflect prefs and are capped.
    Furthermore: if the client reports plan class for in-progress jobs
    (see checkin of 31 May 2009)
    then these limits are enforced separately;
    i.e. the # of in-progress CPU jobs is <= max_wus_in_progress*NCPUS,
    and the # of in-progress GPU jobs is <= max_wus_in_progress_gpu*NGPUS
- scheduler config: rename <cuda_multiplier> to <gpu_multiplier>
- scheduler: <max_wus_to_send> is now scaled by
    (NCPUS + gpu_multiplier*NGPUS)
- scheduler: don't keep scanning array if !work_needed()
- scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp
- scheduler: don't say "no work available" if jobs are available
    but work_needed() is initially false


svn path=/trunk/boinc/; revision=18255
2009-06-01 22:15:14 +00:00
David Anderson c2fda4db09 - scheduler: add <report_max> config parameter;
limits the # of completed results handled per scheduler RPC.
    This may be needed to avoid crashes due to memory allocation
    failure (each reported result uses about 128KB memory).
- web: In showing result lists,
    include "Validate error" results in the "Invalid" category.
    (Previously they didn't appear in any category)

svn path=/trunk/boinc/; revision=18104
2009-05-14 19:01:40 +00:00
David Anderson 12eb6057e5 - client, Mac: don't do res_init(). It causes a crash.
- client (Unix): if client crashes while benchmark processes are going,
    make sure they detect this and exit.
- back-end programs: remove hardwired assumptions about
    what directory they run in, and hence where config.xml is.
    E.g., daemons look for it in "..", others expect it in current dir.
    New approach: all the programs look for the project dir as follows:
    1) the environment var BOINC_PROJECT_DIR, if defined
    2) the current dir, if config.xml is there.
    3) else ".."
    This means you can run programs in either proj/bin/ or proj/,
    or (using BOINC_PROJECT_DIR) you can keep executables
    outside of the project dir.


svn path=/trunk/boinc/; revision=18042
2009-05-07 13:54:51 +00:00
David Anderson 41ed82f791 - scheduler: fix bugs that caused only 1 job to be sent
svn path=/trunk/boinc/; revision=17555
2009-03-07 01:00:05 +00:00
David Anderson 66ec889431 - scheduler: add <locality_scheduling_sticky_file>
and <locality_scheduling_workunit_file> options
    From Bernd M.

svn path=/trunk/boinc/; revision=17431
2009-03-03 00:25:41 +00:00
David Anderson aadf813336 - scheduler/feeder: add <locality_scheduler_fraction> option;
lets you intermix locality and job-cache scheduling
    From Bernd M.

svn path=/trunk/boinc/; revision=17429
2009-03-03 00:12:55 +00:00
David Anderson 2d707927ab - scheduler: replace choose_download_url_by_timezone with
replace_download_url_by_timezone.


svn path=/trunk/boinc/; revision=17427
2009-03-02 23:47:11 +00:00
Eric J. Korpela 8f3abcc835 - Added checks for net/*.h, arpa/*.h, netinet/*.h and code to figure out
which of those files to include
    - Modified MAC address check to work on some non-Linux unixes.
      (mac_address.cpp)
    - Added suggested change to "already attached to project" checking.
      (ProjectInfoPage.cpp)
    - changed includes of standard c header files to their c++ equivalents
      (i.e. replaced <stdio.h> with <cstdio>) for namespace protection.
    - replaced "using namespace std;" with more explicit "using std::function" in
      several files.
    - Fixed bug in checking whether the os is OS/2 and added conditional OS_OS2
      to the build environment. (boinc_platform.m4,configure.ac)
    - Changed build environment to not use -nostandardlibs unless we are using
      G++ and static linkage is specified. (configure.ac)
    - Added makefiles and package building files for solaris CSW package manager.
    - Fixed bug with attempting to find login name using logname. (configure.ac)
    - Added ifdef HAVE_* protection around some include files commonly found in
      sys.
    - Added support for unified binary for x86_64/i686-pc-solaris.
      (cs_platforms.cpp)
    - generate_host_cpid() now uses MAC address on non-linux unix.
      (hostinfo_network.cpp)
    - Macro BOINC_SET_COMPILE_FLAGS now doesn't check gcc only flags on non-gcc
      compilers. (boinc_set_compile_flags.m4)
    - Library compiles no longer depend upon the library extension or require
      the library to be prefixed with lib.
    - More fixes for fcgi builds.
    - Added declaration of "struct ether_addr" and ether_ntoa().  Have not yet
      implemented ether_ntoa() for machines that don't have it, or where it is
      buggy.  (unix_util.h)
    - Added FCGI::perror() which calls FCGI_perror(). (boinc_fcgi.{h,cpp})
    - Fixed library Makefiles so that all required headers get installed.


svn path=/trunk/boinc/; revision=17388
2009-02-26 00:23:23 +00:00
David Anderson 85a8e6a772 - scheduler: remove the config flag <have_cuda_apps>,
and add <cuda_multiplier>.
    The latter is used in calculating max jobs/day for a host;
    namely, it's host.max_results_day * (NCPUS + NCUDA*cuda_multiplier).
    Set it to 10 or so if you have CUDA apps.
- scheduler: don't overload effective_ncpus();
    instead, add two new functions,
    max_results_day_multiplier() and max_wus_in_progress_multiplier()
- scheduler: don't reduce max_results_day if we get an aborted job
    (it might have been aborted by the project;
    not appopriate to punish host in this case)

svn path=/trunk/boinc/; revision=16959
2009-01-20 00:54:16 +00:00