Commit Graph

69 Commits

Author SHA1 Message Date
David Anderson afd8783b60 server: add <hr_class_static> option
For projects (like Lattice) that assign a WU's HR class when it's created,
we don't want the mechanism that clears the HR class
if there are error results and no in-progress of completed results.
This option suppresses this.
2014-12-10 23:18:40 -08:00
David Anderson dbd2d03a0d server/web: add support for per-application credit
See http://boinc.berkeley.edu/trac/wiki/PerAppCredit
If enabled (by the <credit_by_app> config flag)
validators will maintain on a per-(app, user, credit type) basis,
and same for teams,
in new DB tables credit_user and credit_team.
This info is displayed in the web site, on user and team pages,
using project-supplied functions to generate the HTML.

Note: update_stats doesn't decay the recent-average values
for per-app credit; I'll add this if needed.
2014-08-15 14:01:32 -07:00
David Anderson 61fb82e99b scheduler: add <maintenance_delay> config option.
Tells clients minimum delay to next request when the project is down.
Previously this was hardwired at 3600 sec.
2014-07-14 14:13:33 -07:00
David Anderson 81faa12ff3 scheduler: make matchmaker scheduling the default
Remove <matchmaker> config option; add <sched_old> option.
2014-07-08 12:35:45 -07:00
David Anderson 56934f8fbe scheduler: clean up job dispatch logging
There are now 3 flags for job dispatch logging:
<debug_send/>: info about work request, jobs sent, other high-level stuff
<debug_send_scan/>: info about scans through job cache
<debug_send_job/>: info about individual jobs (e.g. reason for not sending)
2014-06-12 11:33:11 -07:00
David Anderson 9889ee8fb6 scheduler: enforce GPU job limits separately for each GPU type
Previously, if a project specified a limit on GPU jobs in progress,
it would be enforced across GPU types.
This could lead to starvation for hosts with multiple GPU types.
E.g. the limit is 10, and a host has 10 NVIDIA jobs and no AMD jobs.

Fix this by enforcing limits separately for each GPU type.
2014-03-08 11:17:16 -08:00
David Anderson ef82d5d9fb server: fix compile error on systems that don't define MAXPATHLEN 2013-08-22 17:01:45 -07:00
David Anderson b9f0733c06 server: replace strcpy() with strlcpy() various places 2013-06-03 22:42:53 -07:00
Eric J Korpela 6c76ddd45c - SCHED: Fixed problem that prevented proper driver version checking in cuda and
nvidia plan classes in plan_class_spec.xml
- SCHED: Scheduler was not using properly estimated performance when assigning
  work.  It was using theoretical performance to choose version and actual
  preformance to determine how long it would take.  I've changed that to start
  with theoretical performance and converge to actual performance as
  host_app_version pfc_n increases.
- SCHED: Added some additional app version selection debugging output.
2013-05-19 11:08:36 -07:00
Eric J Korpela fd5c8c6e82 - Added scheduler config boolean option <estimate_flops_from_hav_pfc> which
allows projected_flops to be calculated from host_app_version pfc rather
  than elapsed time.  This is valuable if result elapsed times are highly
  variable and dependent on input.
2013-04-30 16:30:27 -07:00
David Anderson b250f1bb98 - scheduler: add <debug_client_files> flag for showing actions
involving sticky files
2013-03-05 13:43:14 +01:00
David Anderson 8ae9680134 - server programs: allow config.xml to be a symlink 2013-03-04 15:16:59 +01:00
David Anderson 01c0a9a4b0 - scheduler: when resend jobs:
- don't use devices for which work is not being requested
    - obey wu_is_infeasible_custom()
        (e.g. don't send SETI@home VLAR jobs to GPUs)
- scheduler: add <debug_array_detail> log flag for slot-level messages
- admin web: show and allow control of app.beta
2013-03-01 16:26:08 +01:00
David Anderson 9fb0328d9b - scheduler: add separate log flag for locality sched lite 2013-03-01 16:26:08 +01:00
David Anderson a9e78b6459 - volunteer storage: fix the way that hosts are classified as alive/dead
- add a config item vda_host_timeout.
        A host that hasn't done a scheduler RPC for this long
        is considered dead.
    - a host that's not running a version 7+ client is considered dead
    - host.cpu_efficiency (an otherwise unused field) is used
        as a flag for dead hosts
    - the scheduler clears the flag if the client is v7+
    - vdad sets the flag for hosts where last RPC is old
    - before choosing a host for chunk download,
        vdad checks its client version.


svn path=/trunk/boinc/; revision=26059
2012-08-24 19:06:41 +00:00
David Anderson 751fdd97ca - scheduler: add <min_cal_target>, <max_cal_target>
to plan class XML spec options;
    lets you specify a range of ATI GPU models to use


svn path=/trunk/boinc/; revision=25749
2012-06-07 21:08:47 +00:00
David Anderson f08ca99ff3 - scheduler: add max_results_accepted config option.
Limits mem usage by the scheduler, can prevent crashes.


svn path=/trunk/boinc/; revision=25748
2012-06-07 18:34:53 +00:00
David Anderson 32a08d27d9 - C++ code: use MAXPATHLEN for char arrays that hold paths
svn path=/trunk/boinc/; revision=25659
2012-05-09 16:11:50 +00:00
David Anderson 9caa637a4d - server: is_project_dir() was checking that cgi-bin is a directory.
This doesn't work if it's a symlink to a dir.
    Check for that too.


svn path=/trunk/boinc/; revision=25480
2012-03-23 17:45:04 +00:00
David Anderson 73474ac408 - scheduler: when HR is being used,
make per-HR slot allocation an option rather than the default.
    Kevin reported that slot allocation wasn't working for WCG.
    The default is now no slot allocation,
    and use the regular result enumeration function
    rather than the once that scans the entire table.
    The config flag for enabling slot allocation is <hr_allocate_slots/>.


svn path=/trunk/boinc/; revision=25432
2012-03-15 19:50:10 +00:00
David Anderson f18ffd6fe7 - VDA: add some log messages
- scheduler: add VDA
- client, web: change default prefs to min_buf=.1 days, max_buf=.5 days
- scheduler: app plan function for vbox requires 7.0+ client


svn path=/trunk/boinc/; revision=25351
2012-02-28 06:57:28 +00:00
Bernd Machenschalk 9cb28dd25c scheduler: Another feature for debugging the scheduler.
Previously (little known) the scheduler could be hacked to preserve
  the sched_request.xml and sched_reply.xml in own directories
  (you had to modify the initial value of use_files in sched_main.cpp).
  This feature could now be switched on and off on the fly just by
  changing the project config.
  When there is an (existing) directory configured as
  <debug_req_reply_dir>, each schduler instance will write three
  files in there: PID_C_sched.log, PID_C_sched_request.xml and (if all
  goes well) PID_C_sched_reply.xml. PID is the process id of this
  scheduler instance, C is an internal counter within the process if
  FCGI is used. The sched.log will contain nothing else than the
  pid and the IP address of the client. This should allow for
  identifying the scheduler instance responsible for a given
  apache error log message ("premature end of script headers") when
  a scheduler crashed. sched_request.xml (obviously) is the scheduler
  request, and if the scheduler doesn't crash in between, there will
  also be the reply to the client kept in sched_reply.xml
  Remove the <debug_req_reply_dir> tag from the project config
  to turn this feature off.

svn path=/trunk/boinc/; revision=25349
2012-02-27 13:12:24 +00:00
Bernd Machenschalk 3fa88ac1e3 scheduler: the scheduler (stderr) log is buffered to keep the output of
one instance together in the scheduler.log when multiple instances are
  running. Currently the buffer has a fixed size of 32768 charaters.
  On one hand with much debug output this buffer may turn out to be
  too small. OTOH the log of this instance is completely lost in case
  of a crash, which doesn't help with debugging. Thus make the
  scheduler log buffer size configurable using the tag
  <scheduler_log_buffer> in project config. The default value is
  still the old size (32768), set it to 0 to disable buffering
  completely, e.g. for debugging.

svn path=/trunk/boinc/; revision=25348
2012-02-27 12:40:43 +00:00
Bernd Machenschalk d6290ac541 fix typo in previous commit (patch was using old parser)
svn path=/trunk/boinc/; revision=25347
2012-02-27 12:26:46 +00:00
Bernd Machenschalk 5bb86f79b8 scheduler: allow to configure userids for which the scheduler should
not scan the host table. This was previously hardcoded for
  Einstein@home to prevent some users with many (identical) hosts
  from flooding the DB with slow queries. Now add
  <dont_search_host_for_userid>userid</dont_search_host_for_userid>
  to the project config (in config.xml) for each such userid.

svn path=/trunk/boinc/; revision=25346
2012-02-27 12:08:25 +00:00
David Anderson c4d1229830 - scheduler: in version selection, when deciding which version is fastest,
we multiple projected FLOPS by a normal random var
    with mean 1 and stddev 0.1.
    Make the stddev configurable; in particular it can be zero.


svn path=/trunk/boinc/; revision=25311
2012-02-22 19:51:09 +00:00
David Anderson 61e169f270 - server: add volunteer data archival to the build system
svn path=/trunk/boinc/; revision=25285
2012-02-17 19:16:49 +00:00
David Anderson 130d6ed4f0 - server: revamp the "assigned job" mechanism.
This now supports two main use cases:
    1) there's a job that you want to run once on all hosts,
        present and future
        (or all hosts belonging to a user, or to a team).
        The job is never transitioned, validated, or assimilated.
    2) There's a normal job for which you want to use only
        hosts belonging to a specific user (e.g. cluster or cloud hosts).
        This restriction can be made either when the job is created,
        or on the fly,
        e.g. as part of a scheme for accelerating batch completion.
        For the latter purposes we now provide a function
            restrict_wu_to_user(DB_WORKUNIT&, int userid);

        The job goes through the standard
        transitioner/validator/assimilator path.

    These cases are enabled by config flags
        <enable_assignment_multi/>
        <enable_assignment/>
    respectively.

    Assignment of type 2) are no longer stored in shared mem,
    so there is no limit on their number.

    There is no longer a rule that assigned job names must contain "asgn".

    NOTE: this requires a database update.


svn path=/trunk/boinc/; revision=25169
2012-01-30 22:39:13 +00:00
David Anderson c5c5975b44 - Improve interface of XML_PARSER.
Add parsed_tag and is_tag to the class,
    so that parsing functions don't need to declare them
    and pass them around.
- Complete the task of using XML_PARSER as the argument
    to all parsing functions.
    (Internally, many of these functions still use the old XML parser;
    that's the next step.)


svn path=/trunk/boinc/; revision=23978
2011-08-10 17:11:08 +00:00
David Anderson 6d1133fb1d - scheduler: add <user_filter> config option.
If set, and a WU has nonzero batch,
    it is interpreted as a user ID,
    and the job will be sent only to hosts with that user ID.

    Note: the use of workunit.batch is arbitrary;
    we could also use workunit.opaque or other deprecated field.


svn path=/trunk/boinc/; revision=23556
2011-05-17 21:11:39 +00:00
David Anderson 6b0eba4641 - create_work and other tools: verify that the current dir,
parent dir, or BOINC_PROJECT_DIR actually is a project dir.
- client simulator: improvements


svn path=/trunk/boinc/; revision=23415
2011-04-21 17:04:42 +00:00
David Anderson f508218e4c - scheduler: <max_wus_in_progress> and <max_wus_in_progress_gpu>
are per-processor, not per-host.


svn path=/trunk/boinc/; revision=23326
2011-04-04 22:15:04 +00:00
David Anderson 53a7307305 - scheduler: fix nasty bug introduced in [23040]
that caused no jobs to be sent.


svn path=/trunk/boinc/; revision=23096
2011-02-23 21:22:45 +00:00
David Anderson 5421335dbb - transitioner: fix bug that could cause file deletion to not be done
for some WUs
- back end: fix the way "report grace period" is implemented
    old: result.report_deadline (i.e. what's in the DB) and
        the deadline sent to the client are the same.
        Some confusing and incorrect logic in the transitioner
        tries to provide the desired semantics.
    new: result.report_deadline is the deadline sent to the client,
        plus the grace period.
        No logic in the transitioner is needed.


svn path=/trunk/boinc/; revision=23040
2011-02-15 22:07:14 +00:00
David Anderson 43a3036101 - back end: allow the specification of a read-only DB replica
(in config.xml) to include DB name, user, and password.
- back end: add read-only replica info to SCHED_CONFIG,
    so that C++ programs can use the replica
    (currently only PHP code can use it)
- db_dump: use the read-only DB replica if it exists.


svn path=/trunk/boinc/; revision=22958
2011-01-28 22:03:46 +00:00
David Anderson b356552c9c - scheduler/feeder: add a project config option <dont_send_jobs>.
If set, the feeder doesn't read jobs into shmem,
    and the scheduler doesn't send jobs.
    Intended for use when a project wants to process
    a backlog of completed jobs and not issue more.

svn path=/trunk/boinc/; revision=22601
2010-10-28 19:02:19 +00:00
David Anderson 84679f482a - scheduler: change the "primary_platform_only" config option
to "prefer_primary_platform".
    If an app has only only 32-bit versions, use the for 64-bit clients.


svn path=/trunk/boinc/; revision=22282
2010-08-22 19:13:25 +00:00
David Anderson d79ca6a9f2 - scheduler: add <primary_platform_only> config option:
send only 64-bit app versions to 64-bit hosts 
    (the default is to send whatever app version is fastest)

svn path=/trunk/boinc/; revision=22183
2010-08-10 22:17:59 +00:00
David Anderson e0cea31781 - API: add result name to APP_INFO_DATA structure (for Volpex)
- scheduler: add max_download_urls_per_file config option
    (to limit the length of workunit.xml_doc,
    which is currently capped at 64KB).
    From Bernd.

svn path=/trunk/boinc/; revision=22082
2010-07-30 21:43:23 +00:00
David Anderson c0776ea188 - user web: put RSS item titles in CDATA
- sched: get rid of unused config items
- manager: msg tweak

svn path=/trunk/boinc/; revision=22045
2010-07-22 22:57:15 +00:00
David Anderson 0f613d61d8 - scheduler and client: fix the "allow multiple clients" feature.
This feature lets you run the BOINC client as a job on grid systems
    that handle only 1-CPU jobs;
    it disables various mechanisms that prevent multiple clients per host
    (which is normally a bad thing).
    Old:
        - Run the client with a --allow_multiple_clients flag.
            This tells it not to use a mutex that prevents
            multiple clients per host.
        - Run the project with the <multiple_clients_per_host> config flag.
            This suppresses two mechanisms:
            - (avoid duplicate host records)
                on a scheduler request with no host ID,
                looks for a host with same domain name, OS type,
                and mem size, and assumes the request is from that host
            - (job retry)
                If we get a request that doesn't have a host ID
                but does have a host CPID,
                mark its in-progress results as over
                NOTE: I CAN'T REMEMBER WHY WE SUPPRESS THIS;
                MARK S, DO YOU REMEMBER?

    Problem:
        if the grid clients attach to a project that
        doesn't use <multiple_clients_per_host>, bad things happen.
        E.g., if there are several requests at about the same time,
        most of them will fail with
        "another RPC already in progress" errors.
        If a project does include this flag,
        it loses protection from duplicate host records.

    New:
        - If the client is run with --allow_multiple_clients flag,
            it passes a <allow_multiple_clients> element
            in scheduler requests.
        - The scheduler skips the duplicate-host check on
            requests that include this flag.
        - There is no more <multiple_clients_per_host> scheduler option.

    Note: if a project using the old mechanism upgrades to this change,
    it will need to use new clients for its grid deployment.


svn path=/trunk/boinc/; revision=21839
2010-06-29 16:37:28 +00:00
David Anderson 7c51512cbf - transitioner: the format string for a DB query had %.15d instead of %.15e.
That produced a messed-up query that assigned garbage values to:
        host_app_version.turnaround_var
        host_app_version.turnaround_q
        host_app_version.max_jobs_per_day
        host_app_version.consecutive_valid
    To repair these:
        - set turnaround_var and turnaround_q to zero
        - if max_jobs_per_day is outside of
            (0..config.daily_result_quota)
            set it to config.daily_result_quota
        - if consecutive_valid is outside (0..1000), set it to zero
    I added a script, html/ops/repair_21812.php, that does this;
    if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
    (if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings


svn path=/trunk/boinc/; revision=21812
2010-06-25 18:54:37 +00:00
David Anderson 392f02f8b8 - API: add BOINC copyright notice to graphics2_win.cpp.
This file originally used code from the following tutorial,
    which shows how to open a window using GLUT:
    http://nehe.gamedev.net/data/lessons/lesson.asp?lesson=01
    The code has now been completely rewritten;
    in particular, it doesn't use GLUT anymore.
- scheduler: change default limit on #CPUs from 16 to 64

svn path=/trunk/boinc/; revision=21784
2010-06-21 21:14:34 +00:00
David Anderson 4147249de2 - server: delete old credit stuff
- user web: show host link in user result list.  Fixes #999


svn path=/trunk/boinc/; revision=21735
2010-06-12 22:08:15 +00:00
David Anderson f849faea5e - scheduler: bug fixes for jobs-in-progress limits
- client: msg tweak

svn path=/trunk/boinc/; revision=21692
2010-06-04 16:57:33 +00:00
David Anderson 356327d88c - scheduler: change backoff policy if a host has reached daily job quota.
Old: back off until random time in 1st hour of next day
    New: no server-dictated backoff; rely on client backoff
    This is needed to let hosts recover in a reasonable amount of time
    after a burst of errors.
- scheduler config: it turns out we can't put arbitrary XML in config.xml;
    The Python code is set up to parse only 1 level of tags (??),
    and I'm not up to the task of changing this.
    So the fine-grained job limit feature [21674] needs to use
    a different file, namely config_aux.xml

svn path=/trunk/boinc/; revision=21686
2010-06-03 04:59:27 +00:00
David Anderson cf7fb29227 - scheduler: add fine-grained "max jobs in progress" control.
You can now specify limits for specific apps,
    and/or for the project as a whole.
    Within each of these, you can specify limits on
    CPU jobs, GPU jobs, or total jobs.
    In the case of CPU and GPU limits, you can specify
    whether the limit should be scaled by the number of devices.

    Note: the enforcement of this is done in get_app_version(),
    since per-resource-type limits may dictate what app versions
    we can use for a particular job.

svn path=/trunk/boinc/; revision=21674
2010-06-01 23:41:07 +00:00
David Anderson d45d3b488f - server: code cleanup
svn path=/trunk/boinc/; revision=21664
2010-06-01 03:45:49 +00:00
David Anderson 021edb02c2 - back end programs: improve log msgs
svn path=/trunk/boinc/; revision=21193
2010-04-16 18:07:08 +00:00
David Anderson fb851311e0 - server: various changes;
see http://boinc.berkeley.edu/trac/wiki/CreditNew

    Projects will need to update DB and recompile all back-end programs.

    Summary:
    - new way of computing credit
    - "reliable host" mechanism is per app version
    - "host punishment" mechanism is per app version
    - adjustment of wu.rsc_fpops_est provides the
        equivalent of per app version DCF
    - max jobs in progress is now per app
    - max jobs per RPC is now per app

    TODO:
    - reliable mechanism:
        - populate and use host_app_version.error_rate
        - populate host_app_version.turnaround
    - host punishment:
        - populate host_app_version.max_jobs_per_day
        - populate host_app_version.n_jobs_today
        - use app.max_jobs_per_day_init
    - job limits:
        - use app.max_jobs_in_progress, max_gpu_jobs_in_progress
        - use app.max_jobs_per_rpc
    - adjust wu.rsc_fpops_est
    - remove old credit stuff
        fpops_cumulative, credit_multiplier
        credit computation in scheduler

- AVERAGE class: use the Knuth algorithm (Wikipedia)


svn path=/trunk/boinc/; revision=21021
2010-03-29 22:28:20 +00:00