Commit Graph

627 Commits

Author SHA1 Message Date
David Anderson 11a6e85632 - scheduler: support for projects with some non-CPU-intensive apps
(but not all) wasn't finished.
    New logic: if the project has an NCI app then:
    - make a list of NCI apps for which the client doesn't have
        a job in progress.
    - try to send one job for each of these apps
    - do this even if no work is being requested.
    - don't send jobs for NCI apps by other mechanisms

NOTE: the client logic isn't quite right for mixed NCI projects.
    If there's no job for a given NCI app,
    the client should do a scheduler RPC.
    This isn't critical so we won't do this now.


svn path=/trunk/boinc/; revision=26068
2012-09-01 04:58:12 +00:00
David Anderson d02ff6e1c5 - fix typo
svn path=/trunk/boinc/; revision=26063
2012-08-28 06:33:53 +00:00
David Anderson 9ccb8fa38d - scheduler: add support for limited locality scheduling
- API: remove support for PPM files


svn path=/trunk/boinc/; revision=26062
2012-08-27 17:00:43 +00:00
David Anderson 32da1a7e37 - server: add support for having a mixture of CPU-intensive
and non-CPU-intensive applications.
    An app can be specified as non-CPU-intensive in project.xml,
    and this attribute can be set or cleared using the admin web interface.
    Note: support for this was added to the client in 2011,
    but we didn't add server-side support at that time.
    This change is in 6.12 and later clients.


svn path=/trunk/boinc/; revision=26060
2012-08-25 04:09:24 +00:00
David Anderson a9e78b6459 - volunteer storage: fix the way that hosts are classified as alive/dead
- add a config item vda_host_timeout.
        A host that hasn't done a scheduler RPC for this long
        is considered dead.
    - a host that's not running a version 7+ client is considered dead
    - host.cpu_efficiency (an otherwise unused field) is used
        as a flag for dead hosts
    - the scheduler clears the flag if the client is v7+
    - vdad sets the flag for hosts where last RPC is old
    - before choosing a host for chunk download,
        vdad checks its client version.


svn path=/trunk/boinc/; revision=26059
2012-08-24 19:06:41 +00:00
David Anderson e79d3ea4c8 - client: change the way project disk share is computed.
- Allow projects to report "desired disk usage" (DDU).
        If the client learns that a project wants disk space,
        it can shrink the allocation to other projects.
    - Base share computation on DDU rather than disk usage.
    - Introduce the notion of "disk resource share".
        This is defined (somewhat arbitrarily) as resource share
        plus 1/10 of the largest resource share.
        This is intended to ensure that even zero-share projects
        get enough disk space to store app versions and data files;
        otherwise they wouldn't be able to compute.
- server: use host.d_boinc_max (which wasn't being used)
    to start d_project_share reported by client.
- volunteer storage: change the way hosts are allocated to chunks.
    Allow hosts to store several chunks of the same file, if needed


svn path=/trunk/boinc/; revision=26052
2012-08-22 04:02:52 +00:00
David Anderson b029e352c9 - scheduler: if sending GPU description to pre-7.0 client,
call it CUDA instead of NVIDIA


svn path=/trunk/boinc/; revision=26042
2012-08-17 06:10:25 +00:00
David Anderson 0d42a4aa5c - file upload handler: add an #ifdef for disabling locking of files
while writing to them.
    It's not clear to me that this locking is beneficial,
    and it may be causing filesystem problems at WCG
- volunteer storage stuff


svn path=/trunk/boinc/; revision=26021
2012-08-15 21:27:38 +00:00
David Anderson 7335c036fc - server: volunteer storage bug fixes.
Note to self: jerasure's decoder program loops or crashs
        if there are no missing chunks.

svn path=/trunk/boinc/; revision=25995
2012-08-08 21:37:51 +00:00
David Anderson ab120dea9e - web: after post to a thread, show thread in user's chosen order
instead of newest first.


svn path=/trunk/boinc/; revision=25931
2012-08-01 17:57:56 +00:00
David Anderson 6e816094bd - volunteer data storage: intermediate checkin
svn path=/trunk/boinc/; revision=25890
2012-07-25 21:41:32 +00:00
David Anderson ac20215eb8 - volunteer storage: implement "vda status" command
svn path=/trunk/boinc/; revision=25887
2012-07-23 21:53:09 +00:00
David Anderson 9a84980792 - lib: treat MINGW32 like CYGWIN32 (in 1 place - should do everywhere?)
from Oliver


svn path=/trunk/boinc/; revision=25874
2012-07-17 03:59:12 +00:00
David Anderson 6a8075046b - Unix: include db/boinc_db_types.h in installed headers
- client: small code cleanup, no functional change


svn path=/trunk/boinc/; revision=25857
2012-07-10 17:28:04 +00:00
David Anderson 78f74661aa - distributed storage: move chunk_size to VDA_FILE.
Add some missing code.


svn path=/trunk/boinc/; revision=25854
2012-07-07 19:44:48 +00:00
David Anderson 68f9880615 - client: remove "device" entry from CUDA_DEVICE_PROP,
and change types of mem-size fields from int to double.
    These fields are size_t in NVIDIA's version of this;
    however, cuDeviceGetAttribute() returns them as int,
    so I don't see where this makes any difference.
- client: fix bug in handling of <no_rsc_apps> element.
- scheduler: message tweaks.
    Note: [foo] means that the message is enabled by <debug_foo>.



svn path=/trunk/boinc/; revision=25849
2012-07-05 20:24:17 +00:00
David Anderson 19458ba4de - Compile fixes for Fedora core 17. From Christian B. Fixes #1194.
- Fix various #include issues.

CODING STYLE LAW (minimal inclusion principle):
    If foo.cpp requires <blah.h>,
    #include <blah.h> in foo.cpp, NOT foo.h


svn path=/trunk/boinc/; revision=25837
2012-07-02 18:51:02 +00:00
David Anderson 1776a244ae - web: when showing a batch, recompute and update its fraction done
- feeder: don't enumerate results for WUs with nonzero error_mask
- scheduler: in slow_check(), make sure the WU error_mask is still zero


svn path=/trunk/boinc/; revision=25822
2012-06-29 06:53:48 +00:00
David Anderson fd0983b991 - web: server status page should show elapsed time, not CPU time
svn path=/trunk/boinc/; revision=25785
2012-06-22 07:35:54 +00:00
David Anderson 158aab8d5c - DB: add project_state and description fields to batch table.
Both are for use by project.
- job submission file sandbox: don't delete physical file
    when delete sandbox entry.
    We'll have to figure out how to garbage-collect physical files.
- LAMMPS job submission:
    use the 50th-percentile host,not 0th


svn path=/trunk/boinc/; revision=25734
2012-06-05 05:57:55 +00:00
David Anderson 761fb3f4c1 - admin web: add a function for "revalidating" a given set of jobs.
This reruns validation for instances that are successful
    but marked as invalid or inconclusive.
    Use this if you changed your validator to be more permissive,
    and you want to grant credit for instances that were
    originally marked as invalid.


svn path=/trunk/boinc/; revision=25714
2012-05-25 23:49:17 +00:00
Bernd Machenschalk 8b5b765bb7 scheduler: get app_version info for validator items
svn path=/trunk/boinc/; revision=25658
2012-05-09 08:04:21 +00:00
David Anderson 759c23ed27 - server: create a harness for testing validator code.
If you link your functions (init_result(), compare_results(),
    cleanup_result()) with validate_test.cpp,
    you'll get a program that you can run as
        validate_test file1 file2
    and it will compare the two files
    (this works only for validators that expect 1 file per result).

    I added a makefile, sched/makefile_validator_test,
    that you can use for this.
- server: shuffle code so that the above doesn't need to
    link MySQL libraries
- client: if we fetch a master file and it contains no scheduler URLs,
    show a message of class INTERNAL_ERROR
- client/scheduler: make CUDA_DEVICE_PROP.totalGlobalMem a double,
    and remove dtotalGlobalMem.
    Although NVIDIA reports RAM size as a size_t,
    there's no reason to store it as an integer after that.


svn path=/trunk/boinc/; revision=25542
2012-04-10 00:32:35 +00:00
David Anderson 86f50ba080 - admin web: when resetting app statistics,
clear elapsed time stats as well as PFC stats


svn path=/trunk/boinc/; revision=25530
2012-04-05 11:01:38 +00:00
David Anderson c703b68090 - server: allow <db_host> to include a :port
svn path=/trunk/boinc/; revision=25405
2012-03-12 21:45:29 +00:00
David Anderson 4a50b2b2e2 - wrapper: compute final CPU time correctly for multi-process apps
- storage stuff


svn path=/trunk/boinc/; revision=25356
2012-02-29 20:58:45 +00:00
David Anderson 127e905e0d - storage stuff. Getting there.
svn path=/trunk/boinc/; revision=25355
2012-02-29 07:22:59 +00:00
David Anderson 516e5ad798 - storage stuff
svn path=/trunk/boinc/; revision=25354
2012-02-29 01:11:28 +00:00
David Anderson ce52c9cf3e - storage stuff
svn path=/trunk/boinc/; revision=25341
2012-02-24 22:55:11 +00:00
David Anderson a8f883d2fa - server: split out the "antique file deletion" feature of
file_deleter.cpp into a separate program,
    since it blocks normal file deletion while it's running.
    From Bernd.
- storage stuff


svn path=/trunk/boinc/; revision=25321
2012-02-24 03:09:56 +00:00
David Anderson 2ed1cfbbb2 - scheduler and create_work: fix bugs that caused targeted jobs
to be sent to non-targeted hosts.
    The feeder was erroneously putting targeted jobs
    in the shared mem cache.
    Changes:
    - The feeder only enumerates jobs for which
        workunit.transitioner_flags is zero.
        NOTE: this field is nonzero iff the job is assigned.
    - create_work: when creating an assigned jobs,
        set workunit.transitioner_flags appropriately


svn path=/trunk/boinc/; revision=25314
2012-02-22 22:13:08 +00:00
David Anderson c4d1229830 - scheduler: in version selection, when deciding which version is fastest,
we multiple projected FLOPS by a normal random var
    with mean 1 and stddev 0.1.
    Make the stddev configurable; in particular it can be zero.


svn path=/trunk/boinc/; revision=25311
2012-02-22 19:51:09 +00:00
David Anderson d03f697456 - storage stuff
svn path=/trunk/boinc/; revision=25307
2012-02-21 20:55:09 +00:00
David Anderson 24d386e511 - db_purge: when deleting a workunit record,
delete any assignments that refer to it


svn path=/trunk/boinc/; revision=25284
2012-02-17 18:26:36 +00:00
David Anderson 1b8d6b098d - storage stuff (work in progress)
- small code shuffle


svn path=/trunk/boinc/; revision=25274
2012-02-16 23:59:26 +00:00
David Anderson caf56b8b6b - lib: change get_mac_address() to avoid sprintf(buf, "%s...", buf);
use strcat instead
- client: don't use get_mac_address() to create host CPIDs
    (we have plenty of other info to make them unique)
- storage stuff


svn path=/trunk/boinc/; revision=25269
2012-02-16 00:08:40 +00:00
David Anderson 540a16e2f0 - transitioner: fix bug that cause invalid SQL query
svn path=/trunk/boinc/; revision=25197
2012-02-04 00:18:37 +00:00
David Anderson 480e28b54c - web: fix the user search feature
- scheduler: parse d_project_share
- scheduler: if vbox and vbox_mt are both available,
    use vbox for a 1-CPU machine


svn path=/trunk/boinc/; revision=25176
2012-02-01 03:30:14 +00:00
David Anderson 130d6ed4f0 - server: revamp the "assigned job" mechanism.
This now supports two main use cases:
    1) there's a job that you want to run once on all hosts,
        present and future
        (or all hosts belonging to a user, or to a team).
        The job is never transitioned, validated, or assimilated.
    2) There's a normal job for which you want to use only
        hosts belonging to a specific user (e.g. cluster or cloud hosts).
        This restriction can be made either when the job is created,
        or on the fly,
        e.g. as part of a scheme for accelerating batch completion.
        For the latter purposes we now provide a function
            restrict_wu_to_user(DB_WORKUNIT&, int userid);

        The job goes through the standard
        transitioner/validator/assimilator path.

    These cases are enabled by config flags
        <enable_assignment_multi/>
        <enable_assignment/>
    respectively.

    Assignment of type 2) are no longer stored in shared mem,
    so there is no limit on their number.

    There is no longer a rule that assigned job names must contain "asgn".

    NOTE: this requires a database update.


svn path=/trunk/boinc/; revision=25169
2012-01-30 22:39:13 +00:00
David Anderson 10c79a7166 - scheduler: initialize COPROC_ATI::version to zero;
avoid sending spurious "update driver" messages


svn path=/trunk/boinc/; revision=25131
2012-01-23 21:59:12 +00:00
David Anderson c05444ad1e - GUI RPC: switching to the new XML parser
(which won't parse a double as an int)
    revealed a type mismatch in FILE_TRANSFER::next_request_time
    between client and server.


svn path=/trunk/boinc/; revision=25125
2012-01-23 05:03:52 +00:00
David Anderson dd16170fc1 - scheduler: the p_fpops value reported by clients can't be trusted.
Some credit cheats (e.g. with credit_by_runtime) can be done
    by reporting a huge value.
    Fix this by capping the value at 1.1 times the 95th percentile
    of host.p_fpops, taken over active hosts.


svn path=/trunk/boinc/; revision=25017
2012-01-09 17:35:48 +00:00
David Anderson e8657adfd2 - scheduler: change vbox_mt app plan function to use 1, 2 or 3 CPUs
depending on how many the host has,
    and whether CPU VM extensions are present
    (this reflects the requirements of CernVM).


svn path=/trunk/boinc/; revision=25009
2012-01-08 01:28:39 +00:00
David Anderson 95ebb112c2 - client: for VBox apps, check stderr for "ERR_CPU_VM_EXTENSIONS_DISABLED".
If found, set HOST_INFO::p_vm_extensions_disabled,
    and pass this to the scheduler.
- scheduler (VBox app plan function) if a host has p_vm_extensions_disabled
    set, don't sent it multicore VBox jobs.

Note: if you have a host with VM extensions, and they're disabled
    in the BIOS, and you enable them, you can remove the
    <p_vm_extensions_disabled> line from client_state.xml
    and you'll be eligible to get multicore VM jobs again.


svn path=/trunk/boinc/; revision=24944
2011-12-30 09:43:58 +00:00
David Anderson 8877aa5183 - web: in GPU model list page,
look for plan classes containing "nvidia" as well as "cuda".


svn path=/trunk/boinc/; revision=24614
2011-11-16 19:47:40 +00:00
David Anderson 22a911516c - server: more fixes to DB to handle unsigned result IDs
svn path=/trunk/boinc/; revision=24563
2011-11-09 17:27:50 +00:00
David Anderson 7c201eba3f - DB: use %u when writing result IDs in SQL queries;
this is to support SETI@home, which ran out of result IDs
    and changed the DB field type to int unsigned.
    Note: eventually I'll make this change official
    and change the .h types as well.
- web: put <apps_selected> tags around <app_id> elements
    in project-specific prefs.


svn path=/trunk/boinc/; revision=24555
2011-11-09 07:41:49 +00:00
David Anderson ae04b50a71 - client: don't crash if trickle up exceeds 64KB
(this bug was introduced Sept 20)
- scheduler: truncate long trickle-ups to 256KB; don't crash


svn path=/trunk/boinc/; revision=24535
2011-11-06 06:25:48 +00:00
David Anderson 4b826b52a0 - scheduler: fix bug in the "homogeneous app version" (HAV) feature
(reported by Kevin Reed).
    The problem: cache inconsistency.
    If there are 2 results for the same WU in shared mem,
    and 2 scheduler instances get them around the same time,
    they can send them with different app versions.
    We already fixed this problem for HR by
    1) rereading the relevant WU fields while deciding
        whether to send the result
    2) doing a "careful update" of the WU field using a where clause
        to make sure it wasn't modified in the (short) interval
        since rereading it.
    I fixed the HAV problem in the same way,
    and merged the two mechanisms to combine the DB queries.

    Also:
    - The rereads are done in slow_check() (see below).
    - The careful updates are done in update_wu_on_send(),
        and this is called *before* doing careful updates on result fields.
        That way, if the WU updates fail, we don't have orphaned results.
    - already_sent_to_different_platform_careful() (sic)
        no longer does DB stuff, so it's merged with
        already_send_to_different_hr_class() (better name)

    NOTE: slow_check() is used in array scheduling only.
        Score-based scheduling uses other code,
        in which this bug is not yet fixed.
        Locality scheduling doesn't support HR or HAV at all.
        This should be unified.


svn path=/trunk/boinc/; revision=24484
2011-10-26 07:15:22 +00:00
David Anderson 3410abef6f - backend API: added function cancel_jobs(minid, maxid)
for canceling jobs
- added program cancel_jobs for canceling jobs
- DB interface: it's not an error if update_fields_noid()
    affects != 1 rows


svn path=/trunk/boinc/; revision=24413
2011-10-18 07:15:04 +00:00