Commit Graph

72 Commits

Author SHA1 Message Date
David Anderson 1d82913b80 Scheduler: fix bug in parsing project prefs. 2016-08-20 15:01:44 -07:00
David Anderson 09d67f794a scheduler (HR): detect "ARM" in p_model as well as p_vendor.
Some Android devices report p_vendor as blank, but p_model contains "ARM".
2016-07-30 00:07:33 -07:00
David Anderson 89dc25bc6c Condor interface: compile fix 2016-07-27 23:18:30 -07:00
David Anderson 671446ced1 code cleanup: in scheduler, factor project prefs into their own struct 2016-07-27 15:15:08 -07:00
David Anderson 995fdc38d5 plan class mechanism: allow mem usage to depend on #CPUs
Add optional <mem_usage_base> and <mem_usage_per_cpu> elements
for MT plan classes.
These let you make the estimated memory usage depend on the # of CPUs used,
for example, for VM apps that run multiple jobs within the VM.
The mem usage is base + NCPUS*per_cpu.
This is used for:
1) to limit the # of CPUs, based on client's available memory
2) to determine workunit.rsc_memory_bound
    (the value in the input template is overridden)
2016-06-29 13:06:10 -07:00
David Anderson 7878d24f95 Add header comments to sched/*.cpp 2016-06-24 15:42:11 -07:00
Christian Beer c19cb4675f initialize fields in constructor
fixes CID 28041 found by Coverity
2015-10-23 13:46:59 +02:00
Christian Beer a58ffd2983 initialize fields in constructor
fixes CID 28038 found by Coverity
2015-10-23 13:44:20 +02:00
Christian Beer 9aea04fe22 initialize field in constructor
fixes CID 28002 found by Coverity
2015-10-21 16:57:18 +02:00
David Anderson a87d039f49 server: more 64-bit ID fixes
negative values are stored in app_version_id fields to represent
anonymous platform versions.
So need to use %ld rather than %lu for these fields.

Also there were a couple of more changes of int do DB_ID_TYPE
2015-07-29 17:32:57 -07:00
David Anderson 8cd8c8e7ee server software: handle 64-bit database IDs
The SETI@home result table is about to run out of 32-bit IDs,
so we need to move to 64-bit result IDs.
This will happen to the workunit table at some point too.

I changed the server C++ code to use the "long" type for all DB IDs
(and to use appropriate conversion codes like %lu).
"long" is 64 bit on 64-bit machines.
For uniformity I did this for all tables,
even ones (like app) that will never get big.

I chose NOT to change the DB schema for now.
The new code will work with 32-bit ID fields in the DB.
As projects approach the 32-bit limit on a table they can change
its ID field, and fields that reference this table, to BIGINT.
This is likely to happen only on the result and workunit tables.
I put functions in html/ops/db_update.php
to change the IDs of these tables.
2015-07-23 10:11:08 -07:00
David Anderson b1ee6b00c4 scheduler, FCGI: flush log file after each request. From Nicolas. 2015-01-09 20:40:02 -08:00
David Anderson 6c5849d817 scheduler: fix bug that caused no jobs to be sent to pre-6.7 clients 2014-09-03 15:35:36 -07:00
David Anderson 89b51ea43d scheduler: preliminary support for generic coprocessors
A "generic" coprocessor is one that's reported by the client,
but's not of a type that the scheduler knows about (NVIDIA, AMD, Intel).

With this commit the following works:
- On the client, define a <coproc> in your cc_config.xml
  with a custom name, say 'miner_asic'.
- define a plan class such as
  <plan_class>
    <name>foobar</name>
    <gpu_type>miner_asic</gpu_type>
    <cpu_frac>0.5</cpu_frac>
  <plan_class>
- App versions of this plan class will be sent only to hosts
  that report a coproc of type "miner_asic".
  The <app_version>s in the scheduler reply will include
  a <coproc> element with the given name and count=1.
  This will cause the client (at least the current client)
  to run only one of these jobs at a time,
  and to schedule the CPU appropriately.

Note: there's a lot missing from this;
- app version FLOPS will be those of a CPU app;
- jobs will be sent only if CPU work is requested
... and many other things.
Fixing these issues requires a significant re-architecture of the scheduler,
in particular getting rid of the PROC_TYPE_* constants
and the associated arrays,
which hard-wire the 3 fixed GPU types.
2014-07-25 12:40:35 -07:00
David Anderson 9a9041cf7d server: fix support for client break; show it on web 2014-07-16 21:08:18 -07:00
David Anderson 572d7bf9e5 scheduler: handle <dont_send_work> tag from client
If a project has both NCI and non-NCI apps,
it needs to send NCI jobs even if the work request is zero;
otherwise a client might not get NCI jobs.

However, this policy will send NCI jobs even if the user
has set "no new tasks" for the project.
To handle this correctly, we recently added a <dont_send_work>
element to the scheduler request.
Add logic to the scheduler to parse and enforce this flag.
2014-07-03 00:39:33 -07:00
David Anderson 9889ee8fb6 scheduler: enforce GPU job limits separately for each GPU type
Previously, if a project specified a limit on GPU jobs in progress,
it would be enforced across GPU types.
This could lead to starvation for hosts with multiple GPU types.
E.g. the limit is 10, and a host has 10 NVIDIA jobs and no AMD jobs.

Fix this by enforcing limits separately for each GPU type.
2014-03-08 11:17:16 -08:00
David Anderson 5381def663 server: use gpu_active_frac in scheduling decisions
On some hosts, gpu_active_frac may be much less than active_frac
(i.e., GPUs may be available much less than CPUs).
Use gpu_active_frac in the following places:

- scheduler: in estimating the elapsed time of jobs,
    to decide whether they can meet deadline
- scheduler: in computing the effective speed of a (host, app version),
    when deciding what size class it belongs to
- size_census: in computing effective speed of (host, app versions)

(Previously, we were just using active_frac in all these cases)
2014-03-06 21:23:02 -08:00
David Anderson d861862ca1 server: fix compile warnings and file descriptor leaks
Also, we were using memset() to zero WORK_REQ,
which contains several std::vector's.
This apparently works on Linux, but not in general.
2014-01-08 22:00:13 -08:00
David Anderson 7d54e6537e scheduler: add <vm_accel_required> flag to plan class XML spec 2013-12-03 15:54:56 -08:00
David Anderson b9f0733c06 server: replace strcpy() with strlcpy() various places 2013-06-03 22:42:53 -07:00
David Anderson 12319ca82b - scheduler: add code (commented out for now) for new implementation
of score-based scheduling.
2013-04-09 11:10:50 -07:00
David Anderson 36c304e7d3 - client: maintain current and previous uptime, and include them in scheduler RPC request. - scheduler: parse them Note: this is to support a future feature where the scheduler will send non-checkpointing jobs only clients likely to be able to complete them. 2013-02-26 16:53:20 +01:00
David Anderson 96b8bc39d0 - user web: fix bug when do forum search on Google
svn path=/trunk/boinc/; revision=26101
2012-09-12 22:31:23 +00:00
David Anderson 9ccb8fa38d - scheduler: add support for limited locality scheduling
- API: remove support for PPM files


svn path=/trunk/boinc/; revision=26062
2012-08-27 17:00:43 +00:00
David Anderson 78f74661aa - distributed storage: move chunk_size to VDA_FILE.
Add some missing code.


svn path=/trunk/boinc/; revision=25854
2012-07-07 19:44:48 +00:00
David Anderson 8c71f6d59a - scheduler: add support for Intel GPUs, and restructure things
to make it easier to add other GPU types in the future


svn path=/trunk/boinc/; revision=25792
2012-06-25 23:09:45 +00:00
David Anderson fd0983b991 - web: server status page should show elapsed time, not CPU time
svn path=/trunk/boinc/; revision=25785
2012-06-22 07:35:54 +00:00
David Anderson 8d284f2b17 - scheduler: if we truncate the # of results accepted
(like we're doing in SETI@home)
    don't resend lost results since we don't know what they are


svn path=/trunk/boinc/; revision=25733
2012-06-05 03:48:05 +00:00
David Anderson adab6254bc Update Translation
svn path=/trunk/boinc/; revision=25477
2012-03-23 16:25:19 +00:00
David Anderson 516e5ad798 - storage stuff
svn path=/trunk/boinc/; revision=25354
2012-02-29 01:11:28 +00:00
David Anderson 14199c7b97 - web: change wording of buffer-size prefs
svn path=/trunk/boinc/; revision=25272
2012-02-16 16:52:07 +00:00
David Anderson dd16170fc1 - scheduler: the p_fpops value reported by clients can't be trusted.
Some credit cheats (e.g. with credit_by_runtime) can be done
    by reporting a huge value.
    Fix this by capping the value at 1.1 times the 95th percentile
    of host.p_fpops, taken over active hosts.


svn path=/trunk/boinc/; revision=25017
2012-01-09 17:35:48 +00:00
David Anderson e8657adfd2 - scheduler: change vbox_mt app plan function to use 1, 2 or 3 CPUs
depending on how many the host has,
    and whether CPU VM extensions are present
    (this reflects the requirements of CernVM).


svn path=/trunk/boinc/; revision=25009
2012-01-08 01:28:39 +00:00
David Anderson 0777ab174a - scheduler: if using homogeneous app version and a WU is committed
to a superceded or deprecated app version, use it anyway.
    The current app version may not validate against the old one.


svn path=/trunk/boinc/; revision=24823
2011-12-17 22:11:26 +00:00
David Anderson 2ac9fe8566 - client/scheduler:
If the file "client_opaque.txt" exists on the client,
    include its contents in scheduler request messages.
    On the scheduler, parse this into SCHEDULER_REQUEST::client_opaque,
    where it can be used by the customizable scheduler functions.


svn path=/trunk/boinc/; revision=24586
2011-11-14 06:27:36 +00:00
David Anderson c2cdaacf89 - scheduler: fix bugs that broke work fetch for anonymous platform;
don't send irrelevant messages to anon platform clients


svn path=/trunk/boinc/; revision=24326
2011-10-03 23:43:53 +00:00
David Anderson 7c81d72378 - web: fix warnings in forum pages
- scheduler: when using elapsed time stats to predict runtime,
    cap the estimated FLOPS at twice the peak FLOPS;
    otherwise, if a host has received a lot of very short jobs
    recently, it will get a too-high FLOPS estimate and
    will exceed the rsc_fpops_bound limit.


svn path=/trunk/boinc/; revision=24128
2011-09-05 17:29:53 +00:00
David Anderson c5c5975b44 - Improve interface of XML_PARSER.
Add parsed_tag and is_tag to the class,
    so that parsing functions don't need to declare them
    and pass them around.
- Complete the task of using XML_PARSER as the argument
    to all parsing functions.
    (Internally, many of these functions still use the old XML parser;
    that's the next step.)


svn path=/trunk/boinc/; revision=23978
2011-08-10 17:11:08 +00:00
David Anderson 27e05a3da9 - server: some stuff to prepare for distributed storage
- don't create result records for uploads and downloads.
        Just create a msg_to_client record.
    - the scheduler handles file-transfer results specially;
        it makes a vector of them, then calls a project-supplied function
        handle_file_xfer_results()
    - change the interface and implementation of put_file and get_file
- client write project sched priority in GUI RPC replies,
    but not to the state file


svn path=/trunk/boinc/; revision=23857
2011-07-19 20:52:41 +00:00
David Anderson 436415cfe1 - scheduler, back end: add "homogeneous app version" feature.
Lets you specify, on a per-app basis,
    that all instances should be done using the same app version.
    This is for validation in the presence of GPUs.
- scheduler: code cleanup
    - Instead of adding a bunch of non-DB fields to RESULT,
        used a derived class SCHED_DB_RESULT.
    - Instead of storing a pointer to BEST_APP_VERSION in RESULT,
        store the structure itself.
        This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
    on <file_xfer_debug>


svn path=/trunk/boinc/; revision=23636
2011-06-06 03:40:42 +00:00
David Anderson a7828abdda - scheduler: removed unused destructors in COPROC that
caused scheduler to crash (not sure why)


svn path=/trunk/boinc/; revision=23312
2011-04-01 21:21:11 +00:00
David Anderson eeab2aee92 - simulator work
- fix some indentation

svn path=/trunk/boinc/; revision=22891
2011-01-07 20:23:22 +00:00
David Anderson 18f2e90929 - client: work fetch: if the chosen project is currently uploading a file,
and an upload started in the last 5 min, don't fetch work from it.
    The goal is to merge the 2 scheduler RPCs
    (fetch work, report completed taskS) into a single RPC.
    Note: this may result in idleness in some cases.
- scheduler: if client doesn't handle plan class (pre-5.10),
    check plan-class app versions anyway,
    but only use if it's a single-CPU app.
    This allows single-CPU app versions with specific requirements
    (like SSE) to be issued to old clients.
    From Bernd Machenschalk


svn path=/trunk/boinc/; revision=22841
2010-12-13 22:58:15 +00:00
David Anderson f8e2d07cf9 - scheduler: add vbox32 and vbox64 plan classes for VirtualBox apps.
svn path=/trunk/boinc/; revision=22778
2010-11-30 19:36:07 +00:00
David Anderson be14996a1e - scheduler: deal correctly with jobs that need > 2GB RAM.
Such jobs fail on 32-bit machines, even if they have sufficient RAM,
    because 32-bit OSs don't support address spaces > 2GB.

    In general, we want to support the following scenario:
    - an app has a mixture of small (< 2GB) and big (> 2GB) jobs.
    - there are app versions for both 32b and 64b platforms
    - one of the 32b versions is faster than the 64b version
        (say, it's a 32b GPU app)

    Goals:
    If the client is 32b, send it only small jobs,
        using the fast 32b version if possible
    If the client is 64b and has sufficient RAM,
        send it large jobs using the 64b version;
        send it small jobs using the fast 32b version if possible,
        else the 64b version

    Solution: extend get_app_version() so that it detects big jobs,
        and uses only 64b versions for them.
        Add a "for_64b_jobs" field to BEST_APP_VERSION
        so that we maintain a separate memoized set of
        BEST_APP_VERSIONs for big jobs.

- client: don't set report_results_immediately inappropriately

svn path=/trunk/boinc/; revision=22440
2010-10-01 19:54:09 +00:00
David Anderson 3dffe0a8bc - API: remove deprected stuff related to:
1) old-style apps with graphics in main program.
        No one should be using these anymore.
    2) writing init_data.xml in boinc_finish().
        This was used by deprecated "compound app" scheme
- scheduler: if request reports results that were previously reported,
    that's evidence that the previous reply was not received by client.
    It may have contained results.
    So set a "resend lost results" flag.

svn path=/trunk/boinc/; revision=22203
2010-08-11 22:02:41 +00:00
David Anderson 23de5a887f - client/scheduler: tweak translatable messages
svn path=/trunk/boinc/; revision=22129
2010-08-04 18:41:24 +00:00
David Anderson 6b8a569d6d - client/scheduler: fix a group of bugs related to the new mechanism
where the client tells the scheduler which app versions
    its queued jobs use
    (this is needed, e.g., to enforce per-app or per-resource job limits).
    In this mechanism, the client sends an array of <app_version>s,
    and each <other_result> includes an index into this array.

    - The wrong index was being sent (client).
    - If an <app_version> had a non-existent app name
        (e.g. because that app had been deprecated)
        it wasn't getting put in the array, invalidating array indices
        Furthermore, an erroneous message was being sent to the user

        Fix: if parse error for <app_version>,
        put it in the array anyway, but with cav.app = NULL,
        meaning that it's a place-holder.
        Send a message to user only if anon platform.

- manager: increase notice buffers to 64K

svn path=/trunk/boinc/; revision=22052
2010-07-23 17:43:20 +00:00
David Anderson 0f613d61d8 - scheduler and client: fix the "allow multiple clients" feature.
This feature lets you run the BOINC client as a job on grid systems
    that handle only 1-CPU jobs;
    it disables various mechanisms that prevent multiple clients per host
    (which is normally a bad thing).
    Old:
        - Run the client with a --allow_multiple_clients flag.
            This tells it not to use a mutex that prevents
            multiple clients per host.
        - Run the project with the <multiple_clients_per_host> config flag.
            This suppresses two mechanisms:
            - (avoid duplicate host records)
                on a scheduler request with no host ID,
                looks for a host with same domain name, OS type,
                and mem size, and assumes the request is from that host
            - (job retry)
                If we get a request that doesn't have a host ID
                but does have a host CPID,
                mark its in-progress results as over
                NOTE: I CAN'T REMEMBER WHY WE SUPPRESS THIS;
                MARK S, DO YOU REMEMBER?

    Problem:
        if the grid clients attach to a project that
        doesn't use <multiple_clients_per_host>, bad things happen.
        E.g., if there are several requests at about the same time,
        most of them will fail with
        "another RPC already in progress" errors.
        If a project does include this flag,
        it loses protection from duplicate host records.

    New:
        - If the client is run with --allow_multiple_clients flag,
            it passes a <allow_multiple_clients> element
            in scheduler requests.
        - The scheduler skips the duplicate-host check on
            requests that include this flag.
        - There is no more <multiple_clients_per_host> scheduler option.

    Note: if a project using the old mechanism upgrades to this change,
    it will need to use new clients for its grid deployment.


svn path=/trunk/boinc/; revision=21839
2010-06-29 16:37:28 +00:00