Commit Graph

269 Commits

Author SHA1 Message Date
David Anderson 52152a5a4c Client: skip exclusion logic for resources that have no exclusions.
This may fix, or at least shed light on, a bug where the client
repeatedly requests work for a resources that already has plenty.
2013-10-14 14:41:59 -07:00
David Anderson b52d98b640 client: change per-project runnable job limit to a flat 1000 2013-07-09 13:52:50 -07:00
David Anderson 782a11e22f client: don't fetch work if project has > max(2000, ncpus*100) runnable jobs 2013-07-09 11:17:56 -07:00
David Anderson 4d47e2f170 client: don't request work from a project w/ > 1000 runnable jobs
Because of O(N^2) algorithms, the client becomes CPU-intensive
when there are lots of jobs.
This limit could be somewhat lower.
2013-07-07 13:13:57 -07:00
David Anderson 3614870952 client: don't request work from NCI project if "no new work" set 2013-06-26 20:36:44 -07:00
David Anderson 57a6d3d17a client (Android): make max battery temperature a preference
Note: internal change only; there's no GUI for this yet
2013-06-20 21:47:34 -07:00
David Anderson 73b990b4b0 client: fix bug that sometimes prevented work fetch when GPU exclusions used 2013-06-16 20:10:17 -07:00
David Anderson af8ccfe8b8 client: fix bug that delayed work fetch from non-CPU-intensive projects
We were waiting until there was no task for the project
before asking for another task.
We should have been waiting until there was no in-progress task.
2013-06-15 11:10:44 -07:00
David Anderson eee2879a57 client: fix bug that allowed work fetch request while file uploads active
A while back we added a mechanism intended to defer work-request RPCs
while file uploads are happening,
with the goal of reporting completed tasks sooner
and reducing the number of RPCs.
There were 2 bugs in this mechanism.
First, the decision of whether an upload is active was flawed;
if several uploads were active and 1 finished,
it would act like all had finished.
Second, when WORK_FETCH::choose_project.cpp() picks a project,
it sets p->sched_rpc_pending to RPC_REASON_NEED_WORK.
If we then decide not to request work because an upload
is active, we need to clear this field.
Otherwise scheduler_rpc_poll() will do an RPC to it,
piggybacking a work request and bypassing the upload check.
2013-06-14 22:40:43 -07:00
David Anderson 02fcc45ec4 client: fix work fetch bugs that caused incorrect GPU fetches 2013-06-10 10:36:05 -07:00
David Anderson f44bdb323d client: don't make empty work-request RPCs
It's reported that the client can repeatedly make work request RPCs
that don't request work for any resource.
I'm not sure why this happens, but prevent it.
2013-06-07 14:12:02 -07:00
David Anderson 73bd46c3fa client: don't ask an NCI project for work if current job still uploading
Note: we currently assume NCI projects have only 1 app.
Removing this assumption would be a little work.
2013-05-22 14:33:37 -07:00
David Anderson 3488b286cf client: don't piggyback work request in several situations
namely:
- some download stalled
- some task suspended
- too many uploading tasks
2013-05-21 22:01:30 -07:00
David Anderson e033347ba9 client: don't piggyback work request if project is NNW or suspended 2013-05-21 21:49:26 -07:00
David Anderson 8a1569c384 client: fix work-fetch bug that could starve a GPU if exclusions used 2013-05-16 12:38:55 -07:00
David Anderson c00f27a5a5 client: message tweak (show "don't need" in work request msg) 2013-04-26 12:19:43 -07:00
David Anderson 6c4b23e7d0 client: fix compile warnings
From Gianfranco Costamagna
2013-04-25 01:39:03 -07:00
David Anderson 63611be7e8 - client: fix bug in work fetch that caused infinite RPCs
if all projects backed off
- client emulator: disable "fetch master URL" logic
2013-04-08 11:33:49 -07:00
David Anderson fde9ab70a1 - client: fix bug in work fetch that prevented resource backoff 2013-04-04 16:20:29 -07:00
David Anderson 330a25893f - client emulator: parse <max_concurrent> in <app> in client_state.xml.
This gives you a way to simulate the effects of app_config.xml
- client: piggyback requests for resources even if we're backed off from them
- client: change resource backoff logic
    Old: if we requested work and didn't get any,
        back off from resources for which we requested work
    New: for each resource type T:
        if we requested work for T and didn't get any, back off from T
        Also, don't back off if we're already backed off
            (i.e. if this is a piggyback request)
        Also, only back off if the RPC was due to an automatic
            and potentially rapid source
            (namely: work fetch, result report, trickle up)
- client: fix small work fetch bug
2013-04-04 10:25:56 -07:00
David Anderson a5bcf6ab3b - client: work fetch message tweaks: show state before actions 2013-04-02 17:04:45 -07:00
David Anderson f6a61fe801 - client: major overhaul of work-fetch logic based on suggestions
by Jacob Klein.
    The new policy is roughly as follows:
    - find the highest-priority project P that is allowed
        to fetch work for a resource below buf_min
    - Ask P for work for all resources R below buf_max
        for which it's allowed to fetch work,
        unless there's a higher-priority project allowed
        to request work for R.
    If we're going to do an RPC to P for reasons other than work fetch,
    the policy is:
    - for each resource R for which P is the highest-priority project
        allowed to fetch work, and R is below buf_max,
        request work for R.
2013-04-02 12:32:28 -07:00
David Anderson 515deea4fb - client, work fetch: request # instances so that we have enough jobs
to use project's share of instances.
- client emulator: if client_state.xml doesn't have <no_rsc_apps>
    for a project, and the project doesn't have apps for that resource,
    the project can be asked for work for that resource.
2013-03-25 09:25:25 -07:00
David Anderson b93e80c6f5 - client: code cleanup. Some variable/function/constant names
contained "debt" when they actually refer to REC.
    Change these names to use "rec".
2013-03-24 11:22:01 -07:00
David Anderson 128da198b6 - client: rename two different functions named backoff()
to make it easier to see what's going on.
- fix code formatting in manager
2013-03-22 10:43:05 +01:00
David Anderson 1ef582aad6 - client: improve work fetch messages
- web: include user ID in email to moderators about banishment
2013-03-22 10:29:48 +01:00
David Anderson 546ea233a0 - client: fix small work fetch bug that caused the client to
not add a piggyback work request when it should have.
2013-03-15 13:38:45 +01:00
David Anderson fc6b050883 - client: removed unused code for old work-fetch logic 2013-03-15 13:38:45 +01:00
David Anderson 71b6508313 - client: add <fetch_on_update> config option;
requests work when you update a project
    even if it's not highest priority
2013-03-07 11:28:43 +01:00
David Anderson 2e23bfedaa - client, work fetch policy. Change policy for projects w/ GPU exclusions 2013-03-07 11:28:43 +01:00
David Anderson a63ebbc13e - client: change work fetch policy to work better with GPU exclusions
- scale amount of work request by
        (# non-excluded instances)/#instances
    - change policy:
        old: don't fetch work if #jobs > #non-excluded instances
        new: don't fetch work if # of instance-seconds used in RR sim
            > work_buf_min * (#non-exluded instances)/#instances
2013-03-07 11:28:42 +01:00
David Anderson c7a5156573 - client: work fetch: if there are idle devices, we need to ask
the highest-prio project for work for all of them
    (don't scale by the fetchable resource share!).
    This should fix some device starvation problems.
2013-03-05 16:00:35 +01:00
David Anderson 6afa644fed - client: backoff message tweaks 2013-03-05 13:38:06 +01:00
David Anderson f0254407ea - client: improved log messages for work fetch 2013-03-04 17:24:18 +01:00
David Anderson 7768f6da60 - client: fix bug where, when updating a project, we fail to request work even though higher-priority projects are marked as no-new-tasks or are otherwise ineligible for work fetch. 2013-03-04 14:09:43 +01:00
David Anderson 5457b6b77f - client: in checking reasons for not requesting work,
look at backoff last.
    Otherwise the user can get a misleading message if they
    update a project that's backed off
2013-03-01 16:17:19 +01:00
David Anderson 777f1f11e8 - client: change work fetch policy to avoid starving GPUs in situations where GPU exclusions are used. - client: fix bug in round-robin simulation when GPU exclusions are used.
Note: this fixes a major problem (starvation)
    with project-level GPU exclusion.
    However, project-level GPU exclusion interferes with most of
    the client's scheduling policies.
    E.g., round-robin simulation doesn't take GPU exclusion into account,
    and the resulting completion estimates and device shortfalls
    can be wrong by an order of magnitude.

    The only way I can see to fix this would be to model each
    GPU instance as a separate resource,
    and to associate each job with a particular GPU instance.
    This would be a sweeping change in both client and server.
2013-03-01 15:31:41 +01:00
David Anderson 446bc4ca28 - client: take GPU exclusions into account when making
initial work request to a project
- client: put some casts to double in NVIDIA detect code.
    Shouldn't make any difference.
- volunteer storage: truncate file to right size after retrieval


svn path=/trunk/boinc/; revision=26051
2012-08-20 23:41:27 +00:00
David Anderson 4fea52c6f2 - client: if a project has excluded GPUs of a given type,
allow it to fetch work of that type if the # of runnable
    jobs it <= the # of non-excluded instances (rather than 0).


svn path=/trunk/boinc/; revision=26045
2012-08-18 23:26:10 +00:00
David Anderson 9fa75d5044 - client: tweak to the above: never ask for work if buffer > max.
This is needed to prevent projects that use next_rpc_delay
    from queuing unbounded work.


svn path=/trunk/boinc/; revision=26003
2012-08-10 18:49:22 +00:00
David Anderson ff1a391ced - client: when we're making a scheduler RPC
for a reason other than work fetch,
    and we're deciding whether to piggyback a work request,
    skip the checks for hysteresis (buffer < min)
    and for per-resource backoff time.
    These checks are there only to limit the rate of RPCs,
    which is not relevant since we're doing one any.

    This fixes a bug where a project w/ sporadic jobs specifies
    a next_rpc_delay to ensure regular polling from clients.
    When these polls occur they should request work regardless of backoff.


svn path=/trunk/boinc/; revision=26002
2012-08-10 18:29:00 +00:00
David Anderson 26d702789c - client: fix error in runtime estimation for active tasks
svn path=/trunk/boinc/; revision=25987
2012-08-06 23:25:31 +00:00
David Anderson 555cecbcae - client: don't request work for backup project for a processor type
unless there are idle instances of that type
        

svn path=/trunk/boinc/; revision=25886
2012-07-22 06:18:24 +00:00
David Anderson 68f9880615 - client: remove "device" entry from CUDA_DEVICE_PROP,
and change types of mem-size fields from int to double.
    These fields are size_t in NVIDIA's version of this;
    however, cuDeviceGetAttribute() returns them as int,
    so I don't see where this makes any difference.
- client: fix bug in handling of <no_rsc_apps> element.
- scheduler: message tweaks.
    Note: [foo] means that the message is enabled by <debug_foo>.



svn path=/trunk/boinc/; revision=25849
2012-07-05 20:24:17 +00:00
David Anderson f6bd141b30 - client: further msg tweaks
svn path=/trunk/boinc/; revision=25830
2012-07-02 05:10:58 +00:00
David Anderson 7dcf119854 - client: msg tweak
svn path=/trunk/boinc/; revision=25828
2012-07-02 04:06:11 +00:00
David Anderson 89578050f7 - When the client makes a scheduler RPC without requesting work,
and there's a simple reason
    (e.g. the project is suspended, no-new-tasks, downloads stalled, etc.)
    show it in the event lot.
    If the reason is more complex, don't try to explain.


svn path=/trunk/boinc/; revision=25827
2012-07-02 03:43:05 +00:00
David Anderson f8c1665722 - client: keep track of the fraction of time that
1) a network connection is available and
    2) network communication is allowed and
    3) CPU computation is allowed
- If an app version is marked as needs_network,
    use the above fraction in estimating its rate of progress
- replace "core client" with "client" in comments.
- scheduler: message tweaks


svn path=/trunk/boinc/; revision=25803
2012-06-26 20:30:56 +00:00
David Anderson bbfbef0fe8 - client: code cleanup. Move RESULT and PROJECT to separate files
svn path=/trunk/boinc/; revision=25621
2012-04-30 21:00:28 +00:00
David Anderson 9d25481174 - scheduler: fix bug that tried to open plan class spec file
on each request.
- client: when showing how much work a scheduler request returned,
    scale by availability (as is done to show the amount of the request)
- client in account manager request, <not_started_dur> and
    <in_progress_dur> are in wall time, not run time
    (i.e. scale them by availability)

Note: there's some confusion in the code between runtime and wall time,
    where in general wall time = runtime / availability.
    New convention: let's use "runtime" for the former,
    and "duration" for the latter.

svn path=/trunk/boinc/; revision=25597
2012-04-25 04:10:29 +00:00