Commit Graph

293 Commits

Author SHA1 Message Date
David Anderson 5226d620d0 client: allow initial scheduler request to request N instances.
I made a change on 27 Feb 2009 that set the initial request to 0 instances.
I'm not sure what the rationale was - the checkin note didn't say.
2015-06-21 00:40:01 -07:00
David Anderson ef22b2bd4b client: show projects in alphabetical order of project name
A while back I changed the job sched and work fetch policies to use
REC-based project priority.
The work fetch logic sorts the project list (in CLIENT_STATE::projects)
by descending priority.
This causes two problems:

- If you have a lot of projects, it's hard to find a particular one
  in the event log, e.g. in work_fetch_debug output.
- In the manager's Statistics tab, the selected project can change
  unexpectedly since we identify it by array index,
  and the array order may change.

Solution: sort CLIENT_STATE::projects alphabetically (case insensitive).
In WORK_FETCH, copy this array to a separate array,
that is then sorted by decreasing priority.
2014-12-17 09:56:01 -08:00
David Anderson eafd70ecc6 client: request work from backed-off resources if doing RPC anyway 2014-11-18 00:05:17 -08:00
David Anderson fbc6e40dca Client: fix bug that prevented work fetch for zero-share projects
In work fetch setup, we were computing rsc_project_reason
before doing the round-robin simulation.
It needs to be done after, because it uses the # of idle devices,
which is computed by the simulation.
2014-11-17 13:56:06 -08:00
David Anderson 4c9d1d6659 client: code cleanup and possible debugging in work fetch
- Remove code that tries to keep track of available GPU RAM
  and defer jobs that don't fit.
  This never worked, it relied on project estimates of RAM usage,
  and it's been replaced by having the app do temporary exit
  if alloc fails.
- Move logic for checking for deferred jobs from CPU
  to work fetch.
- Rename rsc_defer_sched to has_deferred_job,
  and move it from PROJECT to RSC_PROJECT_WORK_FETCH
- tweak work_fetch_debug output
2014-10-10 14:35:00 -07:00
David Anderson 9c96108c67 client: work fetch code cleanup
The logic for deciding whether to fetch work for a project
or a (project, resource type) pair
was scattered among several functions, with confusing names.
Consolidate this logic, and use consistent names.
2014-10-10 10:37:07 -07:00
David Anderson f63f259ce5 client: code cleanup 2014-10-10 07:15:10 -07:00
David Anderson 31541e166d client: set work requests for coprocs specified in cc_config.xml
We weren't copying the request fields from RSC_WORK_FETCH to COPROC.
Do this, and clean up the code a bit.

Note: the arrays that parallel the COPROCS::coprocs array
are a bit of a kludge; that stuff logically belongs in COPROC.
But it's specific to the client, so I can't put it there.
Maybe I could do something fancy with derived classes, not sure.
2014-08-09 21:44:39 -07:00
David Anderson c84e3f2607 client: fix build break 2014-07-13 00:42:34 -07:00
David Anderson 57bdeec5ec Client: improve task duration estimates for apps that don't report fraction done
The "static estimate" is wu.rsc_fpops_est/app_version.flops.
The problem is: what if the elapsed time exceeds this.
In this case we were returning elapsed time,
resulting in a "time remaining" of zero, which is bad.

Instead, use the same exponential model that we use to
estimate fraction done when it's not reported.
This has the advantages that:
- time remaining monotonically decreases
  (though potentially at a very slow rate)
- the combo of fraction done, elapsed time, and time remaining
  is consistent for apps that don't report fraction done
2014-07-12 14:31:57 -07:00
David Anderson 8d009ce3b3 client: scheduling and work fetch tweaks for GPU exclusion cases
Scheduling: if a resource has exclusions, put all jobs in the run list;
otherwise we might fail to have a job for a GPU instance, and starve it.

Work fetch: allow work fetch from zero-share projects if the resource
has instances that are idle because of GPU exclusion
2014-05-24 15:18:41 -07:00
David Anderson ac9e2b088d client emulator: make it work again 2014-05-21 10:41:55 -07:00
David Anderson 1e2fcb4b68 client/lib: change CONFIG to CC_CONFIG, config to cc_config.
Eliminates ambiguity of "config" global var, which is used in server code.
This confuses IDEs that are looking at all the code at once.
2014-05-08 00:51:18 -07:00
David Anderson e5810f3061 client/server: change implementation of "exact fraction done".
My last commit did this using a new API call.
But this would require rebuilding apps any time you want to change it;
too much work.
So instead make it an attribute of apps,
which you can set via the admin web interface.

Corresponding changes to client.
2014-05-04 00:02:32 -07:00
David Anderson 77c4dd7b32 API/client: let apps say that fraction done is precise
Currently the duration estimate for a task is a combination of
- a static estimate, based on wu.rsc_fpops_est and the estimated FLOPS
- a dynamic estimate, based on fraction done (FD) and elapsed time
The weighting of the dynamic estimate is FD^2;
the assumption is that fraction done is imprecise and improves
toward the end of a task.

This isn't ideal for apps that can supply accurate FD.

Solution: add a new API function
boinc_fraction_done_exact().
This notifies the client that the FD is accurate,
and that it should use only the dynamic estimate.
(New clients will do this; old clients will use the FD as the currently do).
2014-05-02 23:11:34 -07:00
David Anderson 6a8eab73cd replace tab characters with spaces 2014-05-01 21:03:49 -07:00
David Anderson 2acb991048 client: message tweaks 2014-03-15 20:10:49 -07:00
David Anderson 888c1a1e39 Merge branch 'master' of ssh://boinc.berkeley.edu/boinc-v2 2014-03-11 13:12:29 -07:00
David Anderson 994cbb5695 client (Android): fix bug that caused host venue change to be ignored 2014-03-11 13:12:15 -07:00
David Anderson b076a947fc client: work fetch tweak to avoid starvation in a particular case
My commit of Feb 7 caused work fetch to project P
to be deferred for up to 5 min if an upload to P is active,
even if some instances are idle.
This was to deal with a case where the idleness was caused
by a jobs-in-progress limit by P,
and work requests lead to long backoff.

However, this can cause instances to be idle unnecessarily.
I changed things so that, if instances are idle,
a work fetch can happen even during upload.
But only one such fetch will be done.
2014-03-09 17:09:21 -07:00
David Anderson df1d8e2bde server: store and display gpu_active_frac
- gpu_active_frac is the fraction of time GPU use is allowed
  while the client is running.
  Previously the client reported it but we weren't storing it in the DB.
  We may need it in the future for batch scheduling logic.
- fix a crashing bug in scheduler
- client: minor message tweak
2014-03-06 13:23:52 -08:00
David Anderson 5188d65bff client: use user-friendly GPU names in log msgs 2014-02-24 20:54:42 -08:00
David Anderson 1fb6d713dc client: message tweak 2014-02-24 23:29:37 -05:00
David Anderson fe8b26ac73 client: when not piggybacking work request, explain why in log msg 2014-02-24 18:45:25 -08:00
David Anderson 52152a5a4c Client: skip exclusion logic for resources that have no exclusions.
This may fix, or at least shed light on, a bug where the client
repeatedly requests work for a resources that already has plenty.
2013-10-14 14:41:59 -07:00
David Anderson b52d98b640 client: change per-project runnable job limit to a flat 1000 2013-07-09 13:52:50 -07:00
David Anderson 782a11e22f client: don't fetch work if project has > max(2000, ncpus*100) runnable jobs 2013-07-09 11:17:56 -07:00
David Anderson 4d47e2f170 client: don't request work from a project w/ > 1000 runnable jobs
Because of O(N^2) algorithms, the client becomes CPU-intensive
when there are lots of jobs.
This limit could be somewhat lower.
2013-07-07 13:13:57 -07:00
David Anderson 3614870952 client: don't request work from NCI project if "no new work" set 2013-06-26 20:36:44 -07:00
David Anderson 57a6d3d17a client (Android): make max battery temperature a preference
Note: internal change only; there's no GUI for this yet
2013-06-20 21:47:34 -07:00
David Anderson 73b990b4b0 client: fix bug that sometimes prevented work fetch when GPU exclusions used 2013-06-16 20:10:17 -07:00
David Anderson af8ccfe8b8 client: fix bug that delayed work fetch from non-CPU-intensive projects
We were waiting until there was no task for the project
before asking for another task.
We should have been waiting until there was no in-progress task.
2013-06-15 11:10:44 -07:00
David Anderson eee2879a57 client: fix bug that allowed work fetch request while file uploads active
A while back we added a mechanism intended to defer work-request RPCs
while file uploads are happening,
with the goal of reporting completed tasks sooner
and reducing the number of RPCs.
There were 2 bugs in this mechanism.
First, the decision of whether an upload is active was flawed;
if several uploads were active and 1 finished,
it would act like all had finished.
Second, when WORK_FETCH::choose_project.cpp() picks a project,
it sets p->sched_rpc_pending to RPC_REASON_NEED_WORK.
If we then decide not to request work because an upload
is active, we need to clear this field.
Otherwise scheduler_rpc_poll() will do an RPC to it,
piggybacking a work request and bypassing the upload check.
2013-06-14 22:40:43 -07:00
David Anderson 02fcc45ec4 client: fix work fetch bugs that caused incorrect GPU fetches 2013-06-10 10:36:05 -07:00
David Anderson f44bdb323d client: don't make empty work-request RPCs
It's reported that the client can repeatedly make work request RPCs
that don't request work for any resource.
I'm not sure why this happens, but prevent it.
2013-06-07 14:12:02 -07:00
David Anderson 73bd46c3fa client: don't ask an NCI project for work if current job still uploading
Note: we currently assume NCI projects have only 1 app.
Removing this assumption would be a little work.
2013-05-22 14:33:37 -07:00
David Anderson 3488b286cf client: don't piggyback work request in several situations
namely:
- some download stalled
- some task suspended
- too many uploading tasks
2013-05-21 22:01:30 -07:00
David Anderson e033347ba9 client: don't piggyback work request if project is NNW or suspended 2013-05-21 21:49:26 -07:00
David Anderson 8a1569c384 client: fix work-fetch bug that could starve a GPU if exclusions used 2013-05-16 12:38:55 -07:00
David Anderson c00f27a5a5 client: message tweak (show "don't need" in work request msg) 2013-04-26 12:19:43 -07:00
David Anderson 6c4b23e7d0 client: fix compile warnings
From Gianfranco Costamagna
2013-04-25 01:39:03 -07:00
David Anderson 63611be7e8 - client: fix bug in work fetch that caused infinite RPCs
if all projects backed off
- client emulator: disable "fetch master URL" logic
2013-04-08 11:33:49 -07:00
David Anderson fde9ab70a1 - client: fix bug in work fetch that prevented resource backoff 2013-04-04 16:20:29 -07:00
David Anderson 330a25893f - client emulator: parse <max_concurrent> in <app> in client_state.xml.
This gives you a way to simulate the effects of app_config.xml
- client: piggyback requests for resources even if we're backed off from them
- client: change resource backoff logic
    Old: if we requested work and didn't get any,
        back off from resources for which we requested work
    New: for each resource type T:
        if we requested work for T and didn't get any, back off from T
        Also, don't back off if we're already backed off
            (i.e. if this is a piggyback request)
        Also, only back off if the RPC was due to an automatic
            and potentially rapid source
            (namely: work fetch, result report, trickle up)
- client: fix small work fetch bug
2013-04-04 10:25:56 -07:00
David Anderson a5bcf6ab3b - client: work fetch message tweaks: show state before actions 2013-04-02 17:04:45 -07:00
David Anderson f6a61fe801 - client: major overhaul of work-fetch logic based on suggestions
by Jacob Klein.
    The new policy is roughly as follows:
    - find the highest-priority project P that is allowed
        to fetch work for a resource below buf_min
    - Ask P for work for all resources R below buf_max
        for which it's allowed to fetch work,
        unless there's a higher-priority project allowed
        to request work for R.
    If we're going to do an RPC to P for reasons other than work fetch,
    the policy is:
    - for each resource R for which P is the highest-priority project
        allowed to fetch work, and R is below buf_max,
        request work for R.
2013-04-02 12:32:28 -07:00
David Anderson 515deea4fb - client, work fetch: request # instances so that we have enough jobs
to use project's share of instances.
- client emulator: if client_state.xml doesn't have <no_rsc_apps>
    for a project, and the project doesn't have apps for that resource,
    the project can be asked for work for that resource.
2013-03-25 09:25:25 -07:00
David Anderson b93e80c6f5 - client: code cleanup. Some variable/function/constant names
contained "debt" when they actually refer to REC.
    Change these names to use "rec".
2013-03-24 11:22:01 -07:00
David Anderson 128da198b6 - client: rename two different functions named backoff()
to make it easier to see what's going on.
- fix code formatting in manager
2013-03-22 10:43:05 +01:00
David Anderson 1ef582aad6 - client: improve work fetch messages
- web: include user ID in email to moderators about banishment
2013-03-22 10:29:48 +01:00