Commit Graph

120 Commits

Author SHA1 Message Date
David Anderson 9c240e6e40 Many comments in the source code (C++ and PHP) referred to Trac wiki pages.
Change these to the Github wiki.

Web: change a couple of links from Trac to Github wiki.

text_transform.inc: the [github]wiki:xxx[/github] tag linked
to a non-existent boinc-dev-doc repo.
Link to the Github wiki instead.
2023-05-25 12:59:56 -07:00
David Anderson 4bdef6e2a4 client: fix work-fetch logic when max concurrent limits are used
The round-robin simulation would stop simulating jobs for a project
once a max concurrent limit (app or project) was reached.
As a result it would decide there was a shortfall,
and keep requesting work up to the limit of 1000 jobs.

To fix this:
1) keep simulating a project after an MCL is reached
2) for each (project, resource) pair, keep track of the latest
    simulation time T when an MCL was reached.
3) for such a project, don't request work for a resource if
    T > now + work buf size

This allows us, e.g., to request GPU jobs from a project
even if its CPU jobs (taking MCL into account) fill the buffer.

This works in the simulation case that showed the problem (#192).

Also: add a bit more logging, and improve names
2021-12-01 13:54:22 -08:00
David Anderson 219a540550 client: get rid of the use of memset() to initialize structs to zero.
Instead: declare a static const instance (whose data members are zero)
and copy that.
This avoid the error-prone need to assign each member,
and it works even if there are virtual function tables.
2019-11-05 00:16:02 -08:00
Christian Beer 008b1deb1e Client: initialize struct members
found by Coverity (CIDs 28037 27943)
2019-05-29 20:47:07 +02:00
David Anderson 0b5bae4cc9 client: fix work fetch bug when max_concurrent used
For projects P with MC restrictions, during RR simulation,
we keep track of the max # of instances used by P,
subject to the restrictions, and use that to calculate its "MC shortfall".

Problem: if P doesn't have any jobs, the max # instances is zero,
so MC shortfall is zero, so we erroneously don't request work for P.

Solution: initialize max # of instances to the min of the restrictions;
we'll always be able to use at least that many instances.
2019-04-20 13:46:55 -07:00
David Anderson 81a880c74d client: improve work fetch in presence of max concurrent
Re-enable work buffering in the presence of max concurrent constraints.
See https://boinc.berkeley.edu/trac/wiki/WorkFetchMaxConcurrent
2019-03-25 21:43:30 -07:00
David Anderson 40f0cb44f4 Avoid starvation when max_concurrent is used, and related fixes.
Synopsis: max concurrent was being enforced in the last stage of CPU sched,
but not in earlier stages, or in work fetch.
This caused starvation in some cases.
Fix this by modeling max concurrent in RR sim and make_run_list().

- CPU sched: model and enforce max concurrent limits in building run list
    for CPU jobs; otherwise the list has jobs we can't actually run

- RR simulation: model and enforce max concurrent limits

- RR sim: fix bug in calculation of # idle instances

- RR sim: model unavailability of GPUs
    e.g. if we can't run GPU jobs we can potentially run more CPU jobs

- work fetch: if a project is at a max concurrent limit,
    don't fetch work from it.
    The jobs we get (possibly) wouldn't be runnable.
    NOTE: we currently provide max concurrent limits
    at both project and app level.
    The problem with app level is that apps can have versions that
    use different resources.
    It would be better to have limits at the resource level instead.

- In many cases (e.g. job completion) CPU sched and work fetch are both done
    back to back.  Each of them does RR simulation.
    Only need to do this once (efficiency).

- Show max concurrent settings in startup messages

- Make max runnable jobs (1000) into a #define

- Fix removal of "can't fetch work" notices

- Make "can't fetch work" notices resource-specific;
    the reasons may differ between resources

- Get rid of WF_DEBUG macro;
    just print everything if log_flags.work_fetch_debug is set.

- Change project- and resource-level work-fetch reason codes
    (DONT_FETCH_PREFS etc.) from #defines to enums,
    and give them prefixes RSC_REASON and PROJECT_REASON

- Fix bug where the return of compute_project_reason() wasn't
    actually being stored in project.work_fetch.

- Add work-fetch reason MAX_CONCURRENT (project is at max concurrent limit)
2018-12-28 12:55:05 -08:00
David Anderson 13a5b9bf3e change multiple-inclusion guard names to BOINC_FILENAME_H 2017-04-07 23:54:49 -07:00
Rom Walton 59b5bf2f71 client: Cleanup low hanging fruit with regards to cleaning up strcpy and strcat use.
Use safe_strcpy and safe_strcat when dealing with non-pointer data types.
2016-02-15 23:34:18 -05:00
David Anderson 86109b0815 client: work fetch backup-project tweak
The logic for backup projects (fetch for a resource only if idle instance)
was skipped in the case of GPU exceptions in my checkin of 10/10/2014.
I'm not sure why I did this, and it allows incorrect work fetch
in some cases, so I'm taking it out.
2015-11-07 20:48:02 -08:00
Christian Beer fb715ed47b initialize fields in constructors
fixes CIDs 27934 and 27943 found by Coverity Scan
2015-10-18 22:23:05 +02:00
David Anderson ef22b2bd4b client: show projects in alphabetical order of project name
A while back I changed the job sched and work fetch policies to use
REC-based project priority.
The work fetch logic sorts the project list (in CLIENT_STATE::projects)
by descending priority.
This causes two problems:

- If you have a lot of projects, it's hard to find a particular one
  in the event log, e.g. in work_fetch_debug output.
- In the manager's Statistics tab, the selected project can change
  unexpectedly since we identify it by array index,
  and the array order may change.

Solution: sort CLIENT_STATE::projects alphabetically (case insensitive).
In WORK_FETCH, copy this array to a separate array,
that is then sorted by decreasing priority.
2014-12-17 09:56:01 -08:00
David Anderson 7a4672e7d6 client: increase limit on coproc instances from 31 to 64
We were using an int bitmap to store flags for the instances of a coproc.
Furthermore, because of the use of 2^n-1 to generate a bitmap of 1s,
the limit on instances was 31.

Use a long long for the bitmap instead, and don't use 2^n-1.
This increases the limit to 64.
2014-11-24 00:14:23 -08:00
David Anderson eafd70ecc6 client: request work from backed-off resources if doing RPC anyway 2014-11-18 00:05:17 -08:00
David Anderson fbc6e40dca Client: fix bug that prevented work fetch for zero-share projects
In work fetch setup, we were computing rsc_project_reason
before doing the round-robin simulation.
It needs to be done after, because it uses the # of idle devices,
which is computed by the simulation.
2014-11-17 13:56:06 -08:00
David Anderson 4c9d1d6659 client: code cleanup and possible debugging in work fetch
- Remove code that tries to keep track of available GPU RAM
  and defer jobs that don't fit.
  This never worked, it relied on project estimates of RAM usage,
  and it's been replaced by having the app do temporary exit
  if alloc fails.
- Move logic for checking for deferred jobs from CPU
  to work fetch.
- Rename rsc_defer_sched to has_deferred_job,
  and move it from PROJECT to RSC_PROJECT_WORK_FETCH
- tweak work_fetch_debug output
2014-10-10 14:35:00 -07:00
David Anderson 9c96108c67 client: work fetch code cleanup
The logic for deciding whether to fetch work for a project
or a (project, resource type) pair
was scattered among several functions, with confusing names.
Consolidate this logic, and use consistent names.
2014-10-10 10:37:07 -07:00
David Anderson f63f259ce5 client: code cleanup 2014-10-10 07:15:10 -07:00
David Anderson 31541e166d client: set work requests for coprocs specified in cc_config.xml
We weren't copying the request fields from RSC_WORK_FETCH to COPROC.
Do this, and clean up the code a bit.

Note: the arrays that parallel the COPROCS::coprocs array
are a bit of a kludge; that stuff logically belongs in COPROC.
But it's specific to the client, so I can't put it there.
Maybe I could do something fancy with derived classes, not sure.
2014-08-09 21:44:39 -07:00
David Anderson b076a947fc client: work fetch tweak to avoid starvation in a particular case
My commit of Feb 7 caused work fetch to project P
to be deferred for up to 5 min if an upload to P is active,
even if some instances are idle.
This was to deal with a case where the idleness was caused
by a jobs-in-progress limit by P,
and work requests lead to long backoff.

However, this can cause instances to be idle unnecessarily.
I changed things so that, if instances are idle,
a work fetch can happen even during upload.
But only one such fetch will be done.
2014-03-09 17:09:21 -07:00
David Anderson fe8b26ac73 client: when not piggybacking work request, explain why in log msg 2014-02-24 18:45:25 -08:00
David Anderson 4d47e2f170 client: don't request work from a project w/ > 1000 runnable jobs
Because of O(N^2) algorithms, the client becomes CPU-intensive
when there are lots of jobs.
This limit could be somewhat lower.
2013-07-07 13:13:57 -07:00
David Anderson 57a6d3d17a client (Android): make max battery temperature a preference
Note: internal change only; there's no GUI for this yet
2013-06-20 21:47:34 -07:00
David Anderson 8a1569c384 client: fix work-fetch bug that could starve a GPU if exclusions used 2013-05-16 12:38:55 -07:00
David Anderson c00f27a5a5 client: message tweak (show "don't need" in work request msg) 2013-04-26 12:19:43 -07:00
David Anderson 6b6c2ac519 - client: fix bug that could cause idle GPUs when exclusions are present.
The basic problem: the way we assign GPU instances when creating
        the "run list" is slightly different from the way we assign them
        when we actually run the jobs;
        the latter assigns a running job to the instance it's using,
        but the former doesn't.
    Solution (kludge): when building the run list,
        don't reserve instances for currently running jobs.
        This will result in more jobs in the run list, and avoid starvation.
        For efficiency, do this only if there are exclusions for this type.
    Comment: this is yet another complexity that would be eliminated
        if GPU instances were modeled separately.
        I wish I had time to do that.
- client emulator: change default latency bound from 1 day to 10 days
2013-04-07 13:00:15 -07:00
David Anderson 330a25893f - client emulator: parse <max_concurrent> in <app> in client_state.xml.
This gives you a way to simulate the effects of app_config.xml
- client: piggyback requests for resources even if we're backed off from them
- client: change resource backoff logic
    Old: if we requested work and didn't get any,
        back off from resources for which we requested work
    New: for each resource type T:
        if we requested work for T and didn't get any, back off from T
        Also, don't back off if we're already backed off
            (i.e. if this is a piggyback request)
        Also, only back off if the RPC was due to an automatic
            and potentially rapid source
            (namely: work fetch, result report, trickle up)
- client: fix small work fetch bug
2013-04-04 10:25:56 -07:00
David Anderson f6a61fe801 - client: major overhaul of work-fetch logic based on suggestions
by Jacob Klein.
    The new policy is roughly as follows:
    - find the highest-priority project P that is allowed
        to fetch work for a resource below buf_min
    - Ask P for work for all resources R below buf_max
        for which it's allowed to fetch work,
        unless there's a higher-priority project allowed
        to request work for R.
    If we're going to do an RPC to P for reasons other than work fetch,
    the policy is:
    - for each resource R for which P is the highest-priority project
        allowed to fetch work, and R is below buf_max,
        request work for R.
2013-04-02 12:32:28 -07:00
David Anderson b93e80c6f5 - client: code cleanup. Some variable/function/constant names
contained "debt" when they actually refer to REC.
    Change these names to use "rec".
2013-03-24 11:22:01 -07:00
David Anderson 128da198b6 - client: rename two different functions named backoff()
to make it easier to see what's going on.
- fix code formatting in manager
2013-03-22 10:43:05 +01:00
David Anderson 546ea233a0 - client: fix small work fetch bug that caused the client to
not add a piggyback work request when it should have.
2013-03-15 13:38:45 +01:00
David Anderson fc6b050883 - client: removed unused code for old work-fetch logic 2013-03-15 13:38:45 +01:00
David Anderson 2e23bfedaa - client, work fetch policy. Change policy for projects w/ GPU exclusions 2013-03-07 11:28:43 +01:00
David Anderson a63ebbc13e - client: change work fetch policy to work better with GPU exclusions
- scale amount of work request by
        (# non-excluded instances)/#instances
    - change policy:
        old: don't fetch work if #jobs > #non-excluded instances
        new: don't fetch work if # of instance-seconds used in RR sim
            > work_buf_min * (#non-exluded instances)/#instances
2013-03-07 11:28:42 +01:00
David Anderson 7768f6da60 - client: fix bug where, when updating a project, we fail to request work even though higher-priority projects are marked as no-new-tasks or are otherwise ineligible for work fetch. 2013-03-04 14:09:43 +01:00
David Anderson 777f1f11e8 - client: change work fetch policy to avoid starving GPUs in situations where GPU exclusions are used. - client: fix bug in round-robin simulation when GPU exclusions are used.
Note: this fixes a major problem (starvation)
    with project-level GPU exclusion.
    However, project-level GPU exclusion interferes with most of
    the client's scheduling policies.
    E.g., round-robin simulation doesn't take GPU exclusion into account,
    and the resulting completion estimates and device shortfalls
    can be wrong by an order of magnitude.

    The only way I can see to fix this would be to model each
    GPU instance as a separate resource,
    and to associate each job with a particular GPU instance.
    This would be a sweeping change in both client and server.
2013-03-01 15:31:41 +01:00
David Anderson 446bc4ca28 - client: take GPU exclusions into account when making
initial work request to a project
- client: put some casts to double in NVIDIA detect code.
    Shouldn't make any difference.
- volunteer storage: truncate file to right size after retrieval


svn path=/trunk/boinc/; revision=26051
2012-08-20 23:41:27 +00:00
David Anderson 4fea52c6f2 - client: if a project has excluded GPUs of a given type,
allow it to fetch work of that type if the # of runnable
    jobs it <= the # of non-excluded instances (rather than 0).


svn path=/trunk/boinc/; revision=26045
2012-08-18 23:26:10 +00:00
David Anderson ff1a391ced - client: when we're making a scheduler RPC
for a reason other than work fetch,
    and we're deciding whether to piggyback a work request,
    skip the checks for hysteresis (buffer < min)
    and for per-resource backoff time.
    These checks are there only to limit the rate of RPCs,
    which is not relevant since we're doing one any.

    This fixes a bug where a project w/ sporadic jobs specifies
    a next_rpc_delay to ensure regular polling from clients.
    When these polls occur they should request work regardless of backoff.


svn path=/trunk/boinc/; revision=26002
2012-08-10 18:29:00 +00:00
David Anderson f6bd141b30 - client: further msg tweaks
svn path=/trunk/boinc/; revision=25830
2012-07-02 05:10:58 +00:00
David Anderson 1d717c6fcc - client: msg tweak
svn path=/trunk/boinc/; revision=25829
2012-07-02 04:45:19 +00:00
David Anderson 7dcf119854 - client: msg tweak
svn path=/trunk/boinc/; revision=25828
2012-07-02 04:06:11 +00:00
David Anderson 89578050f7 - When the client makes a scheduler RPC without requesting work,
and there's a simple reason
    (e.g. the project is suspended, no-new-tasks, downloads stalled, etc.)
    show it in the event lot.
    If the reason is more complex, don't try to explain.


svn path=/trunk/boinc/; revision=25827
2012-07-02 03:43:05 +00:00
David Anderson 82d64e9403 - msg tweak and fix compile warnings
svn path=/trunk/boinc/; revision=25408
2012-03-12 23:34:41 +00:00
David Anderson 64a371173b - client: fix crashing bug when there is 1 instance of a resources.
I'm not sure how this every worked.


svn path=/trunk/boinc/; revision=25362
2012-03-02 03:56:26 +00:00
David Anderson a6bf5aecf3 - client: tweak to work-fetch policy:
if we're making a scheduler RPC to a project for reasons
    other than work fetch,
    and we're deciding whether to ask for work, ignore hysteresis;
    i.e. ask for work even if we're above the min buffer
    (idea from John McLeod).


svn path=/trunk/boinc/; revision=25291
2012-02-18 23:19:06 +00:00
David Anderson 69834e0c01 - client: compile fix; remove redundant total_peak_flops()
svn path=/trunk/boinc/; revision=24738
2011-12-06 09:20:30 +00:00
David Anderson bc35060726 - client: when contacting a project for reasons other than
work fetch (e.g. to report completed jobs)
    only request work if it's the project we would have chosen
    if we were fetching work.
- client: the way in which project priorities were adjusted
    in work fetch to reflected currently queued work was wrong.
- client: fix bug in the way project priorities are adjusted
    in RR simulator
- client emulator: if there are results in the state file
    with states DOWNLOADING or UPLOADING,
    change them to DOWNLOADED or UPLOADED.
    Otherwise they're stuck.


svn path=/trunk/boinc/; revision=24737
2011-12-06 04:21:27 +00:00
David Anderson 0d37f69a6a - client emulator fixes
svn path=/trunk/boinc/; revision=24644
2011-11-22 07:47:45 +00:00
David Anderson 7b28215032 - client: reimplement the round-robin simulator to
reduce its runtime from O(N^2) to O(N),
    where N is the number of runnable jobs
    (which can be in the thousands).
    This will make the client emulator run a lot faster,
    and will reduce the client CPU overhead a bit.
- API: change boinc_get_opencl_ids() so that it returns
    a BOINC error code (< -100) if the app_init.xml is
    missing or bad (i.e. we're running standalone),
    and an OpenCL error code (> -100) if an OpenCL call failed.


svn path=/trunk/boinc/; revision=24469
2011-10-24 17:53:09 +00:00