Commit Graph

14 Commits

Author SHA1 Message Date
David Anderson 40f0cb44f4 Avoid starvation when max_concurrent is used, and related fixes.
Synopsis: max concurrent was being enforced in the last stage of CPU sched,
but not in earlier stages, or in work fetch.
This caused starvation in some cases.
Fix this by modeling max concurrent in RR sim and make_run_list().

- CPU sched: model and enforce max concurrent limits in building run list
    for CPU jobs; otherwise the list has jobs we can't actually run

- RR simulation: model and enforce max concurrent limits

- RR sim: fix bug in calculation of # idle instances

- RR sim: model unavailability of GPUs
    e.g. if we can't run GPU jobs we can potentially run more CPU jobs

- work fetch: if a project is at a max concurrent limit,
    don't fetch work from it.
    The jobs we get (possibly) wouldn't be runnable.
    NOTE: we currently provide max concurrent limits
    at both project and app level.
    The problem with app level is that apps can have versions that
    use different resources.
    It would be better to have limits at the resource level instead.

- In many cases (e.g. job completion) CPU sched and work fetch are both done
    back to back.  Each of them does RR simulation.
    Only need to do this once (efficiency).

- Show max concurrent settings in startup messages

- Make max runnable jobs (1000) into a #define

- Fix removal of "can't fetch work" notices

- Make "can't fetch work" notices resource-specific;
    the reasons may differ between resources

- Get rid of WF_DEBUG macro;
    just print everything if log_flags.work_fetch_debug is set.

- Change project- and resource-level work-fetch reason codes
    (DONT_FETCH_PREFS etc.) from #defines to enums,
    and give them prefixes RSC_REASON and PROJECT_REASON

- Fix bug where the return of compute_project_reason() wasn't
    actually being stored in project.work_fetch.

- Add work-fetch reason MAX_CONCURRENT (project is at max concurrent limit)
2018-12-28 12:55:05 -08:00
David Anderson 4a7bb390af Client: fix job scheduling bug
- There was a scenario (#164 in fact) where CPUs were starved
because CPU weren't being added to the run list.
The basic problem was the the max_concurrent stuff was being
called in make_run_list().
It doesn't belong there - only in enforce_run_list().

- add the ability to handle app_config.xml files in the client emulator.

- fix a performance bug that caused extremely long run lists;
in make_run_list(), check for exclusion at the project level, not global.

- do max_concurrent logic only if a max_concurrent rule was given.

- fix bug where the emulator would assign the wrong
version number to results, then fail to find their app version.
2018-12-21 00:54:00 -08:00
David Anderson 4a9cc3e725 client/lib: code shuffle preparatory to adding app_config GUI RPC 2017-05-11 01:53:50 -07:00
David Anderson 13a5b9bf3e change multiple-inclusion guard names to BOINC_FILENAME_H 2017-04-07 23:54:49 -07:00
David Anderson c3eb84db1e client: add report_results_immediately config on project and app levels
see http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration
2017-03-06 16:04:44 -08:00
David Anderson b97e1c86d1 client: report parse errors in app_config.xml correctly 2014-11-24 00:41:26 -08:00
David Anderson bfa0a81a7b client: if a project's app_config.xml has no errors, remove old notices 2014-08-31 13:32:12 -07:00
David Anderson 34e97a5048 client: add <project_max_concurrent> option for app_config.xml
Lets you limit the number of running jobs over the whole project.
Note: this is not taken into account in work fetch.
2014-07-25 15:49:12 -07:00
David Anderson c8bde8cfd5 client: fix bug that caused app_config settings to persist incorrectly
We needed to clear the app_configs and app_version_configs vectors in PROJECT
if app_config.xml isn't there
2014-06-05 17:56:03 -07:00
David Anderson d877983771 client: let app_config.xml specify fraction_done_exact for apps 2014-05-04 10:39:29 -07:00
David Anderson c1bddf4252 client: allow <app_version> elements in app_config.xml, allowing users to override the parameters of particular app versions 2013-09-06 15:41:43 -07:00
David Anderson 5452d3998f client: show app_config warnings only on startup and reread config 2013-05-19 10:02:00 -07:00
David Anderson a64cb793f1 - scheduler: attempted performance enhancement.
Old: each scheduler process holds a semaphore
        while scanning the shared-mem job array.
        On machines with many CPUs
        there seems to be contention for this semaphore,
        causing slow scheduler response and possibly connection failures.
    New: Don't hold the semaphore while scanning array.
        Instead, if find a job that passes quick_check(),
        acquire the semaphore and recheck that the job is present in array
        and passes quick_check().
- client: show messages if app_config.xml has unrecognized tags
2013-03-04 17:16:56 +01:00
David Anderson 952a495fb7 - client: add "client app configuration" feature; see
http://boinc.berkeley.edu/trac/wiki/ClientAppConfig
    This lets users do the following:
    1) limit the number of concurrent jobs of a given app
        (e.g. for WCG apps that are I/O-intensive)
    2) Specify the CPU and GPU usage parameters of GPU versions
        of a given app.
    Implementation notes:
    - max app concurrency is enforced in 2 places:
        1) when building the initial job run list
        2) when enforcing the final job run list
        Both are needed to avoid possible starvation.
    - however, we don't enforce it during RR simulation.
        Doing so could cause erroneous shortfall and work fetch.
        This means, however, that work buffering will not work
        as expected if you're using max concurrency.
2013-03-04 15:20:32 +01:00