Commit Graph

32 Commits

Author SHA1 Message Date
David Anderson 40f0cb44f4 Avoid starvation when max_concurrent is used, and related fixes.
Synopsis: max concurrent was being enforced in the last stage of CPU sched,
but not in earlier stages, or in work fetch.
This caused starvation in some cases.
Fix this by modeling max concurrent in RR sim and make_run_list().

- CPU sched: model and enforce max concurrent limits in building run list
    for CPU jobs; otherwise the list has jobs we can't actually run

- RR simulation: model and enforce max concurrent limits

- RR sim: fix bug in calculation of # idle instances

- RR sim: model unavailability of GPUs
    e.g. if we can't run GPU jobs we can potentially run more CPU jobs

- work fetch: if a project is at a max concurrent limit,
    don't fetch work from it.
    The jobs we get (possibly) wouldn't be runnable.
    NOTE: we currently provide max concurrent limits
    at both project and app level.
    The problem with app level is that apps can have versions that
    use different resources.
    It would be better to have limits at the resource level instead.

- In many cases (e.g. job completion) CPU sched and work fetch are both done
    back to back.  Each of them does RR simulation.
    Only need to do this once (efficiency).

- Show max concurrent settings in startup messages

- Make max runnable jobs (1000) into a #define

- Fix removal of "can't fetch work" notices

- Make "can't fetch work" notices resource-specific;
    the reasons may differ between resources

- Get rid of WF_DEBUG macro;
    just print everything if log_flags.work_fetch_debug is set.

- Change project- and resource-level work-fetch reason codes
    (DONT_FETCH_PREFS etc.) from #defines to enums,
    and give them prefixes RSC_REASON and PROJECT_REASON

- Fix bug where the return of compute_project_reason() wasn't
    actually being stored in project.work_fetch.

- Add work-fetch reason MAX_CONCURRENT (project is at max concurrent limit)
2018-12-28 12:55:05 -08:00
David Anderson 4a7bb390af Client: fix job scheduling bug
- There was a scenario (#164 in fact) where CPUs were starved
because CPU weren't being added to the run list.
The basic problem was the the max_concurrent stuff was being
called in make_run_list().
It doesn't belong there - only in enforce_run_list().

- add the ability to handle app_config.xml files in the client emulator.

- fix a performance bug that caused extremely long run lists;
in make_run_list(), check for exclusion at the project level, not global.

- do max_concurrent logic only if a max_concurrent rule was given.

- fix bug where the emulator would assign the wrong
version number to results, then fail to find their app version.
2018-12-21 00:54:00 -08:00
Jonathan Armstrong 892d1d97e4 security updates for potential buffer overflows 2018-08-29 08:49:54 -05:00
David Anderson da64baf29d Merge pull request #1895 from AenBleidd/PVS_V814_for_pr
Move 'strlen' function outside of the loop
2017-08-14 17:37:12 -07:00
David Anderson b272d1de81 client: fix typo 2017-05-12 16:09:31 -07:00
David Anderson 4a9cc3e725 client/lib: code shuffle preparatory to adding app_config GUI RPC 2017-05-11 01:53:50 -07:00
Vitalii Koshura ac291dd0d1
client: Move 'strlen' function outside of the loop
From PVS Studio:
V814
Decreased performance. The 'strlen' function was called multiple times inside the body of a loop.
https://www.viva64.com/en/w/V814/print

Signed-off-by: Vitalii Koshura <lestat.de.lionkur@gmail.com>
2017-05-02 16:10:08 +03:00
David Anderson c3eb84db1e client: add report_results_immediately config on project and app levels
see http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration
2017-03-06 16:04:44 -08:00
Rom Walton dbf5a9b253 client: Cleanup low hanging fruit with regards to cleaning up strcpy and strcat use.
Use safe_strcpy and safe_strcat when dealing with non-pointer data types.
2016-02-16 00:18:05 -05:00
David Anderson eddebcc209 client: report error if no start tag in app_config.xml 2015-01-29 13:21:00 -08:00
David Anderson 2b035629bd client: always show unparsed tags in config files 2015-01-15 09:03:44 -08:00
David Anderson fd48fea054 client: message tweaks 2014-12-15 15:38:12 -08:00
David Anderson b97e1c86d1 client: report parse errors in app_config.xml correctly 2014-11-24 00:41:26 -08:00
David Anderson 0ae4b4ecff client: when reading app_config.xml, clear app versions vector
Otherwise old error notices persist
2014-11-12 00:47:16 -08:00
David Anderson a10fea0281 Client: improve error message for non-tag text in app_config.xml 2014-11-10 01:01:11 -08:00
David Anderson 1c9233a46f client: display XML in app_config notices correctly 2014-08-31 19:19:07 -07:00
David Anderson bfa0a81a7b client: if a project's app_config.xml has no errors, remove old notices 2014-08-31 13:32:12 -07:00
David Anderson 0eb346167a client: fix bug in last commit 2014-07-25 16:04:29 -07:00
David Anderson 34e97a5048 client: add <project_max_concurrent> option for app_config.xml
Lets you limit the number of running jobs over the whole project.
Note: this is not taken into account in work fetch.
2014-07-25 15:49:12 -07:00
David Anderson 6496da31a3 client: check for negative usage values in app_config.xml 2014-07-05 00:15:39 -07:00
David Anderson c8bde8cfd5 client: fix bug that caused app_config settings to persist incorrectly
We needed to clear the app_configs and app_version_configs vectors in PROJECT
if app_config.xml isn't there
2014-06-05 17:56:03 -07:00
David Anderson d877983771 client: let app_config.xml specify fraction_done_exact for apps 2014-05-04 10:39:29 -07:00
David Anderson c2a34cb938 client: parse <plan_class> in app_config.xml; fix error messages; show error message if <app_version> doesn't match any app versions 2013-11-22 00:04:00 -08:00
David Anderson c1bddf4252 client: allow <app_version> elements in app_config.xml, allowing users to override the parameters of particular app versions 2013-09-06 15:41:43 -07:00
David Anderson e401380d5c admin web: fix PHP errors in failure page 2013-08-16 22:21:12 -07:00
David Anderson 03e3b3b15b client: clear max_concurrent is app_config.xml no longer exists
If you had an app_config.xml that limited the # of concurrent tasks for an app,
and you delete it and do "reread config", then remove the limit.
2013-06-17 12:48:14 -07:00
David Anderson 5452d3998f client: show app_config warnings only on startup and reread config 2013-05-19 10:02:00 -07:00
David Anderson 6c2631ec6f client: make "missing app" messages more consisten 2013-05-16 12:40:43 -07:00
David Anderson 64d7fa3474 - client: more fixes to GUI RPC addition.
Also, replace get_project_dir() with a memoized member function of PROJECT
2013-04-18 13:57:33 -07:00
David Anderson 81d64892b6 - client: msg tweak 2013-04-18 00:36:03 -07:00
David Anderson a64cb793f1 - scheduler: attempted performance enhancement.
Old: each scheduler process holds a semaphore
        while scanning the shared-mem job array.
        On machines with many CPUs
        there seems to be contention for this semaphore,
        causing slow scheduler response and possibly connection failures.
    New: Don't hold the semaphore while scanning array.
        Instead, if find a job that passes quick_check(),
        acquire the semaphore and recheck that the job is present in array
        and passes quick_check().
- client: show messages if app_config.xml has unrecognized tags
2013-03-04 17:16:56 +01:00
David Anderson 952a495fb7 - client: add "client app configuration" feature; see
http://boinc.berkeley.edu/trac/wiki/ClientAppConfig
    This lets users do the following:
    1) limit the number of concurrent jobs of a given app
        (e.g. for WCG apps that are I/O-intensive)
    2) Specify the CPU and GPU usage parameters of GPU versions
        of a given app.
    Implementation notes:
    - max app concurrency is enforced in 2 places:
        1) when building the initial job run list
        2) when enforcing the final job run list
        Both are needed to avoid possible starvation.
    - however, we don't enforce it during RR simulation.
        Doing so could cause erroneous shortfall and work fetch.
        This means, however, that work buffering will not work
        as expected if you're using max concurrency.
2013-03-04 15:20:32 +01:00