Commit Graph

282 Commits

Author SHA1 Message Date
David Anderson b355a13f8d client: use snprintf() instead of sprintf() in a few places
... none of which was a possible overrun, but doesn't hurt to check.
2017-08-15 17:06:29 -07:00
David Anderson 8e7857623e client: eliminate possible buffer overflow in reporting result errors
A result with a lot of failed uploads could overflow a 4K buffer.
Change report_result_error() so you just pass it the error message,
rather than va_args nonsense.
2017-08-15 16:31:33 -07:00
Vitalii Koshura e33b885e92
client: Remove unused variables
From PVS Studio:
V808
'preemptable_tasks' object of 'vector' type was created but was not utilized.
https://www.viva64.com/en/w/V808/print

Signed-off-by: Vitalii Koshura <lestat.de.lionkur@gmail.com>
2017-04-30 08:57:00 +03:00
David Anderson bde961e8bb client: fix estimate of job RAM usage
In estimating the WSS of a job, we were using
- the observed WSS of the job itself
- if not available, the max WSS of current jobs of same app version

However, if neither is available we need a backup, namely WU.rsc_memory_bound.
Otherwise we can schedule jobs that exceed available RAM.
2017-03-06 16:12:25 -08:00
David Anderson 958c89c1e7 client: account per-project CPU and GPU usage; report to account managers
Also report per-project #jobs success/failure
2016-12-27 23:48:37 -08:00
David Anderson 8c44b2f165 client: fix bug that could cause idle CPUs/GPUs.
Review: job scheduling has 2 phases:
1) Make a list of jobs to run.  Add enough jobs to use all resources.
2) Actually run the jobs and preempt existing jobs.

The problem: checking for RAM usage limits
(i.e. making sure the sum of working sets is < RAM usage prefs)
is done in 2) but not 1).
So on a 1 CPU machine we might make a run list consisting of a single job,
which turns out not to fit in available RAM,
and we end up running nothing.

Solution: when we add a job to the run list that previously
exceeded RAM limits, don't count its resource usage.
That way we'll add more jobs to the run list,
and we'll have something to run in the end.
2016-08-15 11:48:39 -07:00
David Anderson e658092255 Add ops script for creating account and team
This is for my own use in BOINC-wide teams.
It must work even if account/team creation are disabled
(as they are in the BOINC-wide teams site).
To do this, I moved the <disable_team_creation> check out of make_team()
and moved it to the existing places that call make_team().
The logic now matches that of make_user().
2016-02-10 14:51:34 -08:00
David Anderson 1431a21d99 client: if a GPU exclusion refers to non-existent device num, ignore it 2015-08-16 01:06:28 -07:00
David Anderson 0f2adb5ab7 client: change cpu_sched_debug log messages to show job's GPU type
Also fix some compile warnings
2015-08-15 10:32:02 -07:00
David Anderson fb3dd9b36f client: fix job scheduling bug that starves CPU instances 2015-08-03 13:19:02 -07:00
David Anderson 8c7aef5b99 client: fix bug when app version uses > 1 GPU instance
Note: the code wasn't written with multi-GPU apps in mind.
There may be other bugs with multi-GPU apps.
2015-06-08 18:55:17 -07:00
David Anderson ee4afb02ea git seems to think cpu_sched.cpp was modified 2014-12-12 14:05:32 -08:00
David Anderson b80ea2aa04 client: indicate "high-priority" tasks in event log (if cpu_sched_debug set) 2014-11-19 23:49:51 -08:00
David Anderson 2b2b04188a client: "suspend GPUs" shouldn't suspend non-GPU coprocessors
The following should apply to GPUs but not other coprocs (e.g. miner ASICs):
- "suspend GPUs" command in GUI
- prefs for suspending GPUs
- always removing app from memory when suspended
2014-11-07 00:57:39 -08:00
David Anderson 7ed946cc37 client: message tweaks 2014-10-13 09:08:54 -07:00
David Anderson 1092fd1b31 client: let a MT job run even if it uses more than max # CPUs
Suppose the user fetches an 8-CPU job, then changes their prefs to use 6 CPUs.
Let the job run anyway.
2014-10-10 23:51:57 -07:00
David Anderson 4c9d1d6659 client: code cleanup and possible debugging in work fetch
- Remove code that tries to keep track of available GPU RAM
  and defer jobs that don't fit.
  This never worked, it relied on project estimates of RAM usage,
  and it's been replaced by having the app do temporary exit
  if alloc fails.
- Move logic for checking for deferred jobs from CPU
  to work fetch.
- Rename rsc_defer_sched to has_deferred_job,
  and move it from PROJECT to RSC_PROJECT_WORK_FETCH
- tweak work_fetch_debug output
2014-10-10 14:35:00 -07:00
David Anderson 119962bc0f client: minor code shuffle 2014-07-29 11:14:10 -07:00
David Anderson a177ef0068 client: fix job scheduling bug. Sort by avg_ncpus doesn't apply to GPU jobs 2014-07-03 21:53:38 -07:00
David Anderson f15f6d2ba0 API/client/vboxwrapper: show notice if need Vbox upgrade
Vboxwrapper detects known buggy versions of Vbox and calls
boinc_temporary_exit().
The "Incompatible version" message appears in the task status
in the BOINC Manager, where some users may never see it.
It needs to appear as a notice, telling the user to upgrade VBox.

To do this, I added an optional argument to boinc_temporary_exit()
saying that the message should be delivered as a notice.
This is conveyed to the client by adding
a line containing "notice" to the temp exit file.
I changed the client and vboxwrapper to use this.
2014-05-28 11:05:56 -07:00
David Anderson 8d009ce3b3 client: scheduling and work fetch tweaks for GPU exclusion cases
Scheduling: if a resource has exclusions, put all jobs in the run list;
otherwise we might fail to have a job for a GPU instance, and starve it.

Work fetch: allow work fetch from zero-share projects if the resource
has instances that are idle because of GPU exclusion
2014-05-24 15:18:41 -07:00
David Anderson 1e2fcb4b68 client/lib: change CONFIG to CC_CONFIG, config to cc_config.
Eliminates ambiguity of "config" global var, which is used in server code.
This confuses IDEs that are looking at all the code at once.
2014-05-08 00:51:18 -07:00
David Anderson 72d1369342 client: code shuffle; move GPU scheduling code to new file 2014-05-01 23:53:55 -07:00
David Anderson 6a8eab73cd replace tab characters with spaces 2014-05-01 21:03:49 -07:00
Rom Walton afb6dcc6f3 MGR & Client: Massive code clean-up. Remove as much of the LoadLibrary/GetProcAddress stuff as we can under VS 2012. 2014-03-06 18:27:54 -05:00
David Anderson 17e44af601 Client: fix job scheduling bug that could starve CPUs
Job scheduling has 2 phases:
    make_run_list(): build a sorted list of runnable jobs
    enforce_run_list() go through the list and run jobs
The run list in general contains more jobs than can actually be run.
This is intentional.
There are lots of reasons why enforce_run_list() might not be able
to run a particular job, and we don't know these during make_run_list().
So we need to give enforce_run_list() a surplus of choices.

The problem: make_run_list() was accounting RAM usage of jobs in the list,
and stopping when this exceeded physical RAM.
This led to a situation where we added a bunch of GPU jobs to the list -
more than could actually be run -
and this caused too few CPU jobs to be put in the list.

Oddly, the comment at the start of cpu_sched.cpp said that RAM usage
was ignored by make_run_list(); this was not the case.

Anyway, I removed RAM accounting from make_run_list().
2014-02-11 12:33:13 -08:00
David Anderson 38e83a3cd7 Client: don't use sub-second CPU throttling
I forgot that the wrapper has a 1-second poll for suspend and resume,
so sub-second throttling won't work properly for wrapper apps.
Revert to a variant of the old scheme,
in which the min of the suspended and resumed periods is 1 sec.

Also, fix task start/suspend/resume log messages.
2014-01-22 17:26:26 -08:00
Charlie Fenton dea8c1cee4 Client: Fix compiler warning (unused static function) 2014-01-13 17:10:57 -08:00
David Anderson 24c6cf99c3 Merge branch 'master' of ssh://boinc.berkeley.edu/boinc-v2 2014-01-10 10:38:55 -08:00
David Anderson 04c81c9d5d Client: fix problems related to CPU throttling
- Don't throttle GPU apps.  GPU apps spend all their time in a
  critical section, during which they can't be suspended.
  They length of these critical sections (i.e. of GPU kernels)
  may be a significant part of a second, or more,
  so sub-second throttling isn't possible.
- Account elapsed time correctly when throttling is used
- Also (not related to throttling)
  don't schedule a job in QUIT_PENDING or ABORT_PENDING state.
  Doing so results in 2 processes in the slot dir,
  and can cause the job to fail.
2014-01-10 10:38:31 -08:00
David Anderson 20ff585a94 client: job scheduler tweaks to avoid idle CPUs
- allow overcommitment by > 1 CPU.
  E.g. If there are two 6-CPU jobs on an 8 CPU machine, run them both.
- Prefer MT jobs to ST jobs in general.
  When reorder the run list (i.e. converting "preliminary" to "final" list),
  prefer job J1 to J2 if:
  1) J1 is EDF and J2 isn't
  2) J1 uses GPUs and J2 doesn't
  3) J1 is in the middle of a timeslice and J2 isn't
  4) J1 uses more CPUs than J2
  5) J1's project has higher scheduling priority than J2's
  ... in that order.

  4) is new; it replaces the function promote_multi_thread_jobs(),
  which did something similar but didn't work in some cases.
2014-01-09 12:07:55 -08:00
David Anderson 6394a37dc6 Merge branch 'master' of ssh://boinc.berkeley.edu/boinc-v2 2013-12-11 00:35:49 -08:00
David Anderson 86a0dc0850 client: message tweak 2013-12-11 00:35:30 -08:00
David Anderson e929b58ac8 client: fix bug that caused idle GPUs when CPU throttling used 2013-12-11 00:14:47 -08:00
David Anderson d6da81b862 client: fix bugs with CPU throttling and GPU apps
Various bad things could happen when CPU throttling was used together w/ GPU apps.
Examples:
- on a multi-GPU system, several GPU tasks are assigned to the same GPU
- a suspended GPU task remains in memory (tying up its GPU resources)
while other tasks try to use the GPU.

The problem was that parts of the code assumed that suspended
GPU processes don't exist - i.e. that when a GPU task is suspended
it's always removed from memory.
This isn't true in the presence of CPU throttling.

So I made the following changes:
- When assigning GPUs to tasks, treat suspended tasks like running tasks
  (i.e. reserve their GPUs)
- At the end of the CPU-scheduling logic, if there are any GPU tasks
  that are suspended and not scheduled, remove them from memory,
  and trigger a reschedule so we can reallocate their GPUs.

Also, a cosmetic change: in the resource usage string shown in the GUI,
include "(device X)" even if the task is suspended (i.e. because of throttling).

Also: zero out COPROC::opencl_device_indexes[] so we don't write
a garbage number to init_data.xml for non-OpenCL jobs
2013-11-29 11:44:09 -08:00
David Anderson 3d910a0190 client: message tweak 2013-11-13 21:24:16 -08:00
David Anderson 45dfb684a6 Client: don't allow more than 1000 slot dirs.
There was a report of a situation where the client created unbounded slot dirs.
Not sure why this happened, but may as well impose a limit.
2013-10-23 21:37:24 -07:00
David Anderson 39af029598 client: mostly revert dddf586, which could lead to way overcommitted CPU 2013-07-03 00:56:01 -07:00
David Anderson dddf586532 client: remove code that avoids overcommitting CPUs if MT jobs present.
This can lead to starving the CPUs if there are both GPU and MT jobs.
The basic problem is that a host with GPUs will never have all its CPUs
available for MT jobs.
It should probably advertise fewer CPUs, or something.
2013-06-17 08:48:05 -07:00
David Anderson 4323afee1f client: task schedule tweak to avoid starvation case
In enforce_run_list(), don't count the RAM usage of NCI tasks.
NCI tasks run sporadically, so it doesn't make to count it;
doing so can starve regular jobs in some cases.
2013-05-09 15:24:44 -07:00
David Anderson 6b6c2ac519 - client: fix bug that could cause idle GPUs when exclusions are present.
The basic problem: the way we assign GPU instances when creating
        the "run list" is slightly different from the way we assign them
        when we actually run the jobs;
        the latter assigns a running job to the instance it's using,
        but the former doesn't.
    Solution (kludge): when building the run list,
        don't reserve instances for currently running jobs.
        This will result in more jobs in the run list, and avoid starvation.
        For efficiency, do this only if there are exclusions for this type.
    Comment: this is yet another complexity that would be eliminated
        if GPU instances were modeled separately.
        I wish I had time to do that.
- client emulator: change default latency bound from 1 day to 10 days
2013-04-07 13:00:15 -07:00
David Anderson 1b9ad86694 - client: don't prefix <task> messages with [task] 2013-04-02 12:31:32 -07:00
David Anderson b93e80c6f5 - client: code cleanup. Some variable/function/constant names
contained "debt" when they actually refer to REC.
    Change these names to use "rec".
2013-03-24 11:22:01 -07:00
David Anderson 702798b84b - client: a couple of more clock-change fixes 2013-03-22 10:28:20 +01:00
David Anderson 3c029c7613 - client: job scheduler tweak to avoid CPU idleness in situation
where GPU jobs use different CPU fractions
- single-job submission: default platform is that of server
2013-03-05 15:57:34 +01:00
Rom Walton 2dd82881de - client/server: fix build breaks I introduced last night with a variable
rename.
2013-03-04 15:30:03 +01:00
Charlie Fenton ce87ec9848 OpenCL: First pass at adding support for Intel Ivy Bridge GPUs 2013-03-04 15:23:39 +01:00
David Anderson 952a495fb7 - client: add "client app configuration" feature; see
http://boinc.berkeley.edu/trac/wiki/ClientAppConfig
    This lets users do the following:
    1) limit the number of concurrent jobs of a given app
        (e.g. for WCG apps that are I/O-intensive)
    2) Specify the CPU and GPU usage parameters of GPU versions
        of a given app.
    Implementation notes:
    - max app concurrency is enforced in 2 places:
        1) when building the initial job run list
        2) when enforcing the final job run list
        Both are needed to avoid possible starvation.
    - however, we don't enforce it during RR simulation.
        Doing so could cause erroneous shortfall and work fetch.
        This means, however, that work buffering will not work
        as expected if you're using max concurrency.
2013-03-04 15:20:32 +01:00
Charlie Fenton 687c8e1a5d Mac: fix build break.
svn path=/trunk/boinc/; revision=25842
2012-07-03 07:31:06 +00:00
David Anderson 430f6a0813 - client: in the job scheduler, there's a check to prevent
overcommitting the CPUs if an MT is scheduled.
    Skip this check for GPU jobs.


svn path=/trunk/boinc/; revision=25835
2012-07-02 17:58:33 +00:00