A job is assigned a max runtime as:
max_elapsed_time = rp->wup->rsc_fpops_bound/rp->avp->flops
The purpose is to eventually abort jobs that are in an infinite loop.
Various problems (e.g. bad GPU peak FLOPS calculations)
can cause this limit to be too small, e.g. one second,
in which case the job is aborted almost immediately.
In this change, if the calculated limit is < 2 minutes,
it's assumed to be in error, a limit of 30 minutes is used instead,
and an event log message is written.
Of course the underlying problem still must be addressed.
But this change will, in some cases, prevent a situation where
thousands of jobs are dispatched and immediately aborted.
From PVS Studio:
V814
Decreased performance. The 'strlen' function was called multiple times inside the body of a loop.
https://www.viva64.com/en/w/V814/print
Signed-off-by: Vitalii Koshura <lestat.de.lionkur@gmail.com>
From PVS Studio:
V767
Suspicious access to element of 'msgs' array by a constant index inside a loop.
https://www.viva64.com/en/w/V767/print/
Signed-off-by: Vitalii Koshura <lestat.de.lionkur@gmail.com>
At some point we decided that OS reporting of mem usage for VM apps
was wrong, and we use wu.rsc_mem_usage instead.
Fix: use this only for running VM apps; for non-running, use zero.
Also, in mem usage print (mem_usage_debug) show whether the job is running.
There was a 20-30 second delay between exclusive app exit
and resuming tasks. This was excessive.
Reduce it to 5-15 sec (uncertainty is because we
check exclusive apps every 10 sec)
Previously, the dir_scan() function didn't distinguish between
- reaching the end of the directory
- errors
It just returned nonzero in either case.
This means that the function that cleans out a slot dir
(client_clean_out_dir())
could potentially return success even though the directory is nonempty.
This could potentially cause the recently-reported problem
where a slot dir contains a VM image from a previous job.
The following should apply to GPUs but not other coprocs (e.g. miner ASICs):
- "suspend GPUs" command in GUI
- prefs for suspending GPUs
- always removing app from memory when suspended
There was at least one case where we weren't cleaning up
subsidiary processes (e.g. VMs) when a task's main process exited.
Fix this by consolidating task cleanup (shared mem and subsidiary processes)
in ACTIVE_TASK::cleanup_task().
This gets called when a tasks' main process exits.
This addresses a problem w/ Bitcoin Utopia,
whose coprocessor app (run via the wrapper) doesn't expect a --device arg,
and fails if it gets one.
The --device mechanism has been superceded by APP_INIT_DATA.gpu_device_num.
GPU apps built with the current API and later should not expect a --device arg.
If a job reports its network usage (via boinc_network_usage()),
keep track of this across episodes of the job, and report it to the server
(some projects may want to give credit for network usage).
Also fixed a bug where, if a job was aborted while not running,
its final CPU and elapsed time weren't copied from ACTIVE_TASK to RESULT,
hence not sent to scheduler
I forgot that the wrapper has a 1-second poll for suspend and resume,
so sub-second throttling won't work properly for wrapper apps.
Revert to a variant of the old scheme,
in which the min of the suspended and resumed periods is 1 sec.
Also, fix task start/suspend/resume log messages.
Suspended tasks can be either left in memory (LIM) or removed
from memory (RFM).
CPU throttling always uses LIM.
Other types of suspension (e.g. user request) use LIM or RFM
depending on user prefs, except that RFM is always used for GPU tasks.
There was a bug: if tasks were suspended because of CPU throttling,
and then the user suspended activity,
GPU apps would remain LIM.
They need to be RFM.
On Windows, the working-set size reported by the OS for VM apps is too low.
Apparently the RAM usage is in fact roughly the VM size.
This can lead to running multiple VM apps,
which use more RAM than is available, causing performance problems.
Solution: use workunit.rsc_memory_bound as the working set size for VM apps.
(Note: for now, a VM app is one where the plan class includes "vbox").