A result with a lot of failed uploads could overflow a 4K buffer.
Change report_result_error() so you just pass it the error message,
rather than va_args nonsense.
It was a BOINC RasPi forum entry that let me revisit chrt and scheduler policies+priorities. The tasks are executed (if !highpriority) with the SCHED_BATCH policy. This also let me unearth again that SCHED_IDLE is what in the very early UNIX days was reached by renicing to the highest values but today SCHED_IDLE is less of a priority than SCHED_BATCH with nicelevel 19. I just executed the "chrt -i ..." manually and it seems just fine. Graphics still work. Communication with the BOINC servers is performed with a higher priority, which lets me think that we should expect any problem. So I suggest to have SCHED_IDLE the default for compute jobs.
Best,
Steffen
If a job has an output file with <copy_file> and <optional>,
and it doesn't create the file,
then the call to boinc_rename() (to move it to the project dir) fails,
and we back off and retry.
Solution: in boinc_rename(), if the rename fails,
check if the file exists, and if it doesn't then don't retry.
Also:
- when writing client messages, use the actual current time
(dtime()) rather than client_state.now.
- write log msgs when output file renames fail
This addresses a problem w/ Bitcoin Utopia,
whose coprocessor app (run via the wrapper) doesn't expect a --device arg,
and fails if it gets one.
The --device mechanism has been superceded by APP_INIT_DATA.gpu_device_num.
GPU apps built with the current API and later should not expect a --device arg.
For now, handle AMD/ATI, NVIDIA or Intel GPUs as before. But for other, "new" vendors, we treat each device as a separate resource, creating an entry for each instance in the COPROCS::coprocs[] array and copying the device name COPROC::opencl_prop.name into the COPROC::type field (instead of the vendor name.)
For devices from "new" vendors, set <gpu_type> field in init_data.xml file to the vendor string supplied by OpenCL. This should allow boinc_get_opencl_ids() to work correctly with these "new" devices without modification.
Apparently on some Win 7/8 with SSD drives, CreateProcess() sometimes returns
ERROR_NOT_ENOUGH_MEMORY; presumably it then allocates more swap space.
Treat this case using the "temporary exit" logic:
delay for 10 min, then try again.
If 100 failures, abort task.
Note: not tested. This may be a bad idea.
If a job reports its network usage (via boinc_network_usage()),
keep track of this across episodes of the job, and report it to the server
(some projects may want to give credit for network usage).
I forgot that the wrapper has a 1-second poll for suspend and resume,
so sub-second throttling won't work properly for wrapper apps.
Revert to a variant of the old scheme,
in which the min of the suspended and resumed periods is 1 sec.
Also, fix task start/suspend/resume log messages.
Various bad things could happen when CPU throttling was used together w/ GPU apps.
Examples:
- on a multi-GPU system, several GPU tasks are assigned to the same GPU
- a suspended GPU task remains in memory (tying up its GPU resources)
while other tasks try to use the GPU.
The problem was that parts of the code assumed that suspended
GPU processes don't exist - i.e. that when a GPU task is suspended
it's always removed from memory.
This isn't true in the presence of CPU throttling.
So I made the following changes:
- When assigning GPUs to tasks, treat suspended tasks like running tasks
(i.e. reserve their GPUs)
- At the end of the CPU-scheduling logic, if there are any GPU tasks
that are suspended and not scheduled, remove them from memory,
and trigger a reschedule so we can reallocate their GPUs.
Also, a cosmetic change: in the resource usage string shown in the GUI,
include "(device X)" even if the task is suspended (i.e. because of throttling).
Also: zero out COPROC::opencl_device_indexes[] so we don't write
a garbage number to init_data.xml for non-OpenCL jobs
- If you run the client with --run_test_app,
runs "test_app" in the current directory and interacts with it
(and does nothing else).
It can suspend/resume it with arbitrary timing;
this is controlled in run_test_app() (app_start.cpp).
- example app: add --critical_section option.
This lets you test the runtime system for apps that do
most of their work in a critical section (like GPU apps).
- Add some logging messages (conditioned by DEBUG_BOINC_API)
to the runtime system.
- boinc_finish() waits for the timer thread to write final messages;
make sure it doesn't do anything else
(like suspend the worker thread) during this period