Create a file "file_delete_regex" in your project dir.
Each line is a regular expression.
Any sticky file whose name matches one of the expressions is deleted.
Old: each scheduler process holds a semaphore
while scanning the shared-mem job array.
On machines with many CPUs
there seems to be contention for this semaphore,
causing slow scheduler response and possibly connection failures.
New: Don't hold the semaphore while scanning array.
Instead, if find a job that passes quick_check(),
acquire the semaphore and recheck that the job is present in array
and passes quick_check().
- client: show messages if app_config.xml has unrecognized tags
to account for systematic errors in FLOP count
- adjust_user_priority: get total project RAC by summing RAC
of app versions where RAC has been updated in past week
- feeder: add --priority_asc option
(for when wu.priority is a logical time)
- don't use devices for which work is not being requested
- obey wu_is_infeasible_custom()
(e.g. don't send SETI@home VLAR jobs to GPUs)
- scheduler: add <debug_array_detail> log flag for slot-level messages
- admin web: show and allow control of app.beta
- It was possible if all results for a workunit were PFC_MODE_INVALID
that NaN pfc would be used causing database update errors. Solved
by using wu_estimated_pfc() as pfc in that case.
- Sanity check was comparing raw_pfc directly to rsc_fpops_bound. That
was causing problems GPUs with high performance estimates. Fixed by
including the app_version scale factor in the check. I thought I had
already committed this...
- Removed a few lines of commented out experimental code accidentally
comitted earlier.
- Committed to git repository on 8/24
svn path=/trunk/boinc/; revision=26144
In LLS array pass, skip file-on-host check if host
doesn't have any sticky files.
TODO: it should actually be "any sticky files for this app".
But we currently don't have any way to know that.
svn path=/trunk/boinc/; revision=26108
We were failing to mark the cache entries as free.
- API: initialize GPU device # to -1;
If client doesn't give us a device number, something is wrong
and it's better to not start computing.
svn path=/trunk/boinc/; revision=26079
We were using a static BEST_APP_VERSION in
check_homogeneous_app_version(),
and it wasn't being initialized on each call
(e.g. its HOST_USAGE was not being cleared).
svn path=/trunk/boinc/; revision=26076
(but not all) wasn't finished.
New logic: if the project has an NCI app then:
- make a list of NCI apps for which the client doesn't have
a job in progress.
- try to send one job for each of these apps
- do this even if no work is being requested.
- don't send jobs for NCI apps by other mechanisms
NOTE: the client logic isn't quite right for mixed NCI projects.
If there's no job for a given NCI app,
the client should do a scheduler RPC.
This isn't critical so we won't do this now.
svn path=/trunk/boinc/; revision=26068
cmdline tool for remote job submission (not done)
- remote job submission: support the 4 file modes described
in the documentation (not done)
svn path=/trunk/boinc/; revision=26067
and non-CPU-intensive applications.
An app can be specified as non-CPU-intensive in project.xml,
and this attribute can be set or cleared using the admin web interface.
Note: support for this was added to the client in 2011,
but we didn't add server-side support at that time.
This change is in 6.12 and later clients.
svn path=/trunk/boinc/; revision=26060
- add a config item vda_host_timeout.
A host that hasn't done a scheduler RPC for this long
is considered dead.
- a host that's not running a version 7+ client is considered dead
- host.cpu_efficiency (an otherwise unused field) is used
as a flag for dead hosts
- the scheduler clears the flag if the client is v7+
- vdad sets the flag for hosts where last RPC is old
- before choosing a host for chunk download,
vdad checks its client version.
svn path=/trunk/boinc/; revision=26059
- Allow projects to report "desired disk usage" (DDU).
If the client learns that a project wants disk space,
it can shrink the allocation to other projects.
- Base share computation on DDU rather than disk usage.
- Introduce the notion of "disk resource share".
This is defined (somewhat arbitrarily) as resource share
plus 1/10 of the largest resource share.
This is intended to ensure that even zero-share projects
get enough disk space to store app versions and data files;
otherwise they wouldn't be able to compute.
- server: use host.d_boinc_max (which wasn't being used)
to start d_project_share reported by client.
- volunteer storage: change the way hosts are allocated to chunks.
Allow hosts to store several chunks of the same file, if needed
svn path=/trunk/boinc/; revision=26052
Do first read from socket before opening the disk file
(an attempt to fix filesystem lockups on WCG).
Increase buffer size from 16KB to 256KB.
svn path=/trunk/boinc/; revision=26046
while writing to them.
It's not clear to me that this locking is beneficial,
and it may be causing filesystem problems at WCG
- volunteer storage stuff
svn path=/trunk/boinc/; revision=26021
- When a normal and an approx result are compared the normal result now gets
double weight in a pegged_avg() with any approx results. "Normal mode" GPU
results are frequently resulting in bad credit values for yet undetermined
reasons. Since GPUs are so fast, there can be a lot of bad values in a
short time. Including the prior average and another result even seems to
prevent this in many case.
- Replaced many of the if { msg_log.printf } with msg_log.cond_printf()
- Accidentally changed some of the formatting when trying a new editor that
apparently autoformats. Sorry for the extra diff lines.
- There's a problem with pfc calculation for hosts whose credit calculation
fails the sanity check. This has been a problem for a long time. Because the
result fails the sanity check, the host_app_version pfc is never updated.
Because hav.pfc is never updated, the credit calculation continues to be
wrong. To quote Quinn, it's like one of those viscious things. I hope to
fix that soon.
svn path=/trunk/boinc/; revision=25999
"cpu" in XML, and other code was looking for "CPU".
To fix this and prevent similar problems,
processor type names are now encapsulated in proc_type_name_xml().
Code should use this rather than having hard-wired names.
Redefine: GPU_TYPE_* as macros that call proc_type_name_xml().
svn path=/trunk/boinc/; revision=25996
proximity to the average estimate. This reduces the number of results that
are granted extremely low credit when a new app_version is released and (I
hope) improves work/duration estimates by speeding up the convergence of app
versions stats. I may try using this in lieu of low_average for normal
result, but haven't yet.
svn path=/trunk/boinc/; revision=25953
and change types of mem-size fields from int to double.
These fields are size_t in NVIDIA's version of this;
however, cuDeviceGetAttribute() returns them as int,
so I don't see where this makes any difference.
- client: fix bug in handling of <no_rsc_apps> element.
- scheduler: message tweaks.
Note: [foo] means that the message is enabled by <debug_foo>.
svn path=/trunk/boinc/; revision=25849
- Fix various #include issues.
CODING STYLE LAW (minimal inclusion principle):
If foo.cpp requires <blah.h>,
#include <blah.h> in foo.cpp, NOT foo.h
svn path=/trunk/boinc/; revision=25837
- validator: add some sanity-checking for credit,
to prevent granting 1e38 credit.
max_granted_credit now defaults to the equivalent of 1 TeraFLOP-year.
Instances that exceed this are not counted in the credit
calculation, and a critical-mode log message is written
- wrapper: remove wall_cpu_time; not used anymore
svn path=/trunk/boinc/; revision=25825
- feeder: don't enumerate results for WUs with nonzero error_mask
- scheduler: in slow_check(), make sure the WU error_mask is still zero
svn path=/trunk/boinc/; revision=25822
1) a network connection is available and
2) network communication is allowed and
3) CPU computation is allowed
- If an app version is marked as needs_network,
use the above fraction in estimating its rate of progress
- replace "core client" with "client" in comments.
- scheduler: message tweaks
svn path=/trunk/boinc/; revision=25803
- consistently accept both 'ati' and 'amd' for AMD/ATI plan classes
- in OpenCL plan classes always use device memory reported via OpenCL
(might be different e.g. from what's available/reported via CUDA)
- comment formatting
svn path=/trunk/boinc/; revision=25744
performed for a particular app version. It is not necessary
to tell the user to upgrade the client just to suite the needs of
a particular app version if this app version requires resources
that the host dosn't have or didn't request work for.
Actually I don't think it's good to tell the user he needs to
upgrade the client if there is only one particular app version
that requires a more recent one than he has. I think that the
purpose of the g_wreq->outdated_client flag was checking the
min_core_version in the project configuration. For this the
flag and the notice/message that it triggers is still ok. But
in the app version checks setting this flag leads to misleading
messages in most cases, so I commented that out for now.
I'm not sure, though, that both of these measurements are needed.
svn path=/trunk/boinc/; revision=25742
but checks for the "stop_daemons" trigger file every 1 sec.
Use this instead of sleep() in daemons.
This will speed up bin/stop.
svn path=/trunk/boinc/; revision=25708
in send_work_setup(), so that they run before lost jobs are resent.
Otherwise lost jobs could get sent using an app version
that's prohibited by prefs
svn path=/trunk/boinc/; revision=25604
if you have
*pbuf = HeapAlloc(...)
then you need
if (*pbuf == NULL)
not
if (pbuf == NULL)
- various code cleanups from
- Makefile.am: don't include clientgui/res; nothing to make there
from Steffen Moeller
svn path=/trunk/boinc/; revision=25599
on each request.
- client: when showing how much work a scheduler request returned,
scale by availability (as is done to show the amount of the request)
- client in account manager request, <not_started_dur> and
<in_progress_dur> are in wall time, not run time
(i.e. scale them by availability)
Note: there's some confusion in the code between runtime and wall time,
where in general wall time = runtime / availability.
New convention: let's use "runtime" for the former,
and "duration" for the latter.
svn path=/trunk/boinc/; revision=25597
(or at least the last double).
This accommodates a particular application (LAMMPS)
that can only append to this file.
- CAS@home stuff
svn path=/trunk/boinc/; revision=25557
If you link your functions (init_result(), compare_results(),
cleanup_result()) with validate_test.cpp,
you'll get a program that you can run as
validate_test file1 file2
and it will compare the two files
(this works only for validators that expect 1 file per result).
I added a makefile, sched/makefile_validator_test,
that you can use for this.
- server: shuffle code so that the above doesn't need to
link MySQL libraries
- client: if we fetch a master file and it contains no scheduler URLs,
show a message of class INTERNAL_ERROR
- client/scheduler: make CUDA_DEVICE_PROP.totalGlobalMem a double,
and remove dtotalGlobalMem.
Although NVIDIA reports RAM size as a size_t,
there's no reason to store it as an integer after that.
svn path=/trunk/boinc/; revision=25542
the first few jobs of a new application
(in wu_estimated_pfc(), only multiply by app.min_avg_pfc
if it's nonzero).
svn path=/trunk/boinc/; revision=25484
in plan_class_spec by using coproc_pref() and capped_host_fpops()
(moved coproc_perf() to sched_customize.h to make it available
in plan_class_spec.cpp, and cleaned up includes)
svn path=/trunk/boinc/; revision=25467
for which the anonymous-platform client doesn't have a version)
mark it as sent so the transitioner can do its thing
svn path=/trunk/boinc/; revision=25461
make per-HR slot allocation an option rather than the default.
Kevin reported that slot allocation wasn't working for WCG.
The default is now no slot allocation,
and use the regular result enumeration function
rather than the once that scans the entire table.
The config flag for enabling slot allocation is <hr_allocate_slots/>.
svn path=/trunk/boinc/; revision=25432
Previously (little known) the scheduler could be hacked to preserve
the sched_request.xml and sched_reply.xml in own directories
(you had to modify the initial value of use_files in sched_main.cpp).
This feature could now be switched on and off on the fly just by
changing the project config.
When there is an (existing) directory configured as
<debug_req_reply_dir>, each schduler instance will write three
files in there: PID_C_sched.log, PID_C_sched_request.xml and (if all
goes well) PID_C_sched_reply.xml. PID is the process id of this
scheduler instance, C is an internal counter within the process if
FCGI is used. The sched.log will contain nothing else than the
pid and the IP address of the client. This should allow for
identifying the scheduler instance responsible for a given
apache error log message ("premature end of script headers") when
a scheduler crashed. sched_request.xml (obviously) is the scheduler
request, and if the scheduler doesn't crash in between, there will
also be the reply to the client kept in sched_reply.xml
Remove the <debug_req_reply_dir> tag from the project config
to turn this feature off.
svn path=/trunk/boinc/; revision=25349
one instance together in the scheduler.log when multiple instances are
running. Currently the buffer has a fixed size of 32768 charaters.
On one hand with much debug output this buffer may turn out to be
too small. OTOH the log of this instance is completely lost in case
of a crash, which doesn't help with debugging. Thus make the
scheduler log buffer size configurable using the tag
<scheduler_log_buffer> in project config. The default value is
still the old size (32768), set it to 0 to disable buffering
completely, e.g. for debugging.
svn path=/trunk/boinc/; revision=25348