Note: this fixes a major problem (starvation)
with project-level GPU exclusion.
However, project-level GPU exclusion interferes with most of
the client's scheduling policies.
E.g., round-robin simulation doesn't take GPU exclusion into account,
and the resulting completion estimates and device shortfalls
can be wrong by an order of magnitude.
The only way I can see to fix this would be to model each
GPU instance as a separate resource,
and to associate each job with a particular GPU instance.
This would be a sweeping change in both client and server.
Old: heartbeat mechanism
Problem: if the client is blocked for > 30 secs
(e.g. because it takes a long time to write the state file,
of because it's stopped in a debugger)
then apps exit.
This is bad is the app doesn't checkpoint and has been
running for a long time.
New: the client passes its PID to the app.
The app periodically (10 sec) checks that the process still exists.
Notes:
- For backward compatibility (e.g. new API w/ old client,
or vice versa) the client still sends heartbeats,
and the API checks heartbeats if the client doesn't pass a PID.
- The new mechanism works only if the client's PID isn't assigned
to a new process within 10 secs of the client exiting.
Windows 2000 reuses PIDs immediately, so check for Win2K
and don't use this mechanism if so.
TODO: For Unix multithread apps,
critical sections aren't currently being enforced.
Need to fix this by masking signals.
svn path=/trunk/boinc/; revision=26147
- It was possible if all results for a workunit were PFC_MODE_INVALID
that NaN pfc would be used causing database update errors. Solved
by using wu_estimated_pfc() as pfc in that case.
- Sanity check was comparing raw_pfc directly to rsc_fpops_bound. That
was causing problems GPUs with high performance estimates. Fixed by
including the app_version scale factor in the check. I thought I had
already committed this...
- Removed a few lines of commented out experimental code accidentally
comitted earlier.
- Committed to git repository on 8/24
svn path=/trunk/boinc/; revision=26144
clear the suspend_request flag.
Otherwise we'll end up doing two suspends,
and on Win the app will be suspended forever.
svn path=/trunk/boinc/; revision=26143
- change default disk prefs to:
- no absolute limit on disk usage (we need to work with future disks)
- keep 100 MB min free space
- use up to 90% of total space
svn path=/trunk/boinc/; revision=26141
In LLS array pass, skip file-on-host check if host
doesn't have any sticky files.
TODO: it should actually be "any sticky files for this app".
But we currently don't have any way to know that.
svn path=/trunk/boinc/; revision=26108
cards. It appears that the Nvidia API was only setting 32-bits
of the 64-bit value. The remaining 32-bits were whatever
was on the stack.
client/
gpu_nvidia.cpp
svn path=/trunk/boinc/; revision=26084
We were failing to mark the cache entries as free.
- API: initialize GPU device # to -1;
If client doesn't give us a device number, something is wrong
and it's better to not start computing.
svn path=/trunk/boinc/; revision=26079
We were using a static BEST_APP_VERSION in
check_homogeneous_app_version(),
and it wasn't being initialized on each call
(e.g. its HOST_USAGE was not being cleared).
svn path=/trunk/boinc/; revision=26076
(but not all) wasn't finished.
New logic: if the project has an NCI app then:
- make a list of NCI apps for which the client doesn't have
a job in progress.
- try to send one job for each of these apps
- do this even if no work is being requested.
- don't send jobs for NCI apps by other mechanisms
NOTE: the client logic isn't quite right for mixed NCI projects.
If there's no job for a given NCI app,
the client should do a scheduler RPC.
This isn't critical so we won't do this now.
svn path=/trunk/boinc/; revision=26068
cmdline tool for remote job submission (not done)
- remote job submission: support the 4 file modes described
in the documentation (not done)
svn path=/trunk/boinc/; revision=26067
and non-CPU-intensive applications.
An app can be specified as non-CPU-intensive in project.xml,
and this attribute can be set or cleared using the admin web interface.
Note: support for this was added to the client in 2011,
but we didn't add server-side support at that time.
This change is in 6.12 and later clients.
svn path=/trunk/boinc/; revision=26060
- add a config item vda_host_timeout.
A host that hasn't done a scheduler RPC for this long
is considered dead.
- a host that's not running a version 7+ client is considered dead
- host.cpu_efficiency (an otherwise unused field) is used
as a flag for dead hosts
- the scheduler clears the flag if the client is v7+
- vdad sets the flag for hosts where last RPC is old
- before choosing a host for chunk download,
vdad checks its client version.
svn path=/trunk/boinc/; revision=26059
- Allow projects to report "desired disk usage" (DDU).
If the client learns that a project wants disk space,
it can shrink the allocation to other projects.
- Base share computation on DDU rather than disk usage.
- Introduce the notion of "disk resource share".
This is defined (somewhat arbitrarily) as resource share
plus 1/10 of the largest resource share.
This is intended to ensure that even zero-share projects
get enough disk space to store app versions and data files;
otherwise they wouldn't be able to compute.
- server: use host.d_boinc_max (which wasn't being used)
to start d_project_share reported by client.
- volunteer storage: change the way hosts are allocated to chunks.
Allow hosts to store several chunks of the same file, if needed
svn path=/trunk/boinc/; revision=26052
initial work request to a project
- client: put some casts to double in NVIDIA detect code.
Shouldn't make any difference.
- volunteer storage: truncate file to right size after retrieval
svn path=/trunk/boinc/; revision=26051
Do first read from socket before opening the disk file
(an attempt to fix filesystem lockups on WCG).
Increase buffer size from 16KB to 256KB.
svn path=/trunk/boinc/; revision=26046
allow it to fetch work of that type if the # of runnable
jobs it <= the # of non-excluded instances (rather than 0).
svn path=/trunk/boinc/; revision=26045
while writing to them.
It's not clear to me that this locking is beneficial,
and it may be causing filesystem problems at WCG
- volunteer storage stuff
svn path=/trunk/boinc/; revision=26021
Lets application specify a min checkpoint interval.
The actual min checkpoint interval is the max of this
and the user-specified pref for min disk interval.
svn path=/trunk/boinc/; revision=26005
for a reason other than work fetch,
and we're deciding whether to piggyback a work request,
skip the checks for hysteresis (buffer < min)
and for per-resource backoff time.
These checks are there only to limit the rate of RPCs,
which is not relevant since we're doing one any.
This fixes a bug where a project w/ sporadic jobs specifies
a next_rpc_delay to ensure regular polling from clients.
When these polls occur they should request work regardless of backoff.
svn path=/trunk/boinc/; revision=26002
"cpu" in XML, and other code was looking for "CPU".
To fix this and prevent similar problems,
processor type names are now encapsulated in proc_type_name_xml().
Code should use this rather than having hard-wired names.
Redefine: GPU_TYPE_* as macros that call proc_type_name_xml().
svn path=/trunk/boinc/; revision=25996
guest VM is 64-bit. 64-bit guest vms require hardware virtualization
and should fail without it.
- VBOX: Implement the <copy_to_shared/> directive in the vbox_job.xml file.
if <copy_to_shared>init_data.xml</copy_to_shared> is set, the wrapper will
copy the init_data.xml file to the shared directory before the VM is launched.
svn path=/trunk/boinc/; revision=25973
the power management window proc, it was removed during one of the Win9x
code scrubs. When we see it, inform the client it is time to shutdown.
client/
sysmon_win.cpp
svn path=/trunk/boinc/; revision=25882
- allow to augment CFLAGS and CXXFLAGS
- allow to at least set DEBUG flags externally, such that
backtrace flags can be used
- minor textual fixes (whitespace error, typo in comment)
svn path=/trunk/boinc/; revision=25881
keep the RESULT record so that we can report it to the scheduler.
Otherwise we'll keep getting the same job if the project has
<resend_lost_results> set.
svn path=/trunk/boinc/; revision=25879
and change types of mem-size fields from int to double.
These fields are size_t in NVIDIA's version of this;
however, cuDeviceGetAttribute() returns them as int,
so I don't see where this makes any difference.
- client: fix bug in handling of <no_rsc_apps> element.
- scheduler: message tweaks.
Note: [foo] means that the message is enabled by <debug_foo>.
svn path=/trunk/boinc/; revision=25849
Otherwise it doesn't work for files >= 2GB
- Client: TIME_STATS::trim_stats_log() wasn't working because
it's called in the constructor of TIME_STATS,
which is called before we've done a chdir() to the data dir.
Note: for this reason, no disk access should be done in constructors
of global objects. A quick scan found no instances of this.
svn path=/trunk/boinc/; revision=25846
- Fix various #include issues.
CODING STYLE LAW (minimal inclusion principle):
If foo.cpp requires <blah.h>,
#include <blah.h> in foo.cpp, NOT foo.h
svn path=/trunk/boinc/; revision=25837
and there's a simple reason
(e.g. the project is suspended, no-new-tasks, downloads stalled, etc.)
show it in the event lot.
If the reason is more complex, don't try to explain.
svn path=/trunk/boinc/; revision=25827
- validator: add some sanity-checking for credit,
to prevent granting 1e38 credit.
max_granted_credit now defaults to the equivalent of 1 TeraFLOP-year.
Instances that exceed this are not counted in the credit
calculation, and a critical-mode log message is written
- wrapper: remove wall_cpu_time; not used anymore
svn path=/trunk/boinc/; revision=25825
- feeder: don't enumerate results for WUs with nonzero error_mask
- scheduler: in slow_check(), make sure the WU error_mask is still zero
svn path=/trunk/boinc/; revision=25822
1) a network connection is available and
2) network communication is allowed and
3) CPU computation is allowed
- If an app version is marked as needs_network,
use the above fraction in estimating its rate of progress
- replace "core client" with "client" in comments.
- scheduler: message tweaks
svn path=/trunk/boinc/; revision=25803
a given app that have different platforms and different version #s.
The client was erroneously deleting the one w/ the lower version
when it was no longer in use.
Fix: in garbage collection, consider one version to supercede another
only if they have the same platform
svn path=/trunk/boinc/; revision=25770
- added the definitions for the new Windows 7/2008r2 preSP1
and Windows 8/2012 SKUs based on the winnt.h
from the Windows 8 RC SKD (also added as proof)
- added the detection for some more Windows SKU
- Updates provided by Teamwork of Planet3Dnow.de to coproc_detect.cpp
- added CAL_TARGET_ID 21 as : AMD Radeon HD 78x0 series (Pitcairn)
(from [P3D] Crashtest)
svn path=/trunk/boinc/; revision=25760
moved app_ipc.h inclusion outside __cplusplus
since it contains important C mode prototypes
(boinc_resolve_filename() etc.)
svn path=/trunk/boinc/; revision=25752
- consistently accept both 'ati' and 'amd' for AMD/ATI plan classes
- in OpenCL plan classes always use device memory reported via OpenCL
(might be different e.g. from what's available/reported via CUDA)
- comment formatting
svn path=/trunk/boinc/; revision=25744
performed for a particular app version. It is not necessary
to tell the user to upgrade the client just to suite the needs of
a particular app version if this app version requires resources
that the host dosn't have or didn't request work for.
Actually I don't think it's good to tell the user he needs to
upgrade the client if there is only one particular app version
that requires a more recent one than he has. I think that the
purpose of the g_wreq->outdated_client flag was checking the
min_core_version in the project configuration. For this the
flag and the notice/message that it triggers is still ok. But
in the app version checks setting this flag leads to misleading
messages in most cases, so I commented that out for now.
I'm not sure, though, that both of these measurements are needed.
svn path=/trunk/boinc/; revision=25742
Both are for use by project.
- job submission file sandbox: don't delete physical file
when delete sandbox entry.
We'll have to figure out how to garbage-collect physical files.
- LAMMPS job submission:
use the 50th-percentile host,not 0th
svn path=/trunk/boinc/; revision=25734
This reruns validation for instances that are successful
but marked as invalid or inconclusive.
Use this if you changed your validator to be more permissive,
and you want to grant credit for instances that were
originally marked as invalid.
svn path=/trunk/boinc/; revision=25714
but checks for the "stop_daemons" trigger file every 1 sec.
Use this instead of sleep() in daemons.
This will speed up bin/stop.
svn path=/trunk/boinc/; revision=25708
lets an application report its network usage to BOINC,
and hence take it into account with monthly limits etc.
- API: get rid of deprecated boinc_ops_per_cpu_sec(),
boinc_ops_cumulative(), and
boinc_set_credit_claim();
- admin web: update manage_apps.php;
add the ability to set homogeneous app version
svn path=/trunk/boinc/; revision=25700
will get a bunch of work in a future commit.
clientgui/
ProjectInfoPage.cpp, .h
clientgui/res/
openclicon.xpm (deleted)
multicore.xpm (deleted)
svn path=/trunk/boinc/; revision=25696
Set it to the timezone specified by the constant TIMEZONE
(in project.inc) or "UTC" if none specified.
- web: fix bugs in submit.php
svn path=/trunk/boinc/; revision=25693
- client: Update the stock all_project_list.xml file we send out
with new client software.
clientgui/res/
openclicon.xpm
win_build/installerv2/redist/
all_projects_list.xml
svn path=/trunk/boinc/; revision=25679
new port number to work with convert it from network byte order
(big endian) to the host byte order (little endian on x86/x64 processors).
samples/vboxwrapper/
vbox.cpp
svn path=/trunk/boinc/; revision=25671