- Remove code that tries to keep track of available GPU RAM
and defer jobs that don't fit.
This never worked, it relied on project estimates of RAM usage,
and it's been replaced by having the app do temporary exit
if alloc fails.
- Move logic for checking for deferred jobs from CPU
to work fetch.
- Rename rsc_defer_sched to has_deferred_job,
and move it from PROJECT to RSC_PROJECT_WORK_FETCH
- tweak work_fetch_debug output
A month or two ago I added code to put user CPID in the project info
exported via GUI RPC, so that GUIs (like BoincTasks) could link
to user pages on stats sites.
However, I completely forgot that the CPID known to the client
(PROJECT::cross_project_id) is the "internal CPID",
while what gets exported to stats is the "external CPID",
which is MD5(internal CPID, email addr).
Solution: include the external CPID in scheduler replies,
store it in the client state file,
and export it in GUI RPCs as PROJECT::external_cpid.
This will eventually work for BoincTasks,
but only after projects update their server software,
and volunteers update their client software.
- If AMS supplies resource share, don't override it with
project setting (my last fix didn't quite do this)
- When detach from AMS, set RS to project-supplied value
A while back we added a mechanism intended to defer work-request RPCs
while file uploads are happening,
with the goal of reporting completed tasks sooner
and reducing the number of RPCs.
There were 2 bugs in this mechanism.
First, the decision of whether an upload is active was flawed;
if several uploads were active and 1 finished,
it would act like all had finished.
Second, when WORK_FETCH::choose_project.cpp() picks a project,
it sets p->sched_rpc_pending to RPC_REASON_NEED_WORK.
If we then decide not to request work because an upload
is active, we need to clear this field.
Otherwise scheduler_rpc_poll() will do an RPC to it,
piggybacking a work request and bypassing the upload check.
if a project sends us <no_rsc_apps> flags for all processor types,
then by default the client will never do a scheduler RPC to that project again.
This could happen because of a transient condition in the project,
e.g. it deprecates all its app versions for a while.
To avoid this situation, the client now checks whether the no_rsc_apps flags
are set for all processor types.
If they are, it clears them all.
This will cause work fetch to use backoff,
and the client will occasionally contact the project.
http://boinc.berkeley.edu/trac/wiki/ClientAppConfig
This lets users do the following:
1) limit the number of concurrent jobs of a given app
(e.g. for WCG apps that are I/O-intensive)
2) Specify the CPU and GPU usage parameters of GPU versions
of a given app.
Implementation notes:
- max app concurrency is enforced in 2 places:
1) when building the initial job run list
2) when enforcing the final job run list
Both are needed to avoid possible starvation.
- however, we don't enforce it during RR simulation.
Doing so could cause erroneous shortfall and work fetch.
This means, however, that work buffering will not work
as expected if you're using max concurrency.
report them.
64 is chosen a bit arbitrarily, but the idea is to
limit the number of tasks reported per RPC,
and to accelerate the reporting of small tasks.
Note: this fixes a major problem (starvation)
with project-level GPU exclusion.
However, project-level GPU exclusion interferes with most of
the client's scheduling policies.
E.g., round-robin simulation doesn't take GPU exclusion into account,
and the resulting completion estimates and device shortfalls
can be wrong by an order of magnitude.
The only way I can see to fix this would be to model each
GPU instance as a separate resource,
and to associate each job with a particular GPU instance.
This would be a sweeping change in both client and server.
- Allow projects to report "desired disk usage" (DDU).
If the client learns that a project wants disk space,
it can shrink the allocation to other projects.
- Base share computation on DDU rather than disk usage.
- Introduce the notion of "disk resource share".
This is defined (somewhat arbitrarily) as resource share
plus 1/10 of the largest resource share.
This is intended to ensure that even zero-share projects
get enough disk space to store app versions and data files;
otherwise they wouldn't be able to compute.
- server: use host.d_boinc_max (which wasn't being used)
to start d_project_share reported by client.
- volunteer storage: change the way hosts are allocated to chunks.
Allow hosts to store several chunks of the same file, if needed
svn path=/trunk/boinc/; revision=26052