(in sched_customize.cpp)
the flops_scale argument is intended to express the
GPU efficiency (actual/peak).
Pass appropriate values.
svn path=/trunk/boinc/; revision=24405
in the presence of GPU exclusions.
The problem was in the job-selection phase,
which picks enough jobs to use all devices.
It was ignoring GPU exclusions, so for example on
a 2 GPU system it could pick 2 jobs from a project
for which 1 GPU is excluded,
and as a result 1 GPU would be idle.
Solution: during job selection,
keep track of GPU usage on a per-instance basis.
Select a job only if it can run on a non-excluded GPU.
- client: in computing ncprocs_excluded (which is used in
work fetch policy) don't count exclusions of non-existent devices
svn path=/trunk/boinc/; revision=24316
and other operations.
You can now designate a user as "manager" for a particular app.
They can then:
- control job-submit permissions for that app
- deprecate/undeprecate versions of the app.
- abort jobs for that app
You can also designate a user a manage for the project.
They can then edit permissions and quotas,
as well as performing the app-specific functions for all apps.
This is described here:
http://boinc.berkeley.edu/trac/wiki/MultiUser#Accesscontrol
This required some changes to the DB schema.
svn path=/trunk/boinc/; revision=24250
is a "runtime outlier", i.e. its runtime does
not correspond to the job's rsc_fpops_est.
Runtime outliers are not counted in the statistics for
elapsed time, turnaround time, and peak FLOPs count.
The is intended for applications like SETI@home,
some of whose jobs finish more or less instantly
(this happens if the data contains a lot of interference).
If a host happens to get a bunch of these short jobs,
its statistics will get skewed: in essence, the server
will think that the host is extremely fast,
and will send it too many jobs.
svn path=/trunk/boinc/; revision=24225
- scheduler: when using elapsed time stats to predict runtime,
cap the estimated FLOPS at twice the peak FLOPS;
otherwise, if a host has received a lot of very short jobs
recently, it will get a too-high FLOPS estimate and
will exceed the rsc_fpops_bound limit.
svn path=/trunk/boinc/; revision=24128
which prevented the client from cleaning up
subprocesses of misbehaving multiprocess apps.
- remote job submission system:
assign physical names to input files (based on their MD5)
rather than having the user provide physical names
- VM apps: eliminate vbox64 plan class. Only vbox.
svn path=/trunk/boinc/; revision=23923
- client: cc_config.xml: if <devnum> is omitted from a <exclude_gpu>,
it means exclude all instances of that GPU type
- client: if all instances of a GPU type are excluded for a project,
don't ask the project for jobs of that type
svn path=/trunk/boinc/; revision=23898
- add fields to batch table, extend APIs accordingly
- require that example web interface run on BOINC server
(this makes many things easier;
an actual remote interface would require a bit more work)
svn path=/trunk/boinc/; revision=23881
and controlling batches of jobs
- web: add an administrative interface for controlling
user permissions for submitting jobs
- web: add an interface where users can view and control
their submitted jobs
See: http://boinc.berkeley.edu/trac/wiki/RemoteJobs
This is at a functional but rough stage.
svn path=/trunk/boinc/; revision=23762
Lets you specify, on a per-app basis,
that all instances should be done using the same app version.
This is for validation in the presence of GPUs.
- scheduler: code cleanup
- Instead of adding a bunch of non-DB fields to RESULT,
used a derived class SCHED_DB_RESULT.
- Instead of storing a pointer to BEST_APP_VERSION in RESULT,
store the structure itself.
This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
on <file_xfer_debug>
svn path=/trunk/boinc/; revision=23636
Additions to request message:
<not_started_dur>X</not_started_dur>
<in_progress_dur>X</in_progress_dur>
The estimated remaining duration of unstarted
and in-progress tasks
Additions to reply message, within <project>, optional:
<suspend>0|1</suspend>
suspend or resume project (overrides local state)
<abort_not_started>0|1</abort_not_started>
if set, abort unstarted jobs
svn path=/trunk/boinc/; revision=22698
Remove funky file-writing stuff - just use caching.
fixes#913
- web: include link to server status page on sample front page
svn path=/trunk/boinc/; revision=22361
It wasn't copying html/ops/db_update.php,
so it wasn't doing necessary DB updates.
Fixed this by always copying html/ops/*.
Even with this fix, there is a problem when using
the --server_only or --web_only options of upgrade:
if any DB updates are done, they may affect the
server code that's not being updated, resulting in crashes.
I added a warning message in this case,
recommending that a full upgrade be done.
svn path=/trunk/boinc/; revision=22200
That produced a messed-up query that assigned garbage values to:
host_app_version.turnaround_var
host_app_version.turnaround_q
host_app_version.max_jobs_per_day
host_app_version.consecutive_valid
To repair these:
- set turnaround_var and turnaround_q to zero
- if max_jobs_per_day is outside of
(0..config.daily_result_quota)
set it to config.daily_result_quota
- if consecutive_valid is outside (0..1000), set it to zero
I added a script, html/ops/repair_21812.php, that does this;
if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
(if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings
svn path=/trunk/boinc/; revision=21812
- whether host is "reliable" for an app version
- whether host is eligible for single replication for an app version
- whether to use host scaling
In each case, the answer is yes if the number of
consecutive valid results is above a threshold.
This replaces existing "error rate" and "scale probation" mechanisms.
TODO: the # of consecutive valid results should also determine
a limit on jobs in progress for an app version.
Namely, if N is the threshold for host scaling, the limit should be
ndevices*(max(1, consecutive_valid - N))
The client currently doesn't supply enough
app version info to do this.
It could be approximated; that would give some protection
against cherry-picking.
- credit: more conservative formulas for combining claimed credit
among replicas.
If there are normal replicas, we use a "low average"
that weights each sample by the sum of the other samples.
Otherwise we use the min (not the average) of the approximate samples.
NOTE: a DB update is required
svn path=/trunk/boinc/; revision=21230