and other operations.
You can now designate a user as "manager" for a particular app.
They can then:
- control job-submit permissions for that app
- deprecate/undeprecate versions of the app.
- abort jobs for that app
You can also designate a user a manage for the project.
They can then edit permissions and quotas,
as well as performing the app-specific functions for all apps.
This is described here:
http://boinc.berkeley.edu/trac/wiki/MultiUser#Accesscontrol
This required some changes to the DB schema.
svn path=/trunk/boinc/; revision=24250
is a "runtime outlier", i.e. its runtime does
not correspond to the job's rsc_fpops_est.
Runtime outliers are not counted in the statistics for
elapsed time, turnaround time, and peak FLOPs count.
The is intended for applications like SETI@home,
some of whose jobs finish more or less instantly
(this happens if the data contains a lot of interference).
If a host happens to get a bunch of these short jobs,
its statistics will get skewed: in essence, the server
will think that the host is extremely fast,
and will send it too many jobs.
svn path=/trunk/boinc/; revision=24225
This assigns credit proportional to runtime*p_fpops.
To prevent cheating, p_fpops is capped at the 95th percentile value
among active hosts,
and runtime is capped at a specified limit.
This option supports apps, like LHC's CERNvm app,
that run for a certain amount of time and then exit.
The CreditNew system doesn't work for such apps.
- trickle_credit:
To prevent cheating,
cap p_fpops at the 95th percentile value among active hosts,
and require a limit on runtime.
- require that trickle handlers supply an initialization function
svn path=/trunk/boinc/; revision=24182
Otherwise, if you have a deprecated app with >= 200 jobs
(200 is the query's limit)
it could always get jobs for that app,
and never put anything into the cache.
svn path=/trunk/boinc/; revision=24142
- scheduler: when using elapsed time stats to predict runtime,
cap the estimated FLOPS at twice the peak FLOPS;
otherwise, if a host has received a lot of very short jobs
recently, it will get a too-high FLOPS estimate and
will exceed the rsc_fpops_bound limit.
svn path=/trunk/boinc/; revision=24128
Add parsed_tag and is_tag to the class,
so that parsing functions don't need to declare them
and pass them around.
- Complete the task of using XML_PARSER as the argument
to all parsing functions.
(Internally, many of these functions still use the old XML parser;
that's the next step.)
svn path=/trunk/boinc/; revision=23978
- add fields to batch table, extend APIs accordingly
- require that example web interface run on BOINC server
(this makes many things easier;
an actual remote interface would require a bit more work)
svn path=/trunk/boinc/; revision=23881
(structure, not table) for AQUA
- client, Windows: when wake up from hibernation,
get the time before printing log msg
svn path=/trunk/boinc/; revision=23784
and controlling batches of jobs
- web: add an administrative interface for controlling
user permissions for submitting jobs
- web: add an interface where users can view and control
their submitted jobs
See: http://boinc.berkeley.edu/trac/wiki/RemoteJobs
This is at a functional but rough stage.
svn path=/trunk/boinc/; revision=23762
(so that they don't have to be 1 element/line)
and also allow optional <input_template> root element
- fix bug in WORKUNIT DB interface
svn path=/trunk/boinc/; revision=23648
Lets you specify, on a per-app basis,
that all instances should be done using the same app version.
This is for validation in the presence of GPUs.
- scheduler: code cleanup
- Instead of adding a bunch of non-DB fields to RESULT,
used a derived class SCHED_DB_RESULT.
- Instead of storing a pointer to BEST_APP_VERSION in RESULT,
store the structure itself.
This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
on <file_xfer_debug>
svn path=/trunk/boinc/; revision=23636
- protect malloc.h
- validator: allow to update 'random' result field
- assimilator: add global variables results_prefix and transcripts_prefix
svn path=/trunk/boinc/; revision=23241
whether news items are exported as notices.
The creator of a news item is shown a "Don't export" or "Export"
button on the thread page.
By default, news items are exported.
svn path=/trunk/boinc/; revision=23119
for some WUs
- back end: fix the way "report grace period" is implemented
old: result.report_deadline (i.e. what's in the DB) and
the deadline sent to the client are the same.
Some confusing and incorrect logic in the transitioner
tries to provide the desired semantics.
new: result.report_deadline is the deadline sent to the client,
plus the grace period.
No logic in the transitioner is needed.
svn path=/trunk/boinc/; revision=23040
(for many platforms) from samples/example_app/bin
- make_project: change name of example app from uppercase to example_app.
- update_versions: allow version numbers to not have decimal points
- sample work generator: make app name and template files
command-line options;
default to "example_app", "example_app_in.xml", "example_app_out.xml"
svn path=/trunk/boinc/; revision=22667
That produced a messed-up query that assigned garbage values to:
host_app_version.turnaround_var
host_app_version.turnaround_q
host_app_version.max_jobs_per_day
host_app_version.consecutive_valid
To repair these:
- set turnaround_var and turnaround_q to zero
- if max_jobs_per_day is outside of
(0..config.daily_result_quota)
set it to config.daily_result_quota
- if consecutive_valid is outside (0..1000), set it to zero
I added a script, html/ops/repair_21812.php, that does this;
if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
(if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings
svn path=/trunk/boinc/; revision=21812
Old: config.xml specifies an initial daily quota (say, 100).
Each host_app_version starts out with this quota.
On the return of a SUCCESS result,
the quota is doubled, up to the initial value.
On the return of an error result, or a timeout,
the quota is decremented down to 1.
Problem:
Doesn't accommodate hosts that can do more than 100 jobs/day.
New: similar, but
- on validation of a job, daily quota is incremented.
- on invalidation of a job, daily quota is decremented.
- on return of an error result, or a timeout,
daily quota is min'd with initial quota, then decremented.
Notes:
- This allows a host to have an unboundedly large quota
as long as it continues to return more valid
than invalid results.
- Even with this change, hosts that return SUCCESS but
invalid results will continue to get the initial daily quota.
It would be desirable to reduce their quota to 1.
svn path=/trunk/boinc/; revision=21675
- whether host is "reliable" for an app version
- whether host is eligible for single replication for an app version
- whether to use host scaling
In each case, the answer is yes if the number of
consecutive valid results is above a threshold.
This replaces existing "error rate" and "scale probation" mechanisms.
TODO: the # of consecutive valid results should also determine
a limit on jobs in progress for an app version.
Namely, if N is the threshold for host scaling, the limit should be
ndevices*(max(1, consecutive_valid - N))
The client currently doesn't supply enough
app version info to do this.
It could be approximated; that would give some protection
against cherry-picking.
- credit: more conservative formulas for combining claimed credit
among replicas.
If there are normal replicas, we use a "low average"
that weights each sample by the sum of the other samples.
Otherwise we use the min (not the average) of the approximate samples.
NOTE: a DB update is required
svn path=/trunk/boinc/; revision=21230
- daily quota mechanism
- reliable mechanism (accelerated retries)
- "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
host_scale_check set.
- validator: do scale probation on invalid results
(need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options
Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
a host will have separate quotas for each one.
That means it could error out on 100 jobs for cuda_fermi,
and when its quota goes to zero,
error out on 100 jobs for cuda23, etc.
This is intentional; there may be cases where one version
works but not the others.
- host.error_rate and host.max_results_day are deprecated
TODO:
- the values in the app table for limits on jobs in progress etc.
should override rather than config.xml.
Implementation notes:
scheduler:
process_request():
read all host_app_versions for host at start;
Compute "reliable" and "trusted" for each one.
write modified records at end
get_app_version():
add "reliable_only" arg; if set, use only reliable versions
skip over-quota versions
Multi-pass scheduling: if have at least one reliable version,
do a pass for jobs that need reliable,
and use only reliable versions.
Then clear best_app_versions cache.
Score-based scheduling: for need-reliable jobs,
it will pick the fastest version,
then give a score bonus if that version happens to be reliable.
When get back a successful result from client:
increase daily quota
When get back an error result from client:
impose scale probation
decrease daily quota if not aborted
Validator:
when handling a WU, create a vector of HOST_APP_VERSION
parallel to vector of RESULT.
Pass it to assign_credit_set().
Make copies of originals so we can update only modified ones
update HOST_APP_VERSION error rates
Transitioner:
decrease quota on timeout
svn path=/trunk/boinc/; revision=21181
TODO: remove related code
- validator: update wu.canonical_credit correctly.
However, this field should be deprecated.
- validator: check for error return from assign_credit_set().
svn path=/trunk/boinc/; revision=21096
are written to the DB.
The incremental approach was bogus.
New approach:
host_app_version: write directly; R/W interval is tiny
app_version: maintain an explicit list of update samples
for both PFC and credit.
When the validator flushes its app_version cache,
do careful updates.
Note: when using double fields in careful updates,
you can't test for equality. Use abs(new-old) < 1e-N
svn path=/trunk/boinc/; revision=21057
see http://boinc.berkeley.edu/trac/wiki/CreditNew
Projects will need to update DB and recompile all back-end programs.
Summary:
- new way of computing credit
- "reliable host" mechanism is per app version
- "host punishment" mechanism is per app version
- adjustment of wu.rsc_fpops_est provides the
equivalent of per app version DCF
- max jobs in progress is now per app
- max jobs per RPC is now per app
TODO:
- reliable mechanism:
- populate and use host_app_version.error_rate
- populate host_app_version.turnaround
- host punishment:
- populate host_app_version.max_jobs_per_day
- populate host_app_version.n_jobs_today
- use app.max_jobs_per_day_init
- job limits:
- use app.max_jobs_in_progress, max_gpu_jobs_in_progress
- use app.max_jobs_per_rpc
- adjust wu.rsc_fpops_est
- remove old credit stuff
fpops_cumulative, credit_multiplier
credit computation in scheduler
- AVERAGE class: use the Knuth algorithm (Wikipedia)
svn path=/trunk/boinc/; revision=21021
However, MySQL's default is that "affected rows" is
rows actually modified, which is not what we want.
Use the CLIENT_FOUND_ROWS option in mysql_real_connect()
to change the semantics to "rows matched".
From Oliver Bock.
svn path=/trunk/boinc/; revision=20880
New policy: anon platform and old platform jobs
get average credit, possibly scaled by elapsed time.
We no longer attempt to guess what app version produced them.
svn path=/trunk/boinc/; revision=20816
Triggering the work generator is now done via the DB
instead of flat files.
Since only E@h uses locality scheduling,
I kept the DB changes in a separate file (db/schema_locality.sql).
There's a new field in the workunit table,
and that's a required update (in db_update.php)
- manager: compile fix
svn path=/trunk/boinc/; revision=20807
elapsed_time: the elapsed time (runtime) as reported by client
flops_estimate: the app's estimated FLOPS as reported by app_plan()
app_version_id: the DB ID of the app_version used
(or -1 if anonymous platform)
TODO: show these in the web interfaces,
and use them where appropriate
svn path=/trunk/boinc/; revision=19002
Old:
1) check deadline based on wu.delay_bound
2) in add_result_to_reply(), potentially modify wu.delay_bound,
e.g. because of retry acceleration
problem: reducing delay bound may cause deadline miss
New:
1) new function get_delay_bound_range()
(called from wu_is_infeasible_fast())
returns optimistic and pessimistic delay bounds.
Retry acceleration logic is here.
2) check deadline based on optimistic bound;
if that fails, check based on pessimistic bound.
Set wu.delay_bound to the one that worked.
Notes:
- get_delay_bound_range() needs result priority and report deadline,
and it's called before we read the full result.
So add these items to WORK_ITEM and WU_RESULT.
- get_delay_bound_range() could be customized for
project-specific deadline policy.
- add_result_to_reply() was becoming a toxic waste dump.
Deadline-related stuff should have been factored out in any case.
svn path=/trunk/boinc/; revision=18946
don't modify user preferences or CPID.
- client: fix bug that shows ATI version incorrectly
- database: host.posts has been repurposed as a salt (or seqno)
for a new type of weak authenticator that won't depend on password
- web code:
modify forum_preferences.posts instead of host.posts.
(actually, the former isn't used either, we just do a select count(*);
should fix this at some point).
svn path=/trunk/boinc/; revision=18865
after freeing the MYSQL_RES it came from.
(this didn't appear to cause any problems, but not good form).
Fixes#883
svn path=/trunk/boinc/; revision=17904
and add <cuda_multiplier>.
The latter is used in calculating max jobs/day for a host;
namely, it's host.max_results_day * (NCPUS + NCUDA*cuda_multiplier).
Set it to 10 or so if you have CUDA apps.
- scheduler: don't overload effective_ncpus();
instead, add two new functions,
max_results_day_multiplier() and max_wus_in_progress_multiplier()
- scheduler: don't reduce max_results_day if we get an aborted job
(it might have been aborted by the project;
not appopriate to punish host in this case)
svn path=/trunk/boinc/; revision=16959
put a textual summary of them in host.serialnum (currently unused)
- web: show coprocs on host detail page
- db_dump: include coproc info in host XML
svn path=/trunk/boinc/; revision=16697
is still alive before handling a request. If not, try to reconnect.
This will hopefully make things work better if MySQL goes down and up
when using FCGI.
svn path=/trunk/boinc/; revision=16112
we're the main program (otherwise we didn't lock it in
the first place, and a crash results). From Artyom Sharov.
- scheduler: add support for the GCL simulator,
which uses special versions of backend programs
that use virtual time,
and that wait for signals instead of sleep()ing.
To compile:
make clean
configure CXXFLAGS="-DGCL_SIMULATOR"
make
svn path=/trunk/boinc/; revision=16038
- API: in APP_INIT_DATA, enclose project preferences in tags
so that it's legal XML
- scheduler: add <multiple_clients_per_host> option.
Use this if your project runs on Condor or grids
and (to use multicore machines) you're running
multiple clients per host.
This will skip the host lookup based on IP address.
svn path=/trunk/boinc/; revision=15954
added "count" field to DB table to keep track of how many times
we've refreshed.
- show refresh schedule on main courses page
- default for random structure is all units, not 1
svn path=/trunk/boinc/; revision=15846
- scheduler: fix bug in adaptive replication:
if send an unreplicated job to untrusted host,
set both wu.target_nresults and wu.min_quorum to app.target_nresults.
svn path=/trunk/boinc/; revision=15762
wish to use it.
- The script calculate_credit_multiplier (expected to be run daily as
a config.xml task) looks at the ratio of granted credit to CPU time
for recent results for each app. Multiplier is calculated to cause
median hosts granted credit per cpu second to equal to equal that
expected from its benchmarks. This is 30-day exponentially averaged
with the previous value of the multplier and stored in the table
credit_multplier.
- When a result is received the server adjusts claimed credit by the
value the multiplier had when the result was sent.
svn path=/trunk/boinc/; revision=15661
If an app is hard, the scheduler always does the deadline check,
even if the client has no other jobs for this project.
And the estimated wallclock duration is multiplied by 1.3,
to avoid sending jobs to hosts that will barely make the deadline.
Hard apps are marked by setting weight = -1.
This is a total kludge, to avoid adding another field to app.
svn path=/trunk/boinc/; revision=15607