The punitive mechanism was scanning for results with validate state INIT.
This is wrong because the scheduler immediately flags results
with client error as INVALID.
Fix: remove validate state check.
Also, don't update validate state; not needed any more.
Say that a job has a "long-term failure" if it fails in a way
(as evidenced by its exit code and/or stderr)
suggesting that other jobs for that (host, app version) will fail too.
In this case we want to avoid sending more jobs to that (host, app version).
This implements this feature.
To use it, have your validator's init_result() return
VAL_RESULT_LONG_TERM_FAIL if it finds a long-term failure,
and run your validator with the --check_punitive option.
("Punitive" because we're "punishing" the host for its failure).
The validator punishes the (host, app version) by
setting host_app_version.max_jobs_per_day to 1.
One job per day can still be sent.
That way if the underlying problem is fixed
(e.g. the user enables VM acceleration in the BIOS)
we'll eventually go back to normal.
Also: normally HAV.max_jobs_per_day is scaled by the numbers
of CPUs and GPUs.
Disable this scaling in the case where it's 1.
Add --post_assigned_credit option to validator.
If set, it gets claimed credit from result.claimed_credit
(put there by project's init_result() function).
The claimed credit of the canonical result is the job's granted credit.
Also changed --credit_from_runtime so that it averages
claimed credit across instances,
instead of just using the canonical instance.
You can now pre-assign a job's credit, as described here:
https://boinc.berkeley.edu/trac/wiki/CreditOptions
Note: this feature was originally available via an
--additional_xml "<credit>xx</credit>" arg to create_work.
This is an ugly kludge; I removed it.
In fact, the --additional_xml arg should be removed at some point.
Also: change stage_file to it cd's to html/bin when including stuff;
this is needed since util_basic.inc now includes something else
The validator handler can now pass unknown arguments to the project specific handler.
Projects that have there own validator need to implement the validate_handler_init() function and handle project specific arguments there. They also need to supply a validate_handler_usage() function that printf()'s a description of the custom options. For examples see sample_substr_validator.cpp or script_validator.cpp
The validator test harness was also adopted to use this new functions.
This brings the implementation of the validator framework on the same level as the assimilator framework where similar changes where made in 0038d275c and dd004404a.
The SETI@home result table is about to run out of 32-bit IDs,
so we need to move to 64-bit result IDs.
This will happen to the workunit table at some point too.
I changed the server C++ code to use the "long" type for all DB IDs
(and to use appropriate conversion codes like %lu).
"long" is 64 bit on 64-bit machines.
For uniformity I did this for all tables,
even ones (like app) that will never get big.
I chose NOT to change the DB schema for now.
The new code will work with 32-bit ID fields in the DB.
As projects approach the 32-bit limit on a table they can change
its ID field, and fields that reference this table, to BIGINT.
This is likely to happen only on the result and workunit tables.
I put functions in html/ops/db_update.php
to change the IDs of these tables.
See http://boinc.berkeley.edu/trac/wiki/PerAppCredit
If enabled (by the <credit_by_app> config flag)
validators will maintain on a per-(app, user, credit type) basis,
and same for teams,
in new DB tables credit_user and credit_team.
This info is displayed in the web site, on user and team pages,
using project-supplied functions to generate the HTML.
Note: update_stats doesn't decay the recent-average values
for per-app credit; I'll add this if needed.
This is meant not to break anything, just add some
(optional) logging and features needed for Einstein@Home.
Please contact me before changing or removing any of this.
Conflicts:
sched/db_dump.cpp
sched/file_deleter.cpp
sched/validator.cpp
- add --is_gzip option to sample_bitwise_validator.
If set, all files are treated as gzip archives.
Check their 10-byte header to verify that it's a gzip file,
but ignore it when comparing files.
- validator.cpp: don't error out on unparsed cmdline args,
since we're now using them in sample_bitwise_validator
and sample_substr_validator.
- fix build error on Debian
check_set() wasn't returning "retry" properly in the case where
one of the calls to init_result() return ERR_OPEN_DIR
(treated as a transient failure, since it can be caused by a failed NFS mount)
A "viable" result is one that could potentially become the canonical result,
i.e. the outcome is SUCCESS and the validate state is not INVALID.
The existing code treated all results with outcome SUCCESS as viable,
which is wrong.
In particular, this could cause workunit.target_nresults
to be incremented inappropriately.
- validator: add some sanity-checking for credit,
to prevent granting 1e38 credit.
max_granted_credit now defaults to the equivalent of 1 TeraFLOP-year.
Instances that exceed this are not counted in the credit
calculation, and a critical-mode log message is written
- wrapper: remove wall_cpu_time; not used anymore
svn path=/trunk/boinc/; revision=25825
but checks for the "stop_daemons" trigger file every 1 sec.
Use this instead of sleep() in daemons.
This will speed up bin/stop.
svn path=/trunk/boinc/; revision=25708
If you link your functions (init_result(), compare_results(),
cleanup_result()) with validate_test.cpp,
you'll get a program that you can run as
validate_test file1 file2
and it will compare the two files
(this works only for validators that expect 1 file per result).
I added a makefile, sched/makefile_validator_test,
that you can use for this.
- server: shuffle code so that the above doesn't need to
link MySQL libraries
- client: if we fetch a master file and it contains no scheduler URLs,
show a message of class INTERNAL_ERROR
- client/scheduler: make CUDA_DEVICE_PROP.totalGlobalMem a double,
and remove dtotalGlobalMem.
Although NVIDIA reports RAM size as a size_t,
there's no reason to store it as an integer after that.
svn path=/trunk/boinc/; revision=25542
depending on how many the host has,
and whether CPU VM extensions are present
(this reflects the requirements of CernVM).
svn path=/trunk/boinc/; revision=25009
use result.flops_estimate rather than host.p_fpops;
otherwise it doesn't work for multicore apps.
TODO: cheat-proofing
svn path=/trunk/boinc/; revision=25006
is a "runtime outlier", i.e. its runtime does
not correspond to the job's rsc_fpops_est.
Runtime outliers are not counted in the statistics for
elapsed time, turnaround time, and peak FLOPs count.
The is intended for applications like SETI@home,
some of whose jobs finish more or less instantly
(this happens if the data contains a lot of interference).
If a host happens to get a bunch of these short jobs,
its statistics will get skewed: in essence, the server
will think that the host is extremely fast,
and will send it too many jobs.
svn path=/trunk/boinc/; revision=24225
This assigns credit proportional to runtime*p_fpops.
To prevent cheating, p_fpops is capped at the 95th percentile value
among active hosts,
and runtime is capped at a specified limit.
This option supports apps, like LHC's CERNvm app,
that run for a certain amount of time and then exit.
The CreditNew system doesn't work for such apps.
- trickle_credit:
To prevent cheating,
cap p_fpops at the 95th percentile value among active hosts,
and require a limit on runtime.
- require that trickle handlers supply an initialization function
svn path=/trunk/boinc/; revision=24182