For projects (like Lattice) that assign a WU's HR class when it's created,
we don't want the mechanism that clears the HR class
if there are error results and no in-progress of completed results.
This option suppresses this.
See http://boinc.berkeley.edu/trac/wiki/PerAppCredit
If enabled (by the <credit_by_app> config flag)
validators will maintain on a per-(app, user, credit type) basis,
and same for teams,
in new DB tables credit_user and credit_team.
This info is displayed in the web site, on user and team pages,
using project-supplied functions to generate the HTML.
Note: update_stats doesn't decay the recent-average values
for per-app credit; I'll add this if needed.
There are now 3 flags for job dispatch logging:
<debug_send/>: info about work request, jobs sent, other high-level stuff
<debug_send_scan/>: info about scans through job cache
<debug_send_job/>: info about individual jobs (e.g. reason for not sending)
Previously, if a project specified a limit on GPU jobs in progress,
it would be enforced across GPU types.
This could lead to starvation for hosts with multiple GPU types.
E.g. the limit is 10, and a host has 10 NVIDIA jobs and no AMD jobs.
Fix this by enforcing limits separately for each GPU type.
nvidia plan classes in plan_class_spec.xml
- SCHED: Scheduler was not using properly estimated performance when assigning
work. It was using theoretical performance to choose version and actual
preformance to determine how long it would take. I've changed that to start
with theoretical performance and converge to actual performance as
host_app_version pfc_n increases.
- SCHED: Added some additional app version selection debugging output.
allows projected_flops to be calculated from host_app_version pfc rather
than elapsed time. This is valuable if result elapsed times are highly
variable and dependent on input.
- don't use devices for which work is not being requested
- obey wu_is_infeasible_custom()
(e.g. don't send SETI@home VLAR jobs to GPUs)
- scheduler: add <debug_array_detail> log flag for slot-level messages
- admin web: show and allow control of app.beta
- add a config item vda_host_timeout.
A host that hasn't done a scheduler RPC for this long
is considered dead.
- a host that's not running a version 7+ client is considered dead
- host.cpu_efficiency (an otherwise unused field) is used
as a flag for dead hosts
- the scheduler clears the flag if the client is v7+
- vdad sets the flag for hosts where last RPC is old
- before choosing a host for chunk download,
vdad checks its client version.
svn path=/trunk/boinc/; revision=26059
make per-HR slot allocation an option rather than the default.
Kevin reported that slot allocation wasn't working for WCG.
The default is now no slot allocation,
and use the regular result enumeration function
rather than the once that scans the entire table.
The config flag for enabling slot allocation is <hr_allocate_slots/>.
svn path=/trunk/boinc/; revision=25432
Previously (little known) the scheduler could be hacked to preserve
the sched_request.xml and sched_reply.xml in own directories
(you had to modify the initial value of use_files in sched_main.cpp).
This feature could now be switched on and off on the fly just by
changing the project config.
When there is an (existing) directory configured as
<debug_req_reply_dir>, each schduler instance will write three
files in there: PID_C_sched.log, PID_C_sched_request.xml and (if all
goes well) PID_C_sched_reply.xml. PID is the process id of this
scheduler instance, C is an internal counter within the process if
FCGI is used. The sched.log will contain nothing else than the
pid and the IP address of the client. This should allow for
identifying the scheduler instance responsible for a given
apache error log message ("premature end of script headers") when
a scheduler crashed. sched_request.xml (obviously) is the scheduler
request, and if the scheduler doesn't crash in between, there will
also be the reply to the client kept in sched_reply.xml
Remove the <debug_req_reply_dir> tag from the project config
to turn this feature off.
svn path=/trunk/boinc/; revision=25349
one instance together in the scheduler.log when multiple instances are
running. Currently the buffer has a fixed size of 32768 charaters.
On one hand with much debug output this buffer may turn out to be
too small. OTOH the log of this instance is completely lost in case
of a crash, which doesn't help with debugging. Thus make the
scheduler log buffer size configurable using the tag
<scheduler_log_buffer> in project config. The default value is
still the old size (32768), set it to 0 to disable buffering
completely, e.g. for debugging.
svn path=/trunk/boinc/; revision=25348
not scan the host table. This was previously hardcoded for
Einstein@home to prevent some users with many (identical) hosts
from flooding the DB with slow queries. Now add
<dont_search_host_for_userid>userid</dont_search_host_for_userid>
to the project config (in config.xml) for each such userid.
svn path=/trunk/boinc/; revision=25346
we multiple projected FLOPS by a normal random var
with mean 1 and stddev 0.1.
Make the stddev configurable; in particular it can be zero.
svn path=/trunk/boinc/; revision=25311
This now supports two main use cases:
1) there's a job that you want to run once on all hosts,
present and future
(or all hosts belonging to a user, or to a team).
The job is never transitioned, validated, or assimilated.
2) There's a normal job for which you want to use only
hosts belonging to a specific user (e.g. cluster or cloud hosts).
This restriction can be made either when the job is created,
or on the fly,
e.g. as part of a scheme for accelerating batch completion.
For the latter purposes we now provide a function
restrict_wu_to_user(DB_WORKUNIT&, int userid);
The job goes through the standard
transitioner/validator/assimilator path.
These cases are enabled by config flags
<enable_assignment_multi/>
<enable_assignment/>
respectively.
Assignment of type 2) are no longer stored in shared mem,
so there is no limit on their number.
There is no longer a rule that assigned job names must contain "asgn".
NOTE: this requires a database update.
svn path=/trunk/boinc/; revision=25169
Add parsed_tag and is_tag to the class,
so that parsing functions don't need to declare them
and pass them around.
- Complete the task of using XML_PARSER as the argument
to all parsing functions.
(Internally, many of these functions still use the old XML parser;
that's the next step.)
svn path=/trunk/boinc/; revision=23978
If set, and a WU has nonzero batch,
it is interpreted as a user ID,
and the job will be sent only to hosts with that user ID.
Note: the use of workunit.batch is arbitrary;
we could also use workunit.opaque or other deprecated field.
svn path=/trunk/boinc/; revision=23556
for some WUs
- back end: fix the way "report grace period" is implemented
old: result.report_deadline (i.e. what's in the DB) and
the deadline sent to the client are the same.
Some confusing and incorrect logic in the transitioner
tries to provide the desired semantics.
new: result.report_deadline is the deadline sent to the client,
plus the grace period.
No logic in the transitioner is needed.
svn path=/trunk/boinc/; revision=23040
(in config.xml) to include DB name, user, and password.
- back end: add read-only replica info to SCHED_CONFIG,
so that C++ programs can use the replica
(currently only PHP code can use it)
- db_dump: use the read-only DB replica if it exists.
svn path=/trunk/boinc/; revision=22958
If set, the feeder doesn't read jobs into shmem,
and the scheduler doesn't send jobs.
Intended for use when a project wants to process
a backlog of completed jobs and not issue more.
svn path=/trunk/boinc/; revision=22601
- scheduler: add max_download_urls_per_file config option
(to limit the length of workunit.xml_doc,
which is currently capped at 64KB).
From Bernd.
svn path=/trunk/boinc/; revision=22082
This feature lets you run the BOINC client as a job on grid systems
that handle only 1-CPU jobs;
it disables various mechanisms that prevent multiple clients per host
(which is normally a bad thing).
Old:
- Run the client with a --allow_multiple_clients flag.
This tells it not to use a mutex that prevents
multiple clients per host.
- Run the project with the <multiple_clients_per_host> config flag.
This suppresses two mechanisms:
- (avoid duplicate host records)
on a scheduler request with no host ID,
looks for a host with same domain name, OS type,
and mem size, and assumes the request is from that host
- (job retry)
If we get a request that doesn't have a host ID
but does have a host CPID,
mark its in-progress results as over
NOTE: I CAN'T REMEMBER WHY WE SUPPRESS THIS;
MARK S, DO YOU REMEMBER?
Problem:
if the grid clients attach to a project that
doesn't use <multiple_clients_per_host>, bad things happen.
E.g., if there are several requests at about the same time,
most of them will fail with
"another RPC already in progress" errors.
If a project does include this flag,
it loses protection from duplicate host records.
New:
- If the client is run with --allow_multiple_clients flag,
it passes a <allow_multiple_clients> element
in scheduler requests.
- The scheduler skips the duplicate-host check on
requests that include this flag.
- There is no more <multiple_clients_per_host> scheduler option.
Note: if a project using the old mechanism upgrades to this change,
it will need to use new clients for its grid deployment.
svn path=/trunk/boinc/; revision=21839
That produced a messed-up query that assigned garbage values to:
host_app_version.turnaround_var
host_app_version.turnaround_q
host_app_version.max_jobs_per_day
host_app_version.consecutive_valid
To repair these:
- set turnaround_var and turnaround_q to zero
- if max_jobs_per_day is outside of
(0..config.daily_result_quota)
set it to config.daily_result_quota
- if consecutive_valid is outside (0..1000), set it to zero
I added a script, html/ops/repair_21812.php, that does this;
if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
(if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings
svn path=/trunk/boinc/; revision=21812
This file originally used code from the following tutorial,
which shows how to open a window using GLUT:
http://nehe.gamedev.net/data/lessons/lesson.asp?lesson=01
The code has now been completely rewritten;
in particular, it doesn't use GLUT anymore.
- scheduler: change default limit on #CPUs from 16 to 64
svn path=/trunk/boinc/; revision=21784
Old: back off until random time in 1st hour of next day
New: no server-dictated backoff; rely on client backoff
This is needed to let hosts recover in a reasonable amount of time
after a burst of errors.
- scheduler config: it turns out we can't put arbitrary XML in config.xml;
The Python code is set up to parse only 1 level of tags (??),
and I'm not up to the task of changing this.
So the fine-grained job limit feature [21674] needs to use
a different file, namely config_aux.xml
svn path=/trunk/boinc/; revision=21686
You can now specify limits for specific apps,
and/or for the project as a whole.
Within each of these, you can specify limits on
CPU jobs, GPU jobs, or total jobs.
In the case of CPU and GPU limits, you can specify
whether the limit should be scaled by the number of devices.
Note: the enforcement of this is done in get_app_version(),
since per-resource-type limits may dictate what app versions
we can use for a particular job.
svn path=/trunk/boinc/; revision=21674
see http://boinc.berkeley.edu/trac/wiki/CreditNew
Projects will need to update DB and recompile all back-end programs.
Summary:
- new way of computing credit
- "reliable host" mechanism is per app version
- "host punishment" mechanism is per app version
- adjustment of wu.rsc_fpops_est provides the
equivalent of per app version DCF
- max jobs in progress is now per app
- max jobs per RPC is now per app
TODO:
- reliable mechanism:
- populate and use host_app_version.error_rate
- populate host_app_version.turnaround
- host punishment:
- populate host_app_version.max_jobs_per_day
- populate host_app_version.n_jobs_today
- use app.max_jobs_per_day_init
- job limits:
- use app.max_jobs_in_progress, max_gpu_jobs_in_progress
- use app.max_jobs_per_rpc
- adjust wu.rsc_fpops_est
- remove old credit stuff
fpops_cumulative, credit_multiplier
credit computation in scheduler
- AVERAGE class: use the Knuth algorithm (Wikipedia)
svn path=/trunk/boinc/; revision=21021