the first few jobs of a new application
(in wu_estimated_pfc(), only multiply by app.min_avg_pfc
if it's nonzero).
svn path=/trunk/boinc/; revision=25484
in plan_class_spec by using coproc_pref() and capped_host_fpops()
(moved coproc_perf() to sched_customize.h to make it available
in plan_class_spec.cpp, and cleaned up includes)
svn path=/trunk/boinc/; revision=25467
for which the anonymous-platform client doesn't have a version)
mark it as sent so the transitioner can do its thing
svn path=/trunk/boinc/; revision=25461
make per-HR slot allocation an option rather than the default.
Kevin reported that slot allocation wasn't working for WCG.
The default is now no slot allocation,
and use the regular result enumeration function
rather than the once that scans the entire table.
The config flag for enabling slot allocation is <hr_allocate_slots/>.
svn path=/trunk/boinc/; revision=25432
Previously (little known) the scheduler could be hacked to preserve
the sched_request.xml and sched_reply.xml in own directories
(you had to modify the initial value of use_files in sched_main.cpp).
This feature could now be switched on and off on the fly just by
changing the project config.
When there is an (existing) directory configured as
<debug_req_reply_dir>, each schduler instance will write three
files in there: PID_C_sched.log, PID_C_sched_request.xml and (if all
goes well) PID_C_sched_reply.xml. PID is the process id of this
scheduler instance, C is an internal counter within the process if
FCGI is used. The sched.log will contain nothing else than the
pid and the IP address of the client. This should allow for
identifying the scheduler instance responsible for a given
apache error log message ("premature end of script headers") when
a scheduler crashed. sched_request.xml (obviously) is the scheduler
request, and if the scheduler doesn't crash in between, there will
also be the reply to the client kept in sched_reply.xml
Remove the <debug_req_reply_dir> tag from the project config
to turn this feature off.
svn path=/trunk/boinc/; revision=25349
one instance together in the scheduler.log when multiple instances are
running. Currently the buffer has a fixed size of 32768 charaters.
On one hand with much debug output this buffer may turn out to be
too small. OTOH the log of this instance is completely lost in case
of a crash, which doesn't help with debugging. Thus make the
scheduler log buffer size configurable using the tag
<scheduler_log_buffer> in project config. The default value is
still the old size (32768), set it to 0 to disable buffering
completely, e.g. for debugging.
svn path=/trunk/boinc/; revision=25348
not scan the host table. This was previously hardcoded for
Einstein@home to prevent some users with many (identical) hosts
from flooding the DB with slow queries. Now add
<dont_search_host_for_userid>userid</dont_search_host_for_userid>
to the project config (in config.xml) for each such userid.
svn path=/trunk/boinc/; revision=25346
by default we skip app versions that use a resource
for which work has not been requested.
This is determined by the "check_req" arg to get_app_version().
This flag is cleared whenever we want to send a job
regardless of whether a requested resource can be used:
namely, when resending lost jobs, and when sending assigned jobs.
Fix a bug that could skip unrequested versions even
when check_req is false.
NOTES:
1) The current semantics aren't right.
When check_req is false, we selected the fastest of all app versions,
including those for which no work is requested.
Instead, we should select the fastest of the versions
for which work is requested if there are any;
otherwise, select the fastest version.
2) The mechanism isn't implemented for anonymous platform.
It should be.
3) If we've cached an answer (including NULL) for a given
value of check_req, that answer may be wrong for a different value.
svn path=/trunk/boinc/; revision=25342
file_deleter.cpp into a separate program,
since it blocks normal file deletion while it's running.
From Bernd.
- storage stuff
svn path=/trunk/boinc/; revision=25321
to be sent to non-targeted hosts.
The feeder was erroneously putting targeted jobs
in the shared mem cache.
Changes:
- The feeder only enumerates jobs for which
workunit.transitioner_flags is zero.
NOTE: this field is nonzero iff the job is assigned.
- create_work: when creating an assigned jobs,
set workunit.transitioner_flags appropriately
svn path=/trunk/boinc/; revision=25314
we multiple projected FLOPS by a normal random var
with mean 1 and stddev 0.1.
Make the stddev configurable; in particular it can be zero.
svn path=/trunk/boinc/; revision=25311
if we're making a scheduler RPC to a project for reasons
other than work fetch,
and we're deciding whether to ask for work, ignore hysteresis;
i.e. ask for work even if we're above the min buffer
(idea from John McLeod).
svn path=/trunk/boinc/; revision=25291
- scheduler: parse d_project_share
- scheduler: if vbox and vbox_mt are both available,
use vbox for a 1-CPU machine
svn path=/trunk/boinc/; revision=25176
This now supports two main use cases:
1) there's a job that you want to run once on all hosts,
present and future
(or all hosts belonging to a user, or to a team).
The job is never transitioned, validated, or assimilated.
2) There's a normal job for which you want to use only
hosts belonging to a specific user (e.g. cluster or cloud hosts).
This restriction can be made either when the job is created,
or on the fly,
e.g. as part of a scheme for accelerating batch completion.
For the latter purposes we now provide a function
restrict_wu_to_user(DB_WORKUNIT&, int userid);
The job goes through the standard
transitioner/validator/assimilator path.
These cases are enabled by config flags
<enable_assignment_multi/>
<enable_assignment/>
respectively.
Assignment of type 2) are no longer stored in shared mem,
so there is no limit on their number.
There is no longer a rule that assigned job names must contain "asgn".
NOTE: this requires a database update.
svn path=/trunk/boinc/; revision=25169
of the full 2 CPUs. Vboxwrapper uses ceil() to allocate enough
whole CPUs for Virtualbox. Ideally this will cause the BOINC
client-side scheduler to use the remaining fraction of the CPU
for GPU data transfer which will then free up one whole CPU for
another job. All without over-commiting anything.
sched/
sched_customize.cpp
svn path=/trunk/boinc/; revision=25120
Some credit cheats (e.g. with credit_by_runtime) can be done
by reporting a huge value.
Fix this by capping the value at 1.1 times the 95th percentile
of host.p_fpops, taken over active hosts.
svn path=/trunk/boinc/; revision=25017
depending on how many the host has,
and whether CPU VM extensions are present
(this reflects the requirements of CernVM).
svn path=/trunk/boinc/; revision=25009
use result.flops_estimate rather than host.p_fpops;
otherwise it doesn't work for multicore apps.
TODO: cheat-proofing
svn path=/trunk/boinc/; revision=25006
If found, set HOST_INFO::p_vm_extensions_disabled,
and pass this to the scheduler.
- scheduler (VBox app plan function) if a host has p_vm_extensions_disabled
set, don't sent it multicore VBox jobs.
Note: if you have a host with VM extensions, and they're disabled
in the BIOS, and you enable them, you can remove the
<p_vm_extensions_disabled> line from client_state.xml
and you'll be eligible to get multicore VM jobs again.
svn path=/trunk/boinc/; revision=24944
scale their PFC by 0.1 in credit calculations.
This reflects the fact that GPU apps are typically less efficient
(relative to device peak FLOPS) than are CPU apps.
The actual values from SETI@home and Milkyway are 0.05 and 0.08.
svn path=/trunk/boinc/; revision=24842
to a superceded or deprecated app version, use it anyway.
The current app version may not validate against the old one.
svn path=/trunk/boinc/; revision=24823
- client: msg tweak
- client: minimum work buffer lower bound is 180 sec
- scheduler: in computing HOST_USAGE::project_flops for a job,
if we don't have sufficient elapsed_time statistics
for either the (host, app_version) or the app_version,
use a conservative estimate (p_fpops*(#cpus+#ngpus))
rather than the number returned by app_plan().
This avoids "time limit exceeded" errors when the latter is way off.
svn path=/trunk/boinc/; revision=24820
reduce backoff intervals somewhat
- vboxwrapper: fix buffer size typo (from Attila)
- scheduler: fix crash if using homogeneous app version,
and a WU is committed to an old or deprecated app version.
From Kevin Reed.
svn path=/trunk/boinc/; revision=24775
Tells multicore apps how many cores to use.
The --nthreads command line arg to the app is now deprecated
though we'll keep it around for the time being.
svn path=/trunk/boinc/; revision=24708
If the file "client_opaque.txt" exists on the client,
include its contents in scheduler request messages.
On the scheduler, parse this into SCHEDULER_REQUEST::client_opaque,
where it can be used by the customizable scheduler functions.
svn path=/trunk/boinc/; revision=24586
this is to support SETI@home, which ran out of result IDs
and changed the DB field type to int unsigned.
Note: eventually I'll make this change official
and change the .h types as well.
- web: put <apps_selected> tags around <app_id> elements
in project-specific prefs.
svn path=/trunk/boinc/; revision=24555
the hr_class and app_version_id fields,
with the where clause that they be either zero or the target value.
This handles the cases where
1) because of the failure of a results, the transitioner set
the field back to zero;
2) another scheduler set the field to the target value
svn path=/trunk/boinc/; revision=24513
are assumed to be for NVIDIA GPU apps;
plan class names containing 'ati' are assumed to be for AMD GPU apps.
Clauses for 'nvidia' were missing in a couple of places.
svn path=/trunk/boinc/; revision=24512
In the inner loop of scan_work_array() there are two WORKUNITs:
- the one that's part of wu_result (in the shared-mem array)
- a temp copy.
quick_check() may modify this in host-specific ways
(e.g., adjusting rsc_fpops_est or delay_bound).
This is the one we pass to add_result_to_reply().
When we reread hr_class and app_version_id from the DB,
update both structs.
svn path=/trunk/boinc/; revision=24493
(reported by Kevin Reed).
The problem: cache inconsistency.
If there are 2 results for the same WU in shared mem,
and 2 scheduler instances get them around the same time,
they can send them with different app versions.
We already fixed this problem for HR by
1) rereading the relevant WU fields while deciding
whether to send the result
2) doing a "careful update" of the WU field using a where clause
to make sure it wasn't modified in the (short) interval
since rereading it.
I fixed the HAV problem in the same way,
and merged the two mechanisms to combine the DB queries.
Also:
- The rereads are done in slow_check() (see below).
- The careful updates are done in update_wu_on_send(),
and this is called *before* doing careful updates on result fields.
That way, if the WU updates fail, we don't have orphaned results.
- already_sent_to_different_platform_careful() (sic)
no longer does DB stuff, so it's merged with
already_send_to_different_hr_class() (better name)
NOTE: slow_check() is used in array scheduling only.
Score-based scheduling uses other code,
in which this bug is not yet fixed.
Locality scheduling doesn't support HR or HAV at all.
This should be unified.
svn path=/trunk/boinc/; revision=24484
(in sched_customize.cpp)
the flops_scale argument is intended to express the
GPU efficiency (actual/peak).
Pass appropriate values.
svn path=/trunk/boinc/; revision=24405
This will show pending uploads in the Transfers tab.
- file_upload_handler: fix message to client when can't acquire lock
- client: parse <alt_platform> in state file correctly
svn path=/trunk/boinc/; revision=24391
The problem: the choice of app version was based on
the "projected FLOPS" return by estimate_flops(av).
If usage stats exist for the host / app version,
this returns a number X such that
WU.rsc_fpops_est/X approximates the runtime of a job
using the given app version..
(If WU.rsc_fpops_est is way off, this will be correspondingly way off
from the actual FLOPS the app version will get.)
However, if there are no usage stats,
it return an estimate based on host hardware speed,
which might be 100X less.
Hence, in some cases a new app version would never get used.
Solution: choose app versions based on the values
returned by the app plan functions.
Use estimate_flops() AFTER choosing the version.
- scheduler: improve the accuracy of FLOPS estimation for GPU apps.
The "flops_scale" argument to coproc_perf
(which expresses the difference between peak GPU FLOPS
and actual FLOPS) should be used to scale GPU FLOPS
prior to calling coproc_perf(),
rather than scaling the estimate returned by coproc_perf().
- show_shmem: show have_X_apps flags
svn path=/trunk/boinc/; revision=24385
for when the job completed successfully but
one or more output files had permanent upload failures.
Show this state in web interfaces.
- sample_work_generator: check return value of count_unsent_results(),
so that we don't generate infinite work if there's a DB problem
- web: RSS feed shows news items from last 90 days, rather than 14
svn path=/trunk/boinc/; revision=24377
the boundary between days is 00:00 in server local time.
This creates a spike of jobs being dispatched
(and files being downloaded) after that time.
Solution: distribute the boundary uniformly,
using a random number determined by the host ID.
(Make sure to save/restore the seed around this,
so we don't destroy the randomness of other things)
svn path=/trunk/boinc/; revision=24353
to match those in the clGetDeviceInfo() calls.
Principles:
- if there's already a name for something, use it.
- follow case conventions
svn path=/trunk/boinc/; revision=24344
don't use name as a tiebreaker.
The will typically group jobs of the same application,
and (it is believed that) things run faster when
applications are mixed.
- scheduler: bug: if a client gets host-specific prefs
(e.g. from an account manager)
it will send only the working prefs to the scheduler.
The scheduler then always sends back the DB prefs,
overwriting the host-specific prefs.
Fix: note the mod time in the working prefs,
and only send the DB prefs if they're more recent.
svn path=/trunk/boinc/; revision=24332
- Fix build problems on Mac OS X using autotools
- Consistently use #if HAVE_X for platform checks,
rather than #ifdef HAVE_X or #if defined(HAVE_X)
- In Unix build, make lots of compiler checks standard
- Fix some compile warnings
From Matt Arsenault.
Note: there are now lots of compile warnings in clientgui/ on Unix,
mostly in WxWidgets code
svn path=/trunk/boinc/; revision=24303
is a "runtime outlier", i.e. its runtime does
not correspond to the job's rsc_fpops_est.
Runtime outliers are not counted in the statistics for
elapsed time, turnaround time, and peak FLOPs count.
The is intended for applications like SETI@home,
some of whose jobs finish more or less instantly
(this happens if the data contains a lot of interference).
If a host happens to get a bunch of these short jobs,
its statistics will get skewed: in essence, the server
will think that the host is extremely fast,
and will send it too many jobs.
svn path=/trunk/boinc/; revision=24225
- measure the available RAM of each GPU when BOINC starts up.
If this fails, set available = physical.
Show available RAM in startup messages.
- use available RAM rather than physical RAM in selecting
the "best" GPU instance
- report available RAM to the scheduler
TODO: change the scheduler to use available rather than physical
if it's reported
svn path=/trunk/boinc/; revision=24210
This assigns credit proportional to runtime*p_fpops.
To prevent cheating, p_fpops is capped at the 95th percentile value
among active hosts,
and runtime is capped at a specified limit.
This option supports apps, like LHC's CERNvm app,
that run for a certain amount of time and then exit.
The CreditNew system doesn't work for such apps.
- trickle_credit:
To prevent cheating,
cap p_fpops at the 95th percentile value among active hosts,
and require a limit on runtime.
- require that trickle handlers supply an initialization function
svn path=/trunk/boinc/; revision=24182
types for which we have no app versions
- client: if too many <coproc> elements in cc_config.xml,
detect it and inform user
svn path=/trunk/boinc/; revision=24144
- scheduler: when using elapsed time stats to predict runtime,
cap the estimated FLOPS at twice the peak FLOPS;
otherwise, if a host has received a lot of very short jobs
recently, it will get a too-high FLOPS estimate and
will exceed the rsc_fpops_bound limit.
svn path=/trunk/boinc/; revision=24128
app version for their platform for a particular app.
The may be versions for other apps which don't have jobs right now.
TODO: send a message if there are no versions of ANY app
for any platform.
- fix makefile indentation, caused manager to not be built
svn path=/trunk/boinc/; revision=24052
reporting incremental runtime exery x seconds of runtime.
- client: more XML parsing cleanup
- credit trickle handler: do sanity checks on CPU speed
svn path=/trunk/boinc/; revision=24017
Add parsed_tag and is_tag to the class,
so that parsing functions don't need to declare them
and pass them around.
- Complete the task of using XML_PARSER as the argument
to all parsing functions.
(Internally, many of these functions still use the old XML parser;
that's the next step.)
svn path=/trunk/boinc/; revision=23978
was doing memset(this, 0, sizeof(RESULT)),
i.e. it wasn't zeroing out the whole structure.
The elapsed_time field (which isn't reported by old clients),
is near the end of the struct,
and it was getting garbage, e.g. 1e-304, in some cases,
which led to zero credit (and maybe other problems)
- validator: treat 1e-304 like zero in case of other problems
like the above.
- remote job submission: tweaks
svn path=/trunk/boinc/; revision=23947
plan classes after all.
Otherwise (since app_plan() is not passed an app version)
there's no way to enforce that 64 bit hosts are sent
only the 64 bit version (which is necessary because
of the split-registry scheme).
svn path=/trunk/boinc/; revision=23935
which prevented the client from cleaning up
subprocesses of misbehaving multiprocess apps.
- remote job submission system:
assign physical names to input files (based on their MD5)
rather than having the user provide physical names
- VM apps: eliminate vbox64 plan class. Only vbox.
svn path=/trunk/boinc/; revision=23923
as described here: http://boinc.berkeley.edu/trac/wiki/ClientDataModel
Compatibility: if your project is using upload signatures:
- set ignore_upload_certificates
- disable job creation
- let your job queue drain
- upgrade to new server software
- clear ignore_upload_certificates
- enable job creation
svn path=/trunk/boinc/; revision=23863
- don't create result records for uploads and downloads.
Just create a msg_to_client record.
- the scheduler handles file-transfer results specially;
it makes a vector of them, then calls a project-supplied function
handle_file_xfer_results()
- change the interface and implementation of put_file and get_file
- client write project sched priority in GUI RPC replies,
but not to the state file
svn path=/trunk/boinc/; revision=23857
adjust project REC by the amount of work queued, to increase variety
NOTE: at some point I think I had a reason to not do this,
but I can't remember what it is.
- client, job scheduling policy: fix how project REC is adjusted
svn path=/trunk/boinc/; revision=23838
PFC values should be around 1.
If they differ from 1 by a factor of > 1e4, ignore them,
and put an error message into the validator log
- validator: if get_pfc() fails because an app version is
missing from the DB (i.e. the project deleted it)
keep going so we don't reprocess the WU forever
svn path=/trunk/boinc/; revision=23837
of the simulation, not the scenario.
If you want to run a simulation w/ different log flags,
you shouldn't have to create a new scenario.
- client emulator: add --config_prefix cmdline arg
- validator: prevent infinite loop when app_version.pfc_avg
is wonky (like 1e-300).
Next step: figure out how it got that way.
svn path=/trunk/boinc/; revision=23828
Lets you specify, on a per-app basis,
that all instances should be done using the same app version.
This is for validation in the presence of GPUs.
- scheduler: code cleanup
- Instead of adding a bunch of non-DB fields to RESULT,
used a derived class SCHED_DB_RESULT.
- Instead of storing a pointer to BEST_APP_VERSION in RESULT,
store the structure itself.
This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
on <file_xfer_debug>
svn path=/trunk/boinc/; revision=23636
in the case where we don't have enough elapsed-time stats
for the host/app_version.
The right formula is (peak FLOPS)/app_version.avg_pfc
svn path=/trunk/boinc/; revision=23634
If set, and a WU has nonzero batch,
it is interpreted as a user ID,
and the job will be sent only to hosts with that user ID.
Note: the use of workunit.batch is arbitrary;
we could also use workunit.opaque or other deprecated field.
svn path=/trunk/boinc/; revision=23556