- scheduler: parse d_project_share
- scheduler: if vbox and vbox_mt are both available,
use vbox for a 1-CPU machine
svn path=/trunk/boinc/; revision=25176
This now supports two main use cases:
1) there's a job that you want to run once on all hosts,
present and future
(or all hosts belonging to a user, or to a team).
The job is never transitioned, validated, or assimilated.
2) There's a normal job for which you want to use only
hosts belonging to a specific user (e.g. cluster or cloud hosts).
This restriction can be made either when the job is created,
or on the fly,
e.g. as part of a scheme for accelerating batch completion.
For the latter purposes we now provide a function
restrict_wu_to_user(DB_WORKUNIT&, int userid);
The job goes through the standard
transitioner/validator/assimilator path.
These cases are enabled by config flags
<enable_assignment_multi/>
<enable_assignment/>
respectively.
Assignment of type 2) are no longer stored in shared mem,
so there is no limit on their number.
There is no longer a rule that assigned job names must contain "asgn".
NOTE: this requires a database update.
svn path=/trunk/boinc/; revision=25169
of the full 2 CPUs. Vboxwrapper uses ceil() to allocate enough
whole CPUs for Virtualbox. Ideally this will cause the BOINC
client-side scheduler to use the remaining fraction of the CPU
for GPU data transfer which will then free up one whole CPU for
another job. All without over-commiting anything.
sched/
sched_customize.cpp
svn path=/trunk/boinc/; revision=25120
Some credit cheats (e.g. with credit_by_runtime) can be done
by reporting a huge value.
Fix this by capping the value at 1.1 times the 95th percentile
of host.p_fpops, taken over active hosts.
svn path=/trunk/boinc/; revision=25017
depending on how many the host has,
and whether CPU VM extensions are present
(this reflects the requirements of CernVM).
svn path=/trunk/boinc/; revision=25009
use result.flops_estimate rather than host.p_fpops;
otherwise it doesn't work for multicore apps.
TODO: cheat-proofing
svn path=/trunk/boinc/; revision=25006
If found, set HOST_INFO::p_vm_extensions_disabled,
and pass this to the scheduler.
- scheduler (VBox app plan function) if a host has p_vm_extensions_disabled
set, don't sent it multicore VBox jobs.
Note: if you have a host with VM extensions, and they're disabled
in the BIOS, and you enable them, you can remove the
<p_vm_extensions_disabled> line from client_state.xml
and you'll be eligible to get multicore VM jobs again.
svn path=/trunk/boinc/; revision=24944
scale their PFC by 0.1 in credit calculations.
This reflects the fact that GPU apps are typically less efficient
(relative to device peak FLOPS) than are CPU apps.
The actual values from SETI@home and Milkyway are 0.05 and 0.08.
svn path=/trunk/boinc/; revision=24842
to a superceded or deprecated app version, use it anyway.
The current app version may not validate against the old one.
svn path=/trunk/boinc/; revision=24823
- client: msg tweak
- client: minimum work buffer lower bound is 180 sec
- scheduler: in computing HOST_USAGE::project_flops for a job,
if we don't have sufficient elapsed_time statistics
for either the (host, app_version) or the app_version,
use a conservative estimate (p_fpops*(#cpus+#ngpus))
rather than the number returned by app_plan().
This avoids "time limit exceeded" errors when the latter is way off.
svn path=/trunk/boinc/; revision=24820
reduce backoff intervals somewhat
- vboxwrapper: fix buffer size typo (from Attila)
- scheduler: fix crash if using homogeneous app version,
and a WU is committed to an old or deprecated app version.
From Kevin Reed.
svn path=/trunk/boinc/; revision=24775
Tells multicore apps how many cores to use.
The --nthreads command line arg to the app is now deprecated
though we'll keep it around for the time being.
svn path=/trunk/boinc/; revision=24708
If the file "client_opaque.txt" exists on the client,
include its contents in scheduler request messages.
On the scheduler, parse this into SCHEDULER_REQUEST::client_opaque,
where it can be used by the customizable scheduler functions.
svn path=/trunk/boinc/; revision=24586
this is to support SETI@home, which ran out of result IDs
and changed the DB field type to int unsigned.
Note: eventually I'll make this change official
and change the .h types as well.
- web: put <apps_selected> tags around <app_id> elements
in project-specific prefs.
svn path=/trunk/boinc/; revision=24555
the hr_class and app_version_id fields,
with the where clause that they be either zero or the target value.
This handles the cases where
1) because of the failure of a results, the transitioner set
the field back to zero;
2) another scheduler set the field to the target value
svn path=/trunk/boinc/; revision=24513
are assumed to be for NVIDIA GPU apps;
plan class names containing 'ati' are assumed to be for AMD GPU apps.
Clauses for 'nvidia' were missing in a couple of places.
svn path=/trunk/boinc/; revision=24512
In the inner loop of scan_work_array() there are two WORKUNITs:
- the one that's part of wu_result (in the shared-mem array)
- a temp copy.
quick_check() may modify this in host-specific ways
(e.g., adjusting rsc_fpops_est or delay_bound).
This is the one we pass to add_result_to_reply().
When we reread hr_class and app_version_id from the DB,
update both structs.
svn path=/trunk/boinc/; revision=24493
(reported by Kevin Reed).
The problem: cache inconsistency.
If there are 2 results for the same WU in shared mem,
and 2 scheduler instances get them around the same time,
they can send them with different app versions.
We already fixed this problem for HR by
1) rereading the relevant WU fields while deciding
whether to send the result
2) doing a "careful update" of the WU field using a where clause
to make sure it wasn't modified in the (short) interval
since rereading it.
I fixed the HAV problem in the same way,
and merged the two mechanisms to combine the DB queries.
Also:
- The rereads are done in slow_check() (see below).
- The careful updates are done in update_wu_on_send(),
and this is called *before* doing careful updates on result fields.
That way, if the WU updates fail, we don't have orphaned results.
- already_sent_to_different_platform_careful() (sic)
no longer does DB stuff, so it's merged with
already_send_to_different_hr_class() (better name)
NOTE: slow_check() is used in array scheduling only.
Score-based scheduling uses other code,
in which this bug is not yet fixed.
Locality scheduling doesn't support HR or HAV at all.
This should be unified.
svn path=/trunk/boinc/; revision=24484
(in sched_customize.cpp)
the flops_scale argument is intended to express the
GPU efficiency (actual/peak).
Pass appropriate values.
svn path=/trunk/boinc/; revision=24405
This will show pending uploads in the Transfers tab.
- file_upload_handler: fix message to client when can't acquire lock
- client: parse <alt_platform> in state file correctly
svn path=/trunk/boinc/; revision=24391
The problem: the choice of app version was based on
the "projected FLOPS" return by estimate_flops(av).
If usage stats exist for the host / app version,
this returns a number X such that
WU.rsc_fpops_est/X approximates the runtime of a job
using the given app version..
(If WU.rsc_fpops_est is way off, this will be correspondingly way off
from the actual FLOPS the app version will get.)
However, if there are no usage stats,
it return an estimate based on host hardware speed,
which might be 100X less.
Hence, in some cases a new app version would never get used.
Solution: choose app versions based on the values
returned by the app plan functions.
Use estimate_flops() AFTER choosing the version.
- scheduler: improve the accuracy of FLOPS estimation for GPU apps.
The "flops_scale" argument to coproc_perf
(which expresses the difference between peak GPU FLOPS
and actual FLOPS) should be used to scale GPU FLOPS
prior to calling coproc_perf(),
rather than scaling the estimate returned by coproc_perf().
- show_shmem: show have_X_apps flags
svn path=/trunk/boinc/; revision=24385
for when the job completed successfully but
one or more output files had permanent upload failures.
Show this state in web interfaces.
- sample_work_generator: check return value of count_unsent_results(),
so that we don't generate infinite work if there's a DB problem
- web: RSS feed shows news items from last 90 days, rather than 14
svn path=/trunk/boinc/; revision=24377
the boundary between days is 00:00 in server local time.
This creates a spike of jobs being dispatched
(and files being downloaded) after that time.
Solution: distribute the boundary uniformly,
using a random number determined by the host ID.
(Make sure to save/restore the seed around this,
so we don't destroy the randomness of other things)
svn path=/trunk/boinc/; revision=24353