That produced a messed-up query that assigned garbage values to:
host_app_version.turnaround_var
host_app_version.turnaround_q
host_app_version.max_jobs_per_day
host_app_version.consecutive_valid
To repair these:
- set turnaround_var and turnaround_q to zero
- if max_jobs_per_day is outside of
(0..config.daily_result_quota)
set it to config.daily_result_quota
- if consecutive_valid is outside (0..1000), set it to zero
I added a script, html/ops/repair_21812.php, that does this;
if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
(if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings
svn path=/trunk/boinc/; revision=21812
Old: various redundant and/or misleading messages were sent.
New:
- if host w/ no GPU contacts a GPU-only project,
send high-pri message saying they need a GPU
- if host w/ GPU has driver too old for all versions,
send high-pri message saying to update driver
- if host w/ GPU has driver too old for some versions,
send low-pri message saying to update driver
- if host has GPU but too little RAM for any app,
send low-pri message saying so
- scheduler: revamp GPU plan class functions
svn path=/trunk/boinc/; revision=21760
pointers to dynamically allocated COPROC-derived objects,
just have the objects themselves.
Dynamic allocation should be avoided at all costs.
svn path=/trunk/boinc/; revision=21564
- daily quota mechanism
- reliable mechanism (accelerated retries)
- "trusted" mechanism (adaptive replication)
- scheduler: enforce host scale probation only for apps with
host_scale_check set.
- validator: do scale probation on invalid results
(need this in addition to error and timeout cases)
- feeder: update app version scales every 10 min, not 10 sec
- back-end apps: support --foo as well as -foo for options
Notes:
- If you have, say, cuda, cuda23 and cuda_fermi plan classes,
a host will have separate quotas for each one.
That means it could error out on 100 jobs for cuda_fermi,
and when its quota goes to zero,
error out on 100 jobs for cuda23, etc.
This is intentional; there may be cases where one version
works but not the others.
- host.error_rate and host.max_results_day are deprecated
TODO:
- the values in the app table for limits on jobs in progress etc.
should override rather than config.xml.
Implementation notes:
scheduler:
process_request():
read all host_app_versions for host at start;
Compute "reliable" and "trusted" for each one.
write modified records at end
get_app_version():
add "reliable_only" arg; if set, use only reliable versions
skip over-quota versions
Multi-pass scheduling: if have at least one reliable version,
do a pass for jobs that need reliable,
and use only reliable versions.
Then clear best_app_versions cache.
Score-based scheduling: for need-reliable jobs,
it will pick the fastest version,
then give a score bonus if that version happens to be reliable.
When get back a successful result from client:
increase daily quota
When get back an error result from client:
impose scale probation
decrease daily quota if not aborted
Validator:
when handling a WU, create a vector of HOST_APP_VERSION
parallel to vector of RESULT.
Pass it to assign_credit_set().
Make copies of originals so we can update only modified ones
update HOST_APP_VERSION error rates
Transitioner:
decrease quota on timeout
svn path=/trunk/boinc/; revision=21181
see http://boinc.berkeley.edu/trac/wiki/CreditNew
Projects will need to update DB and recompile all back-end programs.
Summary:
- new way of computing credit
- "reliable host" mechanism is per app version
- "host punishment" mechanism is per app version
- adjustment of wu.rsc_fpops_est provides the
equivalent of per app version DCF
- max jobs in progress is now per app
- max jobs per RPC is now per app
TODO:
- reliable mechanism:
- populate and use host_app_version.error_rate
- populate host_app_version.turnaround
- host punishment:
- populate host_app_version.max_jobs_per_day
- populate host_app_version.n_jobs_today
- use app.max_jobs_per_day_init
- job limits:
- use app.max_jobs_in_progress, max_gpu_jobs_in_progress
- use app.max_jobs_per_rpc
- adjust wu.rsc_fpops_est
- remove old credit stuff
fpops_cumulative, credit_multiplier
credit computation in scheduler
- AVERAGE class: use the Knuth algorithm (Wikipedia)
svn path=/trunk/boinc/; revision=21021
- scheduler: when calculate scheduler runtime,
don't include the part reading request msg from client.
That can be misleadingly long
svn path=/trunk/boinc/; revision=20781
10000*major + 100*minor + release,
rather than 100*major + minor.
Sometimes you need release-level resolution.
This affects:
- app_version.min_core_version
- config: min_core_client_version_announced
- config: min_core_client_version
Projects using these must multiply them by 100.
svn path=/trunk/boinc/; revision=20149
if project has crazy DCF, don't automatically request 1 sec;
only request work if there's a shortfall.
- intermediate checkin for notices stuff
svn path=/trunk/boinc/; revision=20145
will have enough jobs to use its share of resource instances.
This avoids situations where e.g. on a 2-CPU system
a project has 75% resource share and 1 CPU job,
and its STD increases without bound.
Did a general cleanup of the logic for computing
work request sizes (seconds and instances).
svn path=/trunk/boinc/; revision=20036
don't modify user preferences or CPID.
- client: fix bug that shows ATI version incorrectly
- database: host.posts has been repurposed as a salt (or seqno)
for a new type of weak authenticator that won't depend on password
- web code:
modify forum_preferences.posts instead of host.posts.
(actually, the former isn't used either, we just do a select count(*);
should fix this at some point).
svn path=/trunk/boinc/; revision=18865
feature without requiring use of score-based scheduling.
So add a new customizable function, wu_is_infeasible_custom(),
where projects can put job-specific checks.
Also, move customizable functions (of which there are now 4)
to a new file, sched_customize.cpp.
svn path=/trunk/boinc/; revision=18767
to the executable (../../projects/x/y) into argv[0],
not just the executable filename.
Apparently the new NVIDIA drivers have a bug that cause
CUDA apps to crash unless this is done.
- Scheduler: in no-host-ID case, don't mark results as "detached"
if request contains any in-progress results
svn path=/trunk/boinc/; revision=18754
The limit on jobs in progress is now
max_wus_in_progress * NCPUS
+ max_wus_in_progress * NGPUS
where NCPUS and NGPUS reflect prefs and are capped.
Furthermore: if the client reports plan class for in-progress jobs
(see checkin of 31 May 2009)
then these limits are enforced separately;
i.e. the # of in-progress CPU jobs is <= max_wus_in_progress*NCPUS,
and the # of in-progress GPU jobs is <= max_wus_in_progress_gpu*NGPUS
- scheduler config: rename <cuda_multiplier> to <gpu_multiplier>
- scheduler: <max_wus_to_send> is now scaled by
(NCPUS + gpu_multiplier*NGPUS)
- scheduler: don't keep scanning array if !work_needed()
- scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp
- scheduler: don't say "no work available" if jobs are available
but work_needed() is initially false
svn path=/trunk/boinc/; revision=18255
get jobs for (default it all).
Useful if you're mixing locality and regular scheduling.
- a little E@h-specific stuff
From Bernd Machenschalk.
svn path=/trunk/boinc/; revision=18039
Old: although the request message contained all info
about the app version (flops, coproc usage etc.)
the server ignored this info,
and assumed that all anonymous platform apps where CPU.
With 6.6 client, this could produce infinite work fetch:
- client uses anon platform, has coproc app
- client has idle CPU, requests CPU work
- scheduler sends it jobs, thinking they will be done by CPU app
- client asks for more work etc.
New: scheduler parses full info on anon platform app versions:
plan class, FLOPS, coprocs.
It uses this info to make scheduling decisions;
in particular, if the request is for CUDA work,
if will only send jobs that use a CUDA app version.
The <result> records it returns contain info
(plan_class) that tells the client which app_version to use.
This will work correctly even if the client has multiple app versions
for the same app (e.g., a CPU version and a GPU version)
svn path=/trunk/boinc/; revision=17506
which of those files to include
- Modified MAC address check to work on some non-Linux unixes.
(mac_address.cpp)
- Added suggested change to "already attached to project" checking.
(ProjectInfoPage.cpp)
- changed includes of standard c header files to their c++ equivalents
(i.e. replaced <stdio.h> with <cstdio>) for namespace protection.
- replaced "using namespace std;" with more explicit "using std::function" in
several files.
- Fixed bug in checking whether the os is OS/2 and added conditional OS_OS2
to the build environment. (boinc_platform.m4,configure.ac)
- Changed build environment to not use -nostandardlibs unless we are using
G++ and static linkage is specified. (configure.ac)
- Added makefiles and package building files for solaris CSW package manager.
- Fixed bug with attempting to find login name using logname. (configure.ac)
- Added ifdef HAVE_* protection around some include files commonly found in
sys.
- Added support for unified binary for x86_64/i686-pc-solaris.
(cs_platforms.cpp)
- generate_host_cpid() now uses MAC address on non-linux unix.
(hostinfo_network.cpp)
- Macro BOINC_SET_COMPILE_FLAGS now doesn't check gcc only flags on non-gcc
compilers. (boinc_set_compile_flags.m4)
- Library compiles no longer depend upon the library extension or require
the library to be prefixed with lib.
- More fixes for fcgi builds.
- Added declaration of "struct ether_addr" and ether_ntoa(). Have not yet
implemented ether_ntoa() for machines that don't have it, or where it is
buggy. (unix_util.h)
- Added FCGI::perror() which calls FCGI_perror(). (boinc_fcgi.{h,cpp})
- Fixed library Makefiles so that all required headers get installed.
svn path=/trunk/boinc/; revision=17388
- Update to libtool 1.5.24
- build environment: Major automake changes that I've been warning about
for some time.
- Now uses libtool to build libraries.
- Builds separate boinc_fcgi and sched_fcgi libraries for use with
FCGI server components.
- New macro "BOINC_CHECK_LIB_WITH" that executes a "AC_CHECK_LIB" on
a library only if --with-libname[=DIR] is specified on the configure
command line. This is to allow inclusion of libraries when the
ssl, gtk, wxWidgets, or other configuration is incorrect for static
libraries.
- Added a lot of "--with-*" for some libraries that might be required for
static builds.
- The sea directory has been moved to packages/generic. Changes to sea
and the associated scripts might be required to better make use of the
staging mechanism and shared libraries.
- Fixed includes of boinc_fcgi.h in many files.
- Fixed places where FCGI_FILE needs to be used implicitly.
- Fixed missing define of _SC_PAGESIZE on hosts that define only
_SC_PAGE_SIZE.
- Moved build of boinc_cmd (and source file) from lib to client
svn path=/trunk/boinc/; revision=16904
- parse new request message elements
(CPU and coproc requested seconds and instances)
- decide how many jobs to send based on these params
- select app version based on these params
(may send both CPU and CUDA app versions for the same app!)
svn path=/trunk/boinc/; revision=16861
- web: check whether to show profile in separate function
from displaying profile; eliminate double headers
- scheduler: finish purge of redundant arguments
svn path=/trunk/boinc/; revision=16726
put a textual summary of them in host.serialnum (currently unused)
- web: show coprocs on host detail page
- db_dump: include coproc info in host XML
svn path=/trunk/boinc/; revision=16697
for the selected APP_VERSION, rather than on the CPU benchmarks.
Otherwise estimates are wrong for GPU or multi-thread apps.
- scheduler: start switching from having SCHED_REQUEST and
SCHED_REPLY as globals instead of passing them around as args;
to be continued.
svn path=/trunk/boinc/; revision=16691
- scheduler: fix egregious bug where wu_is_infeasible_fast() result
is ignored, and we send jobs to hosts that can't handle them.
- scheduler: don't check for disk space in work_needed();
do it in check_disk(), which generates a message to user.
- scheduler: add -debug_log flag, which sends stderr to
"debug_log" rather than scheduler_log.txt (for debugging)
svn path=/trunk/boinc/; revision=16578
- API: use non-verbose option to zip
- scheduler: if multiple_client_per_host is set,
don't mark results as over if get repeat CPID
svn path=/trunk/boinc/; revision=16445