where app has both GPU and CPU versions,
but for certain jobs (VLAR WUs in this case)
the GPU version performs poorly and shouldn't be used.
The fix is a kludge - it will result in these jobs
not being sent to the host at all,
rather than being sent with the CPU app.
The current architecture makes it difficult to do otherwise.
One possible fix would be to create a separate app
for VLAR jobs, with only CPU app versions.
svn path=/trunk/boinc/; revision=20419
transitioning every 10 days, hence never become eligible for purging.
The problem: the transitioner has a "safety net" where,
if the WU doesn't have a canonical result,
it arranges for another transition in 10 days.
Skip this if error_mask<>0.
svn path=/trunk/boinc/; revision=20265
anonymous-platform app versions.
Otherwise fractional GPU requirements get truncated to zero.
Thanks to Crunch3r for identifying the problem.
svn path=/trunk/boinc/; revision=20189
10000*major + 100*minor + release,
rather than 100*major + minor.
Sometimes you need release-level resolution.
This affects:
- app_version.min_core_version
- config: min_core_client_version_announced
- config: min_core_client_version
Projects using these must multiply them by 100.
svn path=/trunk/boinc/; revision=20149
if project has crazy DCF, don't automatically request 1 sec;
only request work if there's a shortfall.
- intermediate checkin for notices stuff
svn path=/trunk/boinc/; revision=20145
deletion' pass to 50000 (can be changed with -delete_antiques_limit).
Previously large number of antiques led to not deleting any at all.
- Allow to change the interval between passes with -delete_antiques_interval.
svn path=/trunk/boinc/; revision=20138
- remove obsolete and buggy code from transitioner (create_result() in backend_lib)
- account for 'mixed' scheduling in explain_to_user() in sched_send.cpp
- finish transition to configurable patterns for distinguishing files reported by the client
in the Einstein@home-specific part of send_work_locality in sched_locality
(removed previous hardcoded strcmps)
svn path=/trunk/boinc/; revision=20074
will have enough jobs to use its share of resource instances.
This avoids situations where e.g. on a 2-CPU system
a project has 75% resource share and 1 CPU job,
and its STD increases without bound.
Did a general cleanup of the logic for computing
work request sizes (seconds and instances).
svn path=/trunk/boinc/; revision=20036
and avg_ncpus for GPU apps.
App versions are now characterized by two parameters
(we assume that the app uses either the CPU or the GPU
at a given time, but not both):
- the fraction of FLOPs performed on the CPU
- when the app is using the GPU, the fraction of peak FLOPS
that it gets.
We then run the numbers to get the total FLOPS and avg_ncpus.
svn path=/trunk/boinc/; revision=19977
RAM to run job, but when we actually run the job
not enough GPU RAM is free, so the application fails.
This can cause a large number of jobs to fail.
Solution:
- app_plan() can specify the GPU RAM requirements of an app version.
This is passed to the client in a new field
<gpu_ram> of the <app_version> element.
- prior to starting or restarting a GPU app, the client
checks the amount of free RAM on the particular GPU.
If it's not enough for the app version,
the client doesn't start it,
and arranges for the scheduler to ignore it for 5 minutes
(by which point there might be more free GPU RAM)
Notes:
1) this change will have effect only when
both client and scheduler are updated.
2) the check is done in enforce_schedule(),
rather than schedule_cpus(),
because only at that point
have we assigned a specific GPU to the job.
3) there's another case to deal with:
a GPU app's malloc of GPU RAM fails in the middle of the job.
Currently the job fails.
I plan to add an API call boinc_temporary_exit(x) so
that the job can exit and potentially restart in x seconds.
(In principle this mechanism is sufficient for all cases,
but it could lead to a lot of starting/exiting,
so the current change is worthwhile).
svn path=/trunk/boinc/; revision=19864
Make them both peak FLOPS,
according to the formula supplied by the manufacturer.
The impact on the client is minor:
- the startup message describing the GPU
- the weight of the resource type in computing long-term debt
On the server, I changed the example app_plan() function
to assume that app FLOPS is 20% of peak FLOPS
(that's about what it is for SETI@home)
svn path=/trunk/boinc/; revision=19310
for certain periods (e.g. when Remote Desktop is used on Win).
- add is_usable() member function to COPROC.
Currently this just calls the respective (CUDA or CAL)
initialization function.
We need to check whether this works and/or causes problems.
- in enforce_schedule(), check whether usability has changed
for each GPU type.
If we've gone from usable to unusable,
flag all jobs for that GPU as coproc_missing
(so they won't get run, and will quit if they're running).
If we've gone from unusable to usable, clear the flag.
This should deal with all cases except where
the client is started up with GPUs unusable.
- scheduler: more query optimizations for locality scheduling
(from Oliver Bock)
svn path=/trunk/boinc/; revision=19301
to accept CPU, NVIDIA and ATI jobs.
These prefs are shown only where relevant:
e.g., only for processor types for which the project has app versions,
and if it has versions for only one type, no pref is shown.
These prefs affect both client and scheduler.
The client won't ask for work for a device blocked by prefs,
and the scheduler won't send it.
This replaces earlier optional project-specific prefs for
"no CPU jobs" and "no GPU jobs".
(However, these prefs continue to be honored on the server side).
- client: if NVIDIA driver is unknown, say that rather than 0
svn path=/trunk/boinc/; revision=19194
(amdcalrt.dll is old version w/ funky DLL names)
- client: make GPU enumeration warnings more consistent
(e.g., "NVIDIA" instead of "CUDA").
- scheduler: get rid of ati13 plan class.
Require 1.4+ driver for plan class ati.
svn path=/trunk/boinc/; revision=19158
attribute is missing. That's the app's problem, not BOINC's
- sample assimilator: if a canonical instance has no output files,
rather than quitting create a file named WU_NAME_no_output_files
svn path=/trunk/boinc/; revision=19082
elapsed_time: the elapsed time (runtime) as reported by client
flops_estimate: the app's estimated FLOPS as reported by app_plan()
app_version_id: the DB ID of the app_version used
(or -1 if anonymous platform)
TODO: show these in the web interfaces,
and use them where appropriate
svn path=/trunk/boinc/; revision=19002
Old:
1) check deadline based on wu.delay_bound
2) in add_result_to_reply(), potentially modify wu.delay_bound,
e.g. because of retry acceleration
problem: reducing delay bound may cause deadline miss
New:
1) new function get_delay_bound_range()
(called from wu_is_infeasible_fast())
returns optimistic and pessimistic delay bounds.
Retry acceleration logic is here.
2) check deadline based on optimistic bound;
if that fails, check based on pessimistic bound.
Set wu.delay_bound to the one that worked.
Notes:
- get_delay_bound_range() needs result priority and report deadline,
and it's called before we read the full result.
So add these items to WORK_ITEM and WU_RESULT.
- get_delay_bound_range() could be customized for
project-specific deadline policy.
- add_result_to_reply() was becoming a toxic waste dump.
Deadline-related stuff should have been factored out in any case.
svn path=/trunk/boinc/; revision=18946
is running a graphics application.
Change the semantics of the "don't use GPU while computer in use" pref
to "don't use a GPU that's running a graphics app while
computer is in use".
This will increase GPU utilization on multi-GPU systems.
svn path=/trunk/boinc/; revision=18942
e.g. the Milkyway@home ATI app, of which we can typically run
2 or 3 instances at once on a GPU.
Changes include:
- In APP_VERSION, don't use a COPROCS to represent the GPU
requirements; just use doubles ncudas and natis.
- sufficient_coprocs() etc. are no longer members of COPROCS
- in HOST_USAGE, ncudas and natis are doubles
- in scheduler request, req_instances is now a double
This checkin doesn't include the job scheduling logic,
i.e. assigning jobs to GPUs. That will follow.
svn path=/trunk/boinc/; revision=18868
don't modify user preferences or CPID.
- client: fix bug that shows ATI version incorrectly
- database: host.posts has been repurposed as a salt (or seqno)
for a new type of weak authenticator that won't depend on password
- web code:
modify forum_preferences.posts instead of host.posts.
(actually, the former isn't used either, we just do a select count(*);
should fix this at some point).
svn path=/trunk/boinc/; revision=18865
(This bug has been there since 28 Oct 2004)
- GUI RPC and manager: include project backoff in FILE_TRANSFER,
so that manager gets up-to-date value
svn path=/trunk/boinc/; revision=18786
feature without requiring use of score-based scheduling.
So add a new customizable function, wu_is_infeasible_custom(),
where projects can put job-specific checks.
Also, move customizable functions (of which there are now 4)
to a new file, sched_customize.cpp.
svn path=/trunk/boinc/; revision=18767
"min # of GPU processors" attribute (stored in batch)
and are sent only to hosts whose GPUs have at least this #.
The logical place for this is in the scoring function, JOB::get_score().
I added a clause (#ifdef'd out) that does this.
It rejects the WU if #procs is too small,
otherwise it adds min/actual to the score.
This favors sending jobs that need lots of procs to GPUs that have them.
svn path=/trunk/boinc/; revision=18764
to the executable (../../projects/x/y) into argv[0],
not just the executable filename.
Apparently the new NVIDIA drivers have a bug that cause
CUDA apps to crash unless this is done.
- Scheduler: in no-host-ID case, don't mark results as "detached"
if request contains any in-progress results
svn path=/trunk/boinc/; revision=18754
for showing how GPU instances are being reserved
- scheduler: add "sse3" plan class example
- web: add option (NO_TEAMS constant) for suppressing teams
svn path=/trunk/boinc/; revision=18658
- client: don't write file_infos with no URLs to client_state.xml
for anon platform project; they must be from app_info.xml
svn path=/trunk/boinc/; revision=18592
Daemons that are managed by 'start' go to $(libexecdir)/sched.
The (F)CGI stuff goes to $(libexecdir)/cgi-bin.
Finally, example applications go under $(libexecdir)/examples.
svn path=/trunk/boinc/; revision=18378
so the same set of binaries can be used to handle multiple projects.
<projectroot>/bin is always prepended to $PATH
to ensure that project-specific binaries always take precedence.
From Gabor Gombas.
svn path=/trunk/boinc/; revision=18377
It is now possible to use the same set of tools for multiple
projects by setting BOINC_PROJECT_PATH.
From Gabor Gombas.
svn path=/trunk/boinc/; revision=18376
make_project now generates PROJECT/bin/boinc_path_config.py
to ensure that the interpreter will find the modules.
From Gabor Gombas.
svn path=/trunk/boinc/; revision=18374
and treat it as a recoverable error (i.e., retry).
The file deleter may run on a host that NSF-mounts
the upload/download dirs, and NSF mounts can file.
- scheduler: include WU#ID in log msgs for handled results
svn path=/trunk/boinc/; revision=18351
this is needed to handle stale entries and slots
reserved by now-dead PIDs
- client: unify code for writing soft link files
svn path=/trunk/boinc/; revision=18256
The limit on jobs in progress is now
max_wus_in_progress * NCPUS
+ max_wus_in_progress * NGPUS
where NCPUS and NGPUS reflect prefs and are capped.
Furthermore: if the client reports plan class for in-progress jobs
(see checkin of 31 May 2009)
then these limits are enforced separately;
i.e. the # of in-progress CPU jobs is <= max_wus_in_progress*NCPUS,
and the # of in-progress GPU jobs is <= max_wus_in_progress_gpu*NGPUS
- scheduler config: rename <cuda_multiplier> to <gpu_multiplier>
- scheduler: <max_wus_to_send> is now scaled by
(NCPUS + gpu_multiplier*NGPUS)
- scheduler: don't keep scanning array if !work_needed()
- scheduler: moved array-scan logic from sched_send.cpp to sched_array.cpp
- scheduler: don't say "no work available" if jobs are available
but work_needed() is initially false
svn path=/trunk/boinc/; revision=18255
limits the # of completed results handled per scheduler RPC.
This may be needed to avoid crashes due to memory allocation
failure (each reported result uses about 128KB memory).
- web: In showing result lists,
include "Validate error" results in the "Invalid" category.
(Previously they didn't appear in any category)
svn path=/trunk/boinc/; revision=18104
- client (Unix): if client crashes while benchmark processes are going,
make sure they detect this and exit.
- back-end programs: remove hardwired assumptions about
what directory they run in, and hence where config.xml is.
E.g., daemons look for it in "..", others expect it in current dir.
New approach: all the programs look for the project dir as follows:
1) the environment var BOINC_PROJECT_DIR, if defined
2) the current dir, if config.xml is there.
3) else ".."
This means you can run programs in either proj/bin/ or proj/,
or (using BOINC_PROJECT_DIR) you can keep executables
outside of the project dir.
svn path=/trunk/boinc/; revision=18042
get jobs for (default it all).
Useful if you're mixing locality and regular scheduling.
- a little E@h-specific stuff
From Bernd Machenschalk.
svn path=/trunk/boinc/; revision=18039
- web: divide the stylesheet into "main.css"
(which has formatting stuff, rounded corners etc.)
and "white.css" (which has colors).
The above two from Simek.
- scheduler: change default min NVIDIA driver version
from 17500 to 17700
svn path=/trunk/boinc/; revision=17819
(app versions don't have a <coprocs> around coproc elements,
may an oversight but let's stick with it).
Anyway, I think it's working now.
- lib: remove "owner" array from COPROC.
This was used in client to keep track of assignment of
coprocessors to tasks, but we got rid of the reserve/free scheme.
NOTE: this breaks the mechanism for passing --device N to apps;
I'll have to do this another way. Stay tuned.
svn path=/trunk/boinc/; revision=17543
Otherwise, if an app version has a platform different from
the client's primary platform, the client won't find it.
svn path=/trunk/boinc/; revision=17508
Old: although the request message contained all info
about the app version (flops, coproc usage etc.)
the server ignored this info,
and assumed that all anonymous platform apps where CPU.
With 6.6 client, this could produce infinite work fetch:
- client uses anon platform, has coproc app
- client has idle CPU, requests CPU work
- scheduler sends it jobs, thinking they will be done by CPU app
- client asks for more work etc.
New: scheduler parses full info on anon platform app versions:
plan class, FLOPS, coprocs.
It uses this info to make scheduling decisions;
in particular, if the request is for CUDA work,
if will only send jobs that use a CUDA app version.
The <result> records it returns contain info
(plan_class) that tells the client which app_version to use.
This will work correctly even if the client has multiple app versions
for the same app (e.g., a CPU version and a GPU version)
svn path=/trunk/boinc/; revision=17506
when to do a scheduler RPC:
if user request or acct mgr request, ignore backoff and suspend via GUI;
in all other cases honor both of these.
svn path=/trunk/boinc/; revision=17504
Otherwise we'll get stuck in a loop where the client asks for CPU work,
and the scheduler sends jobs for what it thinks is a CPU app
but is actually a coproc app.
Eventually we should add coproc info to the app descriptions
send in scheduler request,
so that you can use anonymous platform for coproc apps.
But let's wait on this.
- scheduler: compile fix for gcc 4.4. Fixes#854
svn path=/trunk/boinc/; revision=17502
which of those files to include
- Modified MAC address check to work on some non-Linux unixes.
(mac_address.cpp)
- Added suggested change to "already attached to project" checking.
(ProjectInfoPage.cpp)
- changed includes of standard c header files to their c++ equivalents
(i.e. replaced <stdio.h> with <cstdio>) for namespace protection.
- replaced "using namespace std;" with more explicit "using std::function" in
several files.
- Fixed bug in checking whether the os is OS/2 and added conditional OS_OS2
to the build environment. (boinc_platform.m4,configure.ac)
- Changed build environment to not use -nostandardlibs unless we are using
G++ and static linkage is specified. (configure.ac)
- Added makefiles and package building files for solaris CSW package manager.
- Fixed bug with attempting to find login name using logname. (configure.ac)
- Added ifdef HAVE_* protection around some include files commonly found in
sys.
- Added support for unified binary for x86_64/i686-pc-solaris.
(cs_platforms.cpp)
- generate_host_cpid() now uses MAC address on non-linux unix.
(hostinfo_network.cpp)
- Macro BOINC_SET_COMPILE_FLAGS now doesn't check gcc only flags on non-gcc
compilers. (boinc_set_compile_flags.m4)
- Library compiles no longer depend upon the library extension or require
the library to be prefixed with lib.
- More fixes for fcgi builds.
- Added declaration of "struct ether_addr" and ether_ntoa(). Have not yet
implemented ether_ntoa() for machines that don't have it, or where it is
buggy. (unix_util.h)
- Added FCGI::perror() which calls FCGI_perror(). (boinc_fcgi.{h,cpp})
- Fixed library Makefiles so that all required headers get installed.
svn path=/trunk/boinc/; revision=17388
There are two mechanisms to prevent the scheduler from
sending jobs that won't finish by their deadline.
Simple mechanism:
The client sends the interval x for which CPUs are projected
to be saturated.
Given a job with estimated duration y,
the scheduler doesn't send it if x + y exceeds the delay bound.
If it does send it, x is incremented by y.
Complex mechanism:
Client sends workload description.
Scheduler does EDF simulation, sees if deadlines are missed.
The only project using this AFAIK is BOINC alpha test.
Neither of these mechanisms takes coprocessors into account,
and as a result jobs could be sent that are doomed to
miss their deadline.
This checkin adds coprocessor awareness to the Simple mechanism.
Changes:
Client:
compute estimated delay (i.e. time until non-saturation)
for coprocessors as well as CPU.
Send them in scheduler request as part of coproc descriptor.
Scheduler:
Keep track of estimated delays separately for different resources
- client: fixed bug that computed CPU estimated delay incorrectly
- client: the work request (req_secs) for a resource is the min
of the project's share and the shortfall.
svn path=/trunk/boinc/; revision=17086
- client: restore notion of overworked;
if a project is overworked for a resource R,
don't fetch work for R unless there are idle instances
svn path=/trunk/boinc/; revision=17057
but we don't need to send any more CUDA jobs,
delete the BEST_APP_VERSION record and look for another app version.
This lets the scheduler send both CUDA and CPU app versions
for a given app in a single RPC.
svn path=/trunk/boinc/; revision=17051
1) net adjustment for eligible projects is zero;
2) max LTD is zero
- scheduler: fix msgs so disk size is shown in GB
svn path=/trunk/boinc/; revision=17031
1) it uses a coprocessor
2) it has checkpointed since the client started
3) it's being preempted because of a user action
(suspend job, project, or all processing)
or user preference (time of day, computer in use)
- scheduler: if shared mem seg doesn't exist,
report it and don't crash
svn path=/trunk/boinc/; revision=16992
even if it doesn't use a coprocessor.
- scheduler: added an "nci" (non CPU intensive) plan class
to sched_plan.cpp. It declares the use of 1% of a CPU.
The above two changes are intended to allow the QCN app to
run at above_idle priority, which it needs in order to do 500Hz polling.
- API: the std::string version of boinc_resolve_filename()
acts the same as the char[] version.
svn path=/trunk/boinc/; revision=16985
and add <cuda_multiplier>.
The latter is used in calculating max jobs/day for a host;
namely, it's host.max_results_day * (NCPUS + NCUDA*cuda_multiplier).
Set it to 10 or so if you have CUDA apps.
- scheduler: don't overload effective_ncpus();
instead, add two new functions,
max_results_day_multiplier() and max_wus_in_progress_multiplier()
- scheduler: don't reduce max_results_day if we get an aborted job
(it might have been aborted by the project;
not appopriate to punish host in this case)
svn path=/trunk/boinc/; revision=16959
- Update to libtool 1.5.24
- build environment: Major automake changes that I've been warning about
for some time.
- Now uses libtool to build libraries.
- Builds separate boinc_fcgi and sched_fcgi libraries for use with
FCGI server components.
- New macro "BOINC_CHECK_LIB_WITH" that executes a "AC_CHECK_LIB" on
a library only if --with-libname[=DIR] is specified on the configure
command line. This is to allow inclusion of libraries when the
ssl, gtk, wxWidgets, or other configuration is incorrect for static
libraries.
- Added a lot of "--with-*" for some libraries that might be required for
static builds.
- The sea directory has been moved to packages/generic. Changes to sea
and the associated scripts might be required to better make use of the
staging mechanism and shared libraries.
- Fixed includes of boinc_fcgi.h in many files.
- Fixed places where FCGI_FILE needs to be used implicitly.
- Fixed missing define of _SC_PAGESIZE on hosts that define only
_SC_PAGE_SIZE.
- Moved build of boinc_cmd (and source file) from lib to client
svn path=/trunk/boinc/; revision=16904
- parse new request message elements
(CPU and coproc requested seconds and instances)
- decide how many jobs to send based on these params
- select app version based on these params
(may send both CPU and CUDA app versions for the same app!)
svn path=/trunk/boinc/; revision=16861
exceptional cases (e.g., send at least one job to a host with no work)
apply whether using EDF or basic check
- client: don't accept 0 for active/on/connected frac; set to 1
svn path=/trunk/boinc/; revision=16744
- web: check whether to show profile in separate function
from displaying profile; eliminate double headers
- scheduler: finish purge of redundant arguments
svn path=/trunk/boinc/; revision=16726