and passing them the corresponding --device N cmdline args.
This fixes a bug introduced in 17402 (Feb 26)
that broke the --device feature,
presumably causing problems on systems with multiple GPUs.
svn path=/trunk/boinc/; revision=17549
(app versions don't have a <coprocs> around coproc elements,
may an oversight but let's stick with it).
Anyway, I think it's working now.
- lib: remove "owner" array from COPROC.
This was used in client to keep track of assignment of
coprocessors to tasks, but we got rid of the reserve/free scheme.
NOTE: this breaks the mechanism for passing --device N to apps;
I'll have to do this another way. Stay tuned.
svn path=/trunk/boinc/; revision=17543
get_state() reply which are not included in our client interface.
However, it turns out that BoincView uses these items; put them back.
- GUI RPC: set_debt() can set CUDA LTD as well as CPU
svn path=/trunk/boinc/; revision=17542
app versions in scheduler reply
- client: when reporting anonymous platform apps in sched request,
don't include <file_info>s (not relevant to server)
svn path=/trunk/boinc/; revision=17507
when to do a scheduler RPC:
if user request or acct mgr request, ignore backoff and suspend via GUI;
in all other cases honor both of these.
svn path=/trunk/boinc/; revision=17503
Otherwise we'll get stuck in a loop where the client asks for CPU work,
and the scheduler sends jobs for what it thinks is a CPU app
but is actually a coproc app.
Eventually we should add coproc info to the app descriptions
send in scheduler request,
so that you can use anonymous platform for coproc apps.
But let's wait on this.
- scheduler: compile fix for gcc 4.4. Fixes#854
svn path=/trunk/boinc/; revision=17502
old: reference-count files involved in a PERS_FILE_XFER
new: if a PERS_FILE_XFER refers to an unreferenced file,
delete it (and the associated FILE_XFER and HTTP_OP if present)
May fix#366
svn path=/trunk/boinc/; revision=17486
other than work fetch (e.g., user request, project request)
temporarily clear resource backoffs while deciding
whether to request work.
The backoffs are there only to delay RPCs,
and we're going an RPC anyway.
svn path=/trunk/boinc/; revision=17416
to ask for work inappropriately,
and tell user that it wasn't asking for work.
Here's what was going on:
There are two different structures with work request fields
(req_secs, req_instances, estimated_delay):
COPROC_CUDA *coproc_cuda
and
RSC_WORK_FETCH cuda_work_fetch.
WORK_FETCH::choose_project() copied from cuda_work_fetch to coproc_cuda,
but only if a project was selected.
WORK_FETCH::clear_request() clears cuda_work_fetch but not coproc_cuda.
Scenario:
- a scheduler op is made to project A requesting X>0 secs of CUDA
- later, a scheduler op is made to project B for reason
other than work fetch (e.g., user request)
- choose_project() doesn't choose anything,
so the value of coproc_cuda->req_secs remains X
- clear_request() is called but that doesn't change *coproc_cuda
Solution: work-fetch code no longer knows about internals of
COPROC_CUDA and is not responsible for settings its request fields.
The copying of request fields from RSC_WORK_FETCH to COPROC
is done at a higher level,
in CLIENT_STATE::make_scheduler_request()
Additional bug fix: estimated_delay wasn't being cleared in some cases.
svn path=/trunk/boinc/; revision=17411
on first-time startup.
- client: don't do an RPC until we've done CPU benchmarks.
We need the benchmark values to fill in app_version.flops
svn path=/trunk/boinc/; revision=17404
and a 2nd GPU job with an earlier deadline arrives,
neither job is executed ever.
Reorganized things so that scheduling of GPU jobs is
done independently of CPU jobs.
The policy for GPU jobs:
- always EDF
- jobs are always removed from memory, regardless of checkpoint
(GPU memory is not paged, so it's bad to leave an idle app in memory)
svn path=/trunk/boinc/; revision=17402
- client: abort runaway jobs based on elapsed time instead of CPU time.
Specifically, abort jobs for which
elapsed time > WU.rsc_fpops_bound / app_version.flops
This policy works for
1) GPU jobs (which may use little CPU time)
2) jobs that run but because of bugs use little CPU time
(e.g., because they're sleeping)
whereas the old policy didn't.
svn path=/trunk/boinc/; revision=17399
which of those files to include
- Modified MAC address check to work on some non-Linux unixes.
(mac_address.cpp)
- Added suggested change to "already attached to project" checking.
(ProjectInfoPage.cpp)
- changed includes of standard c header files to their c++ equivalents
(i.e. replaced <stdio.h> with <cstdio>) for namespace protection.
- replaced "using namespace std;" with more explicit "using std::function" in
several files.
- Fixed bug in checking whether the os is OS/2 and added conditional OS_OS2
to the build environment. (boinc_platform.m4,configure.ac)
- Changed build environment to not use -nostandardlibs unless we are using
G++ and static linkage is specified. (configure.ac)
- Added makefiles and package building files for solaris CSW package manager.
- Fixed bug with attempting to find login name using logname. (configure.ac)
- Added ifdef HAVE_* protection around some include files commonly found in
sys.
- Added support for unified binary for x86_64/i686-pc-solaris.
(cs_platforms.cpp)
- generate_host_cpid() now uses MAC address on non-linux unix.
(hostinfo_network.cpp)
- Macro BOINC_SET_COMPILE_FLAGS now doesn't check gcc only flags on non-gcc
compilers. (boinc_set_compile_flags.m4)
- Library compiles no longer depend upon the library extension or require
the library to be prefixed with lib.
- More fixes for fcgi builds.
- Added declaration of "struct ether_addr" and ether_ntoa(). Have not yet
implemented ether_ntoa() for machines that don't have it, or where it is
buggy. (unix_util.h)
- Added FCGI::perror() which calls FCGI_perror(). (boinc_fcgi.{h,cpp})
- Fixed library Makefiles so that all required headers get installed.
svn path=/trunk/boinc/; revision=17388
(it wasn't setting the "use_XXX" flags). Fixes#776
- client: you can now include a <proxy_info> element
in your cc_config.xml options.
TODO: the whole proxy info thing needs an overhaul:
- no separate "use_XXX" flags;
non-empty http_server_name implies using HTTP proxy, etc.
- merge PROXY_INFO and GR_PROXY_INFO classes
- use XML_PARSER for parsing
- no PROXY_INFO element in HTTP_OP; just use gstate.proxy_info
svn path=/trunk/boinc/; revision=17379
project, it most have no runnable jobs for ANY resource.
- client: work-fetch bug fix: when setting requests in the
shortfall case, don't request anything if project is backed off
or overworked for the resource.
svn path=/trunk/boinc/; revision=17338
There are situations where multiple projects can legitimately
have large negative LTD on a uniprocessor.
Instead...
- client: add <zero_debts> option to cc_config.xml
svn path=/trunk/boinc/; revision=17328
1) if an instance is idle, get work from highest-debt project,
even if it's overworked.
2) if resource has a shortfall, get work from highest-debt
non-overworked project
3) if there's a fetchable non-overworked project with no runnable jobs,
get from from the highest-debt one.
(each step is done first for GPU, then CPU)
Clause 3) is new.
It will cause the client to get jobs for as many projects as possible,
even if there is no shortfall.
This is necessary to make the notion of "overworked" meaningful
(otherwise, any project with long jobs can become overworked).
It also maintains as much variety as possible (like pre-6.6 clients).
Also (small bug fix) if a project is overworked for resource R,
request work for R only in case 1).
svn path=/trunk/boinc/; revision=17327
stop accumulating debt if it's at or around zero.
This prevents other projects from being driven unboundedly negative.
- client: if the number of overworked projects exceeds the number
of device instances, clear debts; this indicates that an earlier
client was buggy and produced bad debt values.
svn path=/trunk/boinc/; revision=17325
This fixes a bug that can cause debts to NEVER get updated.
- client: added "abort_jobs_on_exit" feature
(available by --abort_jobs_on_exit cmdline
or <abort_jobs_on_exit> in cc_config.xml).
If set, when the client is exited by user request
(this includes signals on Unix)
it marks all pending jobs as aborted,
and does a scheduler RPC to all projects with jobs.
When these are completed the client exits.
This is useful when BOINC is being used on grids
where it is wiped clean after each run.
svn path=/trunk/boinc/; revision=17300
so that largest debt among eligible projects tends towards zero
- client: change definition of "overworked"; debt must be < 1 day
svn path=/trunk/boinc/; revision=17206
this gets called when the op fails, either at initialization or later on;
it clears the project's sched_rpc_pending flag if needed.
This fixes a bug that caused user-requested RPCs to retry every 10 seconds
when the network is down.
- client: if debt-adjust period is too long, reset accounting.
Otherwise we'll get this infinitely.
- API: all optional alpha argument to TEXTURE_DESC::draw()
svn path=/trunk/boinc/; revision=17195
- client: if a project-requested RPC doesn't return work,
don't do resource backoff.
- client: if a user-requested scheduler RPC errors out, clear the request
svn path=/trunk/boinc/; revision=17191
using a coprocessor we don't know about, ignore it
(and all results using that app_version will be flushed).
This deals with the situation where we have some GPU jobs,
but the GPU card is removed (previously this resulted in a crash).
This requires some code shuffling so that we check for coprocessors
before reading state file.
svn path=/trunk/boinc/; revision=17161
ignore intervals longer than 10 secs;
that could only happen if the client or host was suspended/hibernated.
- client: in adjust_debts(), ignore intervals longer than
2*work fetch period, not 2*CPU sched period.
adjust_debts() is called from work fetch.
svn path=/trunk/boinc/; revision=17154
worked in the presence of coprocessors.
The simulator maintained per-project queues of pending jobs.
When a job finished (in the simulation) it would get
one or more jobs from that project's pending queue.
The problem: this could cause "holes" in the scheduling of GPUs,
and produce an erroneous nonzero shortfall for GPUs,
leading to infinite work fetch.
The solution: maintain a separate (per-resource, not per--project)
queue of pending coprocessor jobs.
When a coprocessor job finishes,
start pending jobs from the queue for that resource.
Another change: the simulator did strict reservation of coprocessors.
If there are 2 instances of CUDA,
and a 1-instance job is running in the simulation,
it wouldn't start an additional 2-instance job.
This also can cause erroneous nonzero shortfalls.
So instead, schedule coprocessors like CPUs, i.e. saturate them.
This can cause distorted completion time estimates,
but it's better than infinite work fetch.
svn path=/trunk/boinc/; revision=17093
There are two mechanisms to prevent the scheduler from
sending jobs that won't finish by their deadline.
Simple mechanism:
The client sends the interval x for which CPUs are projected
to be saturated.
Given a job with estimated duration y,
the scheduler doesn't send it if x + y exceeds the delay bound.
If it does send it, x is incremented by y.
Complex mechanism:
Client sends workload description.
Scheduler does EDF simulation, sees if deadlines are missed.
The only project using this AFAIK is BOINC alpha test.
Neither of these mechanisms takes coprocessors into account,
and as a result jobs could be sent that are doomed to
miss their deadline.
This checkin adds coprocessor awareness to the Simple mechanism.
Changes:
Client:
compute estimated delay (i.e. time until non-saturation)
for coprocessors as well as CPU.
Send them in scheduler request as part of coproc descriptor.
Scheduler:
Keep track of estimated delays separately for different resources
- client: fixed bug that computed CPU estimated delay incorrectly
- client: the work request (req_secs) for a resource is the min
of the project's share and the shortfall.
svn path=/trunk/boinc/; revision=17086
- client: restore notion of overworked;
if a project is overworked for a resource R,
don't fetch work for R unless there are idle instances
svn path=/trunk/boinc/; revision=17057
but we don't need to send any more CUDA jobs,
delete the BEST_APP_VERSION record and look for another app version.
This lets the scheduler send both CUDA and CPU app versions
for a given app in a single RPC.
svn path=/trunk/boinc/; revision=17051
1) net adjustment for eligible projects is zero;
2) max LTD is zero
- scheduler: fix msgs so disk size is shown in GB
svn path=/trunk/boinc/; revision=17031
- client: respect work-fetch backoff for non-CPU-intensive projects
- client: for non-CPU-intensive project, fetch new job
if no currently running jobs
- client: skip non-CPU-intensive projects in debt calculations
- manager: show resource backoff times correctly
svn path=/trunk/boinc/; revision=16998
1) it uses a coprocessor
2) it has checkpointed since the client started
3) it's being preempted because of a user action
(suspend job, project, or all processing)
or user preference (time of day, computer in use)
- scheduler: if shared mem seg doesn't exist,
report it and don't crash
svn path=/trunk/boinc/; revision=16992
even if it doesn't use a coprocessor.
- scheduler: added an "nci" (non CPU intensive) plan class
to sched_plan.cpp. It declares the use of 1% of a CPU.
The above two changes are intended to allow the QCN app to
run at above_idle priority, which it needs in order to do 500Hz polling.
- API: the std::string version of boinc_resolve_filename()
acts the same as the char[] version.
svn path=/trunk/boinc/; revision=16985
- manager: display CUDA info in project properties page
- manager: use struct assignment instead of copy() function
svn path=/trunk/boinc/; revision=16925
- Update to libtool 1.5.24
- build environment: Major automake changes that I've been warning about
for some time.
- Now uses libtool to build libraries.
- Builds separate boinc_fcgi and sched_fcgi libraries for use with
FCGI server components.
- New macro "BOINC_CHECK_LIB_WITH" that executes a "AC_CHECK_LIB" on
a library only if --with-libname[=DIR] is specified on the configure
command line. This is to allow inclusion of libraries when the
ssl, gtk, wxWidgets, or other configuration is incorrect for static
libraries.
- Added a lot of "--with-*" for some libraries that might be required for
static builds.
- The sea directory has been moved to packages/generic. Changes to sea
and the associated scripts might be required to better make use of the
staging mechanism and shared libraries.
- Fixed includes of boinc_fcgi.h in many files.
- Fixed places where FCGI_FILE needs to be used implicitly.
- Fixed missing define of _SC_PAGESIZE on hosts that define only
_SC_PAGE_SIZE.
- Moved build of boinc_cmd (and source file) from lib to client
svn path=/trunk/boinc/; revision=16904
update the way that app versions are identified.
Old: WORKUNIT contains version_num
RESULT contains app_version_num (but only if running)
New: Keep old fields so new client works with old manager.
RESULT contains version_num, plan_class
Manager: if RESULT doesn't have version/plan_class
(because talking to old client)
look up app version based on WU version num.
svn path=/trunk/boinc/; revision=16903
(corresponding to the get_project_config.php web RPC):
- platforms: list of platforms supported by the project
- sched_stopped: scheduler disabled
- web_stopped: DB-driven web features disabled
- min_client_version
- GUI RPC: add the following items to CC_STATE:
- platforms: list of platforms supported by the client
(this replaces the unused <platform_name>)
- GUI RPC: add the following items to PROJECT_LIST_ENTRY
(entry in the "all projects" list):
- platforms: list of platforms supported by the project
- GUI RPC: move APP_VERSION pointer from WORKUNIT to RESULT;
include plan class in APP_VERSION lookup.
This completes the change of March 2008,
and allows the Manager to work correctly when a project
has two different app versions of the same (app, platform, version)
running on a client at once (e.g., a CPU and a GPU app)
- get_project_config.php: remove logic that checks client version.
This page is accessed by PHP, not just by client
- web: add link to forum page to get forum as RSS
svn path=/trunk/boinc/; revision=16900
exceptional cases (e.g., send at least one job to a host with no work)
apply whether using EDF or basic check
- client: don't accept 0 for active/on/connected frac; set to 1
svn path=/trunk/boinc/; revision=16744
for projects with no active results.
This is now wrong because there coproc apps might have pending results.
Also remove nidle_cpus > 0 conditional that increments CPU shortfall;
I think this is vestigial code.
svn path=/trunk/boinc/; revision=16646
if missing, use checkpoint CPU time.
- client: enforce CPU schedule: if we're running a coproc job,
keep CPU utilization strictly less than NCPUS.
svn path=/trunk/boinc/; revision=16616
(otherwise it doesn't work for coproc or multi-proc apps)
- client: in estimate of job completion time,
weight the estimate based on fraction done more heavily
(quadratic rather than linear)
svn path=/trunk/boinc/; revision=16603
as the basis for estimating job completion times.
This should improve estimates for GPU apps,
and prevent the DCF from getting messed up.
svn path=/trunk/boinc/; revision=16598
(per-project or overall) if there are no pending tasks.
This is needed when there are coproc (i.e. CUDA) jobs;
CPUs may be idle because pending jobs are waiting for active jobs
to release coprocs.
In this situation the CPU idleness should not be counted as shortfall;
otherwise (if there are only coproc jobs) there will always be a shortfall,
and the client will fetch infinite work.
svn path=/trunk/boinc/; revision=16545
- client: make host CPID a function of:
MAC addresses + hostname + IP addr
This means that a given host will generally always get the same CPID.
Helpful e.g. on grids where the client gets installed repeatedly.
From Artyom Sharov.
client/
hostinfo_network.cpp
lib/
hostinfo.cpp
mac_address.cpp,h
win_build/
boinc_cli_curl.vcproj
libboinc.vcproj
svn path=/trunk/boinc/; revision=16432
If you store input and output files on different servers,
you can run 2 file_deleters, each one on the same machine
as the files it's going to be deleting.
- file_deleter: add -help option and usage()
client/
cpu_sched.cpp
sched/
file_deleter.cpp
svn path=/trunk/boinc/; revision=16397
for the preemptable_task_list.
The problem was that the ordering predicate (more_preemptable())
could change on the fly, making the heap inconsistent.
Instead, we create a vector, sort it by increasing preemptability,
then pop off the end.
svn path=/trunk/boinc/; revision=16395
leak detection will work.
- MGR: Have the BaseFrame call a function to determine if the
selection list should be saved instead of traversing
the application pointer. Each view just overrides the function
returning a true/false value. We don't have to worry about null
pointers and the like.
- MGR: BOINCGUIApp should never need to know how either the views
work or the document. Move the code that determines which
RPCs should be fired into each of the views. Have the document
look for it there.
- MGR: Reduce duplicate code for hiding and showing an application
- MGR: Move some Windows and Mac specific code into functions
and streamline the application startup and shutdown rountines.
- MGR: Move the event processing that was in BOINCGUIApp into the
BaseFrame.
- MGR: General cleanup.
- MGR: Doxygen comments.
- MGR: Cleanup some warnings.
client/
rr_sim.cpp
clientgui/
AdvancedFrame.cpp, .h
AsyncRPC.cpp, .h
BOINCBaseFrame.cpp, .h
BOINCBaseView.cpp, .h
BOINCClientManager.cpp
BOINCGUIApp.cpp, .h
BOINCTaskBar.cpp
MainDocument.cpp, .h
sg_BoincSimpleGUI.cpp, .h
ViewProjects.cpp, .h
ViewTransfers.cpp, .h
ViewWork.cpp, .h
WelcomePage.cpp
win_build/installerv2/
BOINC.ism
BOINCx64.ism
win_build/
sim.vcproj
svn path=/trunk/boinc/; revision=16357
are non-CPU-intensive or that use < 1 CPU (e.g., CUDA)
- client: get rid of spurious "internal error,
expected process to be executing" msg
- diag: don't check heap on every alloc
- fix a few compile warnings
svn path=/trunk/boinc/; revision=16323
preference not the total number of CPUs if we are
actually calculating the min of both of them.
client/
cpu_sched.cpp
svn path=/trunk/boinc/; revision=16289
<network_test_url>: where to go to see if network is up
<client_version_check_url>: where to get list of client versions
<client_download_url>: where to direct user to get new version
- manager: some different text for WCG version
svn path=/trunk/boinc/; revision=16208
Here's are the new semantics: a scheduler reply can include
<next_rpc_delay>
Make another RPC ASAP after this amount of time elapses.
This is specified by the <next_rpc_delay> element in config.xml.
<request_delay>
Don't make another RPC until this amount of time elapses.
This is sent automatically (and sometimes with large delays)
by various parts of the scheduler.
next_rpc_delay now "overrides" request_delay in the sense that
request_delay is ignored if it's greater than next_rpc_delay.
In addition: the client maintains a min_rpc_time which is set based
on request_delay and also by various exponential backoff schemes.
new_rpc_delay now overrides this as well, in the same sense.
svn path=/trunk/boinc/; revision=16206
- web: remove file_get_contents() workaround for PHP4
- web: If Akismet or ReCaptcha failure,
display the form again with a warning message at the top.
That way the user doesn't lose the text they just typed.
svn path=/trunk/boinc/; revision=16175
- client: change logic in a pathological file xfer case
(we asked for tail of file, proxy returned whole file)
to report fopen() errors correctly, and to close all open files
svn path=/trunk/boinc/; revision=16150
<exclusive_app>foo.exe</exclusive_app>
in your cc_config.xml, BOINC will suspend computing
whenever foo.exe is running (e.g., a game).
Eventually we might want to put the interface in preferences
instead of cc_config.xml
svn path=/trunk/boinc/; revision=16087
E.g. if you're running a project locally,
while attached to outside projects via a proxy.
Currently accessible only via the Manager's Options dialog.
From Frank Weiler.
svn path=/trunk/boinc/; revision=16061
Some projects (GPUgrid, QCN) don't work on some platforms
if sandboxing is used.
Better to send an error message than send jobs.
- get rid of a few compiler warnings
svn path=/trunk/boinc/; revision=16042
is already authenticated.
This is needed to make BOINCView work;
it authenticates before every operation for some reason.
svn path=/trunk/boinc/; revision=15920
Old: when checking whether an app can be run,
check for sufficient coprocessors relative to
the current coprocessor usage.
Bug: it there are 2 CUDA jobs,
the scheduler will decide to run both.
enforce_scheduler() will only be able to run one,
and the other CPU will be idle.
New: include coprocessor usage (along with RAM and CPUs)
in the check, and do a simulated reservation.
In the above scenario, the scheduler will select
one CUDA app and one non-CUDA app.
svn path=/trunk/boinc/; revision=15904
- client: don't leak process handles when abort jobs
- client: if an app exits or we kill it, always destroy the shmem segment.
- web: more HTML 4.01 Transitional conformity changes
svn path=/trunk/boinc/; revision=15865
to be used by anybody, and was only meant as a stop-gap until
we had some formal way to deal with co-processors.
client/
hostinfo_win.C
lib/
hostinfo.C, .h
svn path=/trunk/boinc/; revision=15849