This gives you a way to simulate the effects of app_config.xml
- client: piggyback requests for resources even if we're backed off from them
- client: change resource backoff logic
Old: if we requested work and didn't get any,
back off from resources for which we requested work
New: for each resource type T:
if we requested work for T and didn't get any, back off from T
Also, don't back off if we're already backed off
(i.e. if this is a piggyback request)
Also, only back off if the RPC was due to an automatic
and potentially rapid source
(namely: work fetch, result report, trickle up)
- client: fix small work fetch bug
- client: when parsing MD5, use 64 instead of 33 char buffer.
When the XML parser reads a string,
it enforces the buffer size limit BEFORE it strips whitespace.
So if a project put whitespaces before or after the MD5,
it would fail to parse.
(especially per-app exclusions) was incomplete and buggy.
Changes:
- make bitmaps of included instances per (app, resource type)
- in round-robin simulation, we keep track of used instances
(so that we know if there are instances that are idle
because of exclusions).
Do this based on app-level exclusions
(previously it was done based on project-wide exclusions,
which didn't include app-level exclusions).
- compute RSC_PROJECT_WORK_FETCH::non_excluded_instances
as the logical OR of the per-app masks.
I.e. if you exclude an instance for all apps separately,
it's the same as excluding it for the project as a whole.
(Note: this bitmap is used for only 1 purpose:
if we have idle instances, don't request work from a project
for which those instances are excluded.)
- define RSC_PROJECT_WORK_FETCH::ncoprocs_excluded as the # of
instances excluded for *any* app, not the # excluded for all apps.
This quantity is used in work fetch to make sure we don't
unboundedly fetch jobs that turn out not to have a GPU to run on
due to exclusions.
http://boinc.berkeley.edu/trac/wiki/ClientAppConfig
This lets users do the following:
1) limit the number of concurrent jobs of a given app
(e.g. for WCG apps that are I/O-intensive)
2) Specify the CPU and GPU usage parameters of GPU versions
of a given app.
Implementation notes:
- max app concurrency is enforced in 2 places:
1) when building the initial job run list
2) when enforcing the final job run list
Both are needed to avoid possible starvation.
- however, we don't enforce it during RR simulation.
Doing so could cause erroneous shortfall and work fetch.
This means, however, that work buffering will not work
as expected if you're using max concurrency.
had a parse error, and it included project files.
While parsing the scheduler reply we'd add FILE_REFs to
PROJECT::project_files,
but wouldn't link them to FILE_INFOs since this is done
only if the reply parses correctly.
The next garbage_collect() would dereference these NULL pointers.
Solution: parse the FILE_REFS into SCHEDULER_REPLY::project_files.
Copy this to PROJECT::project_files only if the reply parses.
svn path=/trunk/boinc/; revision=25598
on each request.
- client: when showing how much work a scheduler request returned,
scale by availability (as is done to show the amount of the request)
- client in account manager request, <not_started_dur> and
<in_progress_dur> are in wall time, not run time
(i.e. scale them by availability)
Note: there's some confusion in the code between runtime and wall time,
where in general wall time = runtime / availability.
New convention: let's use "runtime" for the former,
and "duration" for the latter.
svn path=/trunk/boinc/; revision=25597
This was presumably the cause of the recent Einstein@home problem.
- client: set file ownership and permissions after an async copy.
- client: set file ownership and permissions after a
regular (non-async) copy.
The latter 2 bugs would affect a VM app that copies
its executable to slot/x/shared
svn path=/trunk/boinc/; revision=25468
boinc_temporary_exit(),
explaining why the app is exiting.
Convey this to the client, and then to the Manager,
and display it there and in the log.
clientgui/
MainDocument.cpp
lib/
gui_rpc_client_ops.cpp
gui_rpc_client.h
api/
boinc_api.cpp,h
client/
client_types.cpp,h
app.h
app_control.cpp
svn path=/trunk/boinc/; revision=25315
in which the tiebreaker is MD5 of name.
That way the order is stable
(it doesn't change from one run of the client to the next)
and it doesn't grep results with similar names
(and hence for the same app).
This ordering is used for
1) the order of display in the manager
2) the job scheduler's notion of FIFO
svn path=/trunk/boinc/; revision=25300
File verify is done in 4 places:
- after a download finishes
- transition result to DOWNLOADED
- if project->verify_files_on_app_start, on app start
Use asynchrony only in the first 2 cases,
since the async logic is set up to mark the file as PRESENT
when done, not to restart a task
svn path=/trunk/boinc/; revision=25219
When a large file is copied from a project dir to a slot dir,
it's copied in chunks,
interleaved with other polling activities such as GUI RPCs.
That way the manager doesn't freeze while large copies
(e.g. VM images) are happening
svn path=/trunk/boinc/; revision=25192
we uncompress it and then verify it.
The latter involves computing its MD5, which reads the entire file.
Combine these 2 steps so that the MD5 is computed
as the file is uncompressed,
eliminating the need to read the file again.
svn path=/trunk/boinc/; revision=25157
send the size of the compressed file as well.
- client: parse and write the compressed size (FILE_INFO::gzipped_nbytes).
For get_transfer GUI RPCs, if it's a compressed download send
the compressed size.
That way the manager will show the fraction done correctly.
svn path=/trunk/boinc/; revision=25152
(It turns out that the compression schemes supported by
Apache and libcurl, suprisingly, aren't restartable.)
if a <file_info> from the server contains <gzipped_url> tags,
use those instead of the <url> tags,
and flag the file as "download_gzipped".
If this is the case, download NAME.gz and save it as NAME.gzt.
When the download is complete, rename NAME.gzt to NAME.gz,
and uncompress it to NAME.
(this ensures that if NAME.gz is present, it's complete).
Also do the uncompression, if needed, in verify_file().
This ensures that the uncompression will eventually get done
even if the client quits are crashes in the middle.
- update_versions: if <gzip> is present in a <file_info>,
add a gzipped copy in the download directory
and add a <gzipped_url> elements to the app version's xml_doc.
svn path=/trunk/boinc/; revision=25112
Report it (along with disk usage) in scheduler request messages.
This will allow the scheduler to send file-delete commands
if the project is using more than its share.
- client: add <disk_usage_debug> log flag
- create_work: add --help, show --command_line option
svn path=/trunk/boinc/; revision=24968
If set, don't run jobs for that app while network is suspended.
- client: parse this flag and maintain in state file;
do a job reschedule when network suspend state changes
- GUI RPC: add RESULT::network_wait flag;
if set, this job is waiting for network access to be allowed
- Manager: display the above in task info
- add support for "web graphics URL" (see above)
- client: parse message containing URL on graphics_reply channel
and store in ACTIVE_TASK::web_graphics_url
- GUI RPC: add RESULT::web_graphics_url
- Manager: if web graphics URL is present, Show Graphics opens a browser
- remove some vestigial code for pre-V6 graphics
svn path=/trunk/boinc/; revision=24899
clear project-level upload and download backoffs,
as well as RPC and individual xfer backoffs.
This was an oversight.
svn path=/trunk/boinc/; revision=24621
If we're contacting a project to report results,
only piggyback work requests for resources for which
that project is the highest priority that may have work.
- client: compute result.not_started more efficiently
TODO: continue efficiency work. There's still some quadratic stuff
svn path=/trunk/boinc/; revision=24523
reduce its runtime from O(N^2) to O(N),
where N is the number of runnable jobs
(which can be in the thousands).
This will make the client emulator run a lot faster,
and will reduce the client CPU overhead a bit.
- API: change boinc_get_opencl_ids() so that it returns
a BOINC error code (< -100) if the app_init.xml is
missing or bad (i.e. we're running standalone),
and an OpenCL error code (> -100) if an OpenCL call failed.
svn path=/trunk/boinc/; revision=24469
so that they do what they're supposed to
(i.e. enforce resource shares)
- client: change log flag <debt_debug> to <priority_debug>
- client simulator: update REC even with large delta-t.
- client simulator: handle "no new work" apps correctly
svn path=/trunk/boinc/; revision=24429
for when the job completed successfully but
one or more output files had permanent upload failures.
Show this state in web interfaces.
- sample_work_generator: check return value of count_unsent_results(),
so that we don't generate infinite work if there's a DB problem
- web: RSS feed shows news items from last 90 days, rather than 14
svn path=/trunk/boinc/; revision=24377
- client: if an app version can't be used because the GPUs it needs
are all excluded, mark it and all its results as "coproc missing"
so that they won't be looked at in scheduling logic.
svn path=/trunk/boinc/; revision=24317
where work fetch didn't work right in the presence of
multiple GPUs and <exclude_gpu> config options.
For example: suppose:
- you have 2 GPUs and 2 projects
- Project A is excluded from GPU 1
- you have lots of jobs for project A
Then the client won't try to fetch jobs from project B.
The problem had 2 parts:
a) round-robin simulation wasn't taking GPU exclusions into account.
In the above example, it would think that both GPUs had jobs.
I fixed this by computing the # of GPUs from each project
is excluded, and using this in the RR simulation.
b) Once this was done, I needed to make the client
request GPU jobs from project B rather than project A.
I did this with following policy:
If a project has excluded GPUs of a given type,
and has a runnable job of that type,
don't ask it for more work of that type.
Notes:
- the policy in b) is crude, and it means that work-buffer
preferences are ignored in some cases.
- neither a) nor b) takes into account app-level exclusions.
I could fix both of these with a lot of work,
but I'd rather move to a model in which dissimilar GPUs
are modeled as different resources,
which would remove the need for the <exclude_gpu> mechanism
in the first place.
- web: remove extraneous ) at end of button tooltips
svn path=/trunk/boinc/; revision=24312
by simulating time-slicing explicitly.
Also simulate changes in project REC
and hence in scheduling priority.
- client: add a log flag "rrsim_detail" that prints
time-slice-level info.
svn path=/trunk/boinc/; revision=24161
If present, "file_prefix/" is prepended to the logical names
of input and output files of jobs using that app version.
I.e. for Vbox wrapper based app versions, file_prefix is "share",
so that I/O files are put in a "share" subdirectory of the slot dir.
- update_versions: add support for
<dont_throttle>
<file_prefix>x</file_prefix>
in version.xml
svn path=/trunk/boinc/; revision=23924
- client: cc_config.xml: if <devnum> is omitted from a <exclude_gpu>,
it means exclude all instances of that GPU type
- client: if all instances of a GPU type are excluded for a project,
don't ask the project for jobs of that type
svn path=/trunk/boinc/; revision=23898