A user had entered a string in one of the project prefs image URL
fields containing ^Z. Windows treats this as EOF and the parse
of the scheduler reply fails.
Soluation: open the scheduler reply file in "rb" mode rather than "r"
On Android, the way we were getting os_version
(Linux version + Android version)
didn't work because get_host_info() was getting called before every
scheduler RPC, and it overwrote the Android version part.
Solution: divide host info into dynamic (disk usage, network info)
and static (everything else).
Computer the static part only at startup.
Also factor the Unix HOST_INFO code into multiple functions.
If a scheduler reply contains a result we already have,
and the report deadline is different, update it.
Notes:
- report deadlines are based on server time.
If there's clock skew between client and server, things are off.
This is a design flaw, but too late now.
- server-side support for adjusting deadlines isn't there yet
If a <file_info> in scheduler RPC reply specifies <sticky_lifetime>,
the client will calculate a "sticky_expire_time",
store it in the client state file,
and make the file unsticky when that time is reached.
Note: if a later RPC reply includes the same file,
the client will update sticky_expire_time.
So if a file is used repeatedly it won't get expired.
We weren't copying the request fields from RSC_WORK_FETCH to COPROC.
Do this, and clean up the code a bit.
Note: the arrays that parallel the COPROCS::coprocs array
are a bit of a kludge; that stuff logically belongs in COPROC.
But it's specific to the client, so I can't put it there.
Maybe I could do something fancy with derived classes, not sure.
Currently the server doesn't know about different client "brands",
e.g. HTC Power to Give, Charity Engine, GridRepublic, etc.,
so there's no way to collect statistics about them.
Changes:
- client: at startup, read a "client brand" string from client_brand.txt
(i.e. branded clients will have to include this file in their installer)
Report this string in scheduler requests.
- scheduler: parse this request element,
and store it in host.serialnum as [BOINC|7.4.2|brand]
My last commit did this using a new API call.
But this would require rebuilding apps any time you want to change it;
too much work.
So instead make it an attribute of apps,
which you can set via the admin web interface.
Corresponding changes to client.
- If AMS supplies resource share, don't override it with
project setting (my last fix didn't quite do this)
- When detach from AMS, set RS to project-supplied value
My commit of Feb 7 caused work fetch to project P
to be deferred for up to 5 min if an upload to P is active,
even if some instances are idle.
This was to deal with a case where the idleness was caused
by a jobs-in-progress limit by P,
and work requests lead to long backoff.
However, this can cause instances to be idle unnecessarily.
I changed things so that, if instances are idle,
a work fetch can happen even during upload.
But only one such fetch will be done.
If a project has active uploads, defer work fetch from it for 5 minutes
even if there are idle devices (that's the change).
This addresses a situation (reported by Rytis) where
- a project P has a jobs-in-progress limit less than NCPUS
- P's jobs finish and are uploading
- the client asks P for work and doesn't get any because of the limit
- the client does exponential backoff from P
Over the long term, P can get much less than its fair share of work
Many Android users had existing preferences with settings like
"don't compute when idle" that make sense for PCs but not mobile devices.
When this pref is enforced on Android, no computing happens
and user confusion results.
We're addressing this by using only local prefs on Android.
We considered other approaches - e.g. having a "mobile" venue -
but they're too complex.
A while back we added a mechanism intended to defer work-request RPCs
while file uploads are happening,
with the goal of reporting completed tasks sooner
and reducing the number of RPCs.
There were 2 bugs in this mechanism.
First, the decision of whether an upload is active was flawed;
if several uploads were active and 1 finished,
it would act like all had finished.
Second, when WORK_FETCH::choose_project.cpp() picks a project,
it sets p->sched_rpc_pending to RPC_REASON_NEED_WORK.
If we then decide not to request work because an upload
is active, we need to clear this field.
Otherwise scheduler_rpc_poll() will do an RPC to it,
piggybacking a work request and bypassing the upload check.
If the user typed an extremely long URL into the
Attach to Account Manager wizard, a buffer overrun could result.
There were several places in the code that assumed user-entered
URLs are small (e.g. 256 chars):
- canonicalize_master_url.cpp()
- several GUI RPC interfaces, when generating XML request message
- URL-escaping (not relevant here, but fix anyway)
Change all these to stay within buffers regardless of URL size.
Note: do this by truncation.
This will cause error messages like "can't connect to project"
rather than saying the URL is too long. That's OK.
if a project sends us <no_rsc_apps> flags for all processor types,
then by default the client will never do a scheduler RPC to that project again.
This could happen because of a transient condition in the project,
e.g. it deprecates all its app versions for a while.
To avoid this situation, the client now checks whether the no_rsc_apps flags
are set for all processor types.
If they are, it clears them all.
This will cause work fetch to use backoff,
and the client will occasionally contact the project.
by Jacob Klein.
The new policy is roughly as follows:
- find the highest-priority project P that is allowed
to fetch work for a resource below buf_min
- Ask P for work for all resources R below buf_max
for which it's allowed to fetch work,
unless there's a higher-priority project allowed
to request work for R.
If we're going to do an RPC to P for reasons other than work fetch,
the policy is:
- for each resource R for which P is the highest-priority project
allowed to fetch work, and R is below buf_max,
request work for R.
(usually in a static variable called "last_time")
of the last time we did something,
and we only do it again when now - last_time exceeds some interval.
Example: sending heartbeat messages to apps.
Problem: if the system clock is decreased by X,
we won't do any of these actions are time X,
making it appear that the client is frozen.
Solution: when we detect that the system clock has decreased,
set a global var "clock_change" for 1 iteration of the polling loop,
and disable these time checks if clock_change is set.
report them.
64 is chosen a bit arbitrarily, but the idea is to
limit the number of tasks reported per RPC,
and to accelerate the reporting of small tasks.
the binding of the get_state() RPC
- client: move client_start_time and previous_uptime
from CLIENT_STATE to TIME_STATS,
so that these are also visible in GUI RPC
- scheduler RPC: move uptime and previous_uptime
into <time_stats>
- client: condition an RR simulation message on <rrsim_detail>
- boinccmd: show TIME_STATS info in --get_state