There was a 20-30 second delay between exclusive app exit
and resuming tasks. This was excessive.
Reduce it to 5-15 sec (uncertainty is because we
check exclusive apps every 10 sec)
This is the interval between when the client sends a "quit" message
to when it kills the app via TerminateProcess or SIGKILL.
Apparently VM apps that do their own state-saving
(i.e. that don't use snapshot-based checkpointing)
can take more than 15 seconds to do so.
Hopefully 60 is enough.
Rom suggested making this interval shorter in the case where
the OS is shutting down.
I don't think this is necessary since the OS kill everything anyway
after some period (5 sec in the case of Win 10).
There were two problems:
1) we were sorting before parsing the client state file
(which is where we get project names from)
2) the Win implementation of strcasecmp() wasn't right;
it returned difference but not order.
A while back I changed the job sched and work fetch policies to use
REC-based project priority.
The work fetch logic sorts the project list (in CLIENT_STATE::projects)
by descending priority.
This causes two problems:
- If you have a lot of projects, it's hard to find a particular one
in the event log, e.g. in work_fetch_debug output.
- In the manager's Statistics tab, the selected project can change
unexpectedly since we identify it by array index,
and the array order may change.
Solution: sort CLIENT_STATE::projects alphabetically (case insensitive).
In WORK_FETCH, copy this array to a separate array,
that is then sorted by decreasing priority.
Currently the server doesn't know about different client "brands",
e.g. HTC Power to Give, Charity Engine, GridRepublic, etc.,
so there's no way to collect statistics about them.
Changes:
- client: at startup, read a "client brand" string from client_brand.txt
(i.e. branded clients will have to include this file in their installer)
Report this string in scheduler requests.
- scheduler: parse this request element,
and store it in host.serialnum as [BOINC|7.4.2|brand]
On client startup, decide whether we need to do CPU benchmarks
(cmdline option was set, or we haven't done them for 30 days).
If so, do them when possible.
If a project has active uploads, defer work fetch from it for 5 minutes
even if there are idle devices (that's the change).
This addresses a situation (reported by Rytis) where
- a project P has a jobs-in-progress limit less than NCPUS
- P's jobs finish and are uploading
- the client asks P for work and doesn't get any because of the limit
- the client does exponential backoff from P
Over the long term, P can get much less than its fair share of work
- If run with --gui_rpc_unix_domain, the client will listen on
a Unix-domain socket (named "boinc_socket") rather than on a TCP port.
- Add RPC_CLIENT::init_unix_domain() function to C++ GUI RPC interface
(Note: we'll need to add a corresponding function to the Java interface)
- boinccmd: add --unix_domain option
I forgot that the wrapper has a 1-second poll for suspend and resume,
so sub-second throttling won't work properly for wrapper apps.
Revert to a variant of the old scheme,
in which the min of the suspended and resumed periods is 1 sec.
Also, fix task start/suspend/resume log messages.
- Add a GUI RPC ("set_language") that lets the Manager communicate
the user's selected language code to the client at startup.
- The client stores the language code in the client state file
- The client appends a "lang=X" GET argument to the URLs from
which notices are fetched.
- The next steps (not done) are 1) to change the get_notices.php
script to parse the argument and do translation, and
2) extend our Pootle system to allow volunteer translation
of notices by all projects.
Many Android users had existing preferences with settings like
"don't compute when idle" that make sense for PCs but not mobile devices.
When this pref is enforced on Android, no computing happens
and user confusion results.
We're addressing this by using only local prefs on Android.
We considered other approaches - e.g. having a "mobile" venue -
but they're too complex.
- If you run the client with --run_test_app,
runs "test_app" in the current directory and interacts with it
(and does nothing else).
It can suspend/resume it with arbitrary timing;
this is controlled in run_test_app() (app_start.cpp).
- example app: add --critical_section option.
This lets you test the runtime system for apps that do
most of their work in a critical section (like GPU apps).
- Add some logging messages (conditioned by DEBUG_BOINC_API)
to the runtime system.
- boinc_finish() waits for the timer thread to write final messages;
make sure it doesn't do anything else
(like suspend the worker thread) during this period
We want to track the product name (e.g. "HTC One X") of Android devices.
On Android, the API to get this is Java,
so we need to do it in the GUI rather than the client.
- Add product_name field to HOST_INFO
- Add a GUI RPC for passing this info from the GUI to the client.
- Store it in client_state.xml, so that the client knows it initially.
The product name is included in scheduler RPC requests, as part of <host_info>.
TODO: add server-side support for parsing it and storing in DB.
Also: move DEVICE_STATUS out of HOST_INFO; it didn't belong there.
Previously the client had (C++) code to
- check whether on AC or USB power
- get battery status and temperature
- check whether on wifi
These functions looked in various places under /sys.
Problem: the paths are system-dependent,
so whatever we do won't work on all devices.
The Android APIs for getting this info are in Java,
so we can't call them from the client.
Solution: have the GUI periodically get this info
and report it to the client via a GUI RPC.
The GUI must make this RPC periodically:
if the client doesn't get one within some period of time
(currently 30 sec) it suspends computing and network.
Also: if suspending jobs because of battery charge level
or temperature, leave them in memory.
(usually in a static variable called "last_time")
of the last time we did something,
and we only do it again when now - last_time exceeds some interval.
Example: sending heartbeat messages to apps.
Problem: if the system clock is decreased by X,
we won't do any of these actions are time X,
making it appear that the client is frozen.
Solution: when we detect that the system clock has decreased,
set a global var "clock_change" for 1 iteration of the polling loop,
and disable these time checks if clock_change is set.
This was supposed to be in my 507cd79 commit, but it got botched somehow.
- client: the <task> debug flag enables suspend/resume messages
for both CPU and GPU.
Previously CPU messages were always shown,
and GPU messages were shown if <cpu_sched_debug> was set.
- client: fix bug where reschedule wasn't being done on GPU suspend or resume.
report them.
64 is chosen a bit arbitrarily, but the idea is to
limit the number of tasks reported per RPC,
and to accelerate the reporting of small tasks.
the binding of the get_state() RPC
- client: move client_start_time and previous_uptime
from CLIENT_STATE to TIME_STATS,
so that these are also visible in GUI RPC
- scheduler RPC: move uptime and previous_uptime
into <time_stats>
- client: condition an RR simulation message on <rrsim_detail>
- boinccmd: show TIME_STATS info in --get_state
- Allow projects to report "desired disk usage" (DDU).
If the client learns that a project wants disk space,
it can shrink the allocation to other projects.
- Base share computation on DDU rather than disk usage.
- Introduce the notion of "disk resource share".
This is defined (somewhat arbitrarily) as resource share
plus 1/10 of the largest resource share.
This is intended to ensure that even zero-share projects
get enough disk space to store app versions and data files;
otherwise they wouldn't be able to compute.
- server: use host.d_boinc_max (which wasn't being used)
to start d_project_share reported by client.
- volunteer storage: change the way hosts are allocated to chunks.
Allow hosts to store several chunks of the same file, if needed
svn path=/trunk/boinc/; revision=26052
1) a network connection is available and
2) network communication is allowed and
3) CPU computation is allowed
- If an app version is marked as needs_network,
use the above fraction in estimating its rate of progress
- replace "core client" with "client" in comments.
- scheduler: message tweaks
svn path=/trunk/boinc/; revision=25803
Report it (along with disk usage) in scheduler request messages.
This will allow the scheduler to send file-delete commands
if the project is using more than its share.
- client: add <disk_usage_debug> log flag
- create_work: add --help, show --command_line option
svn path=/trunk/boinc/; revision=24968
- client: msg tweak
- client: minimum work buffer lower bound is 180 sec
- scheduler: in computing HOST_USAGE::project_flops for a job,
if we don't have sufficient elapsed_time statistics
for either the (host, app_version) or the app_version,
use a conservative estimate (p_fpops*(#cpus+#ngpus))
rather than the number returned by app_plan().
This avoids "time limit exceeded" errors when the latter is way off.
svn path=/trunk/boinc/; revision=24820
where the client crashes after giving up (90 day timeout) on an upload.
I'm guessing this was caused by [24391],
which changed the order in the poll loop from
garbage_collect
file_xfers->poll
pers_file_xfers->poll
handle_pers_file_xfers
to
garbage_collect
handle_pers_file_xfers
file_xfers->poll
pers_file_xfers->poll
I don't understand why this would have caused a crash,
but so be it.
I restored the original order, but with handle_pers_file_xfers
not inside the if (!network_suspended).
- client renamed handle_pers_file_xfers() to
create_and_delete_pers_file_xfers()
- client simulator: show simulator CPU time
svn path=/trunk/boinc/; revision=24531
by simulating time-slicing explicitly.
Also simulate changes in project REC
and hence in scheduling priority.
- client: add a log flag "rrsim_detail" that prints
time-slice-level info.
svn path=/trunk/boinc/; revision=24161