Also modify db_dump to exclude user records whose authenticator starts
with 'deleted' or host domain names that equal 'deleted'. Those values
are set by the obfuscate delete method.
- add DB field for storing job keywords: workunit.keywords
add this to various DB parse/write functions
- add --keywords option to create_work for specifying job keywords
- add <keyword_sched> option in config.xml for enabling keyword score
(it's disabled by default).
If set, increment score for "yes" keyword matches,
and disallow jobs with "no" matches
- in scheduler, add array job_keywords_array for parsed versions
of job keywords (vector<int>)
also:
- use symbols instead of numbers for slow_check() return values
- parse unused fields in req message to remove unparsed-XML warnings
negative values are stored in app_version_id fields to represent
anonymous platform versions.
So need to use %ld rather than %lu for these fields.
Also there were a couple of more changes of int do DB_ID_TYPE
The SETI@home result table is about to run out of 32-bit IDs,
so we need to move to 64-bit result IDs.
This will happen to the workunit table at some point too.
I changed the server C++ code to use the "long" type for all DB IDs
(and to use appropriate conversion codes like %lu).
"long" is 64 bit on 64-bit machines.
For uniformity I did this for all tables,
even ones (like app) that will never get big.
I chose NOT to change the DB schema for now.
The new code will work with 32-bit ID fields in the DB.
As projects approach the 32-bit limit on a table they can change
its ID field, and fields that reference this table, to BIGINT.
This is likely to happen only on the result and workunit tables.
I put functions in html/ops/db_update.php
to change the IDs of these tables.
The query to get counts of unsent results for various size classes
did a sequential scan, which isn't practical for large projects.
All we care about is the count up to a certain (low) limit,
so I replaced it with an enumeration with a limit, and count the results.
See http://boinc.berkeley.edu/trac/wiki/PerAppCredit
If enabled (by the <credit_by_app> config flag)
validators will maintain on a per-(app, user, credit type) basis,
and same for teams,
in new DB tables credit_user and credit_team.
This info is displayed in the web site, on user and team pages,
using project-supplied functions to generate the HTML.
Note: update_stats doesn't decay the recent-average values
for per-app credit; I'll add this if needed.
This is meant not to break anything, just add some
(optional) logging and features needed for Einstein@Home.
Please contact me before changing or removing any of this.
Conflicts:
sched/db_dump.cpp
sched/file_deleter.cpp
sched/validator.cpp
I did this by including list of badges in the tables.xml file,
and writing the list of badge assignments to 2 new files,
badge_user.gz (for users) and badge_team.gz (for teams).
I considered including the badges within the <user> and <team> elements.
However, this would require enumerating the badges for a particular user
within the enumeration of users, which doesn't work;
only one enumeration can be active at a time.
Plus it would be less efficient, and db_dump already takes
a half hour on a big project.
My last commit did this using a new API call.
But this would require rebuilding apps any time you want to change it;
too much work.
So instead make it an attribute of apps,
which you can set via the admin web interface.
Corresponding changes to client.
The job submission RPC handler (PHP) originally ran the
create_work program once per job.
This took about 1.5 minutes to create 1000 jobs.
Recently I changed this so that create_work only is run once;
it does one SQL insert per job.
Disappointingly, this was only slightly faster: 1 min per 1000 jobs.
This commit changes create_work to create multiple jobs per SQL insert
(as many as will fit in a 1 MB query, which is the default limit).
This speeds things up by a factor of 100: 1000 jobs in 0.5 sec.
The latest client reports the peak working set size, swap size,
and disk usage for completed jobs.
Add fields to the results table to store these.
Parse them in scheduler request messages, and write to the DB.
Display them in the result web page.
This data can be used to improve (or even automate)
the job estimates for memory and disk usage.
- gpu_active_frac is the fraction of time GPU use is allowed
while the client is running.
Previously the client reported it but we weren't storing it in the DB.
We may need it in the future for batch scheduling logic.
- fix a crashing bug in scheduler
- client: minor message tweak
Work generators create jobs (workunits);
the transitioner creates instances (results).
If a work generator tries to maintain a certain number of unsent results
(as the sample work generator does)
it must wait for a bit, after creating jobs,
to let the transitioner create instances of those jobs.
The example work generator waited 5 seconds.
Problem: on a heavily loaded project, the transitioner can fall behind -
minutes or hours behind.
So the above policy can create way too many jobs.
Solution: after creating jobs, the sample work generator
notes the current time X,
then waits until the transitioner catches up to time X
(i.e., until the min workunit.transition_time exceeds X).
This ensures that instances have been created for all the new jobs.
Other work generators the limit the number of unsent jobs
should use the same technique;
use min_transition_time(x) to get the min transition time.
Code cleanup: get_double should be a member of DB_CONN, not DB_BASE.
Problem: a workunit could error out with unsent results.
The feeder skips such results, but the size_regulator counts them
and doesn't so doesn't promote any new results.
Solution: the feeder scans for results even with workunit errors.
If marks these results as state OVER, outcome DIDNT_NEED
See http://boinc.berkeley.edu/trac/wiki/MultiSize
The components of this include:
- DB changes:
add size_class to workunit and result
n_size_classes to app; >1 means multi-size
- size_regulator daemon program: change results states
from INACTIVE to UNSENT carefully
- size_census program; writes quantile info in flat files
- transitioner: when creating results for multi-size apps,
set server state to INACTIVE
- sched shmem (feeder): read quantile info from flat files,
store in shared memory
- scheduler (score-based scheduling): for multi-size apps,
add component to score function for size class.
- show_shmem: show result size class
- make_work (and other callers of count_unsent_results()):
count both INACTIVE and UNSENT
- create_work: add --size_class cmdline option
Also:
- if get MySQL errors in upgrade, don't rewrite db_version