- add DB field for storing job keywords: workunit.keywords
add this to various DB parse/write functions
- add --keywords option to create_work for specifying job keywords
- add <keyword_sched> option in config.xml for enabling keyword score
(it's disabled by default).
If set, increment score for "yes" keyword matches,
and disallow jobs with "no" matches
- in scheduler, add array job_keywords_array for parsed versions
of job keywords (vector<int>)
also:
- use symbols instead of numbers for slow_check() return values
- parse unused fields in req message to remove unparsed-XML warnings
This supports the TACC use case,
in the jobs in a batch can use different Docker images
and different input and output file signatures,
none of which are known in advance.
Python API binding:
- A JOB_DESC object can optionally contain wu_template and result_template
elements, which are the templates (the actual XML) to use for that job.
Add these to the XML request message if present.
- Added the same capability to the PHP binding, but not C++.
- Added and debugged test cases for both languages.
Also, submit_batch() can take either a batch name (in which case
the batch is created) or a batch ID
(in which the batch was created prior to remotely staging files).
RPC handler:
- in submit_batch(), check for jobs with templates specified
and store them in files.
For input templates (which are deleted after creating jobs)
we put them in /tmp,
and use a map so that if two templates are the same we use 1 file.
For output templates (which have to last until all jobs are done)
we put them in templates/tmp, with content-based filenames
to economize.
- When creating jobs, or generating SQL strings for multiple jobs,
use these names as --wu_template_filename
and --result_template_filename args to create_work
(either cmdline args or stdin args)
- Delete WU templates when done
create_work.cpp:
handle per-job --wu_template and --result_template args in stdin job lines
(the names of per-job WU and result templates).
Maintain a map mapping WU template name to contents,
to avoid repeatedly reading them.
For jobs that don't specify templates, use the ones specified
at the batch level, or the defaults.
The SETI@home result table is about to run out of 32-bit IDs,
so we need to move to 64-bit result IDs.
This will happen to the workunit table at some point too.
I changed the server C++ code to use the "long" type for all DB IDs
(and to use appropriate conversion codes like %lu).
"long" is 64 bit on 64-bit machines.
For uniformity I did this for all tables,
even ones (like app) that will never get big.
I chose NOT to change the DB schema for now.
The new code will work with 32-bit ID fields in the DB.
As projects approach the 32-bit limit on a table they can change
its ID field, and fields that reference this table, to BIGINT.
This is likely to happen only on the result and workunit tables.
I put functions in html/ops/db_update.php
to change the IDs of these tables.
We were creating the workunit, then updating its transitioner_flags field.
If the transitioner runs inbetween,
it would (incorrectly) create results for the workunit.
Solution: set transitioner_flags during insert.
A "remote input file" is located on a data server other than the project server.
Previously these could be specified only in the input template,
which was of limited utility.
Add new ways of specifying remote input files:
1) in the create_work program, a remote input file can be specified
with command-line args
--remote_file name URL nbytes MD5
or by the same syntax in stdin when creating multiple jobs
2) add a variant of create_work() called create_work2(),
which takes a vector of INFILE_DESC structures that can specify
either local or remote files
The job submission RPC handler (PHP) originally ran the
create_work program once per job.
This took about 1.5 minutes to create 1000 jobs.
Recently I changed this so that create_work only is run once;
it does one SQL insert per job.
Disappointingly, this was only slightly faster: 1 min per 1000 jobs.
This commit changes create_work to create multiple jobs per SQL insert
(as many as will fit in a 1 MB query, which is the default limit).
This speeds things up by a factor of 100: 1000 jobs in 0.5 sec.
Previously if you wanted to create lots of jobs from a script (e.g. PHP)
you had to run create_work once per job.
With the --stdin option you run it once,
passing it a file (view stdin) with one line per job.
Each line can specify a command line and/or a set of input files.
On my server this gives a performance of about 1000 jobs per minute,
which is less than I would have expected,
but all the time is spent in doing MySQL inserts
so that's as good as we can do for now.
Also fix a bug in stage_file.
See http://boinc.berkeley.edu/trac/wiki/MultiSize
The components of this include:
- DB changes:
add size_class to workunit and result
n_size_classes to app; >1 means multi-size
- size_regulator daemon program: change results states
from INACTIVE to UNSENT carefully
- size_census program; writes quantile info in flat files
- transitioner: when creating results for multi-size apps,
set server state to INACTIVE
- sched shmem (feeder): read quantile info from flat files,
store in shared memory
- scheduler (score-based scheduling): for multi-size apps,
add component to score function for size class.
- show_shmem: show result size class
- make_work (and other callers of count_unsent_results()):
count both INACTIVE and UNSENT
- create_work: add --size_class cmdline option
Also:
- if get MySQL errors in upgrade, don't rewrite db_version
- Fix various #include issues.
CODING STYLE LAW (minimal inclusion principle):
If foo.cpp requires <blah.h>,
#include <blah.h> in foo.cpp, NOT foo.h
svn path=/trunk/boinc/; revision=25837
to be sent to non-targeted hosts.
The feeder was erroneously putting targeted jobs
in the shared mem cache.
Changes:
- The feeder only enumerates jobs for which
workunit.transitioner_flags is zero.
NOTE: this field is nonzero iff the job is assigned.
- create_work: when creating an assigned jobs,
set workunit.transitioner_flags appropriately
svn path=/trunk/boinc/; revision=25314
This now supports two main use cases:
1) there's a job that you want to run once on all hosts,
present and future
(or all hosts belonging to a user, or to a team).
The job is never transitioned, validated, or assimilated.
2) There's a normal job for which you want to use only
hosts belonging to a specific user (e.g. cluster or cloud hosts).
This restriction can be made either when the job is created,
or on the fly,
e.g. as part of a scheme for accelerating batch completion.
For the latter purposes we now provide a function
restrict_wu_to_user(DB_WORKUNIT&, int userid);
The job goes through the standard
transitioner/validator/assimilator path.
These cases are enabled by config flags
<enable_assignment_multi/>
<enable_assignment/>
respectively.
Assignment of type 2) are no longer stored in shared mem,
so there is no limit on their number.
There is no longer a rule that assigned job names must contain "asgn".
NOTE: this requires a database update.
svn path=/trunk/boinc/; revision=25169
Report it (along with disk usage) in scheduler request messages.
This will allow the scheduler to send file-delete commands
if the project is using more than its share.
- client: add <disk_usage_debug> log flag
- create_work: add --help, show --command_line option
svn path=/trunk/boinc/; revision=24968
and controlling batches of jobs
- web: add an administrative interface for controlling
user permissions for submitting jobs
- web: add an interface where users can view and control
their submitted jobs
See: http://boinc.berkeley.edu/trac/wiki/RemoteJobs
This is at a functional but rough stage.
svn path=/trunk/boinc/; revision=23762
That produced a messed-up query that assigned garbage values to:
host_app_version.turnaround_var
host_app_version.turnaround_q
host_app_version.max_jobs_per_day
host_app_version.consecutive_valid
To repair these:
- set turnaround_var and turnaround_q to zero
- if max_jobs_per_day is outside of
(0..config.daily_result_quota)
set it to config.daily_result_quota
- if consecutive_valid is outside (0..1000), set it to zero
I added a script, html/ops/repair_21812.php, that does this;
if you ran server code between [21181] and [21812], run this script.
- scheduler/transitioner: add <debug_quota> log flag
- changed the build system to always use -Wall
(if we'd done this before, this bug wouldn't have happened)
- fixed a bunch of other compile warnings
svn path=/trunk/boinc/; revision=21812
- client (Unix): if client crashes while benchmark processes are going,
make sure they detect this and exit.
- back-end programs: remove hardwired assumptions about
what directory they run in, and hence where config.xml is.
E.g., daemons look for it in "..", others expect it in current dir.
New approach: all the programs look for the project dir as follows:
1) the environment var BOINC_PROJECT_DIR, if defined
2) the current dir, if config.xml is there.
3) else ".."
This means you can run programs in either proj/bin/ or proj/,
or (using BOINC_PROJECT_DIR) you can keep executables
outside of the project dir.
svn path=/trunk/boinc/; revision=18042
set the job params to reasonable values (see below),
and make it easy to change these values in the script
- create_work (function and script): change default job params:
FLOPs est: 1e9 => 3600e9
FLOPs bound: 1e10 => 86400e9
mem bound 100MB => 500MB,
disk bound 100MB => 1GB
delay bound: 100000s => 1 week
svn path=/trunk/boinc/; revision=17703