Commit Graph

656 Commits

Author SHA1 Message Date
David Anderson fe1db8060a Remote job submission: allow a limit on the # of in-progress jobs per user 2014-01-13 21:52:55 -08:00
David Anderson 2d0a6cc10f web: add badge stuff to db_update script 2013-12-22 20:53:10 -08:00
David Anderson 2e4d561647 sample work generator: wait until transitioner has processed jobs before creating any more
Work generators create jobs (workunits);
the transitioner creates instances (results).
If a work generator tries to maintain a certain number of unsent results
(as the sample work generator does)
it must wait for a bit, after creating jobs,
to let the transitioner create instances of those jobs.
The example work generator waited 5 seconds.

Problem: on a heavily loaded project, the transitioner can fall behind -
minutes or hours behind.
So the above policy can create way too many jobs.

Solution: after creating jobs, the sample work generator
notes the current time X,
then waits until the transitioner catches up to time X
(i.e., until the min workunit.transition_time exceeds X).
This ensures that instances have been created for all the new jobs.

Other work generators the limit the number of unsent jobs
should use the same technique;
use min_transition_time(x) to get the min transition time.

Code cleanup: get_double should be a member of DB_CONN, not DB_BASE.
2013-12-14 16:36:18 -08:00
David Anderson 65b5ab5184 server/web: preliminary support for badges
- DB: add tables for badges and badge/user and badge/team associations
- add script that defines 3 RAC-based badges and assigns them
- add images for these badges
- add admin page for creating/editing badges
- show badges on user page
not done:
- figure out how to send badges to client
- display badges somewhere in the GUIs
- export badges in db_dump
- enable badges by default for new projects
2013-12-05 10:14:26 -08:00
David Anderson 99332624f3 scheduler: parse <opencl_cpu_prop> in scheduler requests correctly
The OPENCL_CPU_PROP structure was being referred to as both
"opencl_cpu_prop" and "cpu_opencl_prop", roughly 50/50,
in variable names and XML tags.
Let's standardize on "opencl_cpu_prop",
which is what current clients are sending in scheduler requests.
2013-11-28 14:11:42 -08:00
David Anderson 438cd78b13 Remote job submission: add C++ APIs for query_batches() and query_batch()
- Add program (tools/remote_submit_test.cpp) for testing C++ API for remote job submission.
- Rename Condor-specific API to query_batch_set().
2013-10-22 15:27:34 -07:00
David Anderson 3e21e8b7c4 Condor: debug set_expire_time RPC 2013-09-17 23:14:57 -07:00
David Anderson 34933c8cd6 make_project: revert change that doesn't work with Apache 2.2 2013-09-17 23:13:49 -07:00
David Anderson 2a2c9c4ad8 remote job submission: add notion of "expire time" for batches (for Condor)
- Batches now have optional "expire time".
  If this time passes and the batch is not retired, abort and retire it.
- Add script "expire_batches" which enforces the above.
  Run it as a periodic task.
- Add a web RPC for setting the expire time of a batch
  (it can be changed multiple times)
- Add a C++ interface for this RPC
- Add a BOINC_SET_LEASE command to the BOINC GAHP
  ("lease" is Condor term for expire time)
2013-09-17 13:35:55 -07:00
David Anderson 73d7e0cb81 Server: change declaration of mod_time fields to work with MySQL 5.6 2013-09-10 19:12:47 -07:00
David Anderson 5b76909f04 scheduler: parse OpenCL/CPU descriptors, and add plan class for OpenCL/CPU/Intel 2013-08-26 23:32:32 -07:00
David Anderson b2e06e0704 Server: various fixes for "make install" 2013-08-24 20:36:49 -07:00
David Anderson 93d6f5ef16 transitioner: don't set result.mod_time to null; this fails if the DB field has accidentally been marked as not null. 2013-07-18 17:10:54 -07:00
David Anderson 846b8c7757 all components: change strcpy() to strlcpy() when possible.
This commit should cover the client and manager code.
2013-06-03 20:24:48 -07:00
David Anderson f25cf0836a Include <cmath> instead of <math.h> various places 2013-05-27 16:44:22 -07:00
David Anderson ba68f452a0 server: fix bug related to job-size matching
Problem: a workunit could error out with unsent results.
The feeder skips such results, but the size_regulator counts them
and doesn't so doesn't promote any new results.
Solution: the feeder scans for results even with workunit errors.
If marks these results as state OVER, outcome DIDNT_NEED
2013-05-24 20:11:14 -07:00
David Anderson cde42fcbcc server: parse product_name in scheduler request, store in DB
This will let projects see what kind of device each Android host is,
possibly helping with app debugging.
2013-05-23 23:30:42 -07:00
David Anderson 8e2524f55f Unix build: Makefile changes for "make install", from Steffen Moeller
"make install" followed by make_project should now work
2013-05-20 15:19:13 -07:00
David Anderson 1a1a01c103 - server: initialize result.size_class and workunit.size_class to -1 2013-05-03 15:09:45 -07:00
David Anderson 0c430ce1fa Add support for multi-size apps
See http://boinc.berkeley.edu/trac/wiki/MultiSize
The components of this include:
- DB changes:
    add size_class to workunit and result
    n_size_classes to app; >1 means multi-size
- size_regulator daemon program: change results states
    from INACTIVE to UNSENT carefully
- size_census program; writes quantile info in flat files
- transitioner: when creating results for multi-size apps,
    set server state to INACTIVE
- sched shmem (feeder): read quantile info from flat files,
    store in shared memory
- scheduler (score-based scheduling): for multi-size apps,
    add component to score function for size class.
- show_shmem: show result size class
- make_work (and other callers of count_unsent_results()):
    count both INACTIVE and UNSENT
- create_work: add --size_class cmdline option

Also:
- if get MySQL errors in upgrade, don't rewrite db_version
2013-04-25 00:27:35 -07:00
David Anderson 9481e04e7b - client: there were many places in the code where we keep track
(usually in a static variable called "last_time")
    of the last time we did something,
    and we only do it again when now - last_time exceeds some interval.
    Example: sending heartbeat messages to apps.
    Problem: if the system clock is decreased by X,
    we won't do any of these actions are time X,
    making it appear that the client is frozen.
    Solution: when we detect that the system clock has decreased,
    set a global var "clock_change" for 1 iteration of the polling loop,
    and disable these time checks if clock_change is set.
2013-03-22 10:28:20 +01:00
David Anderson 980c9b66c9 - validator: fix confused logic.
A "viable" result is one that could potentially become the canonical result,
    i.e. the outcome is SUCCESS and the validate state is not INVALID.
    The existing code treated all results with outcome SUCCESS as viable,
    which is wrong.
    In particular, this could cause workunit.target_nresults
    to be incremented inappropriately.
2013-03-22 10:28:20 +01:00
David Anderson 033a47691b - client: write log flags in alpha order 2013-03-15 13:38:44 +01:00
David Anderson 3ced18ddaa - client: don't show cache size in startup messages. 2013-03-15 13:38:44 +01:00
David Anderson 2a73dc0e01 - remote file management stuff for Condor 2013-03-05 14:05:04 +01:00
David Anderson 6f962d5b61 - file upload handler: in FCGI version, check for trigger file
each time through loop (from Bernd).
- validator: fix bug that zeroed result.random
2013-03-04 17:24:18 +01:00
David Anderson 2ded3ff67d - fix typo in GUI RPC
- check in some code for multi-user job prioritization
2013-03-04 15:23:39 +01:00
David Anderson 6205ffed08 - scheduler: add extra check for not sending homogeneous app version
jobs to anonymous platform clients
- remote job submission: add DB table for keeping track of files
2013-03-04 15:16:58 +01:00
David Anderson e538c8c303 - client: TIME_STATS fields go in <time_stats> part of state file
- scheduler: parse TIME_STATS fields (e.g., uptime)
- admin web: small fix for manage_apps.php
2013-03-04 14:14:05 +01:00
David Anderson 11a6e85632 - scheduler: support for projects with some non-CPU-intensive apps
(but not all) wasn't finished.
    New logic: if the project has an NCI app then:
    - make a list of NCI apps for which the client doesn't have
        a job in progress.
    - try to send one job for each of these apps
    - do this even if no work is being requested.
    - don't send jobs for NCI apps by other mechanisms

NOTE: the client logic isn't quite right for mixed NCI projects.
    If there's no job for a given NCI app,
    the client should do a scheduler RPC.
    This isn't critical so we won't do this now.


svn path=/trunk/boinc/; revision=26068
2012-09-01 04:58:12 +00:00
David Anderson d02ff6e1c5 - fix typo
svn path=/trunk/boinc/; revision=26063
2012-08-28 06:33:53 +00:00
David Anderson 9ccb8fa38d - scheduler: add support for limited locality scheduling
- API: remove support for PPM files


svn path=/trunk/boinc/; revision=26062
2012-08-27 17:00:43 +00:00
David Anderson 32da1a7e37 - server: add support for having a mixture of CPU-intensive
and non-CPU-intensive applications.
    An app can be specified as non-CPU-intensive in project.xml,
    and this attribute can be set or cleared using the admin web interface.
    Note: support for this was added to the client in 2011,
    but we didn't add server-side support at that time.
    This change is in 6.12 and later clients.


svn path=/trunk/boinc/; revision=26060
2012-08-25 04:09:24 +00:00
David Anderson a9e78b6459 - volunteer storage: fix the way that hosts are classified as alive/dead
- add a config item vda_host_timeout.
        A host that hasn't done a scheduler RPC for this long
        is considered dead.
    - a host that's not running a version 7+ client is considered dead
    - host.cpu_efficiency (an otherwise unused field) is used
        as a flag for dead hosts
    - the scheduler clears the flag if the client is v7+
    - vdad sets the flag for hosts where last RPC is old
    - before choosing a host for chunk download,
        vdad checks its client version.


svn path=/trunk/boinc/; revision=26059
2012-08-24 19:06:41 +00:00
David Anderson e79d3ea4c8 - client: change the way project disk share is computed.
- Allow projects to report "desired disk usage" (DDU).
        If the client learns that a project wants disk space,
        it can shrink the allocation to other projects.
    - Base share computation on DDU rather than disk usage.
    - Introduce the notion of "disk resource share".
        This is defined (somewhat arbitrarily) as resource share
        plus 1/10 of the largest resource share.
        This is intended to ensure that even zero-share projects
        get enough disk space to store app versions and data files;
        otherwise they wouldn't be able to compute.
- server: use host.d_boinc_max (which wasn't being used)
    to start d_project_share reported by client.
- volunteer storage: change the way hosts are allocated to chunks.
    Allow hosts to store several chunks of the same file, if needed


svn path=/trunk/boinc/; revision=26052
2012-08-22 04:02:52 +00:00
David Anderson b029e352c9 - scheduler: if sending GPU description to pre-7.0 client,
call it CUDA instead of NVIDIA


svn path=/trunk/boinc/; revision=26042
2012-08-17 06:10:25 +00:00
David Anderson 0d42a4aa5c - file upload handler: add an #ifdef for disabling locking of files
while writing to them.
    It's not clear to me that this locking is beneficial,
    and it may be causing filesystem problems at WCG
- volunteer storage stuff


svn path=/trunk/boinc/; revision=26021
2012-08-15 21:27:38 +00:00
David Anderson 7335c036fc - server: volunteer storage bug fixes.
Note to self: jerasure's decoder program loops or crashs
        if there are no missing chunks.

svn path=/trunk/boinc/; revision=25995
2012-08-08 21:37:51 +00:00
David Anderson ab120dea9e - web: after post to a thread, show thread in user's chosen order
instead of newest first.


svn path=/trunk/boinc/; revision=25931
2012-08-01 17:57:56 +00:00
David Anderson 6e816094bd - volunteer data storage: intermediate checkin
svn path=/trunk/boinc/; revision=25890
2012-07-25 21:41:32 +00:00
David Anderson ac20215eb8 - volunteer storage: implement "vda status" command
svn path=/trunk/boinc/; revision=25887
2012-07-23 21:53:09 +00:00
David Anderson 9a84980792 - lib: treat MINGW32 like CYGWIN32 (in 1 place - should do everywhere?)
from Oliver


svn path=/trunk/boinc/; revision=25874
2012-07-17 03:59:12 +00:00
David Anderson 6a8075046b - Unix: include db/boinc_db_types.h in installed headers
- client: small code cleanup, no functional change


svn path=/trunk/boinc/; revision=25857
2012-07-10 17:28:04 +00:00
David Anderson 78f74661aa - distributed storage: move chunk_size to VDA_FILE.
Add some missing code.


svn path=/trunk/boinc/; revision=25854
2012-07-07 19:44:48 +00:00
David Anderson 68f9880615 - client: remove "device" entry from CUDA_DEVICE_PROP,
and change types of mem-size fields from int to double.
    These fields are size_t in NVIDIA's version of this;
    however, cuDeviceGetAttribute() returns them as int,
    so I don't see where this makes any difference.
- client: fix bug in handling of <no_rsc_apps> element.
- scheduler: message tweaks.
    Note: [foo] means that the message is enabled by <debug_foo>.



svn path=/trunk/boinc/; revision=25849
2012-07-05 20:24:17 +00:00
David Anderson 19458ba4de - Compile fixes for Fedora core 17. From Christian B. Fixes #1194.
- Fix various #include issues.

CODING STYLE LAW (minimal inclusion principle):
    If foo.cpp requires <blah.h>,
    #include <blah.h> in foo.cpp, NOT foo.h


svn path=/trunk/boinc/; revision=25837
2012-07-02 18:51:02 +00:00
David Anderson 1776a244ae - web: when showing a batch, recompute and update its fraction done
- feeder: don't enumerate results for WUs with nonzero error_mask
- scheduler: in slow_check(), make sure the WU error_mask is still zero


svn path=/trunk/boinc/; revision=25822
2012-06-29 06:53:48 +00:00
David Anderson fd0983b991 - web: server status page should show elapsed time, not CPU time
svn path=/trunk/boinc/; revision=25785
2012-06-22 07:35:54 +00:00
David Anderson 158aab8d5c - DB: add project_state and description fields to batch table.
Both are for use by project.
- job submission file sandbox: don't delete physical file
    when delete sandbox entry.
    We'll have to figure out how to garbage-collect physical files.
- LAMMPS job submission:
    use the 50th-percentile host,not 0th


svn path=/trunk/boinc/; revision=25734
2012-06-05 05:57:55 +00:00
David Anderson 761fb3f4c1 - admin web: add a function for "revalidating" a given set of jobs.
This reruns validation for instances that are successful
    but marked as invalid or inconclusive.
    Use this if you changed your validator to be more permissive,
    and you want to grant credit for instances that were
    originally marked as invalid.


svn path=/trunk/boinc/; revision=25714
2012-05-25 23:49:17 +00:00