boinc

Commit Graph

Author	SHA1	Message	Date
Bernd Machenschalk	34c823a9ab	Merge branch 'EinsteinAtHome' into 'master' This is meant not to break anything, just add some (optional) logging and features needed for Einstein@Home. Please contact me before changing or removing any of this. Conflicts: sched/db_dump.cpp sched/file_deleter.cpp sched/validator.cpp	2014-05-26 14:42:36 +02:00
Bernd Machenschalk	2f6d140c56	validator: added options -min_wu_id and -max_wu_id to validator	2014-05-23 12:06:00 +02:00
David Anderson	de6540cbc0	scheduler: if a result was aborted by user, don't count it as an error	2014-05-22 23:54:56 -07:00
David Anderson	b17455816d	db_dump: include badges in XML stats export I did this by including list of badges in the tables.xml file, and writing the list of badge assignments to 2 new files, badge_user.gz (for users) and badge_team.gz (for teams). I considered including the badges within the <user> and <team> elements. However, this would require enumerating the badges for a particular user within the enumeration of users, which doesn't work; only one enumeration can be active at a time. Plus it would be less efficient, and db_dump already takes a half hour on a big project.	2014-05-18 19:19:05 -07:00
David Anderson	e5810f3061	client/server: change implementation of "exact fraction done". My last commit did this using a new API call. But this would require rebuilding apps any time you want to change it; too much work. So instead make it an attribute of apps, which you can set via the admin web interface. Corresponding changes to client.	2014-05-04 00:02:32 -07:00
David Anderson	1e6e22449d	remote job submission: minor bug fix	2014-04-11 15:54:41 -07:00
David Anderson	fec574f4e8	create_work: increase the efficiency of bulk job creation The job submission RPC handler (PHP) originally ran the create_work program once per job. This took about 1.5 minutes to create 1000 jobs. Recently I changed this so that create_work only is run once; it does one SQL insert per job. Disappointingly, this was only slightly faster: 1 min per 1000 jobs. This commit changes create_work to create multiple jobs per SQL insert (as many as will fit in a 1 MB query, which is the default limit). This speeds things up by a factor of 100: 1000 jobs in 0.5 sec.	2014-04-10 23:53:19 -07:00
David Anderson	fc7c75b200	server: parse peak memory/disk info from client, store in DB, display in web The latest client reports the peak working set size, swap size, and disk usage for completed jobs. Add fields to the results table to store these. Parse them in scheduler request messages, and write to the DB. Display them in the result web page. This data can be used to improve (or even automate) the job estimates for memory and disk usage.	2014-04-02 19:35:59 -07:00
David Anderson	6f29a50812	validator: fixes and features - add --is_gzip option to sample_bitwise_validator. If set, all files are treated as gzip archives. Check their 10-byte header to verify that it's a gzip file, but ignore it when comparing files. - validator.cpp: don't error out on unparsed cmdline args, since we're now using them in sample_bitwise_validator and sample_substr_validator. - fix build error on Debian	2014-03-20 12:38:29 -07:00
David Anderson	cf0a0817c0	server: fix some compile warnings Add a derived class DB_APP_VERSION_VAL for use by the validator, containing the extra fields it uses, so that we're not doing memset 0 on vectors	2014-03-19 14:55:16 -07:00
David Anderson	df1d8e2bde	server: store and display gpu_active_frac - gpu_active_frac is the fraction of time GPU use is allowed while the client is running. Previously the client reported it but we weren't storing it in the DB. We may need it in the future for batch scheduling logic. - fix a crashing bug in scheduler - client: minor message tweak	2014-03-06 13:23:52 -08:00
David Anderson	8934f46d88	translations, message tweak	2014-02-02 00:18:20 -08:00
David Anderson	fe1db8060a	Remote job submission: allow a limit on the # of in-progress jobs per user	2014-01-13 21:52:55 -08:00
David Anderson	2d0a6cc10f	web: add badge stuff to db_update script	2013-12-22 20:53:10 -08:00
David Anderson	2e4d561647	sample work generator: wait until transitioner has processed jobs before creating any more Work generators create jobs (workunits); the transitioner creates instances (results). If a work generator tries to maintain a certain number of unsent results (as the sample work generator does) it must wait for a bit, after creating jobs, to let the transitioner create instances of those jobs. The example work generator waited 5 seconds. Problem: on a heavily loaded project, the transitioner can fall behind - minutes or hours behind. So the above policy can create way too many jobs. Solution: after creating jobs, the sample work generator notes the current time X, then waits until the transitioner catches up to time X (i.e., until the min workunit.transition_time exceeds X). This ensures that instances have been created for all the new jobs. Other work generators the limit the number of unsent jobs should use the same technique; use min_transition_time(x) to get the min transition time. Code cleanup: get_double should be a member of DB_CONN, not DB_BASE.	2013-12-14 16:36:18 -08:00
David Anderson	65b5ab5184	server/web: preliminary support for badges - DB: add tables for badges and badge/user and badge/team associations - add script that defines 3 RAC-based badges and assigns them - add images for these badges - add admin page for creating/editing badges - show badges on user page not done: - figure out how to send badges to client - display badges somewhere in the GUIs - export badges in db_dump - enable badges by default for new projects	2013-12-05 10:14:26 -08:00
David Anderson	99332624f3	scheduler: parse <opencl_cpu_prop> in scheduler requests correctly The OPENCL_CPU_PROP structure was being referred to as both "opencl_cpu_prop" and "cpu_opencl_prop", roughly 50/50, in variable names and XML tags. Let's standardize on "opencl_cpu_prop", which is what current clients are sending in scheduler requests.	2013-11-28 14:11:42 -08:00
David Anderson	438cd78b13	Remote job submission: add C++ APIs for query_batches() and query_batch() - Add program (tools/remote_submit_test.cpp) for testing C++ API for remote job submission. - Rename Condor-specific API to query_batch_set().	2013-10-22 15:27:34 -07:00
David Anderson	3e21e8b7c4	Condor: debug set_expire_time RPC	2013-09-17 23:14:57 -07:00
David Anderson	34933c8cd6	make_project: revert change that doesn't work with Apache 2.2	2013-09-17 23:13:49 -07:00
David Anderson	2a2c9c4ad8	remote job submission: add notion of "expire time" for batches (for Condor) - Batches now have optional "expire time". If this time passes and the batch is not retired, abort and retire it. - Add script "expire_batches" which enforces the above. Run it as a periodic task. - Add a web RPC for setting the expire time of a batch (it can be changed multiple times) - Add a C++ interface for this RPC - Add a BOINC_SET_LEASE command to the BOINC GAHP ("lease" is Condor term for expire time)	2013-09-17 13:35:55 -07:00
David Anderson	73d7e0cb81	Server: change declaration of mod_time fields to work with MySQL 5.6	2013-09-10 19:12:47 -07:00
David Anderson	5b76909f04	scheduler: parse OpenCL/CPU descriptors, and add plan class for OpenCL/CPU/Intel	2013-08-26 23:32:32 -07:00
David Anderson	b2e06e0704	Server: various fixes for "make install"	2013-08-24 20:36:49 -07:00
David Anderson	93d6f5ef16	transitioner: don't set result.mod_time to null; this fails if the DB field has accidentally been marked as not null.	2013-07-18 17:10:54 -07:00
David Anderson	846b8c7757	all components: change strcpy() to strlcpy() when possible. This commit should cover the client and manager code.	2013-06-03 20:24:48 -07:00
David Anderson	f25cf0836a	Include <cmath> instead of <math.h> various places	2013-05-27 16:44:22 -07:00
David Anderson	ba68f452a0	server: fix bug related to job-size matching Problem: a workunit could error out with unsent results. The feeder skips such results, but the size_regulator counts them and doesn't so doesn't promote any new results. Solution: the feeder scans for results even with workunit errors. If marks these results as state OVER, outcome DIDNT_NEED	2013-05-24 20:11:14 -07:00
David Anderson	cde42fcbcc	server: parse product_name in scheduler request, store in DB This will let projects see what kind of device each Android host is, possibly helping with app debugging.	2013-05-23 23:30:42 -07:00
David Anderson	8e2524f55f	Unix build: Makefile changes for "make install", from Steffen Moeller "make install" followed by make_project should now work	2013-05-20 15:19:13 -07:00
David Anderson	1a1a01c103	- server: initialize result.size_class and workunit.size_class to -1	2013-05-03 15:09:45 -07:00
David Anderson	0c430ce1fa	Add support for multi-size apps See http://boinc.berkeley.edu/trac/wiki/MultiSize The components of this include: - DB changes: add size_class to workunit and result n_size_classes to app; >1 means multi-size - size_regulator daemon program: change results states from INACTIVE to UNSENT carefully - size_census program; writes quantile info in flat files - transitioner: when creating results for multi-size apps, set server state to INACTIVE - sched shmem (feeder): read quantile info from flat files, store in shared memory - scheduler (score-based scheduling): for multi-size apps, add component to score function for size class. - show_shmem: show result size class - make_work (and other callers of count_unsent_results()): count both INACTIVE and UNSENT - create_work: add --size_class cmdline option Also: - if get MySQL errors in upgrade, don't rewrite db_version	2013-04-25 00:27:35 -07:00
David Anderson	9481e04e7b	- client: there were many places in the code where we keep track (usually in a static variable called "last_time") of the last time we did something, and we only do it again when now - last_time exceeds some interval. Example: sending heartbeat messages to apps. Problem: if the system clock is decreased by X, we won't do any of these actions are time X, making it appear that the client is frozen. Solution: when we detect that the system clock has decreased, set a global var "clock_change" for 1 iteration of the polling loop, and disable these time checks if clock_change is set.	2013-03-22 10:28:20 +01:00
David Anderson	980c9b66c9	- validator: fix confused logic. A "viable" result is one that could potentially become the canonical result, i.e. the outcome is SUCCESS and the validate state is not INVALID. The existing code treated all results with outcome SUCCESS as viable, which is wrong. In particular, this could cause workunit.target_nresults to be incremented inappropriately.	2013-03-22 10:28:20 +01:00
David Anderson	033a47691b	- client: write log flags in alpha order	2013-03-15 13:38:44 +01:00
David Anderson	3ced18ddaa	- client: don't show cache size in startup messages.	2013-03-15 13:38:44 +01:00
David Anderson	2a73dc0e01	- remote file management stuff for Condor	2013-03-05 14:05:04 +01:00
David Anderson	6f962d5b61	- file upload handler: in FCGI version, check for trigger file each time through loop (from Bernd). - validator: fix bug that zeroed result.random	2013-03-04 17:24:18 +01:00
David Anderson	2ded3ff67d	- fix typo in GUI RPC - check in some code for multi-user job prioritization	2013-03-04 15:23:39 +01:00
David Anderson	6205ffed08	- scheduler: add extra check for not sending homogeneous app version jobs to anonymous platform clients - remote job submission: add DB table for keeping track of files	2013-03-04 15:16:58 +01:00
David Anderson	e538c8c303	- client: TIME_STATS fields go in <time_stats> part of state file - scheduler: parse TIME_STATS fields (e.g., uptime) - admin web: small fix for manage_apps.php	2013-03-04 14:14:05 +01:00
David Anderson	11a6e85632	- scheduler: support for projects with some non-CPU-intensive apps (but not all) wasn't finished. New logic: if the project has an NCI app then: - make a list of NCI apps for which the client doesn't have a job in progress. - try to send one job for each of these apps - do this even if no work is being requested. - don't send jobs for NCI apps by other mechanisms NOTE: the client logic isn't quite right for mixed NCI projects. If there's no job for a given NCI app, the client should do a scheduler RPC. This isn't critical so we won't do this now. svn path=/trunk/boinc/; revision=26068	2012-09-01 04:58:12 +00:00
David Anderson	d02ff6e1c5	- fix typo svn path=/trunk/boinc/; revision=26063	2012-08-28 06:33:53 +00:00
David Anderson	9ccb8fa38d	- scheduler: add support for limited locality scheduling - API: remove support for PPM files svn path=/trunk/boinc/; revision=26062	2012-08-27 17:00:43 +00:00
David Anderson	32da1a7e37	- server: add support for having a mixture of CPU-intensive and non-CPU-intensive applications. An app can be specified as non-CPU-intensive in project.xml, and this attribute can be set or cleared using the admin web interface. Note: support for this was added to the client in 2011, but we didn't add server-side support at that time. This change is in 6.12 and later clients. svn path=/trunk/boinc/; revision=26060	2012-08-25 04:09:24 +00:00
David Anderson	a9e78b6459	- volunteer storage: fix the way that hosts are classified as alive/dead - add a config item vda_host_timeout. A host that hasn't done a scheduler RPC for this long is considered dead. - a host that's not running a version 7+ client is considered dead - host.cpu_efficiency (an otherwise unused field) is used as a flag for dead hosts - the scheduler clears the flag if the client is v7+ - vdad sets the flag for hosts where last RPC is old - before choosing a host for chunk download, vdad checks its client version. svn path=/trunk/boinc/; revision=26059	2012-08-24 19:06:41 +00:00
David Anderson	e79d3ea4c8	- client: change the way project disk share is computed. - Allow projects to report "desired disk usage" (DDU). If the client learns that a project wants disk space, it can shrink the allocation to other projects. - Base share computation on DDU rather than disk usage. - Introduce the notion of "disk resource share". This is defined (somewhat arbitrarily) as resource share plus 1/10 of the largest resource share. This is intended to ensure that even zero-share projects get enough disk space to store app versions and data files; otherwise they wouldn't be able to compute. - server: use host.d_boinc_max (which wasn't being used) to start d_project_share reported by client. - volunteer storage: change the way hosts are allocated to chunks. Allow hosts to store several chunks of the same file, if needed svn path=/trunk/boinc/; revision=26052	2012-08-22 04:02:52 +00:00
David Anderson	b029e352c9	- scheduler: if sending GPU description to pre-7.0 client, call it CUDA instead of NVIDIA svn path=/trunk/boinc/; revision=26042	2012-08-17 06:10:25 +00:00
David Anderson	0d42a4aa5c	- file upload handler: add an #ifdef for disabling locking of files while writing to them. It's not clear to me that this locking is beneficial, and it may be causing filesystem problems at WCG - volunteer storage stuff svn path=/trunk/boinc/; revision=26021	2012-08-15 21:27:38 +00:00
David Anderson	7335c036fc	- server: volunteer storage bug fixes. Note to self: jerasure's decoder program loops or crashs if there are no missing chunks. svn path=/trunk/boinc/; revision=25995	2012-08-08 21:37:51 +00:00

1 2 3 4 5 ...

668 Commits