Commit Graph

2266 Commits

Author SHA1 Message Date
David Anderson c25ce3177c file_deleter: delete gzipped versions of files also 2014-05-06 12:58:13 -07:00
David Anderson e5810f3061 client/server: change implementation of "exact fraction done".
My last commit did this using a new API call.
But this would require rebuilding apps any time you want to change it;
too much work.
So instead make it an attribute of apps,
which you can set via the admin web interface.

Corresponding changes to client.
2014-05-04 00:02:32 -07:00
David Anderson 425f67f4c6 scheduler: don't show error msg if no plan class spec file 2014-05-02 12:03:07 -07:00
David Anderson b0516e635c make_work: fix bug that prevented --max_wus from working 2014-04-30 15:35:04 -07:00
David Anderson bb4f4194d0 scheduler: cap CPU time of reported results at elapsed time * ncpus
This affects only result display,
since CPU time is no longer used for anything.
2014-04-10 23:52:13 -07:00
David Anderson fc7c75b200 server: parse peak memory/disk info from client, store in DB, display in web
The latest client reports the peak working set size, swap size,
and disk usage for completed jobs.
Add fields to the results table to store these.
Parse them in scheduler request messages, and write to the DB.
Display them in the result web page.

This data can be used to improve (or even automate)
the job estimates for memory and disk usage.
2014-04-02 19:35:59 -07:00
David Anderson e91eee67da trickle handler daemon: mark message as handled even if handler returns error.
This is because errors in general are non-recoverable,
and we'll end up retrying infinitely.
If an error actually is recoverable, exit().
2014-03-29 09:25:01 -07:00
David Anderson 6216673eca web: fix missing mysqli change 2014-03-22 09:04:58 -07:00
David Anderson 6f29a50812 validator: fixes and features
- add --is_gzip option to sample_bitwise_validator.
  If set, all files are treated as gzip archives.
  Check their 10-byte header to verify that it's a gzip file,
  but ignore it when comparing files.
- validator.cpp: don't error out on unparsed cmdline args,
  since we're now using them in sample_bitwise_validator
  and sample_substr_validator.
- fix build error on Debian
2014-03-20 12:38:29 -07:00
David Anderson cf0a0817c0 server: fix some compile warnings
Add a derived class DB_APP_VERSION_VAL for use by the validator,
containing the extra fields it uses,
so that we're not doing memset 0 on vectors
2014-03-19 14:55:16 -07:00
David Anderson 8aa10ee5a9 scheduler: check if cpu_time and elapsed_time are infinite, set to zero if so
Some (old? buggy?) clients report these as infinity.
This causes the result update queries to fail.
2014-03-18 20:19:04 -07:00
David Anderson 834ac11661 server: add sample validator that checks for string in stderr 2014-03-18 19:12:13 -07:00
David Anderson c2fd2b33e0 scheduler: fix bug that caused no jobs to be sent 2014-03-12 15:31:12 -07:00
David Anderson 2f91cd6b5e scheduler: add support for jobs targeted at hosts and teams
Also: add code to db_purge to delete assignment records for completed WUs
2014-03-12 00:03:17 -07:00
David Anderson 9889ee8fb6 scheduler: enforce GPU job limits separately for each GPU type
Previously, if a project specified a limit on GPU jobs in progress,
it would be enforced across GPU types.
This could lead to starvation for hosts with multiple GPU types.
E.g. the limit is 10, and a host has 10 NVIDIA jobs and no AMD jobs.

Fix this by enforcing limits separately for each GPU type.
2014-03-08 11:17:16 -08:00
David Anderson 5381def663 server: use gpu_active_frac in scheduling decisions
On some hosts, gpu_active_frac may be much less than active_frac
(i.e., GPUs may be available much less than CPUs).
Use gpu_active_frac in the following places:

- scheduler: in estimating the elapsed time of jobs,
    to decide whether they can meet deadline
- scheduler: in computing the effective speed of a (host, app version),
    when deciding what size class it belongs to
- size_census: in computing effective speed of (host, app versions)

(Previously, we were just using active_frac in all these cases)
2014-03-06 21:23:02 -08:00
David Anderson df1d8e2bde server: store and display gpu_active_frac
- gpu_active_frac is the fraction of time GPU use is allowed
  while the client is running.
  Previously the client reported it but we weren't storing it in the DB.
  We may need it in the future for batch scheduling logic.
- fix a crashing bug in scheduler
- client: minor message tweak
2014-03-06 13:23:52 -08:00
David Anderson 593181e196 scheduler: if gui_urls.xml or project_files.xml don't end with \n, add one
Otherwise the scheduler reply has two tags on one line,
which messes up old clients that don't use the new XML parse
2014-02-26 16:16:51 -08:00
David Anderson 0d8a22e75c Server: add optional size_class parameter to count_unsent_results().
This lets you write work generators that maintain min levels of
unsent jobs for each size class.
2014-02-20 13:44:56 -08:00
David Anderson 4b5a099f81 scheduler: create host_app_version records in NCI case 2014-02-04 15:58:01 -08:00
David Anderson c7db808abd Scheduler: message tweak 2014-02-04 10:07:46 -08:00
David Anderson d861862ca1 server: fix compile warnings and file descriptor leaks
Also, we were using memset() to zero WORK_REQ,
which contains several std::vector's.
This apparently works on Linux, but not in general.
2014-01-08 22:00:13 -08:00
David Anderson cbc419ccab scheduler: fix bug that caused sticky files to always get deleted when file_delete_regexp mechanism used 2013-12-18 16:33:14 -08:00
David Anderson 2e4d561647 sample work generator: wait until transitioner has processed jobs before creating any more
Work generators create jobs (workunits);
the transitioner creates instances (results).
If a work generator tries to maintain a certain number of unsent results
(as the sample work generator does)
it must wait for a bit, after creating jobs,
to let the transitioner create instances of those jobs.
The example work generator waited 5 seconds.

Problem: on a heavily loaded project, the transitioner can fall behind -
minutes or hours behind.
So the above policy can create way too many jobs.

Solution: after creating jobs, the sample work generator
notes the current time X,
then waits until the transitioner catches up to time X
(i.e., until the min workunit.transition_time exceeds X).
This ensures that instances have been created for all the new jobs.

Other work generators the limit the number of unsent jobs
should use the same technique;
use min_transition_time(x) to get the min transition time.

Code cleanup: get_double should be a member of DB_CONN, not DB_BASE.
2013-12-14 16:36:18 -08:00
David Anderson 6d4999767f example app: print "starting" message after boinc_init, so that it appears in stdferr file
Also remove old score-based sched code
2013-12-10 14:00:31 -08:00
David Anderson 7d54e6537e scheduler: add <vm_accel_required> flag to plan class XML spec 2013-12-03 15:54:56 -08:00
David Anderson 99332624f3 scheduler: parse <opencl_cpu_prop> in scheduler requests correctly
The OPENCL_CPU_PROP structure was being referred to as both
"opencl_cpu_prop" and "cpu_opencl_prop", roughly 50/50,
in variable names and XML tags.
Let's standardize on "opencl_cpu_prop",
which is what current clients are sending in scheduler requests.
2013-11-28 14:11:42 -08:00
David Anderson feb2f1971d scheduler: fix bug that prevented Intel GPU work from being sent to anonymous platform clients 2013-11-21 22:31:15 -08:00
Rom Walton bec26d2447 VBOX: Add support for vbox32_hwaccel and vbox64_hwaccel plan classes in the stock server scheduler. 2013-11-18 14:43:44 -05:00
David Anderson 863c9496b0 deadline-extension trickle handler: message tweaks 2013-11-11 13:24:09 -08:00
David Anderson 5192fe2545 scheduler: assigned jobs should respect user app preferences 2013-10-06 21:23:28 -07:00
David Anderson 5b76909f04 scheduler: parse OpenCL/CPU descriptors, and add plan class for OpenCL/CPU/Intel 2013-08-26 23:32:32 -07:00
David Anderson b2e06e0704 Server: various fixes for "make install" 2013-08-24 20:36:49 -07:00
David Anderson f13c3d58ea fix bug in trickle handler framework; from Christian 2013-08-23 13:01:53 -07:00
David Anderson 628ba8f0ef Tweaks to deadline-extension trickle handler, from Christian 2013-08-23 09:45:45 -07:00
David Anderson 95d12b76e7 server: add code for extending deadlines via trickle-ups; from Christian 2013-08-23 00:34:37 -07:00
David Anderson ef82d5d9fb server: fix compile error on systems that don't define MAXPATHLEN 2013-08-22 17:01:45 -07:00
David Anderson 1c31f6feaa Condor: fix bug when 2 input files have same contents; fix error messages 2013-08-09 16:06:36 -07:00
Eric J Korpela 48d995061f Merge branch 'master' of ssh://boinc.berkeley.edu/boinc-v2 2013-08-08 11:31:27 -07:00
Eric J Korpela 60c7814250 SCHED: Removed claimed credit sanity check because GPU machines often have host
scales that cause it to fail. That prevents host_app_version PFCs not to be
updated for perfectly reasonable credit claims.  Since there is a max credit
granted this mechanism is unneccesary, anyway.
2013-08-08 11:23:30 -07:00
David Anderson b156e88208 scheduler: sample code for the SSE3 plan class must check for "pni" rather than "sse3"; clients report "pni" 2013-08-08 11:00:29 -07:00
Eric J Korpela 03e64f720b SCHED: Added "intel_gpu" to app_plan_uses_gpu() 2013-06-25 19:31:23 -07:00
Eric J Korpela 4e338e946e -SCHED: Added plan class spec plan class option "<need_amd_libs>" (similar to
"<need_ati_libs>".  Before this the default was to require AMD libraries unless
    need_ati_libs was set.  Now the default is to require neither.  This is
    necessary for MacOS compatibility (where there is no distiction).
   -SCHED: Changed intel gpu type search to match any string in the gpu_type
    beginning with "intel".  This was done because there have been
    inconsistencies in the code where "intel" vs "intel_gpu" is used.
2013-06-25 19:17:46 -07:00
Eric J Korpela 2c226d6ab2 SETI@home: made sending VLAR results to GPUs a run time sah_config.xml option 2013-06-19 14:40:53 -07:00
Eric J Korpela 244ba5bc85 SCHED: modified scheduled log output to use unsigned format for WU and RESULT
ids.  This allows IDs greater than 2^31 to be printed.
2013-06-19 10:15:08 -07:00
Eric J Korpela bd5658a833 Merge branch 'master' of ssh://boinc.berkeley.edu/boinc-v2 2013-06-09 19:18:33 -07:00
David Anderson 78f7610f6e remove dependency of boinc_api.h on str_replace.h (and hence config.h)
Any files that use strlcpy() or strlcat() must directly include str_replace.h
2013-06-06 17:31:46 -07:00
Eric J Korpela 679cca5f1f Merge branch 'master' of ssh://boinc.berkeley.edu/boinc-v2 2013-06-04 14:01:18 -07:00
Eric J Korpela b718037e79 - SCHED: Added code to reduce the number of times a app_version is sent
to host that has never successfully completed a result with that app_version.
2013-06-04 14:00:09 -07:00
Eric J Korpela d9d5b4b3b5 - SCHED: changes SETI@home feasibility to allow VLAR to go to ATI GPUs
with cal_target>15 and NVIDIA GPUs with compute capability>=3.0
2013-06-04 13:57:30 -07:00