My last commit did this using a new API call.
But this would require rebuilding apps any time you want to change it;
too much work.
So instead make it an attribute of apps,
which you can set via the admin web interface.
Corresponding changes to client.
The latest client reports the peak working set size, swap size,
and disk usage for completed jobs.
Add fields to the results table to store these.
Parse them in scheduler request messages, and write to the DB.
Display them in the result web page.
This data can be used to improve (or even automate)
the job estimates for memory and disk usage.
- add --is_gzip option to sample_bitwise_validator.
If set, all files are treated as gzip archives.
Check their 10-byte header to verify that it's a gzip file,
but ignore it when comparing files.
- validator.cpp: don't error out on unparsed cmdline args,
since we're now using them in sample_bitwise_validator
and sample_substr_validator.
- fix build error on Debian
Previously, if a project specified a limit on GPU jobs in progress,
it would be enforced across GPU types.
This could lead to starvation for hosts with multiple GPU types.
E.g. the limit is 10, and a host has 10 NVIDIA jobs and no AMD jobs.
Fix this by enforcing limits separately for each GPU type.
On some hosts, gpu_active_frac may be much less than active_frac
(i.e., GPUs may be available much less than CPUs).
Use gpu_active_frac in the following places:
- scheduler: in estimating the elapsed time of jobs,
to decide whether they can meet deadline
- scheduler: in computing the effective speed of a (host, app version),
when deciding what size class it belongs to
- size_census: in computing effective speed of (host, app versions)
(Previously, we were just using active_frac in all these cases)
- gpu_active_frac is the fraction of time GPU use is allowed
while the client is running.
Previously the client reported it but we weren't storing it in the DB.
We may need it in the future for batch scheduling logic.
- fix a crashing bug in scheduler
- client: minor message tweak
Work generators create jobs (workunits);
the transitioner creates instances (results).
If a work generator tries to maintain a certain number of unsent results
(as the sample work generator does)
it must wait for a bit, after creating jobs,
to let the transitioner create instances of those jobs.
The example work generator waited 5 seconds.
Problem: on a heavily loaded project, the transitioner can fall behind -
minutes or hours behind.
So the above policy can create way too many jobs.
Solution: after creating jobs, the sample work generator
notes the current time X,
then waits until the transitioner catches up to time X
(i.e., until the min workunit.transition_time exceeds X).
This ensures that instances have been created for all the new jobs.
Other work generators the limit the number of unsent jobs
should use the same technique;
use min_transition_time(x) to get the min transition time.
Code cleanup: get_double should be a member of DB_CONN, not DB_BASE.
The OPENCL_CPU_PROP structure was being referred to as both
"opencl_cpu_prop" and "cpu_opencl_prop", roughly 50/50,
in variable names and XML tags.
Let's standardize on "opencl_cpu_prop",
which is what current clients are sending in scheduler requests.
scales that cause it to fail. That prevents host_app_version PFCs not to be
updated for perfectly reasonable credit claims. Since there is a max credit
granted this mechanism is unneccesary, anyway.
"<need_ati_libs>". Before this the default was to require AMD libraries unless
need_ati_libs was set. Now the default is to require neither. This is
necessary for MacOS compatibility (where there is no distiction).
-SCHED: Changed intel gpu type search to match any string in the gpu_type
beginning with "intel". This was done because there have been
inconsistencies in the code where "intel" vs "intel_gpu" is used.