- This adds overhead to the get_state() call,
but this happens only once per minute with the Manager.
- rename things so that "keyword_ids" refers to lists of keyword IDs
and "keywords" refers to full KEYWORD objects
- have boinccmd include keywords in workunit properties
Add optional <mem_usage_base> and <mem_usage_per_cpu> elements
for MT plan classes.
These let you make the estimated memory usage depend on the # of CPUs used,
for example, for VM apps that run multiple jobs within the VM.
The mem usage is base + NCPUS*per_cpu.
This is used for:
1) to limit the # of CPUs, based on client's available memory
2) to determine workunit.rsc_memory_bound
(the value in the input template is overridden)
negative values are stored in app_version_id fields to represent
anonymous platform versions.
So need to use %ld rather than %lu for these fields.
Also there were a couple of more changes of int do DB_ID_TYPE
The SETI@home result table is about to run out of 32-bit IDs,
so we need to move to 64-bit result IDs.
This will happen to the workunit table at some point too.
I changed the server C++ code to use the "long" type for all DB IDs
(and to use appropriate conversion codes like %lu).
"long" is 64 bit on 64-bit machines.
For uniformity I did this for all tables,
even ones (like app) that will never get big.
I chose NOT to change the DB schema for now.
The new code will work with 32-bit ID fields in the DB.
As projects approach the 32-bit limit on a table they can change
its ID field, and fields that reference this table, to BIGINT.
This is likely to happen only on the result and workunit tables.
I put functions in html/ops/db_update.php
to change the IDs of these tables.
There was a bug where, when you suspend GPU activity,
GPU jobs show as suspended but are not actually suspended.
This was because of recent changes to distinguish GPU and non-GPU coprocs.
Change things so that coprocs are by default GPUs.
If you want to declare a non-GPU coproc in your cc_config.xml,
you much put <non_gpu/> in its <coproc> element.
I.e. treat miner ASICs as a distinct processor type;
send miner_asic jobs only if the client requests them.
Note: I was planning to do this in a more general way,
in which the scheduler wouldn't have a hard-wired list of processor types.
However, that would be a large code change,
so for now I just added miner_asic to the list of processor types
(nvidia, ati, intel_gpu),
and made various changes to get things to work.
Also: in the job dispatch logic, try to send coproc jobs
before CPU jobs.
That way if e.g. there's a limit on jobs in progress,
we'll preferentially send coproc jobs.
A "generic" coprocessor is one that's reported by the client,
but's not of a type that the scheduler knows about (NVIDIA, AMD, Intel).
With this commit the following works:
- On the client, define a <coproc> in your cc_config.xml
with a custom name, say 'miner_asic'.
- define a plan class such as
<plan_class>
<name>foobar</name>
<gpu_type>miner_asic</gpu_type>
<cpu_frac>0.5</cpu_frac>
<plan_class>
- App versions of this plan class will be sent only to hosts
that report a coproc of type "miner_asic".
The <app_version>s in the scheduler reply will include
a <coproc> element with the given name and count=1.
This will cause the client (at least the current client)
to run only one of these jobs at a time,
and to schedule the CPU appropriately.
Note: there's a lot missing from this;
- app version FLOPS will be those of a CPU app;
- jobs will be sent only if CPU work is requested
... and many other things.
Fixing these issues requires a significant re-architecture of the scheduler,
in particular getting rid of the PROC_TYPE_* constants
and the associated arrays,
which hard-wire the 3 fixed GPU types.
This is meant not to break anything, just add some
(optional) logging and features needed for Einstein@Home.
Please contact me before changing or removing any of this.
Conflicts:
sched/db_dump.cpp
sched/file_deleter.cpp
sched/validator.cpp
In 'mixed scheduling', save & reset 'insufficient' (resources) flags
after calling the first scheduler, and after calling the second,
recombine the resulting flags
Previously, if a project specified a limit on GPU jobs in progress,
it would be enforced across GPU types.
This could lead to starvation for hosts with multiple GPU types.
E.g. the limit is 10, and a host has 10 NVIDIA jobs and no AMD jobs.
Fix this by enforcing limits separately for each GPU type.
On some hosts, gpu_active_frac may be much less than active_frac
(i.e., GPUs may be available much less than CPUs).
Use gpu_active_frac in the following places:
- scheduler: in estimating the elapsed time of jobs,
to decide whether they can meet deadline
- scheduler: in computing the effective speed of a (host, app version),
when deciding what size class it belongs to
- size_census: in computing effective speed of (host, app versions)
(Previously, we were just using active_frac in all these cases)
We were failing to mark the cache entries as free.
- API: initialize GPU device # to -1;
If client doesn't give us a device number, something is wrong
and it's better to not start computing.
svn path=/trunk/boinc/; revision=26079
(but not all) wasn't finished.
New logic: if the project has an NCI app then:
- make a list of NCI apps for which the client doesn't have
a job in progress.
- try to send one job for each of these apps
- do this even if no work is being requested.
- don't send jobs for NCI apps by other mechanisms
NOTE: the client logic isn't quite right for mixed NCI projects.
If there's no job for a given NCI app,
the client should do a scheduler RPC.
This isn't critical so we won't do this now.
svn path=/trunk/boinc/; revision=26068
cmdline tool for remote job submission (not done)
- remote job submission: support the 4 file modes described
in the documentation (not done)
svn path=/trunk/boinc/; revision=26067