Add optional <mem_usage_base> and <mem_usage_per_cpu> elements
for MT plan classes.
These let you make the estimated memory usage depend on the # of CPUs used,
for example, for VM apps that run multiple jobs within the VM.
The mem usage is base + NCPUS*per_cpu.
This is used for:
1) to limit the # of CPUs, based on client's available memory
2) to determine workunit.rsc_memory_bound
(the value in the input template is overridden)
negative values are stored in app_version_id fields to represent
anonymous platform versions.
So need to use %ld rather than %lu for these fields.
Also there were a couple of more changes of int do DB_ID_TYPE
The SETI@home result table is about to run out of 32-bit IDs,
so we need to move to 64-bit result IDs.
This will happen to the workunit table at some point too.
I changed the server C++ code to use the "long" type for all DB IDs
(and to use appropriate conversion codes like %lu).
"long" is 64 bit on 64-bit machines.
For uniformity I did this for all tables,
even ones (like app) that will never get big.
I chose NOT to change the DB schema for now.
The new code will work with 32-bit ID fields in the DB.
As projects approach the 32-bit limit on a table they can change
its ID field, and fields that reference this table, to BIGINT.
This is likely to happen only on the result and workunit tables.
I put functions in html/ops/db_update.php
to change the IDs of these tables.
A "generic" coprocessor is one that's reported by the client,
but's not of a type that the scheduler knows about (NVIDIA, AMD, Intel).
With this commit the following works:
- On the client, define a <coproc> in your cc_config.xml
with a custom name, say 'miner_asic'.
- define a plan class such as
<plan_class>
<name>foobar</name>
<gpu_type>miner_asic</gpu_type>
<cpu_frac>0.5</cpu_frac>
<plan_class>
- App versions of this plan class will be sent only to hosts
that report a coproc of type "miner_asic".
The <app_version>s in the scheduler reply will include
a <coproc> element with the given name and count=1.
This will cause the client (at least the current client)
to run only one of these jobs at a time,
and to schedule the CPU appropriately.
Note: there's a lot missing from this;
- app version FLOPS will be those of a CPU app;
- jobs will be sent only if CPU work is requested
... and many other things.
Fixing these issues requires a significant re-architecture of the scheduler,
in particular getting rid of the PROC_TYPE_* constants
and the associated arrays,
which hard-wire the 3 fixed GPU types.
If a project has both NCI and non-NCI apps,
it needs to send NCI jobs even if the work request is zero;
otherwise a client might not get NCI jobs.
However, this policy will send NCI jobs even if the user
has set "no new tasks" for the project.
To handle this correctly, we recently added a <dont_send_work>
element to the scheduler request.
Add logic to the scheduler to parse and enforce this flag.
Previously, if a project specified a limit on GPU jobs in progress,
it would be enforced across GPU types.
This could lead to starvation for hosts with multiple GPU types.
E.g. the limit is 10, and a host has 10 NVIDIA jobs and no AMD jobs.
Fix this by enforcing limits separately for each GPU type.
On some hosts, gpu_active_frac may be much less than active_frac
(i.e., GPUs may be available much less than CPUs).
Use gpu_active_frac in the following places:
- scheduler: in estimating the elapsed time of jobs,
to decide whether they can meet deadline
- scheduler: in computing the effective speed of a (host, app version),
when deciding what size class it belongs to
- size_census: in computing effective speed of (host, app versions)
(Previously, we were just using active_frac in all these cases)
Some credit cheats (e.g. with credit_by_runtime) can be done
by reporting a huge value.
Fix this by capping the value at 1.1 times the 95th percentile
of host.p_fpops, taken over active hosts.
svn path=/trunk/boinc/; revision=25017
depending on how many the host has,
and whether CPU VM extensions are present
(this reflects the requirements of CernVM).
svn path=/trunk/boinc/; revision=25009
to a superceded or deprecated app version, use it anyway.
The current app version may not validate against the old one.
svn path=/trunk/boinc/; revision=24823
If the file "client_opaque.txt" exists on the client,
include its contents in scheduler request messages.
On the scheduler, parse this into SCHEDULER_REQUEST::client_opaque,
where it can be used by the customizable scheduler functions.
svn path=/trunk/boinc/; revision=24586
- scheduler: when using elapsed time stats to predict runtime,
cap the estimated FLOPS at twice the peak FLOPS;
otherwise, if a host has received a lot of very short jobs
recently, it will get a too-high FLOPS estimate and
will exceed the rsc_fpops_bound limit.
svn path=/trunk/boinc/; revision=24128
Add parsed_tag and is_tag to the class,
so that parsing functions don't need to declare them
and pass them around.
- Complete the task of using XML_PARSER as the argument
to all parsing functions.
(Internally, many of these functions still use the old XML parser;
that's the next step.)
svn path=/trunk/boinc/; revision=23978
- don't create result records for uploads and downloads.
Just create a msg_to_client record.
- the scheduler handles file-transfer results specially;
it makes a vector of them, then calls a project-supplied function
handle_file_xfer_results()
- change the interface and implementation of put_file and get_file
- client write project sched priority in GUI RPC replies,
but not to the state file
svn path=/trunk/boinc/; revision=23857
Lets you specify, on a per-app basis,
that all instances should be done using the same app version.
This is for validation in the presence of GPUs.
- scheduler: code cleanup
- Instead of adding a bunch of non-DB fields to RESULT,
used a derived class SCHED_DB_RESULT.
- Instead of storing a pointer to BEST_APP_VERSION in RESULT,
store the structure itself.
This simplifies the memory allocation situation.
- client: condition "Got server request to delete file" messages
on <file_xfer_debug>
svn path=/trunk/boinc/; revision=23636
and an upload started in the last 5 min, don't fetch work from it.
The goal is to merge the 2 scheduler RPCs
(fetch work, report completed taskS) into a single RPC.
Note: this may result in idleness in some cases.
- scheduler: if client doesn't handle plan class (pre-5.10),
check plan-class app versions anyway,
but only use if it's a single-CPU app.
This allows single-CPU app versions with specific requirements
(like SSE) to be issued to old clients.
From Bernd Machenschalk
svn path=/trunk/boinc/; revision=22841
Such jobs fail on 32-bit machines, even if they have sufficient RAM,
because 32-bit OSs don't support address spaces > 2GB.
In general, we want to support the following scenario:
- an app has a mixture of small (< 2GB) and big (> 2GB) jobs.
- there are app versions for both 32b and 64b platforms
- one of the 32b versions is faster than the 64b version
(say, it's a 32b GPU app)
Goals:
If the client is 32b, send it only small jobs,
using the fast 32b version if possible
If the client is 64b and has sufficient RAM,
send it large jobs using the 64b version;
send it small jobs using the fast 32b version if possible,
else the 64b version
Solution: extend get_app_version() so that it detects big jobs,
and uses only 64b versions for them.
Add a "for_64b_jobs" field to BEST_APP_VERSION
so that we maintain a separate memoized set of
BEST_APP_VERSIONs for big jobs.
- client: don't set report_results_immediately inappropriately
svn path=/trunk/boinc/; revision=22440
1) old-style apps with graphics in main program.
No one should be using these anymore.
2) writing init_data.xml in boinc_finish().
This was used by deprecated "compound app" scheme
- scheduler: if request reports results that were previously reported,
that's evidence that the previous reply was not received by client.
It may have contained results.
So set a "resend lost results" flag.
svn path=/trunk/boinc/; revision=22203
where the client tells the scheduler which app versions
its queued jobs use
(this is needed, e.g., to enforce per-app or per-resource job limits).
In this mechanism, the client sends an array of <app_version>s,
and each <other_result> includes an index into this array.
- The wrong index was being sent (client).
- If an <app_version> had a non-existent app name
(e.g. because that app had been deprecated)
it wasn't getting put in the array, invalidating array indices
Furthermore, an erroneous message was being sent to the user
Fix: if parse error for <app_version>,
put it in the array anyway, but with cav.app = NULL,
meaning that it's a place-holder.
Send a message to user only if anon platform.
- manager: increase notice buffers to 64K
svn path=/trunk/boinc/; revision=22052
This feature lets you run the BOINC client as a job on grid systems
that handle only 1-CPU jobs;
it disables various mechanisms that prevent multiple clients per host
(which is normally a bad thing).
Old:
- Run the client with a --allow_multiple_clients flag.
This tells it not to use a mutex that prevents
multiple clients per host.
- Run the project with the <multiple_clients_per_host> config flag.
This suppresses two mechanisms:
- (avoid duplicate host records)
on a scheduler request with no host ID,
looks for a host with same domain name, OS type,
and mem size, and assumes the request is from that host
- (job retry)
If we get a request that doesn't have a host ID
but does have a host CPID,
mark its in-progress results as over
NOTE: I CAN'T REMEMBER WHY WE SUPPRESS THIS;
MARK S, DO YOU REMEMBER?
Problem:
if the grid clients attach to a project that
doesn't use <multiple_clients_per_host>, bad things happen.
E.g., if there are several requests at about the same time,
most of them will fail with
"another RPC already in progress" errors.
If a project does include this flag,
it loses protection from duplicate host records.
New:
- If the client is run with --allow_multiple_clients flag,
it passes a <allow_multiple_clients> element
in scheduler requests.
- The scheduler skips the duplicate-host check on
requests that include this flag.
- There is no more <multiple_clients_per_host> scheduler option.
Note: if a project using the old mechanism upgrades to this change,
it will need to use new clients for its grid deployment.
svn path=/trunk/boinc/; revision=21839