RAM to run job, but when we actually run the job
not enough GPU RAM is free, so the application fails.
This can cause a large number of jobs to fail.
Solution:
- app_plan() can specify the GPU RAM requirements of an app version.
This is passed to the client in a new field
<gpu_ram> of the <app_version> element.
- prior to starting or restarting a GPU app, the client
checks the amount of free RAM on the particular GPU.
If it's not enough for the app version,
the client doesn't start it,
and arranges for the scheduler to ignore it for 5 minutes
(by which point there might be more free GPU RAM)
Notes:
1) this change will have effect only when
both client and scheduler are updated.
2) the check is done in enforce_schedule(),
rather than schedule_cpus(),
because only at that point
have we assigned a specific GPU to the job.
3) there's another case to deal with:
a GPU app's malloc of GPU RAM fails in the middle of the job.
Currently the job fails.
I plan to add an API call boinc_temporary_exit(x) so
that the job can exit and potentially restart in x seconds.
(In principle this mechanism is sufficient for all cases,
but it could lead to a lot of starting/exiting,
so the current change is worthwhile).
svn path=/trunk/boinc/; revision=19864
can increase or decrease at N times real time.
My checkin of 7 Dec reflects this by changing
the STD limits to +- N*MAX_STD.
This looks like a bug to users.
Instead, scale that rate of STD change by 1/N,
and keep the old limits of +- MAX_STD
svn path=/trunk/boinc/; revision=19851
Old: if a project has RR sim deadline misses,
select jobs to run high-priority on the basis of:
1) deadline (earliest first)
2) estimated time to completion (least first)
This ignores whether jobs missed their deadline in RR sim,
so it may choose to run a job that's actually in no
danger of missing its deadline over one that is.
New: choose only jobs that miss their deadline in RR sim
svn path=/trunk/boinc/; revision=19826
Let them float around with other projects.
Fixes problem where, when a project finishes its last job
and has a negative STD, it gets an unfair increment
by being set to zero.
svn path=/trunk/boinc/; revision=19804
Source of proxy info (descending priority)
- GUI RPC (Manager or boinccmd)
This and only this is saved in state file.
If neither HTTP nor SOCKS server name present,
this is viewed as not present
- environment vars
- cc_config.xml
Show sources of proxy info in message log.
If one is present but overridden, show a message to that effect.
This fixes a bug where someone had a proxy info env var and
forgot about it.
They got an erroneous message saying no proxy was being used.
svn path=/trunk/boinc/; revision=19785
It computed an "overall STD" as the sum of CPU and coprocs,
weighted by the coproc's speed, as we do for LTD.
This was the wrong idea; in the presence of GPUs,
STDs quickly get pushed to +- 1 day and are truncated there.
New scheme: STD is maintained per (resource type, project).
This fixes the above problem,
and it opens to door to round-robin scheduling of GPUs.
- client: the calculation of "anticipated debt" was scaling
by relative resource share.
This wasn't correct, seems to me.
- client: rename "debt" to "long_term_debt" in a few places
(but not in the client state file, for compatibility)
svn path=/trunk/boinc/; revision=19777
only if the offset is positive.
- client: some cmdline args set members of config.
However, config was being cleared after cmdline args were parsed,
so these args had no effect.
Instead, clear config before parsing cmdline
svn path=/trunk/boinc/; revision=19776
Old: it's based entirely on CPU time.
So a GPU project, whose app uses only a fraction
of a CPU, accrues positive debt.
This is OK if the project has only GPU apps,
since STD is not (currently) used for GPU scheduling.
But some projects have both CPU and GPU apps.
New: STD is based on total processing.
It has terms for each resource type.
The notion of "runnable resource share" is specific to a type.
Note: the notion of "resource share fraction" appears in
a couple of other places:
- it's passed to apps in app_init_data.xml
- it's passed in scheduler requests.
It should be broken down by resource type in these cases too.
Note to self: do this later.
svn path=/trunk/boinc/; revision=19762