pointers to dynamically allocated COPROC-derived objects,
just have the objects themselves.
Dynamic allocation should be avoided at all costs.
svn path=/trunk/boinc/; revision=21564
Report it to the manager
(it was already in CC_STATUS, but not populated)
- manager: fix system tray icon popup text
svn path=/trunk/boinc/; revision=21481
apply same random backoff to all transfers.
- client: when changing ncpus via config file,
don't modify host_info.p_ncpus
- client: show effective #CPUs separately from physical #
svn path=/trunk/boinc/; revision=21470
Some of them allow only 1 CUDA context at a time.
You need to create a CUDA context to get available VRAM.
So the client would run a CUDA job, then immediately kill it.
Solution:
- If a GPU app is running,
let it keep running regardless of available VRAM
(if it's still running, it has enough VRAM).
- But don't start new apps if there's not enough available VRAM,
or it the amount is unknown
(if the client can't create a CUDA context,
the app won't be able to either)
- client: if <coproc_debug> is set, print available GPU RAM periodically
svn path=/trunk/boinc/; revision=21253
of other jobs of that type.
They're waiting for GPU RAM, which may now be available.
- client: bug fix in GPU RAM availability
- client: fix testing setup for GPU RAM availability
svn path=/trunk/boinc/; revision=21206
old: assign GPUs, then check available RAM
Problem: may cause starvation on multi-GPU systems.
new: use available RAM info in the assignment process.
Prevents starvation, also reduces the number of driver calls.
svn path=/trunk/boinc/; revision=21205
to multiple jobs at the same time.
I fixed one error (reference arg to assign_coprocs())
but I can't see why this would explain the problem.
I added a lot of extra <coproc_debug> log messages.
- user web: give scientists moderator privileges
svn path=/trunk/boinc/; revision=21158
Removed my changes of 19 Jan 2010, which didn't work.
Added new mechanism: keep track of whether a job J has ever run in EDF.
If so, and if another job of the same project and resource type as J
is marked as deadline miss, then mark J as deadline miss,
so that it won't get preempted.
- web: change "result" to "task" in server status page
- admin web: show server stable SVN revision, not trunk
svn path=/trunk/boinc/; revision=20805
no longer means simulate zero CPUs.
There are several places that divide by ncpus.
Zero CPUs doesn't make any sense anyway.
svn path=/trunk/boinc/; revision=20474
if job A is unstarted and EDF,
and there's a job B that is later in the list,
is started, has the same app version,
and has the same arrival time,
move A after B.
- client: remove the "temp_dcf" mechanism,
which had the same goal but didn't work.
- client: in computing overall debt for a project,
subtract a term that reflects pending work.
This should reduce repeated fetches from the same project.
- client simulator: tweaks
svn path=/trunk/boinc/; revision=20223
"There is a bug in tra() that causes problems if one of the arguments
contains a replacement marker itself. For example, if the first
argument contains an encoded URL, which contains '%2', the second
argument may appear in the middle of the URL."
- client simulator: further fiddling around. Not done.
svn path=/trunk/boinc/; revision=20201
RAM to run job, but when we actually run the job
not enough GPU RAM is free, so the application fails.
This can cause a large number of jobs to fail.
Solution:
- app_plan() can specify the GPU RAM requirements of an app version.
This is passed to the client in a new field
<gpu_ram> of the <app_version> element.
- prior to starting or restarting a GPU app, the client
checks the amount of free RAM on the particular GPU.
If it's not enough for the app version,
the client doesn't start it,
and arranges for the scheduler to ignore it for 5 minutes
(by which point there might be more free GPU RAM)
Notes:
1) this change will have effect only when
both client and scheduler are updated.
2) the check is done in enforce_schedule(),
rather than schedule_cpus(),
because only at that point
have we assigned a specific GPU to the job.
3) there's another case to deal with:
a GPU app's malloc of GPU RAM fails in the middle of the job.
Currently the job fails.
I plan to add an API call boinc_temporary_exit(x) so
that the job can exit and potentially restart in x seconds.
(In principle this mechanism is sufficient for all cases,
but it could lead to a lot of starting/exiting,
so the current change is worthwhile).
svn path=/trunk/boinc/; revision=19864
Old: if a project has RR sim deadline misses,
select jobs to run high-priority on the basis of:
1) deadline (earliest first)
2) estimated time to completion (least first)
This ignores whether jobs missed their deadline in RR sim,
so it may choose to run a job that's actually in no
danger of missing its deadline over one that is.
New: choose only jobs that miss their deadline in RR sim
svn path=/trunk/boinc/; revision=19826
It computed an "overall STD" as the sum of CPU and coprocs,
weighted by the coproc's speed, as we do for LTD.
This was the wrong idea; in the presence of GPUs,
STDs quickly get pushed to +- 1 day and are truncated there.
New scheme: STD is maintained per (resource type, project).
This fixes the above problem,
and it opens to door to round-robin scheduling of GPUs.
- client: the calculation of "anticipated debt" was scaling
by relative resource share.
This wasn't correct, seems to me.
- client: rename "debt" to "long_term_debt" in a few places
(but not in the client state file, for compatibility)
svn path=/trunk/boinc/; revision=19777
Old: it's based entirely on CPU time.
So a GPU project, whose app uses only a fraction
of a CPU, accrues positive debt.
This is OK if the project has only GPU apps,
since STD is not (currently) used for GPU scheduling.
But some projects have both CPU and GPU apps.
New: STD is based on total processing.
It has terms for each resource type.
The notion of "runnable resource share" is specific to a type.
Note: the notion of "resource share fraction" appears in
a couple of other places:
- it's passed to apps in app_init_data.xml
- it's passed in scheduler requests.
It should be broken down by resource type in these cases too.
Note to self: do this later.
svn path=/trunk/boinc/; revision=19762
This tells you what's running, not why
- client: add <std_debug> log flag; changes in STD
The above are to let you log just stuff relevant to debt.
Right now I'm not sure why we need STD at all.
svn path=/trunk/boinc/; revision=19726