Table of Contents
Work fetch with max concurrent constraints
current:
rr sim: (pick_jobs_to_run) if project reaches a MC limit,
stop picking its jobs
(and take it out of the simulation)
Need to do this to avoid starvation.
work fetch: don't fetch from a project at MC limit
problem: we don't buffer work for projects with MC limits
solution:
rr sim:
keep simulating project even if at MC limit
keep track of MI(P,R) = max # instances used by MC projects P
how many instances the project is able to use, given its MC restriction.
It may be all instances.
maintain "MC shortfall" MCS(R,P) for each MC project P
in update_stats()
y = MI(P,R) - #devices in use by P
x = min(y, nidle)
MCS(R,P) += x*dt
allow work fetch from MC project P, but use MCS(R,P) instead of shortfall; don't request if it's zero
examples (suppose min_buf is 3, max_buf is 6)
4 device instances
p = project with max concurrent constraint
x = other projects
. = idle
example 1: P has lots of jobs, and can use only 2 instances
1 pp..
2 pp..
3 pp..
4 pp..
5 pp..
6 pp..
7 pp..
In this case shortfall is 6, but we don't want to request any more work from P
example 2: p has only a couple of jobs. It can use 1 or 2 instances, depending on which app versions run
1 ppxx
2 pxxx
3 p*xx
4 p*.x
5 p*..
6 **..
7 ....
In this case the MC shortfall for P is 5 (the *'s). If P had the highest scheduling priority, we'd ask it for 5 units of work. After that, it wouldn't be eligible for work fetch because the MC shortfall would be zero. But we'd be able to ask another project for 4 units.
(Aug 2024) MC limits (per app or project) are in terms of jobs. But jobs can use > 1 processor of each type.
work_fetch.cpp has, for the max # of instances a project can use given MC constraints:
max_nused = p->app_configs.project_min_mc
where project_min_mc is the min of max_concurrent over project and apps
This is wrong. Instead, for each project P and resource R compute
x = max usage of R over P's app versions that use R
mc_max_could_use = min(m*x, R.ninstances)
where m is P's smallest max concurrent value (over all apps)
Use this as the basis for determining shortfall.
This is an improvement, but it's still very crude: e.g. the MC limit might be on a single app, but we're taking the min over all apps. Also, we're taking a min over apps for which there might be no jobs.
But I think that these approximations can only cause over-fetching, rather than starvation