Job runtime estimation
The old system
Jobs have a FLOP count estimate, wu.rsc_fpops_est.
When sending an app version to a host, the scheduler estimates its FLOPS. This is either the CPU benchmark, or a value assigned by the app_plan() function (the app_plan function is expected to predict the performance of an app on all possible hosts).
The client maintains a per-project duration correction factor (DCF), intended to measure the efficiency of the project's apps, and the systematic error in wu.rsc_fpops_est. DCF is used to scale runtime estimates on both client and server side.
Problems with the old system:
- Projects can have lots of apps. A single DCF does not suffice.
- Projects can't be expected to predict app performance, either in wu.rsc_fpops_est or in app_plan().
The new system
Projects still have to supply wu.rsc_fpops_est.
The new system has a large overlap with the new credit system; read that document first. In particular, we now maintain:
- A host_app_version database record per (host, app version), or per (host, app, resource type) in the case of anonymous platform. This record includes the average elapsed time per wu.rsc_fpops_est.
- for each app version, a pfc_scale which approximates the efficiency of the app version relative to the most efficient version.
The app_plan() function now returns peak FLOPS, not the expected actual FLOPS.
DCF is no longer used.
In the process of selecting an app version for each job, the scheduler estimates its actual FLOPS. This is stored in BEST_APP_VERSION.HOST_USAGE.flops.
Regular case
An app version's FLOPS estimate is initially the peak FLOPS. We then look at the host_app_version record. If it exists, and there are sufficient samples, we set
estimated_flops = 1/host_app_version.et.avg
Otherwise, is app_version.pfc_scale is defined,
estimated_flops *= app_version.pfc_scale
Anonymous platform case
If the host_app_version record exists and there are sufficient samples,
estimated_flops = 1/host_app_version.et.avg
Otherwise, we use the estimate supplied by the client. This may be specified in the app_info.xml file. If not, the current client passes the peak FLOPS.
Older clients (predating GPU support) don't pass a FLOPS estimate. In this case we use the CPU benchmark.
The estimated FLOPS is used to estimate job runtime on the server side.
However, the only way to change the client's runtime estimate is by adjusting the wu.rsc_fpops_est that we send to the client. So, in the first case above, we scale wu.rsc_fpops_est by
(old estimate flops)/(new estimated flops)
Implementation notes
At the start of send_work(), the scheduler enumerates all host_app_version records for this host. At the end of the request, when host_scale_time is updated, we do updates or inserts as appropriate.