Commit Graph

192 Commits

Author SHA1 Message Date
David Anderson 9411118774 client: fix bug where GPU jobs non suspended
There was a bug where, when you suspend GPU activity,
GPU jobs show as suspended but are not actually suspended.
This was because of recent changes to distinguish GPU and non-GPU coprocs.
Change things so that coprocs are by default GPUs.
If you want to declare a non-GPU coproc in your cc_config.xml,
you much put <non_gpu/> in its <coproc> element.
2014-12-08 20:38:56 -08:00
David Anderson 115894f1e3 client emulator: don't crash if > 64 coproc instances specified 2014-11-24 23:07:21 -08:00
David Anderson 2b2b04188a client: "suspend GPUs" shouldn't suspend non-GPU coprocessors
The following should apply to GPUs but not other coprocs (e.g. miner ASICs):
- "suspend GPUs" command in GUI
- prefs for suspending GPUs
- always removing app from memory when suspended
2014-11-07 00:57:39 -08:00
David Anderson 74f3d25106 client: small code cleanup 2014-10-10 14:45:40 -07:00
David Anderson c2a0421074 scheduler: add support for miner_asic coprocessor type
I.e. treat miner ASICs as a distinct processor type;
send miner_asic jobs only if the client requests them.

Note: I was planning to do this in a more general way,
in which the scheduler wouldn't have a hard-wired list of processor types.
However, that would be a large code change,
so for now I just added miner_asic to the list of processor types
(nvidia, ati, intel_gpu),
and made various changes to get things to work.

Also: in the job dispatch logic, try to send coproc jobs
before CPU jobs.
That way if e.g. there's a limit on jobs in progress,
we'll preferentially send coproc jobs.
2014-09-21 21:08:09 -07:00
David Anderson 89b51ea43d scheduler: preliminary support for generic coprocessors
A "generic" coprocessor is one that's reported by the client,
but's not of a type that the scheduler knows about (NVIDIA, AMD, Intel).

With this commit the following works:
- On the client, define a <coproc> in your cc_config.xml
  with a custom name, say 'miner_asic'.
- define a plan class such as
  <plan_class>
    <name>foobar</name>
    <gpu_type>miner_asic</gpu_type>
    <cpu_frac>0.5</cpu_frac>
  <plan_class>
- App versions of this plan class will be sent only to hosts
  that report a coproc of type "miner_asic".
  The <app_version>s in the scheduler reply will include
  a <coproc> element with the given name and count=1.
  This will cause the client (at least the current client)
  to run only one of these jobs at a time,
  and to schedule the CPU appropriately.

Note: there's a lot missing from this;
- app version FLOPS will be those of a CPU app;
- jobs will be sent only if CPU work is requested
... and many other things.
Fixing these issues requires a significant re-architecture of the scheduler,
in particular getting rid of the PROC_TYPE_* constants
and the associated arrays,
which hard-wire the 3 fixed GPU types.
2014-07-25 12:40:35 -07:00
Charlie Fenton 512e8e2cfe client: continue adding support for OpenCL devices (GPUs and accelerators) other than AMD/ATI, NVIDIA or Intel GPUs.
For now, handle AMD/ATI, NVIDIA or Intel GPUs as before.  But for other, "new" vendors, we treat each device as a separate resource, creating an entry for each instance in the COPROCS::coprocs[] array and copying the device name COPROC::opencl_prop.name into the COPROC::type field (instead of the vendor name.)
For devices from "new" vendors, set <gpu_type> field in init_data.xml file to the vendor string supplied by OpenCL.  This should allow boinc_get_opencl_ids() to work correctly with these "new" devices without modification.
2014-07-23 05:18:51 -07:00
David Anderson 035541f7a7 scheduler: recent coproc changes caused crash; undo them 2014-07-22 09:13:29 -07:00
Charlie Fenton 7fb69fe924 client: generalize naming scheme for OpenCL devices, add more general have_rsrc() functions 2014-07-17 02:22:26 -07:00
Charlie Fenton b37cf4cd9a client: begin adding support for OpenCL devices other than AMD/ATI, NVIDIA or Intel 2014-07-16 04:33:26 -07:00
David Anderson ac9e2b088d client emulator: make it work again 2014-05-21 10:41:55 -07:00
Charlie Fenton 9895066353 client: fix to commit 6b1a073 (don't try to run OpenCL jobs on non-OpenCL GPUs)
For unknown reasons, testing opencl_device_ids[[i] works only for debug builds, so add a new array bool have_opencls[] to COPROC struct in which we record which devices are openCL-capable before we clear the ati_opencls and nvidia_opencls vectors.
2014-05-14 03:40:58 -07:00
David Anderson d6da81b862 client: fix bugs with CPU throttling and GPU apps
Various bad things could happen when CPU throttling was used together w/ GPU apps.
Examples:
- on a multi-GPU system, several GPU tasks are assigned to the same GPU
- a suspended GPU task remains in memory (tying up its GPU resources)
while other tasks try to use the GPU.

The problem was that parts of the code assumed that suspended
GPU processes don't exist - i.e. that when a GPU task is suspended
it's always removed from memory.
This isn't true in the presence of CPU throttling.

So I made the following changes:
- When assigning GPUs to tasks, treat suspended tasks like running tasks
  (i.e. reserve their GPUs)
- At the end of the CPU-scheduling logic, if there are any GPU tasks
  that are suspended and not scheduled, remove them from memory,
  and trigger a reschedule so we can reallocate their GPUs.

Also, a cosmetic change: in the resource usage string shown in the GUI,
include "(device X)" even if the task is suspended (i.e. because of throttling).

Also: zero out COPROC::opencl_device_indexes[] so we don't write
a garbage number to init_data.xml for non-OpenCL jobs
2013-11-29 11:44:09 -08:00
David Anderson feb2f1971d scheduler: fix bug that prevented Intel GPU work from being sent to anonymous platform clients 2013-11-21 22:31:15 -08:00
David Anderson c1ee47216b Move OpenCL-related code to a separate file 2013-08-25 14:13:14 -07:00
Charlie Fenton eb15b04d4a client: implement support for OpenCL detection of CPUs
Notes:
- The same CPU can have a different cpu_opencl_prop for each of multiple OpenCL platforms.  We send them all to the project server because:
   - Different OpenCL platforms report different values for the same CPU.
   - Some OpenCL CPU apps may work better with certain OpenCL platforms.
- OpenCL has only 64 bits for global_mem_size, so it can report a max of only 4GB; get the CPU RAM size from gstate.hostinfo.m_nbytes.
2013-08-22 05:06:54 -07:00
Charlie Fenton 631e236b08 client: tweaks to code for detecting GPUs via a child process.
Added safety features requested by Rom Walton:
* Change COPROC_ATI::get_available_ram and  COPROC_NVIDIA::get_available_ram to static routines to prevent calling them without first loading CAL or CUDA libraries.
* Add tests for NULL library calls in these routines.
* Add comments warning about need to call from a separate child process on dual-GPU laptops, proper library initialization, etc.
2013-06-27 02:36:20 -07:00
Charlie Fenton 4d74c5abbd client: tweaks to code for detecting GPUs via a child process and change sprintf calls to safer snprintf. 2013-06-26 05:00:25 -07:00
Charlie Fenton 737ab61bce client: tweaks to code for detecting GPUs via a child process and change sprintf calls to safer snprintf. 2013-06-26 02:24:36 -07:00
Charlie Fenton e2b2370e9d client: optionally detect GPUs via a child process, for dual_GPU laptops.
Some dual-GPU laptops (e.g., Macbook Pro) don't power down the more powerful GPU until  all applications which used them exit.  To save battery life, the client launches a second instance of the client as a child process to detect and get info about the GPUs.
The child process writes the info to a temp file which our main client then reads.
This option is enabled at compile time by defining USE_CHILD_PROCESS_TO_DETECT_GPUS as non-zero in gpu_detect.cpp
2013-06-25 04:31:34 -07:00
David Anderson c6d79d1172 client: fix bug that could cause client to never contact project
if a project sends us <no_rsc_apps> flags for all processor types,
then by default the client will never do a scheduler RPC to that project again.
This could happen because of a transient condition in the project,
e.g. it deprecates all its app versions for a while.

To avoid this situation, the client now checks whether the no_rsc_apps flags
are set for all processor types.
If they are, it clears them all.
This will cause work fetch to use backoff,
and the client will occasionally contact the project.
2013-05-17 10:25:03 -07:00
Eric J Korpela 24353261c4 Fixed problems with FCGI compiles introduced in recent checkins. 2013-04-25 12:46:22 -07:00
David Anderson 35390ef974 - client: add support for CPU OpenCL apps.
Add OPENCL_DEVICE_PROP cpu_opencl_prop to HOST_INFO;
    this store info about the host's ability to run CPU OpenCL apps.
    Detect this, and report it in scheduler requests.
2013-04-16 22:42:29 -07:00
Eric J Korpela 61609281c1 - added opencl_driver_revision to OPENCL_DEVICE_PROP and PLAN_CLASS_SPEC. This
was necessary because ATI is releasing OpenCL drivers that don't work.
2013-04-10 18:20:22 -07:00
David Anderson b93e80c6f5 - client: code cleanup. Some variable/function/constant names
contained "debt" when they actually refer to REC.
    Change these names to use "rec".
2013-03-24 11:22:01 -07:00
David Anderson 13925d7c7b - compile fixes. Fixes #1219 2013-03-04 17:39:24 +01:00
Charlie Fenton 60f750e395 client: COPROC_NVIDIA, COPROC_ATI and COPROC_INTEL constructors must initialize the entire struct, not just the GPU type field 2013-03-04 17:01:36 +01:00
Charlie Fenton bc69fe301d Restore changes lost due to GIT confusion 2013-03-04 17:01:36 +01:00
Charlie Fenton 6d6403545a client: clean up redundant and confusing GPU descriptions 2013-03-04 16:42:16 +01:00
Oliver Bock 508b9b572b Merge branch 'master' of ssh://boinc.berkeley.edu/boinc
Conflicts:
	checkin_notes
	client/acct_mgr.cpp
	client/cs_statefile.cpp
	client/gpu_opencl.cpp
	lib/coproc.cpp

Additional changes:
	client/Makefile.am

Dropped changes:
	client/cs_scheduler.cpp (516eff6)
	sched/sched_send.cpp (2dd8288)
2013-03-04 16:35:08 +01:00
Rom Walton 516eff60b0 - client: Hook up the XML portion of the Intel GPU detection code so
the server scheduler knows about it.
    - client: Print out the peak flops for the Intel GPU, the regular
        OpenCL descriptions do not show peak flops.
2013-03-04 15:30:03 +01:00
Charlie Fenton ce87ec9848 OpenCL: First pass at adding support for Intel Ivy Bridge GPUs 2013-03-04 15:23:39 +01:00
Charlie Fenton 58d41b142f OpenCL: Add definition of GPU_TYPE_INTEL to match definitions of GPU_TYPE_ATI and GPU_TYPE_NVIDIA 2013-03-04 15:23:38 +01:00
David Anderson 777f1f11e8 - client: change work fetch policy to avoid starving GPUs in situations where GPU exclusions are used. - client: fix bug in round-robin simulation when GPU exclusions are used.
Note: this fixes a major problem (starvation)
    with project-level GPU exclusion.
    However, project-level GPU exclusion interferes with most of
    the client's scheduling policies.
    E.g., round-robin simulation doesn't take GPU exclusion into account,
    and the resulting completion estimates and device shortfalls
    can be wrong by an order of magnitude.

    The only way I can see to fix this would be to model each
    GPU instance as a separate resource,
    and to associate each job with a particular GPU instance.
    This would be a sweeping change in both client and server.
2013-03-01 15:31:41 +01:00
Charlie Fenton eb40422c34 client: If OpenCL detection gets an error for a platform or device, finish detection of the remaining platforms and / or devices
svn path=/trunk/boinc/; revision=26047
2012-08-20 10:04:19 +00:00
David Anderson d085bc4ee6 - Client/manager: there was a bug because some code was writing
"cpu" in XML, and other code was looking for "CPU".
    To fix this and prevent similar problems,
    processor type names are now encapsulated in proc_type_name_xml().
    Code should use this rather than having hard-wired names.
    Redefine: GPU_TYPE_* as macros that call proc_type_name_xml().


svn path=/trunk/boinc/; revision=25996
2012-08-08 23:09:43 +00:00
David Anderson 2e71ade9c5 Win compile fixes
svn path=/trunk/boinc/; revision=25937
2012-08-01 21:02:54 +00:00
David Anderson 68f9880615 - client: remove "device" entry from CUDA_DEVICE_PROP,
and change types of mem-size fields from int to double.
    These fields are size_t in NVIDIA's version of this;
    however, cuDeviceGetAttribute() returns them as int,
    so I don't see where this makes any difference.
- client: fix bug in handling of <no_rsc_apps> element.
- scheduler: message tweaks.
    Note: [foo] means that the message is enabled by <debug_foo>.



svn path=/trunk/boinc/; revision=25849
2012-07-05 20:24:17 +00:00
David Anderson 8d8662adb2 - more code cleanup
svn path=/trunk/boinc/; revision=25838
2012-07-02 19:31:34 +00:00
David Anderson 8c71f6d59a - scheduler: add support for Intel GPUs, and restructure things
to make it easier to add other GPU types in the future


svn path=/trunk/boinc/; revision=25792
2012-06-25 23:09:45 +00:00
David Anderson fd0983b991 - web: server status page should show elapsed time, not CPU time
svn path=/trunk/boinc/; revision=25785
2012-06-22 07:35:54 +00:00
David Anderson b050deecf7 - client: compile fixes
svn path=/trunk/boinc/; revision=25773
2012-06-18 20:41:37 +00:00
David Anderson 5e61c29cc3 - client: split GPU detection code into separate files
svn path=/trunk/boinc/; revision=25771
2012-06-18 20:12:30 +00:00
David Anderson 0cc0370f02 - client, GUI RPC: detect and export the PCI bus, device, and domain #s.
- scheduler: increase #GPU limit from 8 to 64


svn path=/trunk/boinc/; revision=25761
2012-06-15 20:49:11 +00:00
David Anderson 759c23ed27 - server: create a harness for testing validator code.
If you link your functions (init_result(), compare_results(),
    cleanup_result()) with validate_test.cpp,
    you'll get a program that you can run as
        validate_test file1 file2
    and it will compare the two files
    (this works only for validators that expect 1 file per result).

    I added a makefile, sched/makefile_validator_test,
    that you can use for this.
- server: shuffle code so that the above doesn't need to
    link MySQL libraries
- client: if we fetch a master file and it contains no scheduler URLs,
    show a message of class INTERNAL_ERROR
- client/scheduler: make CUDA_DEVICE_PROP.totalGlobalMem a double,
    and remove dtotalGlobalMem.
    Although NVIDIA reports RAM size as a size_t,
    there's no reason to store it as an integer after that.


svn path=/trunk/boinc/; revision=25542
2012-04-10 00:32:35 +00:00
David Anderson 36529da919 - client: change some unsigned int to size_t in our versions
of NVIDIA APIs.  This apparently caused crashes
    (in app, not client, which I don't understand) for Einstein@Home.
    From Steffen Moller.


svn path=/trunk/boinc/; revision=25527
2012-04-02 21:31:02 +00:00
David Anderson 82d64e9403 - msg tweak and fix compile warnings
svn path=/trunk/boinc/; revision=25408
2012-03-12 23:34:41 +00:00
Charlie Fenton 6688c21c11 client: On Mac only, get ATI RAM sizes from OpenGL
svn path=/trunk/boinc/; revision=25358
2012-03-01 02:35:45 +00:00
Charlie Fenton 7ee9f28e54 client: Always use GPU model name from OpenCL if available for ATI / AMD GPUs
svn path=/trunk/boinc/; revision=25275
2012-02-17 00:10:36 +00:00
Charlie Fenton 65b5930423 client: don't defer scheduling a task based on insufficient GPU RAM
svn path=/trunk/boinc/; revision=25166
2012-01-30 10:09:44 +00:00