Checking on supported extensions. #19

Friedrich2 · 2024-12-20T00:45:37Z

This is a follow-up issue of fjarri/reikna#73 with a potential idea on how to approach the problem.
The issue was to get a more detailed error message for newbies like me if the program uses types internally within the kernel source code which are not supported by the GPU.

The main idea is heuristic: To deliver better error messages in many use cases (= mostly one existing context) and to not change the behaviour for the other use cases. If the more detailed error message is given, it should be correct, but the more detailed error message is not guaranteed to appear in every possible problematic situation. I tried to recognize a missing 64-bit extension as well as a missing half-float extension.

Imagine to use the function check_extensions() here. I simulated a dangerous type by setting dtype = "float2". It checks if every context object resp. every of their associated devices does not support 64-bit resp. 16-bit floating point numbers. To "find" the contexts, I iterated over all variables via the Python equivalent of Java Reflection: globals() and locals().
As a basis for discussion, I post a rough sketch within this issue. Do you have critiques / tips / ideas for improvements? E. g.:

Is the heuristic main idea o. k. for you or not appropriate? Otherwise, we could e. g. not check the missing extensions "for all" devices, but "there is a device with missing extension", reducing the error message to a warning (contentless if the user chooses an appropriate a device having the needed extensions for the source code later).
You mentioned within the reikna issue that "it is a little tricky to determine if the device supports double precision in OpenCL". What risks did you have in mind regarding this (for me seemingly relatively clear) part of the OpenCL API standard text? Since we do not have to give the error message in every case, we can specify conditions / restrictions. Do we have to check if OpenCL 3.0 is used? And, out of pure interest: Is it theoretically possible that the extensions are not there, but that the types are supported by a software-side definition by the user (leading to a false positive in this potentially improbable cause)?
I have not checked up to now if dtype in ctype() is float2 or complex128 or so -- it's too late in the evening now :-D.
Did I forget cases within the if conditions?

#!/usr/bin/env python

import numpy as np

import pyopencl as cl
import os

os.environ["PYOPENCL_COMPILER_OUTPUT"] = "1"


rng = np.random.default_rng()
a_np = rng.random(50000, dtype=np.float32)
b_np = rng.random(50000, dtype=np.float32)

ctx = cl.create_some_context(interactive=False)
queue = cl.CommandQueue(ctx)


dtype = "float2"   # or complex64 or so???



# Returns the error message string for a missing extension needed internally.
def extension_error_message(needed_extension: str, goal: str,
                            type_name: str, var_names: dict) -> str:
    if len(var_names) == 0:
        checked_variables = "(none)"
    else:
        checked_variables = ", ".join(str(var_name) for var_name in var_names)

    return "For every context created up to now the OpenCL extension " \
        + needed_extension + " (concerning the use of " + goal + \
        ") is missing, but internally, the type " \
        + type_name + " shall be used, which needs this extension.\n" \
        "The contexts which where checked are: " + checked_variables


# Checks if the internal built kernel source code uses types which are
# not supported by any device (regarding 64-bit and 16-bit floating point
# numbers).
def check_extensions() -> None:
    var_global = [value for name, value in globals().items()
                  if isinstance(value, cl.Context)]
    var_global_names = [name for name, value in globals().items()
                        if isinstance(value, cl.Context)]
    var_local = [value for name, value in locals().items()
                 if isinstance(value, cl.Context)]
    var_local_names = [name for name, value in locals().items()
                       if isinstance(value, cl.Context)]
    context_variables = set(var_global).union(var_local)
    context_var_names = set(var_global_names).union(var_local_names)

    fp64ok = True
    fp16ok = True

    for variable in context_variables:
        devices = variable.get_info(cl.context_info.DEVICES)
        for device in devices:
            extensions = device.get_info(cl.device_info.EXTENSIONS)
            if ((dtype == "float2") | (dtype == "double2")) \
                    and "cl_khr_fp64" not in extensions:
                fp64ok = False
                break

            elif ((dtype == "float16") | (dtype == "double16")) \
                    and "cl_khr_fp16 " not in extensions:
                fp16ok = False
                break

    if fp64ok == False:
        needed_extension = "cl_khr_fp64"
        goal = "64-bit floating point numbers"
        raise ValueError(extension_error_message(needed_extension,
                                                 goal, dtype,
                                                 context_var_names))

    if fp16ok == False:
        needed_extension = "cl_khr_fp16"
        goal = "16-bit floating point numbers"
        raise ValueError(extension_error_message(needed_extension,
                                                 goal, dtype,
                                                 context_var_names))


check_extensions()

mf = cl.mem_flags
a_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a_np)
b_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b_np)

prg = cl.Program(ctx, """
   #if defined(cl_khr_fp64)
    #pragma OPENCL EXTENSION cl_khr_fp64: enable
    #elif defined(cl_amd_fp64)
    #pragma OPENCL EXTENSION cl_amd_fp64: enable
    #else
        #error "Double precision not supported.")
    #endif
__kernel void sum(
    __global const float *a_g, __global const float *b_g, __global float *res_g)
{
  int gid = get_global_id(0);
  if (gid < 16)
      printf("hallkdiekdikkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkko     ");
  res_g[gid] = a_g[gid] + b_g[gid];
}
""").build()

res_g = cl.Buffer(ctx, mf.WRITE_ONLY, a_np.nbytes)
knl = prg.sum  # Use this Kernel object for repeated calls
knl(queue, a_np.shape, None, a_g, b_g, res_g)

res_np = np.empty_like(a_np)
cl.enqueue_copy(queue, res_np, res_g)

# Check on CPU with Numpy:
error_np = res_np - (a_np + b_np)
print(f"Error:\n{error_np}")
print(f"Norm: {np.linalg.norm(error_np):.16e}")
assert np.allclose(res_np, a_np + b_np)

The output for my computer (having one Intel GPU as sole OpenCL device) is currently:


Traceback (most recent call last):
  File "C:\Users\johndoe\Desktop\opencldemo.py", line 85, in <module>
    check_extensions()
  File "C:\Users\johndoe\Desktop\opencldemo.py", line 73, in check_extensions
    raise ValueError(extension_error_message(needed_extension,
ValueError: For every context created up to now the OpenCL extension cl_khr_fp64 (concerning the use of 64-bit floating point numbers) is missing, but internally, the type float2 shall be used, which needs this extension.
The contexts which where checked are: ctx

The text was updated successfully, but these errors were encountered:

Friedrich2 mentioned this issue Dec 20, 2024

Type problem doing an FFT using an Intel graphic card. fjarri/reikna#73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checking on supported extensions. #19

Checking on supported extensions. #19

Friedrich2 commented Dec 20, 2024 •

edited

Loading

Checking on supported extensions. #19

Checking on supported extensions. #19

Comments

Friedrich2 commented Dec 20, 2024 • edited Loading

Friedrich2 commented Dec 20, 2024 •

edited

Loading