You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a follow-up issue of fjarri/reikna#73 with a potential idea on how to approach the problem.
The issue was to get a more detailed error message for newbies like me if the program uses types internally within the kernel source code which are not supported by the GPU.
The main idea is heuristic: To deliver better error messages in many use cases (= mostly one existing context) and to not change the behaviour for the other use cases. If the more detailed error message is given, it should be correct, but the more detailed error message is not guaranteed to appear in every possible problematic situation. I tried to recognize a missing 64-bit extension as well as a missing half-float extension.
Imagine to use the function check_extensions() here. I simulated a dangerous type by setting dtype = "float2". It checks if every context object resp. every of their associated devices does not support 64-bit resp. 16-bit floating point numbers. To "find" the contexts, I iterated over all variables via the Python equivalent of Java Reflection: globals() and locals().
As a basis for discussion, I post a rough sketch within this issue. Do you have critiques / tips / ideas for improvements? E. g.:
Is the heuristic main idea o. k. for you or not appropriate? Otherwise, we could e. g. not check the missing extensions "for all" devices, but "there is a device with missing extension", reducing the error message to a warning (contentless if the user chooses an appropriate a device having the needed extensions for the source code later).
You mentioned within the reikna issue that "it is a little tricky to determine if the device supports double precision in OpenCL". What risks did you have in mind regarding this (for me seemingly relatively clear) part of the OpenCL API standard text? Since we do not have to give the error message in every case, we can specify conditions / restrictions. Do we have to check if OpenCL 3.0 is used? And, out of pure interest: Is it theoretically possible that the extensions are not there, but that the types are supported by a software-side definition by the user (leading to a false positive in this potentially improbable cause)?
I have not checked up to now if dtype in ctype() is float2 or complex128 or so -- it's too late in the evening now :-D.
Did I forget cases within the if conditions?
#!/usr/bin/env pythonimportnumpyasnpimportpyopenclasclimportosos.environ["PYOPENCL_COMPILER_OUTPUT"] ="1"rng=np.random.default_rng()
a_np=rng.random(50000, dtype=np.float32)
b_np=rng.random(50000, dtype=np.float32)
ctx=cl.create_some_context(interactive=False)
queue=cl.CommandQueue(ctx)
dtype="float2"# or complex64 or so???# Returns the error message string for a missing extension needed internally.defextension_error_message(needed_extension: str, goal: str,
type_name: str, var_names: dict) ->str:
iflen(var_names) ==0:
checked_variables="(none)"else:
checked_variables=", ".join(str(var_name) forvar_nameinvar_names)
return"For every context created up to now the OpenCL extension " \
+needed_extension+" (concerning the use of "+goal+ \
") is missing, but internally, the type " \
+type_name+" shall be used, which needs this extension.\n" \
"The contexts which where checked are: "+checked_variables# Checks if the internal built kernel source code uses types which are# not supported by any device (regarding 64-bit and 16-bit floating point# numbers).defcheck_extensions() ->None:
var_global= [valueforname, valueinglobals().items()
ifisinstance(value, cl.Context)]
var_global_names= [nameforname, valueinglobals().items()
ifisinstance(value, cl.Context)]
var_local= [valueforname, valueinlocals().items()
ifisinstance(value, cl.Context)]
var_local_names= [nameforname, valueinlocals().items()
ifisinstance(value, cl.Context)]
context_variables=set(var_global).union(var_local)
context_var_names=set(var_global_names).union(var_local_names)
fp64ok=Truefp16ok=Trueforvariableincontext_variables:
devices=variable.get_info(cl.context_info.DEVICES)
fordeviceindevices:
extensions=device.get_info(cl.device_info.EXTENSIONS)
if ((dtype=="float2") | (dtype=="double2")) \
and"cl_khr_fp64"notinextensions:
fp64ok=Falsebreakelif ((dtype=="float16") | (dtype=="double16")) \
and"cl_khr_fp16 "notinextensions:
fp16ok=Falsebreakiffp64ok==False:
needed_extension="cl_khr_fp64"goal="64-bit floating point numbers"raiseValueError(extension_error_message(needed_extension,
goal, dtype,
context_var_names))
iffp16ok==False:
needed_extension="cl_khr_fp16"goal="16-bit floating point numbers"raiseValueError(extension_error_message(needed_extension,
goal, dtype,
context_var_names))
check_extensions()
mf=cl.mem_flagsa_g=cl.Buffer(ctx, mf.READ_ONLY|mf.COPY_HOST_PTR, hostbuf=a_np)
b_g=cl.Buffer(ctx, mf.READ_ONLY|mf.COPY_HOST_PTR, hostbuf=b_np)
prg=cl.Program(ctx, """ #if defined(cl_khr_fp64) #pragma OPENCL EXTENSION cl_khr_fp64: enable #elif defined(cl_amd_fp64) #pragma OPENCL EXTENSION cl_amd_fp64: enable #else #error "Double precision not supported.") #endif__kernel void sum( __global const float *a_g, __global const float *b_g, __global float *res_g){ int gid = get_global_id(0); if (gid < 16) printf("hallkdiekdikkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkko "); res_g[gid] = a_g[gid] + b_g[gid];}""").build()
res_g=cl.Buffer(ctx, mf.WRITE_ONLY, a_np.nbytes)
knl=prg.sum# Use this Kernel object for repeated callsknl(queue, a_np.shape, None, a_g, b_g, res_g)
res_np=np.empty_like(a_np)
cl.enqueue_copy(queue, res_np, res_g)
# Check on CPU with Numpy:error_np=res_np- (a_np+b_np)
print(f"Error:\n{error_np}")
print(f"Norm: {np.linalg.norm(error_np):.16e}")
assertnp.allclose(res_np, a_np+b_np)
The output for my computer (having one Intel GPU as sole OpenCL device) is currently:
Traceback (most recent call last):
File "C:\Users\johndoe\Desktop\opencldemo.py", line 85, in <module>
check_extensions()
File "C:\Users\johndoe\Desktop\opencldemo.py", line 73, in check_extensions
raise ValueError(extension_error_message(needed_extension,
ValueError: For every context created up to now the OpenCL extension cl_khr_fp64 (concerning the use of 64-bit floating point numbers) is missing, but internally, the type float2 shall be used, which needs this extension.
The contexts which where checked are: ctx
The text was updated successfully, but these errors were encountered:
This is a follow-up issue of fjarri/reikna#73 with a potential idea on how to approach the problem.
The issue was to get a more detailed error message for newbies like me if the program uses types internally within the kernel source code which are not supported by the GPU.
The main idea is heuristic: To deliver better error messages in many use cases (= mostly one existing context) and to not change the behaviour for the other use cases. If the more detailed error message is given, it should be correct, but the more detailed error message is not guaranteed to appear in every possible problematic situation. I tried to recognize a missing 64-bit extension as well as a missing half-float extension.
Imagine to use the function check_extensions() here. I simulated a dangerous type by setting dtype = "float2". It checks if every context object resp. every of their associated devices does not support 64-bit resp. 16-bit floating point numbers. To "find" the contexts, I iterated over all variables via the Python equivalent of Java Reflection: globals() and locals().
As a basis for discussion, I post a rough sketch within this issue. Do you have critiques / tips / ideas for improvements? E. g.:
The output for my computer (having one Intel GPU as sole OpenCL device) is currently:
The text was updated successfully, but these errors were encountered: