Skip to content

Use experimental complex extension for all complex arithmetic #2069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ndgrigorian
Copy link
Collaborator

@ndgrigorian ndgrigorian commented May 1, 2025

With current nightly compiler, dpctl will fail to link when built for HIP backend due to undefined symbols (i.e., __muldc3).

Since compiler does not support arithmetic with std::complex on HIP backend, this PR takes the steps to refactor complex arithmetic throughout tensor library

  • libtensor/include/kernels/elementwise_functions/sycl_complex.hpp is now moved to libtensor/include/utils/sycl_complex.hpp and refactored to defined sycl_complex_t<T>, aliasing the complex type defined in the extension
  • sycl_complex.hpp still indirectly sets SYCL_EXT_ONEAPI_COMPLEX and includes the header for experimental extension, but no longer defines exprm_ns namespace alias. This is now left to individual files.
  • All uses of std::real, std::imag, etc. have been removed
  • sycl_utils.hpp now defines new custom functors a la Maximum and Minimum, Plus and Multiplies. These structs are used by accumulation operations (sum, cumulative_sum, GEMM, etc.). They perform casting of std::complex inputs to SYCL equivalent, perform operations, and then return as std::complex
  • Several element-wise functions are updated to properly perform operations in SYCL complex type
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

@ndgrigorian ndgrigorian force-pushed the use-sycl-complex-conversions branch 2 times, most recently from cb75d56 to 2dd0a72 Compare May 1, 2025 01:36
Copy link

github-actions bot commented May 1, 2025

Copy link

github-actions bot commented May 1, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_183 ran successfully.
Passed: 1108
Failed: 4
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the use-sycl-complex-conversions branch from 2dd0a72 to 52bb73e Compare May 1, 2025 02:38
Copy link

github-actions bot commented May 1, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_183 ran successfully.
Passed: 1109
Failed: 3
Skipped: 119

@coveralls
Copy link
Collaborator

Coverage Status

coverage: 86.419%. remained the same
when pulling 52bb73e on use-sycl-complex-conversions
into 29eeac7 on master.

Comment on lines +29 to +36
#ifndef SYCL_EXT_ONEAPI_COMPLEX
#define SYCL_EXT_ONEAPI_COMPLEX 1
#endif
#if __has_include(<sycl/ext/oneapi/experimental/sycl_complex.hpp>)
#include <sycl/ext/oneapi/experimental/sycl_complex.hpp>
#else
#include <sycl/ext/oneapi/experimental/complex/complex.hpp>
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include #include "sycl_complex.hpp" here instead or it was intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It was originally intentional, because instead of sycl_complex.hpp, I had made the include part of sycl_utils.hpp, but decided it was too complicated. I'll change it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think sycl_utils.hpp is actually using sycl_complex.hpp anymore, so it should be removed there.

@@ -133,6 +161,20 @@ template <typename T> T logaddexp(T x, T y)
}
}

template <typename T> T plus_complex(const T &x1, const T &x2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need here something like below?

Suggested change
template <typename T> T plus_complex(const T &x1, const T &x2)
template <typename T, typename = std::enable_if_t<is_complex_v<T>> T plus_complex(const T &x1, const T &x2)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems applicable to many declaration here expecting complex template type only

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a good idea, good suggestion

{
T operator()(const T &x, const T &y) const
{
if constexpr (detail::IsComplex<T>::value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reuse is_complex_v here:

Suggested change
if constexpr (detail::IsComplex<T>::value) {
if constexpr (dpctl::tensor::type_utils::is_complex_v<T>) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_utils.hpp isn't included in this header. Maybe it could be though.

@@ -133,6 +161,20 @@ template <typename T> T logaddexp(T x, T y)
}
}

template <typename T> T plus_complex(const T &x1, const T &x2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use inline here explicitly?
As I can see compiler is actively uses inline __attribute__((__visibility__("hidden"), __always_inline__)) in $CONDA_PREFIX/include/sycl/ext/oneapi/experimental/complex/detail/complex.hpp header and I wonder if we should do the same or similar. Or it will be well optimized and inlined implicitly anyway?

Copy link
Collaborator Author

@ndgrigorian ndgrigorian May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I'm not sure it would do much good: function is only really used in sycl_utils.hpp for Plus struct, which is just our own implementation of sycl::plus.

It's arguable whether this function is needed at all. It only really avoids (directly) including sycl_complex.hpp in sycl_utils.hpp.

using sycl_complexT = exprm_ns::complex<realT>;
sycl_complexT z1 = sycl_complexT(x1);
sycl_complexT z2 = sycl_complexT(x2);
realT real1 = exprm_ns::real(z1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it can be simplified by:

Suggested change
realT real1 = exprm_ns::real(z1);
realT real1 = z1.real();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it is not a bad idea to implement the wrappers for real and imag functions also, since anyway it will be declared with theconstexpr:

template<typename Tp>
constexpr Tp real_complex(const std::complex<Tp> &z) {
    return sycl_complex_t<Tp>(z).real();
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can stop using function calls to real and imag and just use the methods everywhere, which I think is pretty sensible. The methods also don't seem to trip up HIP compiler.

Then again, the std::real and std::imag functions actually didn't either, I just changed them for consistency. I vote to just use methods everywhere, they look cleaner anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see your point though, the wrapper would be good anywhere we don't want actually use the result of sycl_complex_t<Tp>

namespace sycl_utils
{

template <typename T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename T>
template <typename T, typename std::enable_if_t<std::is_floating_point_v<T>>

{

template <typename T>
using sycl_complex_t = sycl::ext::oneapi::experimental::complex<T>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like we have to use sycl_complex_t type everywhere to have it working with HIP backend.
In that case would it be worse idea to cast usm_ndarray to sycl_complex_t type from the beginning (or at least in case of building with HIP support enabled)? It might be that sycl_complex_t = std::complex<T> by default.

It would impact all type matrix where we need to replace std::complex with sycl_complex_t and probably to update some helper functions and aliases. But that would help us avoid casting to sycl_complex_t every time we need to do some math operation.

Copy link
Collaborator Author

@ndgrigorian ndgrigorian May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but the flaw is especially with MKL and oneMath (for dpnp): they expect std::complex inputs. oneMath actually casts std::complex inputs to the SYCL complex type internally, and never expects the SYCL type in its signatures.

(see here: https://github.com/uxlfoundation/oneMath/blob/4ad4dfb5db834117248ad5f8fbded5cfc1097005/src/blas/backends/generic/generic_level3.cxx#L35)

this would mean that in calls to MKL/oneMath, dpnp would have to copy into a fresh allocation of type std::complex, and then if using AMD/CUDA backend, it would have to be copied again by oneMath.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants