Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opencl plugins] If no out of order queue support, then temporary queues get created that never get deleted. #11156

Open
coldav opened this issue Sep 12, 2023 · 10 comments
Labels
bug Something isn't working Stale

Comments

@coldav
Copy link
Contributor

coldav commented Sep 12, 2023

Describe the bug
The OpenCL plugin support for dpc++ tries to create a queue with the property CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE.
If this fails (such as in the oneAPI Construction Kit), then it may create temporary queues to enforce dependencies.
It never deletes these temporary queues.

To Reproduce

  • Build oneAPI Construction Kit
  • Set it to be the OpenCL plugin for dpc++
  • Run a simple test such as vector-add in oneAPI-Samples with SYCL_PI_TRACE=-1
  • Compare number of queue creations with queue releases

In detail:

Use the following script, it assumes that we already have LD_LIBRARY_PATH and PATH set up correctly for dpc++

Please describe the steps to reproduce the behavior:

git clone git@github.com:codeplaysoftware/oneapi-construction-kit.git
cd oneapi-construction-kit
git checkout 142c2d4964e44480871bd68ba8d454e6fc1051e2
export ONEAPI_CON_KIT_INSTALL_DIR=$PWD/build/install

cmake -Bbuild -DCA_LLVM_INSTALL_DIR=$LLVM_INSTALL_DIR -DCA_ENABLE_API=cl -DCMAKE_INSTALL_PREFIX=$ONEAPI_CON_KIT_INSTALL_DIR $PWD
cd build
make -j8 install
cd ../..

git clone git@github.com:oneapi-src/oneAPI-samples.git 
cd oneAPI-samples/DirectProgramming/C++SYCL/DenseLinearAlgebra/vector-add
export OCL_ICD_FILENAMES=$ONEAPI_CON_KIT_INSTALL_DIR/lib/libCL.so
export ONEAPI_DEVICE_SELECTOR="*:cpu"
export SYCL_CONFIG_FILE_NAME=null.cfg
clang++ -fsycl src/vector-add-buffers.cpp -o vector-add-buffers
SYCL_PI_TRACE=-1 ./vector-add-buffers 2>&1 | tee /tmp/run.txt
grep Queue /tmp/run.txt

This shows four piextQueueCreate and one piQueueRelease

Environment (please complete the following information):

  • OS: Ubuntu20
  • Target device and vendor: Codeplay oneAPI Construction Kit
  • DPC++ version: 68dc47d (also tried with 2023.2.0)
  • Dependencies version:oneApi Construction Kit (see above)

Additional context
Add any other context about the problem here.

@coldav coldav added the bug Something isn't working label Sep 12, 2023
coldav added a commit to coldav/oneapi-construction-kit that referenced this issue Oct 2, 2023
This is due to an issue with dpc++ where it can leave temporary queues
unreleased. This can then have the effect on the closing down of the
device.

This is resolved by registering and deregistering the queues on creation
and deletion and then releasing any queues which still have an external
reference when `atexit` is called.

This should be reviewed when intel/llvm#11156
is fixed.
coldav added a commit to coldav/oneapi-construction-kit that referenced this issue Oct 2, 2023
This is due to an issue with dpc++ where it can leave temporary queues
unreleased. This can then have the effect on the closing down of the
device.

This is resolved by registering and deregistering the queues on creation
and deletion and then releasing any queues which still have an external
reference when `atexit` is called.

This should be reviewed when intel/llvm#11156
is fixed.
coldav added a commit to coldav/oneapi-construction-kit that referenced this issue Oct 5, 2023
This is due to an issue with dpc++ where it can leave temporary queues
unreleased. This can then have the effect on the closing down of the
device.

This is resolved by registering and deregistering the queues on creation
and deletion and then releasing any queues which still have an external
reference when `atexit` is called.

This should be reviewed when intel/llvm#11156
is fixed.
coldav added a commit to coldav/oneapi-construction-kit that referenced this issue Oct 5, 2023
This is due to an issue with dpc++ where it can leave temporary queues
unreleased. This can then have the effect on the closing down of the
device.

This is resolved by registering and deregistering the queues on creation
and deletion and then releasing any queues which still have an external
reference when `atexit` is called.

This should be reviewed when intel/llvm#11156
is fixed.
coldav added a commit to coldav/oneapi-construction-kit that referenced this issue Oct 9, 2023
This is due to an issue with dpc++ where it can leave temporary queues
unreleased. This can then have the effect on the closing down of the
device.

This is resolved by registering and deregistering the queues on creation
and deletion and then releasing any queues which still have an external
reference when `atexit` is called. This required an additional device
mutex to avoid threading issues.

This should be reviewed when intel/llvm#11156
is fixed.
@coldav
Copy link
Contributor Author

coldav commented Oct 24, 2023

This is a major issue for us, is there any idea when it might be addressed?

@bader
Copy link
Contributor

bader commented Oct 24, 2023

@coldav, do you observe this issue with other back-ends (e.g. Level Zero)?
@kbenzie, could you take a look, please?

@coldav
Copy link
Contributor Author

coldav commented Oct 24, 2023

I don't have level zero as a route to testing the oneapi construction kit, and I'm not sure how to repeat this any other way.

@bader
Copy link
Contributor

bader commented Oct 24, 2023

Potentially this might be a memory leak in the runtime as well.
Tagging @bso-intel and @intel/llvm-reviewers-runtime for awareness.

@coldav
Copy link
Contributor Author

coldav commented Oct 24, 2023

To be clear, I haven't been able to show this since the UR change over, since it didn't have the out of order aspect written, but the thought seemed to be that the failure was elsewhere. (If we do claim we support out of order, then I don't believe the temporary queues are created).

@kbenzie
Copy link
Contributor

kbenzie commented Oct 25, 2023

@kbenzie, could you take a look, please?

I've been chatting with @coldav about this internally, we don't think there's much to go wrong in the OpenCL adapter around this. Seems more likely to me its an interaction in the SYCL RT causing the issue.

@coldav
Copy link
Contributor Author

coldav commented Oct 30, 2023

I can confirm I still see this even with the yet to be merged https://github.com/oneapi-src/unified-runtime/pull/975/files (which means that a lot of tests pass).

@coldav
Copy link
Contributor Author

coldav commented Dec 4, 2023

I'm guessing this won't be looked at before the final oneapi release?

hvdijk added a commit to hvdijk/oneapi-construction-kit that referenced this issue Dec 11, 2023
Out-of-order queues impose additional restrictions on user code and
grant additional freedom to the implementation, including the freedom to
continue to execute in order. Therefore, we can trivially support
out-of-order queues by removing the code that is specifically there to
reject them, and continuing to run them in order.

DPC++ aims to support OpenCL implementations that lack out of order
execution support, but it does so by creating temporary queues that
appear to not get released (intel/llvm#11156). By claiming out of order
execution support, we avoid leaks, and the specification for
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE only permits out-of-order
execution, it does not and in general cannot mandate it, so this should
be a valid minimal implementation.

In order to ease testing, this is optional for now, but enabled by
default. -DCA_ENABLE_OUT_OF_ORDER_EXEC_MODE=OFF can be used to restore
the prior behavior.

* CMakeLists.txt: add CA_ENABLE_OUT_OF_ORDER_EXEC_MODE option.
* config.h.in: report the CA_ENABLE_OUT_OF_ORDER_EXEC_MODE option.
* _cl_command_queue::create: tolerate
  CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE and pass it along to
  _cl_command_queue, provided CA_ENABLE_OUT_OF_ORDER_EXEC_MODE is
  enabled.
* cl::EnqueueWaitForEvents: tolerate
  CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE and ignore it, provided
  CA_ENABLE_OUT_OF_ORDER_EXEC_MODE is enabled.

Tests for unsupported properties are updated to test CL_QUEUE_ON_DEVICE
rather than CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE.
Copy link
Contributor

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

Copy link
Contributor

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

@github-actions github-actions bot added the Stale label Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

4 participants