Skip to content

ConcurrentOpenConcurrentCloseHandles and ConcurrentGetConcurrentPutHandles tests fail when NTHREADS == utils_get_num_cores() #1169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bratpiorka opened this issue Mar 7, 2025 · 9 comments
Labels
bug Something isn't working high priority
Milestone

Comments

@bratpiorka
Copy link
Contributor

bratpiorka commented Mar 7, 2025

ConcurrentOpenConcurrentCloseHandles and ConcurrentGetConcurrentPutHandles tests fail when NTHREADS == utils_get_num_cores()

Environment Information

Please provide a reproduction of the bug:

All 24 failed jobs in the build: https://github.com/oneapi-src/unified-memory-framework/actions/runs/15040980329?pr=1315
of the PR: #1315

Linux:

35/55 Test #35: test_provider_level_zero ......................***Exception: SegFault  0.17 sec
Running main() from /home/test-user/actions-runner/_work/unified-memory-framework/unified-memory-framework/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 46 tests from 4 test suites.
[----------] Global test environment set-up.

...

[----------] 14 tests from umfLevelZeroProviderTestSuite/umfIpcTest
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSize/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSize/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.BasicFlow/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.BasicFlow/0 (4 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetPoolByOpenedHandle/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetPoolByOpenedHandle/0 (12 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.AllocFreeAllocTest/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.AllocFreeAllocTest/0 (1 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.openInTwoIpcHandlers/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.openInTwoIpcHandlers/0 (1 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0

CRASH

Windows (locally):

      Start 21: test_ipc
21/23 Test #21: test_ipc ...............................Exit code 0xc0000409
***Exception:   0.73 sec
Running main() from C:\Users\ldorau\work\unified-memory-framework\build-2\_deps\googletest-src\googletest\src\gtest_main.cc
[==========] Running 16 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 16 tests from umfIpcTestSuite/umfIpcTest
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSize/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSize/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0
[       OK ] umfIpcTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.BasicFlow/0
[       OK ] umfIpcTestSuite/umfIpcTest.BasicFlow/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.AllocFreeAllocTest/0
[       OK ] umfIpcTestSuite/umfIpcTest.AllocFreeAllocTest/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.openInTwoIpcHandlers/0
[       OK ] umfIpcTestSuite/umfIpcTest.openInTwoIpcHandlers/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.35.32215\include\array(531) : Assertion failed: array subscript out of range

      Start 22: test_ipc_max_opened_limit
22/23 Test #22: test_ipc_max_opened_limit ..............Exit code 0xc0000409
***Exception:   0.66 sec
Running main() from C:\Users\ldorau\work\unified-memory-framework\build-2\_deps\googletest-src\googletest\src\gtest_main.cc
[==========] Running 16 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 16 tests from umfIpcTestSuite/umfIpcTest
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSize/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSize/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0
[       OK ] umfIpcTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.BasicFlow/0
[       OK ] umfIpcTestSuite/umfIpcTest.BasicFlow/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.AllocFreeAllocTest/0
[       OK ] umfIpcTestSuite/umfIpcTest.AllocFreeAllocTest/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.openInTwoIpcHandlers/0
[       OK ] umfIpcTestSuite/umfIpcTest.openInTwoIpcHandlers/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.35.32215\include\array(531) : Assertion failed: array subscript out of range

How often bug is revealed:

ALWAYS when NTHREADS >= utils_get_num_cores()

##############################################################################

Previous version of this issue: sporadic fail in Level Zero IPC tests

sporadic fail in Level Zero IPC tests

umfLevelZeroProviderTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0

Environment Information

  • UMF version (hash commit or a tag): latest
  • OS(es) version(s): Ubuntu Release

Please provide a reproduction of the bug:

https://github.com/oneapi-src/unified-memory-framework/actions/runs/13719665034/job/38374211866?pr=1147

35/55 Test #35: test_provider_level_zero ......................***Exception: SegFault  0.17 sec
Running main() from /home/test-user/actions-runner/_work/unified-memory-framework/unified-memory-framework/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 46 tests from 4 test suites.
[----------] Global test environment set-up.

...

[----------] 14 tests from umfLevelZeroProviderTestSuite/umfIpcTest
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSize/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSize/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.BasicFlow/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.BasicFlow/0 (4 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetPoolByOpenedHandle/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetPoolByOpenedHandle/0 (12 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.AllocFreeAllocTest/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.AllocFreeAllocTest/0 (1 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.openInTwoIpcHandlers/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.openInTwoIpcHandlers/0 (1 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0

CRASH

How often bug is revealed:

rare

@bratpiorka bratpiorka added the bug Something isn't working label Mar 7, 2025
@bratpiorka
Copy link
Contributor Author

@ldorau
Copy link
Contributor

ldorau commented Mar 27, 2025

@vinser52

@ldorau
Copy link
Contributor

ldorau commented Mar 27, 2025

Maybe this ASAN error has something to do with it?
https://github.com/ldorau/unified-memory-framework/actions/runs/14083824988/job/39443119450

==12506==ERROR: AddressSanitizer: use-after-poison on address 0x7fa3f2a69188 at pc 0x55fdfc47fde2 bp 0x7fa3e93feb30 sp 0x7fa3e93feb20
READ of size 8 at 0x7fa3f2a69188 thread T17
    #0 0x55fdfc47fde1 in utils_atomic_load_acquire_u64 /home/runner/work/unified-memory-framework/unified-memory-framework/src/utils/utils_concurrency.h:165
    #1 0x55fdfc4814e6 in umfMemoryTrackerAdd /home/runner/work/unified-memory-framework/unified-memory-framework/src/provider/provider_tracking.c:202
    #2 0x55fdfc48407a in trackingAlloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/provider/provider_tracking.c:481
    #3 0x55fdfc47ccbe in umfMemoryProviderAlloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/memory_provider.c:245
    #4 0x55fdfc49f34a in proxy_aligned_malloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/pool/pool_proxy.c:51
    #5 0x55fdfc49f470 in proxy_malloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/pool/pool_proxy.c:64
    #6 0x55fdfc47a010 in umfPoolMalloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/memory_pool.c:189

@ldorau
Copy link
Contributor

ldorau commented Mar 27, 2025

See: #1224

@vinser52
Copy link
Contributor

Maybe this ASAN error has something to do with it? https://github.com/ldorau/unified-memory-framework/actions/runs/14083824988/job/39443119450

==12506==ERROR: AddressSanitizer: use-after-poison on address 0x7fa3f2a69188 at pc 0x55fdfc47fde2 bp 0x7fa3e93feb30 sp 0x7fa3e93feb20
READ of size 8 at 0x7fa3f2a69188 thread T17
    #0 0x55fdfc47fde1 in utils_atomic_load_acquire_u64 /home/runner/work/unified-memory-framework/unified-memory-framework/src/utils/utils_concurrency.h:165
    #1 0x55fdfc4814e6 in umfMemoryTrackerAdd /home/runner/work/unified-memory-framework/unified-memory-framework/src/provider/provider_tracking.c:202
    #2 0x55fdfc48407a in trackingAlloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/provider/provider_tracking.c:481
    #3 0x55fdfc47ccbe in umfMemoryProviderAlloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/memory_provider.c:245
    #4 0x55fdfc49f34a in proxy_aligned_malloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/pool/pool_proxy.c:51
    #5 0x55fdfc49f470 in proxy_malloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/pool/pool_proxy.c:64
    #6 0x55fdfc47a010 in umfPoolMalloc /home/runner/work/unified-memory-framework/unified-memory-framework/src/memory_pool.c:189

I am not sure because the issue above is related to the allocation flow, right?

@ldorau
Copy link
Contributor

ldorau commented Mar 27, 2025

I am not sure because the issue above is related to the allocation flow, right?

Most probably, or the free() path.

@bratpiorka bratpiorka added this to the v0.12.x milestone Apr 8, 2025
@lukaszstolarczuk
Copy link
Contributor

@ldorau
Copy link
Contributor

ldorau commented May 14, 2025

@vinser52
When there is more threads it is not sporadic any more:
https://github.com/oneapi-src/unified-memory-framework/actions/runs/15019481136/job/42206135042

@ldorau ldorau changed the title sporadic fail in Level Zero IPC tests ConcurrentOpenConcurrentCloseHandles and ConcurrentGetConcurrentPutHandles tests fail when NTHREADS == utils_get_num_cores() May 16, 2025
@ldorau
Copy link
Contributor

ldorau commented May 16, 2025

New version of the issue:

ConcurrentOpenConcurrentCloseHandles and ConcurrentGetConcurrentPutHandles tests fail when NTHREADS == utils_get_num_cores()

Environment Information

Please provide a reproduction of the bug:

All 24 failed jobs in the build: https://github.com/oneapi-src/unified-memory-framework/actions/runs/15040980329?pr=1315
of the PR: #1315

Linux:

35/55 Test #35: test_provider_level_zero ......................***Exception: SegFault  0.17 sec
Running main() from /home/test-user/actions-runner/_work/unified-memory-framework/unified-memory-framework/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 46 tests from 4 test suites.
[----------] Global test environment set-up.

...

[----------] 14 tests from umfLevelZeroProviderTestSuite/umfIpcTest
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSize/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSize/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0 (0 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.BasicFlow/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.BasicFlow/0 (4 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.GetPoolByOpenedHandle/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.GetPoolByOpenedHandle/0 (12 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.AllocFreeAllocTest/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.AllocFreeAllocTest/0 (1 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.openInTwoIpcHandlers/0
[       OK ] umfLevelZeroProviderTestSuite/umfIpcTest.openInTwoIpcHandlers/0 (1 ms)
[ RUN      ] umfLevelZeroProviderTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0

CRASH

Windows (locally):

      Start 21: test_ipc
21/23 Test #21: test_ipc ...............................Exit code 0xc0000409
***Exception:   0.73 sec
Running main() from C:\Users\ldorau\work\unified-memory-framework\build-2\_deps\googletest-src\googletest\src\gtest_main.cc
[==========] Running 16 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 16 tests from umfIpcTestSuite/umfIpcTest
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSize/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSize/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0
[       OK ] umfIpcTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.BasicFlow/0
[       OK ] umfIpcTestSuite/umfIpcTest.BasicFlow/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.AllocFreeAllocTest/0
[       OK ] umfIpcTestSuite/umfIpcTest.AllocFreeAllocTest/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.openInTwoIpcHandlers/0
[       OK ] umfIpcTestSuite/umfIpcTest.openInTwoIpcHandlers/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.35.32215\include\array(531) : Assertion failed: array subscript out of range

      Start 22: test_ipc_max_opened_limit
22/23 Test #22: test_ipc_max_opened_limit ..............Exit code 0xc0000409
***Exception:   0.66 sec
Running main() from C:\Users\ldorau\work\unified-memory-framework\build-2\_deps\googletest-src\googletest\src\gtest_main.cc
[==========] Running 16 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 16 tests from umfIpcTestSuite/umfIpcTest
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSize/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSize/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleSizeInvalidArgs/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0
[       OK ] umfIpcTestSuite/umfIpcTest.GetIPCHandleInvalidArgs/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0
[       OK ] umfIpcTestSuite/umfIpcTest.CloseIPCHandleInvalidPtr/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.BasicFlow/0
[       OK ] umfIpcTestSuite/umfIpcTest.BasicFlow/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.AllocFreeAllocTest/0
[       OK ] umfIpcTestSuite/umfIpcTest.AllocFreeAllocTest/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.openInTwoIpcHandlers/0
[       OK ] umfIpcTestSuite/umfIpcTest.openInTwoIpcHandlers/0 (0 ms)
[ RUN      ] umfIpcTestSuite/umfIpcTest.ConcurrentGetConcurrentPutHandles/0
C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.35.32215\include\array(531) : Assertion failed: array subscript out of range

How often bug is revealed:

ALWAYS when NTHREADS >= utils_get_num_cores()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority
Projects
None yet
Development

No branches or pull requests

4 participants