Skip to content

[Issue]: clr-rocm-6.0.2/rocclr/os/os_posix.cpp:321: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed. #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
darkbasic opened this issue Feb 26, 2024 · 11 comments

Comments

@darkbasic
Copy link

darkbasic commented Feb 26, 2024

Problem Description

I'm on Gentoo Linux ppc64le (4K page size) using linux-6.7.6.
GPU is AMD RX 570 (mesa 24.0.1).
LLVM is 17.0.6.
I managed to successfully build rocm-opencl-runtime-6.0.2, but I had to use the -DNO_WARN_X86_INTRINSICS compile flag otherwise it fails.
Full build log without -DNO_WARN_X86_INTRINSICS: rocm-opencl-runtime-6.0.2.build.log
I'm also carrying this patch since v5 which used to fix tests:

--- ./opencl/tests/ocltst/module/perf/OCLPerfKernelThroughput.h.orig    2024-02-26 09:53:53.925778934 +0100
+++ ./opencl/tests/ocltst/module/perf/OCLPerfKernelThroughput.h 2024-02-26 09:54:09.165774504 +0100
@@ -45,7 +45,7 @@
 #define UNSIGNED_LARGE_INT unsigned long long
 #define MAX_LOOP_ITER 10
 typedef cl_float4 float4;
-typedef void (*CPUKernel)(__m128 *, __m128 *, unsigned int);
+typedef void (*CPUKernel)(__ibm128 *, __ibm128 *, unsigned int);
 
 class OCLPerfKernelThroughput : public OCLTestImp {
  public:

Unfortunately both clinfo and rocminfo still fail at runtime like they used to fail with 5.4.3:

talos2 ~ # clinfo 
clinfo: /var/tmp/portage/dev-libs/rocm-opencl-runtime-6.0.2/work/clr-rocm-6.0.2/rocclr/os/os_posix.cpp:321: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.
Aborted (core dumped)

clinfo: /var/tmp/portage/dev-libs/rocm-opencl-runtime-6.0.2/work/clr-rocm-6.0.2/rocclr/os/os_posix.cpp:321: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.

Program received signal SIGABRT, Aborted.
0x00003ffff7ca819c in ?? () from /usr/lib64/libc.so.6
(gdb) backtrace
#0  0x00003ffff7ca819c in ?? () from /usr/lib64/libc.so.6
#1  0x00003ffff7c4525c in raise () from /usr/lib64/libc.so.6
#2  0x00003ffff7c2543c in abort () from /usr/lib64/libc.so.6
#3  0x00003ffff7c39398 in ?? () from /usr/lib64/libc.so.6
#4  0x00003ffff7c39444 in __assert_fail () from /usr/lib64/libc.so.6
#5  0x00003ffff78cd504 in amd::Os::currentStackInfo (base=base@entry=0x100073630, size=size@entry=0x100073638) at /var/tmp/portage/dev-libs/rocm-opencl-runtime-6.0.2/work/clr-rocm-6.0.2/rocclr/os/os_posix.cpp:321
#6  0x00003ffff78fbd98 in amd::HostThread::HostThread (this=0x1000735d0) at /var/tmp/portage/dev-libs/rocm-opencl-runtime-6.0.2/work/clr-rocm-6.0.2/rocclr/thread/thread.cpp:34
#7  0x00003ffff78fbe8c in amd::Thread::init () at /var/tmp/portage/dev-libs/rocm-opencl-runtime-6.0.2/work/clr-rocm-6.0.2/rocclr/thread/thread.cpp:170
#8  0x00003ffff78ccae8 in amd::Os::init () at /var/tmp/portage/dev-libs/rocm-opencl-runtime-6.0.2/work/clr-rocm-6.0.2/rocclr/os/os_posix.cpp:170
#9  amd::Os::init () at /var/tmp/portage/dev-libs/rocm-opencl-runtime-6.0.2/work/clr-rocm-6.0.2/rocclr/os/os_posix.cpp:155
#10 0x00003ffff783d0b8 in amd::init () at /var/tmp/portage/dev-libs/rocm-opencl-runtime-6.0.2/work/clr-rocm-6.0.2/rocclr/os/os_posix.cpp:136
#11 0x00003ffff7fa5dfc in ?? () from /lib64/ld64.so.2
#12 0x00003ffff7fb9f18 in ?? () from /lib64/ld64.so.2
#13 0x00003ffff7f9f420 in _dl_catch_exception () from /lib64/ld64.so.2
#14 0x00003ffff7fba0d8 in ?? () from /lib64/ld64.so.2
#15 0x00003ffff7f9f37c in _dl_catch_exception () from /lib64/ld64.so.2
#16 0x00003ffff7fbb97c in ?? () from /lib64/ld64.so.2
#17 0x00003ffff7c9ed24 in ?? () from /usr/lib64/libc.so.6
#18 0x00003ffff7f9f37c in _dl_catch_exception () from /lib64/ld64.so.2
#19 0x00003ffff7f9f4fc in ?? () from /lib64/ld64.so.2
#20 0x00003ffff7c9e5f8 in ?? () from /usr/lib64/libc.so.6
#21 0x00003ffff7c9ee34 in dlopen () from /usr/lib64/libc.so.6
#22 0x00003ffff7f408a0 in ?? () from /usr/lib64/libOpenCL.so.1
#23 0x00003ffff7f3419c in ?? () from /usr/lib64/libOpenCL.so.1
#24 0x00003ffff7f40228 in ?? () from /usr/lib64/libOpenCL.so.1
#25 0x00003ffff7f404e4 in ?? () from /usr/lib64/libOpenCL.so.1
#26 0x00003ffff7cacf40 in ?? () from /usr/lib64/libc.so.6
#27 0x00003ffff7f40858 in ?? () from /usr/lib64/libOpenCL.so.1
#28 0x00003ffff7f34118 in ?? () from /usr/lib64/libOpenCL.so.1
#29 0x00003ffff7f36498 in clGetPlatformIDs () from /usr/lib64/libOpenCL.so.1
#30 0x0000000100008b58 in ?? ()
#31 0x00003ffff7c25c2c in ?? () from /usr/lib64/libc.so.6
#32 0x00003ffff7c25e6c in __libc_start_main () from /usr/lib64/libc.so.6
#33 0x0000000000000000 in ?? ()
talos2 ~ # rocminfo 
ROCk module is loaded
Segmentation fault (core dumped)

ROCk module is loaded

Program received signal SIGSEGV, Segmentation fault.
0x00003ffff7e5840c in rocr::os::callback (info=0x3fffffffda60, size=<optimized out>, data=0x3fffffffdb40) at /var/tmp/portage/dev-libs/rocr-runtime-6.0.2/work/ROCR-Runtime-rocm-6.0.2/src/core/util/lnx/os_linux.cpp:314
warning: 314	/var/tmp/portage/dev-libs/rocr-runtime-6.0.2/work/ROCR-Runtime-rocm-6.0.2/src/core/util/lnx/os_linux.cpp: No such file or directory
(gdb) backtrace
#0  0x00003ffff7e5840c in rocr::os::callback (info=0x3fffffffda60, size=<optimized out>, data=0x3fffffffdb40) at /var/tmp/portage/dev-libs/rocr-runtime-6.0.2/work/ROCR-Runtime-rocm-6.0.2/src/core/util/lnx/os_linux.cpp:314
#1  0x00003ffff77be50c in dl_iterate_phdr () from /usr/lib64/libc.so.6
#2  0x00003ffff7e58780 in rocr::os::GetLoadedToolsLib () at /var/tmp/portage/dev-libs/rocr-runtime-6.0.2/work/ROCR-Runtime-rocm-6.0.2/src/core/util/lnx/os_linux.cpp:332
#3  0x00003ffff7ebc3a8 in rocr::core::Runtime::LoadTools (this=this@entry=0x10003f1b0) at /var/tmp/portage/dev-libs/rocr-runtime-6.0.2/work/ROCR-Runtime-rocm-6.0.2/src/core/runtime/runtime.cpp:1745
#4  0x00003ffff7ebd460 in rocr::core::Runtime::Load (this=0x10003f1b0) at /var/tmp/portage/dev-libs/rocr-runtime-6.0.2/work/ROCR-Runtime-rocm-6.0.2/src/core/runtime/runtime.cpp:1539
#5  0x00003ffff7ebd688 in rocr::core::Runtime::Acquire () at /var/tmp/portage/dev-libs/rocr-runtime-6.0.2/work/ROCR-Runtime-rocm-6.0.2/src/core/runtime/runtime.cpp:116
#6  0x00003ffff7e8e1e8 in rocr::HSA::hsa_init () at /var/tmp/portage/dev-libs/rocr-runtime-6.0.2/work/ROCR-Runtime-rocm-6.0.2/src/core/runtime/hsa.cpp:206
#7  0x00003ffff7ed42fc in hsa_init () at /var/tmp/portage/dev-libs/rocr-runtime-6.0.2/work/ROCR-Runtime-rocm-6.0.2/src/core/common/hsa_table_interface.cpp:68
#8  0x00000001000027cc in ?? ()
#9  0x00003ffff7625c2c in ?? () from /usr/lib64/libc.so.6
#10 0x00003ffff7625e6c in __libc_start_main () from /usr/lib64/libc.so.6
#11 0x0000000000000000 in ?? ()

Operating System

Gentoo Linux ppc64le (4K page size)

CPU

IBM Power 9

GPU

AMD RX 570

ROCm Version

ROCm 6.0.2

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

This was referenced Feb 26, 2024
@cjatin
Copy link
Contributor

cjatin commented Feb 26, 2024

AFAIK HIP is not tested on POWER Arch and is written keeping x86_64 in mind. So getting this to work might require more work than just fixing compilation errors of missing intrinsic.

The GPU you have is also not supported on ROCm 6.0

https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

@darkbasic
Copy link
Author

AFAIK HIP is not tested on POWER Arch and is written keeping x86_64 in mind. So getting this to work might require more work than just fixing compilation errors of missing intrinsic.

Early versions of ROCm claimed to support ppc64le. Also Adam Tran from AMD said it should work starting from 6.0.2, that's why I've re-tested it.

The GPU you have is also not supported on ROCm 6.0

Yeah I know, but at least the OpenCL part works (or at least used to work last time I've tested it) on x86_64.

@darkbasic
Copy link
Author

I've found a similar error for the RX 6900 XT on x86_64: Mozilla-Ocho/llamafile#214
Is it possible that somehow ROCm regressed and RX 570 doesn't work on x86_64 anymore? Can someone confirm? I'm sure OpenCL used to work but a couple of years have passed since last time I've tested it on x86_64.

@ppanchad-amd
Copy link

@darkbasic ROCm supports only x86_64 based CPU architectures and the GPU you are using is no longer supported. Please see the following link for supported hardware: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

@ppanchad-amd ppanchad-amd closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2024
@Xeonacid
Copy link

Xeonacid commented Jan 1, 2025

A ppcle64 implementation for Os::currentStackPtr is needed.

FYI, I added RISC-V implementation at #117. Without the patch, clinfo throws the same error on riscv64, but with the patch, it works fine.

A friend of mine also checked that after commenting out the assert line, everything works fine. After a source code search, it seems Os::currentStackPtr is not used except for the sanity check assert.

@runlevel5
Copy link

A ppcle64 implementation for Os::currentStackPtr is needed.

#145 is my naive implementation. I doubt ROCm would accept it however I hope Linux distro would use it for their packaging.

As @Xeonacid has pointed out before, Os::currentStackPtr is only being used for sanity check assert, so in theory you could comment out this assertion and everything should work as expected.

@darkbasic
Copy link
Author

@runlevel5 thanks, but honestly if this is AMD's attitude towards collaborative development I'm not sure if ppc64le support will ever be a thing.

@runlevel5
Copy link

@darkbasic I could totally understand why they would not support anything but x86_64 as of now. I think the best approach is to maintain patchset downstream and convince major Linux distributions to ship them.

@darkbasic
Copy link
Author

@runlevel5 @Xeonacid I think you should try to submit your patches to the https://github.com/lamikr/rocm_sdk_builder
It looks like they might accept them: https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1532372-unofficial-rocm-sdk-builder-expanded-to-support-more-gpus#post1532436

@runlevel5
Copy link

@darkbasic I am going to focus on introducing patches downstream on Fedora https://src.fedoraproject.org/rpms/rocm-runtime. The maintainer is open for ppc64le changes and quite supportive.

@ratijas
Copy link

ratijas commented Apr 2, 2025

OK, so what if I am getting an exact same assertion error on a normal x86-64 trying to run Insta360 Studio through wine? It just breaks on launch while "Probing system configuration".

$ WINEPREFIX=/home/ratijas/.wine wine 'C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Insta360 Studio\Insta360 Studio.lnk'
[...]
Studio\Insta360 Studio.exe: /usr/src/debug/rocm-opencl-runtime/clr-rocm-6.3.2/rocclr/os/os_posix.cpp:321: static void amd::Os::currentStackInfo(unsigned char**, size_t*): Assertion `Os::currentStackPtr() >= *base - *size && Os::currentStackPtr() < *base && "just checking"' failed.

Operating System: Arch Linux
KDE Plasma Version: 6.3.3
KDE Frameworks Version: 6.11.0
Qt Version: 6.8.2
Kernel Version: 6.13.6-arch1-1 (64-bit)
Graphics Platform: X11
Processors: 16 × AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
Memory: 60.6 GiB of RAM
Graphics Processor 1: AMD Radeon RX 7700S
Graphics Processor 2: AMD Radeon 780M
Manufacturer: Framework
Product Name: Laptop 16 (AMD Ryzen 7040 Series)
System Version: AJ

ROCm: 6.3.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants