feat(ml): rocm #16613

mertalev · 2025-03-05T14:46:15Z

Description

This PR introduces support for AMD GPUs through ROCm. It's a rebased version of #11063 with updated dependencies.

It also once again removes algo caching, as the concurrency issue with caching seems to be more subtle than originally thought. While disabling caching is wasteful (it essentially runs a benchmark every time instead of only once), it's still better than the current alternative of either lowering concurrency to 1 or not having ROCm support.

use OrtMutex

use 3.12 use 1.19.2

guard algo benchmark results mark mutex as mutable re-add /bin/sh (?) use 3.10 use 6.1.2

1.19.2 fix variable name fix variable reference aaaaaaaaaaaaaaaaaaaa

github-actions · 2025-03-05T14:51:01Z

📖 Documentation deployed to pr-16613.preview.immich.app

.github/workflows/docker.yml

machine-learning/Dockerfile

NicholasFlamy · 2025-03-05T18:19:47Z

.github/workflows/docker.yml

    steps:
-        - name: Login to GitHub Container Registry


There's some changes in indentation as well as changes from double quote to single quote. Was this intended? I know it's from the first commit from the original PR but I don't think that was addressed.

VS Code did this when I saved. I'm not sure why it's different

Is there a PR check that runs prettier on the workflow files? I would think the inconsistency exists because there likely isn't.

This reverts commit 2c4452f.

This reverts commit c121d3e.

NicholasFlamy · 2025-03-07T21:01:49Z

machine-learning/Dockerfile

+
+WORKDIR /code
+
+RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx migraphx-dev half


Suggested change

RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx migraphx-dev half

RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx-dev

Only migraphx-dev is needed as the other 2 are dependencies.

Edit: don't change it now, though, because it's already building.

NicholasFlamy · 2025-03-09T03:14:35Z

machine-learning/Dockerfile

@@ -80,11 +111,14 @@ COPY --from=builder-armnn \
    /opt/ann/build.sh \
    /opt/armnn/

+FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS prod-rocm


I know there were already comments on this, but I think copying the deps manually may result in a smaller, yet still working image. It might be worth re-investigating.

NicholasFlamy · 2025-03-09T20:06:01Z

machine-learning/Dockerfile

@@ -15,6 +15,34 @@ RUN mkdir /opt/armnn && \
    cd /opt/ann && \
    sh build.sh

+# Warning: 25GiB+ disk space required to pull this image
+# TODO: find a way to reduce the image size
+FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS builder-rocm


Nope. Not it.

przemekbialek · 2025-03-09T16:00:50Z

They're inconsistent and define supported as our team will help you on GitHub with certain stuff but anything not on the list may work (eg. Vega GPUs work fine) but they won't help you.

Yeah, but the official ROCm build will not work with gfx1103 at all, applications built against it (i.e. pytorch prebuilt) will not work with gfx1103, and building against it for gfx1103 will not work either. I'm not sure what the exact steps are to get gfx1103 in ROCm but I do know it requires a custom build/version of ROCm. And while as you said, AMD's stance is "it may work but we won't help you out", it does not mean it will work without this custom ROCm build.

Edit: So my question would be, how does one check what's supported by the build they are running?

I'm not quite sure. On Fedora, the gfx1103 build is provided as a separate package and listed as a separate folder, but the officially supported gfx1102 falls under gfx1100 here, so it's not a reliable check:
$ ls /usr/lib64/rocm/
gfx10  gfx11  gfx1100  gfx1103  gfx8  gfx9  gfx90a  gfx942

Fedora rocBLAS patch for gfx1103 support looks like copy of gfx1102 (navi33). Only names and ISA versions differ. I diffed changes betwen few files and think that theese are only diferences.

-- phoenix
-- gfx1103
-- [Device 1586]
+- navi33
+- gfx1102
+- [Device 73f0]
 - AllowNoFreeDims: false
   AssignedDerivedParameters: true
   Batched: true
@@ -112,7 +112,7 @@
     GroupLoadStore: false
     GuaranteeNoPartialA: false
     GuaranteeNoPartialB: false
-    ISA: [11, 0, 3]
+    ISA: [11, 0, 2]

I'm intrested in additional gpu support because I have minipc with Ryzen8845HS (Radeon 780M) for testing, and second one with Ryzen5825U.
I tried running ghcr.io/immich-app/immich-machine-learning:pr-16613-rocm version with HSA_OVERRIDE_GFX_VERSION=11.0.0, but this setup crashes my card under heavy load (only default models from immich works and only when I run one type of job in single thread). I read that for 780M best choice is gfx1102 but when I set HSA_OVERRIDE_GFX_VERSION=11.0.2 I have errors. I think its because onnxruntime doesn't have compiled support for this arch. Now I trying to build machine-learning with rocm onnxruntime support with small patch which I think enables gfx900 and gfx1102 support in onnxruntime, so if and when build completes I will try this.

diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt
index d90a2a355..bb1a7de12 100644
--- a/cmake/CMakeLists.txt
+++ b/cmake/CMakeLists.txt
@@ -295,7 +295,7 @@ if (onnxruntime_USE_ROCM)
   endif()

   if (NOT CMAKE_HIP_ARCHITECTURES)
-    set(CMAKE_HIP_ARCHITECTURES "gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942;gfx1200;gfx1201")
+    set(CMAKE_HIP_ARCHITECTURES "gfx900;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx1102;gfx940;gfx941;gfx942;gfx1200;gfx1201")
   endif()

   file(GLOB rocm_cmake_components ${onnxruntime_ROCM_HOME}/lib/cmake/*)

SharkWipf · 2025-03-09T16:10:54Z

but this setup crashes my card under heavy load

My 780m locks up my desktop roughly 50% of the time when using ROCm llama.cpp/whisper.cpp with any ROCm version (1100, 1102, 1103). I'd hoped it would be less of an issue headless or with different applications, but if you have the same issue with Immich that does not bode well...

NicholasFlamy · 2025-03-09T19:23:09Z

HSA_OVERRIDE_GFX_VERSION=11.0.2

This is not a valid version from what I've observed. So far, there are only 3 valid options:

HSA_OVERRIDE_GFX_VERSION=11.0.0
HSA_OVERRIDE_GFX_VERSION=10.3.0
HSA_OVERRIDE_GFX_VERSION=9.0.0

przemekbialek · 2025-03-09T19:23:46Z

but this setup crashes my card under heavy load

My 780m locks up my desktop roughly 50% of the time when using ROCm llama.cpp/whisper.cpp with any ROCm version (1100, 1102, 1103). I'd hoped it would be less of an issue headless or with different applications, but if you have the same issue with Immich that does not bode well...

Unfortunately adding support for gfx1102 dosen't solve problems with crashing on Radeon 780M, but I'm happy because I succeeded getting it to work on Ryzen 5825U GPU.

NicholasFlamy · 2025-03-09T19:24:59Z

Radeon 780M

They also specifically say certain iGPUs crash. I would bet that they're just bleading edge.

Ryzen 5825U GPU

That model or similar is known to work.

przemekbialek · 2025-03-09T19:34:24Z

HSA_OVERRIDE_GFX_VERSION=11.0.2

This is not a valid version from what I've observed. So far, there are only 3 valid options:
HSA_OVERRIDE_GFX_VERSION=11.0.0
HSA_OVERRIDE_GFX_VERSION=10.3.0
HSA_OVERRIDE_GFX_VERSION=9.0.0

ROCm which is in image created in this PR has compiled for arch which are below so 11.0.2 is valid option because this means gfx1102. Below some direcrory listing from image.

-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1010.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1012.dat
-rw-r--r-- 1 root root     23026 Dec 11 10:06 TensileLibrary_lazy_gfx1030.dat
-rw-r--r-- 1 root root     24186 Dec 11 10:06 TensileLibrary_lazy_gfx1100.dat
-rw-r--r-- 1 root root     24186 Dec 11 10:06 TensileLibrary_lazy_gfx1101.dat
-rw-r--r-- 1 root root     24186 Dec 11 10:06 TensileLibrary_lazy_gfx1102.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1151.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1200.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1201.dat
-rw-r--r-- 1 root root     26537 Dec 11 10:06 TensileLibrary_lazy_gfx900.dat
-rw-r--r-- 1 root root     31798 Dec 11 10:06 TensileLibrary_lazy_gfx906.dat
-rw-r--r-- 1 root root     34732 Dec 11 10:06 TensileLibrary_lazy_gfx908.dat
-rw-r--r-- 1 root root     62265 Dec 11 10:06 TensileLibrary_lazy_gfx90a.dat
-rw-r--r-- 1 root root     58949 Dec 11 10:06 TensileLibrary_lazy_gfx942.dat

Without patch to onxruntime HSA_OVERRIDE_GFX_VERSION=9.0.0 isn't a valid option in immich-machine-learning because this arch isn't compiled by default.
By default onnx runtime builds for arch:

set(CMAKE_HIP_ARCHITECTURES "gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942;gfx1200;gfx1201")

przemekbialek · 2025-03-09T19:51:08Z

I created my image from commit 1c550fa with following changes:

diff --git a/machine-learning/Dockerfile b/machine-learning/Dockerfile
index f3b7c60c4..253430cac 100644
--- a/machine-learning/Dockerfile
+++ b/machine-learning/Dockerfile
@@ -1,4 +1,4 @@
-ARG DEVICE=cpu
+ARG DEVICE=rocm

 FROM python:3.11-bookworm@sha256:68a8863d0625f42d47e0684f33ca02f19d6094ef859a8af237aaf645195ed477 AS builder-cpu

@@ -36,10 +36,12 @@ WORKDIR /code/onnxruntime
 # TODO: find a way to fix this without disabling algo caching
 COPY ./patches/0001-disable-rocm-conv-algo-caching.patch /tmp/
 RUN git apply /tmp/0001-disable-rocm-conv-algo-caching.patch
+COPY ./patches/onnxruntime_add_gfx900_gfx1102.patch /tmp/
+RUN git apply /tmp/onnxruntime_add_gfx900_gfx1102.patch

 RUN /bin/sh ./dockerfiles/scripts/install_common_deps.sh
 # Note: the `parallel` setting uses a substantial amount of RAM
-RUN ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel 17 --cmake_extra_defines\
+RUN ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel 16 --cmake_extra_defines\
     ONNXRUNTIME_VERSION=1.20.1 --use_rocm --rocm_home=/opt/rocm
 RUN mv /code/onnxruntime/build/Linux/Release/dist/*.whl /opt/

diff --git a/machine-learning/patches/onnxruntime_add_gfx900_gfx1102.patch b/machine-learning/patches/onnxruntime_add_gfx900_gfx1102.patch
new file mode 100644
index 000000000..81bbdb3d6
--- /dev/null
+++ b/machine-learning/patches/onnxruntime_add_gfx900_gfx1102.patch
@@ -0,0 +1,13 @@
+diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt
+index d90a2a355..bb1a7de12 100644
+--- a/cmake/CMakeLists.txt
++++ b/cmake/CMakeLists.txt
+@@ -295,7 +295,7 @@ if (onnxruntime_USE_ROCM)
+   endif()
+
+   if (NOT CMAKE_HIP_ARCHITECTURES)
+-    set(CMAKE_HIP_ARCHITECTURES "gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942;gfx1200;gfx1201")
++    set(CMAKE_HIP_ARCHITECTURES "gfx900;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx1102;gfx940;gfx941;gfx942;gfx1200;gfx1201")
+   endif()
+
+   file(GLOB rocm_cmake_components ${onnxruntime_ROCM_HOME}/lib/cmake/*)

Propably better way is to set CMAKE_HIP_ARCHITECTURES in Dockerfile but I didn't know how to do it so I made it brute force ;)

NicholasFlamy · 2025-03-09T19:57:57Z

Alright, so I learned something. Some gfx versions such as gfx1031 (RX 6700 XT, my GPU) are bundles with other versions such as gfx1030 while others are not. I had thought there were only 3 bundles, but now I know there are more.

This PR should have these versions as of ghcr.io/immich-app/immich-machine-learning:commit-ded3bbb033615385a2339e27171b6ab682a31df9-rocm:

-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1010.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1012.dat
-rw-r--r-- 1 root root     23026 Dec 11 10:06 TensileLibrary_lazy_gfx1030.dat
-rw-r--r-- 1 root root     24186 Dec 11 10:06 TensileLibrary_lazy_gfx1100.dat
-rw-r--r-- 1 root root     24186 Dec 11 10:06 TensileLibrary_lazy_gfx1101.dat
-rw-r--r-- 1 root root     24186 Dec 11 10:06 TensileLibrary_lazy_gfx1102.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1151.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1200.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1201.dat
-rw-r--r-- 1 root root     26537 Dec 11 10:06 TensileLibrary_lazy_gfx900.dat
-rw-r--r-- 1 root root     31798 Dec 11 10:06 TensileLibrary_lazy_gfx906.dat
-rw-r--r-- 1 root root     34732 Dec 11 10:06 TensileLibrary_lazy_gfx908.dat
-rw-r--r-- 1 root root     62265 Dec 11 10:06 TensileLibrary_lazy_gfx90a.dat
-rw-r--r-- 1 root root     58949 Dec 11 10:06 TensileLibrary_lazy_gfx942.dat

Edit: I did also experiment on my system with a python script and setting different values:

HSA_OVERRIDE_GFX_VERSION=10.3.1:

rocBLAS error: Cannot read /home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1031
 List of available TensileLibrary Files : 
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1201.dat"
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1200.dat"
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/home/nicholas/development/image-rotation-AI-classification/resnet-ixion/.venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"

HSA_OVERRIDE_GFX_VERSION=11.0.2 or anything else that is on the list resulted in my system locking up cause AMD.

przemekbialek · 2025-03-09T20:17:57Z

Alright, so I learned something. Some gfx versions such as gfx1031 (RX 6700 XT, my GPU) are bundles with other versions such as gfx1030 while others are not. I had though there were only 3 bundles, but now I know there are more.

Some archs are builtin in ROCm, and than works without HSA_OVERRIDE_GFX_VERSION. If Your arch isn't supported than You may try with arch override. Sometimes it works.

This PR should have these versions as of ghcr.io/immich-app/immich-machine-learning:commit-ded3bbb033615385a2339e27171b6ab682a31df9-rocm

Yes. Support in ROCm doesn't means that there is support in other software. I added for mysel only two because I may test it on my hardware. In onnx runtime default arch list is shorter than in ROCm. Adding two archs in my image addes about 0,1GB to the size of original one:

immich-machine-learning-rocm-agfx            latest                                                 9bd3da10dee5   2 hours ago    31.9GB
ghcr.io/immich-app/immich-machine-learning   commit-ded3bbb033615385a2339e27171b6ab682a31df9-rocm   9f0b5a801e8a   3 days ago     31.8GB

NicholasFlamy · 2025-03-09T22:57:32Z

Propably better way is to set CMAKE_HIP_ARCHITECTURES in Dockerfile but I didn't know how to do it so I made it brute force ;)

I might try that. I could pass that into the Dockerfile.

przemekbialek · 2025-03-10T11:41:56Z

but this setup crashes my card under heavy load

My 780m locks up my desktop roughly 50% of the time when using ROCm llama.cpp/whisper.cpp with any ROCm version (1100, 1102, 1103). I'd hoped it would be less of an issue headless or with different applications, but if you have the same issue with Immich that does not bode well...

Finally I found a workaround for 780M. When I set HSA_USE_SVM=0 environment variable crashes are gone. I may use HSA_OVERRIDE_GFX_VERSION=11.0.2 or HSA_OVERRIDE_GFX_VERSION=11.0.0, and everything works in immich-machinelearning as expected. I may set concurency in Smart Search and Face Detection to 5, set bigger multilanguage model to Smart Search, and all works, but I dont use graphical environment on this machine.

NicholasFlamy · 2025-03-10T12:02:42Z

When I set HSA_USE_SVM=0 environment variable crashes are gone.

I might have to try that on my RX 6700 XT lol.

Edit: It still has a few buggy things when running this PR.

przemekbialek · 2025-03-10T21:07:30Z

I tested this PR once again. I used latest commit with newer ROCm version, but rolled out changes to migraphix because it crashes for me. I added also gfx900 and gfx1102 archs to onnx runtime, this time the propper way. Bellow are changes that I made:

diff --git a/machine-learning/Dockerfile b/machine-learning/Dockerfile
index 216b15fca..51f60ab8c 100644
--- a/machine-learning/Dockerfile
+++ b/machine-learning/Dockerfile
@@ -1,4 +1,4 @@
-ARG DEVICE=cpu
+ARG DEVICE=rocm

 FROM python:3.11-bookworm@sha256:68a8863d0625f42d47e0684f33ca02f19d6094ef859a8af237aaf645195ed477 AS builder-cpu

@@ -40,7 +40,7 @@ RUN git apply /tmp/0001-disable-rocm-conv-algo-caching.patch
 RUN /bin/sh ./dockerfiles/scripts/install_common_deps.sh
 # Note: the `parallel` setting uses a substantial amount of RAM
 RUN ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel 17 --cmake_extra_defines\
-    ONNXRUNTIME_VERSION=1.20.1 --skip_tests --use_migraphx --migraphx_home=/opt/rocm
+    ONNXRUNTIME_VERSION=1.20.1 CMAKE_HIP_ARCHITECTURES="gfx900;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx1102;gfx940;gfx941;gfx942;gfx1200;gfx1201" --skip_tests --use_rocm --rocm_home=/opt/rocm
 RUN mv /code/onnxruntime/build/Linux/Release/dist/*.whl /opt/

 FROM builder-${DEVICE} AS builder
@@ -118,7 +118,7 @@ FROM prod-${DEVICE} AS prod
 ARG DEVICE

 RUN apt-get update && \
-    apt-get install -y --no-install-recommends tini $(if ! [ "$DEVICE" = "openvino" ] && ! [ "$DEVICE" = "rocm" ]; then echo "libmimalloc2.0"; fi) $(if [ "$DEVICE" = "rocm" ]; then echo "migraphx"; fi) && \
+    apt-get install -y --no-install-recommends tini $(if ! [ "$DEVICE" = "openvino" ] && ! [ "$DEVICE" = "rocm" ]; then echo "libmimalloc2.0"; fi) && \
     apt-get autoremove -yqq && \
     apt-get clean && \
     rm -rf /var/lib/apt/lists/*
diff --git a/machine-learning/app/models/constants.py b/machine-learning/app/models/constants.py
index 5824cd6c5..43088741b 100644
--- a/machine-learning/app/models/constants.py
+++ b/machine-learning/app/models/constants.py
@@ -65,7 +65,7 @@ _INSIGHTFACE_MODELS = {

 SUPPORTED_PROVIDERS = [
     "CUDAExecutionProvider",
-    "MIGraphXExecutionProvider",
+    "ROCMExecutionProvider",
     "OpenVINOExecutionProvider",
     "CPUExecutionProvider",
 ]
diff --git a/machine-learning/app/sessions/ort.py b/machine-learning/app/sessions/ort.py
index 00c7ad50a..d15f2d354 100644
--- a/machine-learning/app/sessions/ort.py
+++ b/machine-learning/app/sessions/ort.py
@@ -88,7 +88,7 @@ class OrtSession:
             match provider:
                 case "CPUExecutionProvider":
                     options = {"arena_extend_strategy": "kSameAsRequested"}
-                case "CUDAExecutionProvider":
+                case "CUDAExecutionProvider" | "ROCMExecutionProvider":
                     options = {"arena_extend_strategy": "kSameAsRequested", "device_id": settings.device_id}
                 case "OpenVINOExecutionProvider":
                     options = {

Configuration above was tested on hardware listed below:

Radeon RX6800 XT (gfx1030) - all works without tinkering.
Ryzen 8845HS with Radeon 780M (gfx1103) - I used HSA_OVERRIDE_GFX_VERSION=11.0.2 and HSA_OVERRIDE_GFX_VERSION=11.0.0 environment variables. To workaround crashes I must set HSA_USE_SVM=0.
Ryzen 5825U (gfx90c) - I used HSA_OVERRIDE_GFX_VERSION=9.0.0 environment variables.

When all is set as above I have no problems with running Smart Search and Facial Detection in pararell with with job concurency set to 5 (2 on Ryzen 5825U) for both jobs. I also tested on bigger models and this also works.
All tests runs on the same collection - 1,068 (3 GiB) of photos and 144 (75 GiB) videos.

On migraphx image ml workers crashed instantly even on Radeon RX 6800XT:

immich_machine_learning  | [03/10/25 20:44:56] ERROR    Worker (pid:542) was sent code 139!

In dmesg i see:

[ 2309.167940] gunicorn[27327]: segfault at 0 ip 00007f9405d7231c sp 00007f94337f9710 error 4 in libmigraphx.so.2011000.0.60304[1fb531c,7f9405a48000+1c0e000] likely on CPU 22 (core 6, socket 0)
[ 2309.167950] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc 55 41 57 41 56 53 48 83 ec 38 48 89 d3 49 89 fe 48 8d 7a 10 e8 c7 37 84 01 48 8b 30 <48> 8b 06 4c 8d 7c 24 08 4c 89 ff ff 90 a0 00 00 00 4c 89 ff 4c 89
[ 2324.523610] gunicorn[27881]: segfault at 27 ip 00007f93664586ba sp 00007f94413f49f0 error 4 in libmigraphx.so.2011000.0.60304[1d976ba,7f936634c000+1c0e000] likely on CPU 5 (core 5, socket 0)
[ 2324.523619] Code: 48 8d 47 68 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 53 48 89 fb 48 8b 36 48 8b 06 <ff> 50 20 48 89 d8 5b c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
[ 2339.303738] gunicorn[28525]: segfault at 20 ip 00007f9364b726ba sp 00007f942bff99f0 error 4 in libmigraphx.so.2011000.0.60304[1d976ba,7f9364a66000+1c0e000] likely on CPU 1 (core 1, socket 0)
[ 2339.303747] Code: 48 8d 47 68 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 53 48 89 fb 48 8b 36 48 8b 06 <ff> 50 20 48 89 d8 5b c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00

NicholasFlamy · 2025-03-10T21:16:39Z

On migraphx image ml workers crashed instantly even on Radeon RX 6800XT:

Hmm okay, do you MigraphX installed on the host? For me it doesn't even get there because it errors about the MigraphX dependency.

przemekbialek · 2025-03-10T21:36:18Z

On migraphx image ml workers crashed instantly even on Radeon RX 6800XT:

Hmm okay, do you MigraphX installed on the host? For me it doesn't even get there because it errors about the MigraphX dependency.

No. I dont know migraphx and I assumed that everything needed is in docker image.

I found command to test installation of migraphx and run it in docker image without problems:

Running [ MIGraphX Version: 2.11.0.4b20cbc9 ]: /opt/rocm-6.3.4/bin/migraphx-driver perf --test
Compiling ...
module: "main"
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}
output = @param:output -> float_type, {4, 3}, {3, 1}
b = @param:b -> float_type, {5, 3}, {3, 1}
a = @param:a -> float_type, {4, 5}, {5, 1}
@4 = gpu::code_object[code_object=4328,symbol_name=mlir_dot,global=128,local=128,](a,b,output) -> float_type, {4, 3}, {3, 1}


Allocating params ...
Running performance report ...
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}: 0.00036214ms, 2%
output = @param:output -> float_type, {4, 3}, {3, 1}: 0.00027652ms, 2%
b = @param:b -> float_type, {5, 3}, {3, 1}: 0.00025078ms, 2%
a = @param:a -> float_type, {4, 5}, {5, 1}: 0.00025372ms, 2%
@4 = gpu::code_object[code_object=4328,symbol_name=mlir_dot,global=128,local=128,](a,b,output) -> float_type, {4, 3}, {3, 1}: 0.0187583ms, 95%

Summary:
gpu::code_object::mlir_dot: 0.0187583ms / 1 = 0.0187583ms, 95%
@param: 0.00078102ms / 3 = 0.00026034ms, 4%
check_context::migraphx::gpu::context: 0.00036214ms / 1 = 0.00036214ms, 2%

Batch size: 1
Rate: 51131.3 inferences/sec
Total time: 0.0195575ms
Total instructions time: 0.0199014ms
Overhead time: 0.0006051ms, -0.00034396ms
Overhead: 3%, -2%
[ MIGraphX Version: 2.11.0.4b20cbc9 ] Complete: /opt/rocm-6.3.4/bin/migraphx-driver perf --test

I found that Smart Search alone works with migraphx even with concurency 5. The problem is with Face Detection.

NicholasFlamy · 2025-03-11T03:27:47Z

I found that Smart Search alone works with migraphx even with concurency 5.

Oh wow, good job getting it running!

The problem is with Face Detection.

I have problems with Face Detection with regular ROCm too.

przemekbialek · 2025-03-11T15:16:18Z

I found that Smart Search alone works with migraphx even with concurency 5.

Oh wow, good job getting it running!

Thanks go to @mertalev. I only pulled image from this PR and tested it.

The problem is with Face Detection.

I have problems with Face Detection with regular ROCm too.

What problems do You have?

ricklahaye · 2025-03-11T16:48:33Z

I found that Smart Search alone works with migraphx even with concurency 5.

Oh wow, good job getting it running!

Thanks go to @mertalev. I only pulled image from this PR and tested it.

The problem is with Face Detection.

I have problems with Face Detection with regular ROCm too.

What problems do You have?

Smart search seem to run correctly.

Face detection fails:

immich_machine_learning  | [03/11/25 16:50:25] INFO     Setting execution providers to
immich_machine_learning  |                              ['MIGraphXExecutionProvider',
immich_machine_learning  |                              'CPUExecutionProvider'], in descending order of
immich_machine_learning  |                              preference
immich_machine_learning  | [03/11/25 16:50:25] DEBUG    Setting execution provider options to [{},
immich_machine_learning  |                              {'arena_extend_strategy': 'kSameAsRequested'}]
immich_machine_learning  | [03/11/25 16:50:25] DEBUG    Setting execution_mode to ORT_SEQUENTIAL
immich_machine_learning  | [03/11/25 16:50:25] DEBUG    Setting inter_op_num_threads to 0
immich_machine_learning  | [03/11/25 16:50:25] DEBUG    Setting intra_op_num_threads to 0
immich_machine_learning  | [03/11/25 16:50:29] ERROR    Worker (pid:9) was sent code 139!
immich_machine_learning  | [03/11/25 16:50:29] INFO     Booting worker with pid: 79
immich_machine_learning  | [03/11/25 16:50:30] DEBUG    Could not load ANN shared libraries, using ONNX:
immich_machine_learning  |                              libmali.so: cannot open shared object file: No such
immich_machine_learning  |                              file or directory
immich_machine_learning  | [03/11/25 16:50:31] INFO     Started server process [79]
immich_machine_learning  | [03/11/25 16:50:31] INFO     Waiting for application startup.

mertalev · 2025-03-11T16:56:01Z

For those saying smart search runs correctly with MIGraphX, can you confirm that GPU utilization is high (it's okay for it to initially have high CPU utilization due to compilation, but it should eventually be primarily GPU utilization).

ricklahaye · 2025-03-11T17:09:28Z

For those saying smart search runs correctly with MIGraphX, can you confirm that GPU utilization is high (it's okay for it to initially have high CPU utilization due to compilation, but it should eventually be primarily GPU utilization).

I checked and its indeed using CPU and not the GPU! Apologies for previous statement saying 'it works'
I don't see any GPU utilization at all during smart search

przemekbialek · 2025-03-11T19:36:06Z

For those saying smart search runs correctly with MIGraphX, can you confirm that GPU utilization is high (it's okay for it to initially have high CPU utilization due to compilation, but it should eventually be primarily GPU utilization).

I checked and its indeed using CPU and not the GPU! Apologies for previous statement saying 'it works' I don't see any GPU utilization at all during smart search

Same for me. Only CPU utilization on migraphx image.

NicholasFlamy · 2025-03-11T19:37:55Z

For those saying smart search runs correctly with MIGraphX, can you confirm that GPU utilization is high (it's okay for it to initially have high CPU utilization due to compilation, but it should eventually be primarily GPU utilization).

I checked and its indeed using CPU and not the GPU! Apologies for previous statement saying 'it works' I don't see any GPU utilization at all during smart search

Same for me. Only CPU utilization on migraphx image.

Yes, this was my issue. The error I had was resolved by a commit that mert added after I had the error. I forgot that that error was fixed and then it was only using CPU.

ricklahaye · 2025-03-11T19:43:03Z

I do want to add for everyone that when trying the later commit; hardware acceleration works for smart search and face detection!

I tried the original image/PR ghcr.io/immich-app/immich-machine-learning:pr-16613-rocm and that did not work, but later commits did work for me!

The one I used was ghcr.io/immich-app/immich-machine-learning:commit-ded3bbb033615385a2339e27171b6ab682a31df9-rocm

I did add the following environment variables: HSA_OVERRIDE_GFX_VERSION=11.0.0 and HSA_USE_SVM=0

NicholasFlamy · 2025-03-11T20:42:08Z

The one I used was ghcr.io/immich-app/immich-machine-learning:commit-ded3bbb033615385a2339e27171b6ab682a31df9-rocm

Yep, that version works, but have you tried 100 images and running Smart Search and Face Detection at the same time? I hit deadlock doing that.

Also, what GPU and what OS are you running?

przemekbialek · 2025-03-11T21:14:48Z

For those saying smart search runs correctly with MIGraphX, can you confirm that GPU utilization is high (it's okay for it to initially have high CPU utilization due to compilation, but it should eventually be primarily GPU utilization).

I tried to run some tests with migraphx image and run docker exec on it, next I pip install transformers psutil torch and then i run:

python3 -m onnxruntime.transformers.benchmark -g -m bert-base-cased --provider migraphx

This command returns with following error:

Please install onnxruntime-gpu or onnxruntime-directml package instead of onnxruntime, and use a machine with GPU for testing gpu performance.

I'm not an expert and don't know what am I doing, but maybe this will help You. :D

Zelnes and others added 7 commits March 5, 2025 09:37

feat(ml): introduce support of onnxruntime-rocm for AMD GPU

fe26ccd

try mutex for algo cache

f30fac9

use OrtMutex

bump versions, run on mich

d9a41b8

use 3.12 use 1.19.2

acquire lock before any changes can be made

ec0eb93

guard algo benchmark results mark mutex as mutable re-add /bin/sh (?) use 3.10 use 6.1.2

use composite cache key

fe2ddc3

1.19.2 fix variable name fix variable reference aaaaaaaaaaaaaaaaaaaa

bump deps

7ac3099

disable algo caching

f19cf20

mertalev requested a review from bo0tzz as a code owner March 5, 2025 14:46

github-actions bot added documentation Improvements or additions to documentation 🧠machine-learning labels Mar 5, 2025

mertalev added changelog:feature and removed documentation Improvements or additions to documentation labels Mar 5, 2025

fix gha

5ca08be

github-actions bot added the documentation Improvements or additions to documentation label Mar 5, 2025

bo0tzz reviewed Mar 5, 2025

View reviewed changes

.github/workflows/docker.yml Outdated Show resolved Hide resolved

mertalev added 2 commits March 5, 2025 09:57

try ubuntu runner

a9953c1

actually fix the gha

aa438b0

mertalev mentioned this pull request Mar 5, 2025

feat(ml): introduce support of onnxruntime-rocm for AMD GPU #11063

Closed

NicholasFlamy reviewed Mar 5, 2025

View reviewed changes

.github/workflows/docker.yml Show resolved Hide resolved

NicholasFlamy reviewed Mar 5, 2025

View reviewed changes

machine-learning/Dockerfile Show resolved Hide resolved

update patch

9ba9a11

NicholasFlamy reviewed Mar 5, 2025

View reviewed changes

mertalev added 7 commits March 5, 2025 19:16

skip mimalloc preload for rocm

3be474a

increase build threads

1c550fa

increase timeout for rocm

2c4452f

Revert "increase timeout for rocm"

29fe1e0

This reverts commit 2c4452f.

attempt migraphx

521f9fb

set migraphx_home

c121d3e

Revert "set migraphx_home"

afd47ee

This reverts commit c121d3e.

NicholasFlamy reviewed Mar 7, 2025

View reviewed changes

mertalev added 2 commits March 7, 2025 20:46

try only targeting migraphx

c57ef92

skip tests

c61ed6e

NicholasFlamy reviewed Mar 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ml): rocm #16613

feat(ml): rocm #16613

mertalev commented Mar 5, 2025

github-actions bot commented Mar 5, 2025 •

edited

Loading

NicholasFlamy Mar 5, 2025 •

edited

Loading

mertalev Mar 5, 2025

NicholasFlamy Mar 5, 2025 •

edited

Loading

NicholasFlamy Mar 7, 2025 •

edited

Loading

NicholasFlamy Mar 9, 2025

This comment was marked as resolved.

NicholasFlamy Mar 9, 2025

przemekbialek commented Mar 9, 2025

SharkWipf commented Mar 9, 2025

NicholasFlamy commented Mar 9, 2025

przemekbialek commented Mar 9, 2025

NicholasFlamy commented Mar 9, 2025

przemekbialek commented Mar 9, 2025 •

edited

Loading

przemekbialek commented Mar 9, 2025

NicholasFlamy commented Mar 9, 2025 •

edited

Loading

przemekbialek commented Mar 9, 2025 •

edited

Loading

NicholasFlamy commented Mar 9, 2025

przemekbialek commented Mar 10, 2025

NicholasFlamy commented Mar 10, 2025 •

edited

Loading

przemekbialek commented Mar 10, 2025

NicholasFlamy commented Mar 10, 2025

przemekbialek commented Mar 10, 2025 •

edited

Loading

NicholasFlamy commented Mar 11, 2025

przemekbialek commented Mar 11, 2025

ricklahaye commented Mar 11, 2025 •

edited

Loading

mertalev commented Mar 11, 2025 •

edited

Loading

ricklahaye commented Mar 11, 2025 •

edited

Loading

przemekbialek commented Mar 11, 2025

NicholasFlamy commented Mar 11, 2025

ricklahaye commented Mar 11, 2025

NicholasFlamy commented Mar 11, 2025 •

edited

Loading

przemekbialek commented Mar 11, 2025


		WORKDIR /code

		RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx migraphx-dev half

feat(ml): rocm #16613

Are you sure you want to change the base?

feat(ml): rocm #16613

Conversation

mertalev commented Mar 5, 2025

Description

github-actions bot commented Mar 5, 2025 • edited Loading

NicholasFlamy Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

mertalev Mar 5, 2025

Choose a reason for hiding this comment

NicholasFlamy Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

NicholasFlamy Mar 7, 2025 • edited Loading

Choose a reason for hiding this comment

NicholasFlamy Mar 9, 2025

Choose a reason for hiding this comment

This comment was marked as resolved.

NicholasFlamy Mar 9, 2025

Choose a reason for hiding this comment

przemekbialek commented Mar 9, 2025

SharkWipf commented Mar 9, 2025

NicholasFlamy commented Mar 9, 2025

przemekbialek commented Mar 9, 2025

NicholasFlamy commented Mar 9, 2025

przemekbialek commented Mar 9, 2025 • edited Loading

przemekbialek commented Mar 9, 2025

NicholasFlamy commented Mar 9, 2025 • edited Loading

przemekbialek commented Mar 9, 2025 • edited Loading

NicholasFlamy commented Mar 9, 2025

przemekbialek commented Mar 10, 2025

NicholasFlamy commented Mar 10, 2025 • edited Loading

przemekbialek commented Mar 10, 2025

NicholasFlamy commented Mar 10, 2025

przemekbialek commented Mar 10, 2025 • edited Loading

NicholasFlamy commented Mar 11, 2025

przemekbialek commented Mar 11, 2025

ricklahaye commented Mar 11, 2025 • edited Loading

mertalev commented Mar 11, 2025 • edited Loading

ricklahaye commented Mar 11, 2025 • edited Loading

przemekbialek commented Mar 11, 2025

NicholasFlamy commented Mar 11, 2025

ricklahaye commented Mar 11, 2025

NicholasFlamy commented Mar 11, 2025 • edited Loading

przemekbialek commented Mar 11, 2025

github-actions bot commented Mar 5, 2025 •

edited

Loading

NicholasFlamy Mar 5, 2025 •

edited

Loading

NicholasFlamy Mar 5, 2025 •

edited

Loading

NicholasFlamy Mar 7, 2025 •

edited

Loading

przemekbialek commented Mar 9, 2025 •

edited

Loading

NicholasFlamy commented Mar 9, 2025 •

edited

Loading

przemekbialek commented Mar 9, 2025 •

edited

Loading

NicholasFlamy commented Mar 10, 2025 •

edited

Loading

przemekbialek commented Mar 10, 2025 •

edited

Loading

ricklahaye commented Mar 11, 2025 •

edited

Loading

mertalev commented Mar 11, 2025 •

edited

Loading

ricklahaye commented Mar 11, 2025 •

edited

Loading

NicholasFlamy commented Mar 11, 2025 •

edited

Loading