-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ml): rocm #16613
base: main
Are you sure you want to change the base?
feat(ml): rocm #16613
Conversation
use OrtMutex
use 3.12 use 1.19.2
guard algo benchmark results mark mutex as mutable re-add /bin/sh (?) use 3.10 use 6.1.2
1.19.2 fix variable name fix variable reference aaaaaaaaaaaaaaaaaaaa
📖 Documentation deployed to pr-16613.preview.immich.app |
steps: | ||
- name: Login to GitHub Container Registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some changes in indentation as well as changes from double quote to single quote. Was this intended? I know it's from the first commit from the original PR but I don't think that was addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VS Code did this when I saved. I'm not sure why it's different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a PR check that runs prettier on the workflow files? I would think the inconsistency exists because there likely isn't.
|
||
WORKDIR /code | ||
|
||
RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx migraphx-dev half |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx migraphx-dev half | |
RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx-dev |
Only migraphx-dev
is needed as the other 2 are dependencies.
Edit: don't change it now, though, because it's already building.
@@ -80,11 +111,14 @@ COPY --from=builder-armnn \ | |||
/opt/ann/build.sh \ | |||
/opt/armnn/ | |||
|
|||
FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS prod-rocm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there were already comments on this, but I think copying the deps manually may result in a smaller, yet still working image. It might be worth re-investigating.
@@ -15,6 +15,34 @@ RUN mkdir /opt/armnn && \ | |||
cd /opt/ann && \ | |||
sh build.sh | |||
|
|||
# Warning: 25GiB+ disk space required to pull this image | |||
# TODO: find a way to reduce the image size | |||
FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS builder-rocm |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Not it.
Fedora rocBLAS patch for gfx1103 support looks like copy of gfx1102 (navi33). Only names and ISA versions differ. I diffed changes betwen few files and think that theese are only diferences.
I'm intrested in additional gpu support because I have minipc with Ryzen8845HS (Radeon 780M) for testing, and second one with Ryzen5825U.
|
My 780m locks up my desktop roughly 50% of the time when using ROCm llama.cpp/whisper.cpp with any ROCm version (1100, 1102, 1103). I'd hoped it would be less of an issue headless or with different applications, but if you have the same issue with Immich that does not bode well... |
This is not a valid version from what I've observed. So far, there are only 3 valid options:
|
Unfortunately adding support for gfx1102 dosen't solve problems with crashing on Radeon 780M, but I'm happy because I succeeded getting it to work on Ryzen 5825U GPU. |
They also specifically say certain iGPUs crash. I would bet that they're just bleading edge.
That model or similar is known to work. |
ROCm which is in image created in this PR has compiled for arch which are below so 11.0.2 is valid option because this means gfx1102. Below some direcrory listing from image.
Without patch to onxruntime HSA_OVERRIDE_GFX_VERSION=9.0.0 isn't a valid option in immich-machine-learning because this arch isn't compiled by default.
|
I created my image from commit 1c550fa with following changes:
Propably better way is to set CMAKE_HIP_ARCHITECTURES in Dockerfile but I didn't know how to do it so I made it brute force ;) |
Alright, so I learned something. Some gfx versions such as This PR should have these versions as of
Edit: I did also experiment on my system with a python script and setting different values:
|
Some archs are builtin in ROCm, and than works without HSA_OVERRIDE_GFX_VERSION. If Your arch isn't supported than You may try with arch override. Sometimes it works.
Yes. Support in ROCm doesn't means that there is support in other software. I added for mysel only two because I may test it on my hardware. In onnx runtime default arch list is shorter than in ROCm. Adding two archs in my image addes about 0,1GB to the size of original one:
|
I might try that. I could pass that into the Dockerfile. |
Finally I found a workaround for 780M. When I set HSA_USE_SVM=0 environment variable crashes are gone. I may use HSA_OVERRIDE_GFX_VERSION=11.0.2 or HSA_OVERRIDE_GFX_VERSION=11.0.0, and everything works in immich-machinelearning as expected. I may set concurency in Smart Search and Face Detection to 5, set bigger multilanguage model to Smart Search, and all works, but I dont use graphical environment on this machine. |
I might have to try that on my RX 6700 XT lol. Edit: It still has a few buggy things when running this PR. |
I tested this PR once again. I used latest commit with newer ROCm version, but rolled out changes to migraphix because it crashes for me. I added also gfx900 and gfx1102 archs to onnx runtime, this time the propper way. Bellow are changes that I made:
Configuration above was tested on hardware listed below:
When all is set as above I have no problems with running Smart Search and Facial Detection in pararell with with job concurency set to 5 (2 on Ryzen 5825U) for both jobs. I also tested on bigger models and this also works. On migraphx image ml workers crashed instantly even on Radeon RX 6800XT:
In dmesg i see:
|
Hmm okay, do you MigraphX installed on the host? For me it doesn't even get there because it errors about the MigraphX dependency. |
No. I dont know migraphx and I assumed that everything needed is in docker image. I found command to test installation of migraphx and run it in docker image without problems:
I found that Smart Search alone works with migraphx even with concurency 5. The problem is with Face Detection. |
Oh wow, good job getting it running!
I have problems with Face Detection with regular ROCm too. |
Thanks go to @mertalev. I only pulled image from this PR and tested it.
What problems do You have? |
Smart search seem to run correctly. Face detection fails:
|
For those saying smart search runs correctly with MIGraphX, can you confirm that GPU utilization is high (it's okay for it to initially have high CPU utilization due to compilation, but it should eventually be primarily GPU utilization). |
I checked and its indeed using CPU and not the GPU! Apologies for previous statement saying 'it works' |
Same for me. Only CPU utilization on migraphx image. |
Yes, this was my issue. The error I had was resolved by a commit that mert added after I had the error. I forgot that that error was fixed and then it was only using CPU. |
I do want to add for everyone that when trying the later commit; hardware acceleration works for smart search and face detection! I tried the original image/PR ghcr.io/immich-app/immich-machine-learning:pr-16613-rocm and that did not work, but later commits did work for me! The one I used was ghcr.io/immich-app/immich-machine-learning:commit-ded3bbb033615385a2339e27171b6ab682a31df9-rocm I did add the following environment variables: HSA_OVERRIDE_GFX_VERSION=11.0.0 and HSA_USE_SVM=0 |
Yep, that version works, but have you tried 100 images and running Smart Search and Face Detection at the same time? I hit deadlock doing that. Also, what GPU and what OS are you running? |
I tried to run some tests with migraphx image and run docker exec on it, next I pip install transformers psutil torch and then i run: python3 -m onnxruntime.transformers.benchmark -g -m bert-base-cased --provider migraphx This command returns with following error:
I'm not an expert and don't know what am I doing, but maybe this will help You. :D |
Description
This PR introduces support for AMD GPUs through ROCm. It's a rebased version of #11063 with updated dependencies.
It also once again removes algo caching, as the concurrency issue with caching seems to be more subtle than originally thought. While disabling caching is wasteful (it essentially runs a benchmark every time instead of only once), it's still better than the current alternative of either lowering concurrency to 1 or not having ROCm support.