-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[TPU] Avoid Triton Import #15589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TPU] Avoid Triton Import #15589
Conversation
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Btw, shall we consider to have a common interface of layers, and implementations for different backend are put in the different files.
yes, I can catch you up @houseroad on what our plan is offline in a separate thread. We have been planning a refactor led by Lucas but its just been hard to prioritize it with all the models coming out. If your team has capacity to help with it, we can collaborate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, unfortunately
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
+1, break other backend as well. |
Thanks for this fix! BTW, I'm doing smilar things in pr #15099. |
This would be a much better solution. Our code is riddled with delayed imports for this reason. |
Thanks @MengqingCao - will take a look |
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Hello @robertgshaw2-redhat ,
and Non-CUDA backends will run into triton import. |
Could you try with #15099, which is a pr addressing the triton import issue on NON-CUDA devices. Feel free to review this pr or feed back your result, thanks! |
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
SUMMARY: