-
Notifications
You must be signed in to change notification settings - Fork 1k
Understanding the WASM behind Web-LLM #684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Your phrasing is a bit opaque but to clarify - it's all managed / orchestrated via WebLLM. There's no magic handoff where WebLLM is inactive. It calls the WASM engine for the inference step, waits for the result, and then continues managing the overall process (WebLLM). Hope this help you a bit |
Hi, yes thanks I have understood that. In the WebLLM project the WASM is loaded from the corresponding model. The wasm binaries are also in a repo. Is the inference Code somewhere open source viewable (not as binary but as Code)? |
Hi there! Thanks for the discussion. MLC-LLM and TVM are the two sources for the implementation of the WASM (both WebGPU kernels and necessary runtime support such as tensor manipulation). For instance, the following line in
loads a compiled function called |
Hi I really love the project and have already built several small websites with WebLLM integrated somewhere inside of them. Now I am starting to explore the more technical side, how it's build and did a deep dive into the code. What I'm missing is, where can I find the WASM Libs of the different LLM Architectures. As far as I've understood (and please correct me if I'm wrong) Web-LLM is handling everything until inference starts and the actual inference and buildup of the LLM Architecture is done in the WASM for performance reasons?
I want to look into the WASM libs to understand the whole inference engine a little better and would appreciate any help 😄
The text was updated successfully, but these errors were encountered: