Skip to content

Understanding the WASM behind Web-LLM #684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Yingrjimsch opened this issue Apr 25, 2025 · 3 comments
Open

Understanding the WASM behind Web-LLM #684

Yingrjimsch opened this issue Apr 25, 2025 · 3 comments

Comments

@Yingrjimsch
Copy link

Hi I really love the project and have already built several small websites with WebLLM integrated somewhere inside of them. Now I am starting to explore the more technical side, how it's build and did a deep dive into the code. What I'm missing is, where can I find the WASM Libs of the different LLM Architectures. As far as I've understood (and please correct me if I'm wrong) Web-LLM is handling everything until inference starts and the actual inference and buildup of the LLM Architecture is done in the WASM for performance reasons?

I want to look into the WASM libs to understand the whole inference engine a little better and would appreciate any help 😄

@ElituGo
Copy link

ElituGo commented May 2, 2025

Hi I really love the project and have already built several small websites with WebLLM integrated somewhere inside of them. Now I am starting to explore the more technical side, how it's build and did a deep dive into the code. What I'm missing is, where can I find the WASM Libs of the different LLM Architectures. As far as I've understood (and please correct me if I'm wrong) Web-LLM is handling everything until inference starts and the actual inference and buildup of the LLM Architecture is done in the WASM for performance reasons?

I want to look into the WASM libs to understand the whole inference engine a little better and would appreciate any help 😄

Your phrasing is a bit opaque but to clarify - it's all managed / orchestrated via WebLLM. There's no magic handoff where WebLLM is inactive. It calls the WASM engine for the inference step, waits for the result, and then continues managing the overall process (WebLLM). Hope this help you a bit

@Yingrjimsch
Copy link
Author

Hi I really love the project and have already built several small websites with WebLLM integrated somewhere inside of them. Now I am starting to explore the more technical side, how it's build and did a deep dive into the code. What I'm missing is, where can I find the WASM Libs of the different LLM Architectures. As far as I've understood (and please correct me if I'm wrong) Web-LLM is handling everything until inference starts and the actual inference and buildup of the LLM Architecture is done in the WASM for performance reasons?
I want to look into the WASM libs to understand the whole inference engine a little better and would appreciate any help 😄

Your phrasing is a bit opaque but to clarify - it's all managed / orchestrated via WebLLM. There's no magic handoff where WebLLM is inactive. It calls the WASM engine for the inference step, waits for the result, and then continues managing the overall process (WebLLM). Hope this help you a bit

Hi, yes thanks I have understood that. In the WebLLM project the WASM is loaded from the corresponding model. The wasm binaries are also in a repo. Is the inference Code somewhere open source viewable (not as binary but as Code)?

@CharlieFRuan
Copy link
Contributor

CharlieFRuan commented May 5, 2025

Hi there! Thanks for the discussion. MLC-LLM and TVM are the two sources for the implementation of the WASM (both WebGPU kernels and necessary runtime support such as tensor manipulation).

For instance, the following line in llm_chat.ts:

    this.prefill = this.tvm.detachFromCurrentScope(
      this.vm.getFunction("prefill"),
    );

loads a compiled function called prefill defined in MLC-LLM. Each model architecture has its own prefill, and here is Llama's: https://github.com/mlc-ai/mlc-llm/blob/d2118b3c9d56da6d1e66dfe2667f650020417010/python/mlc_llm/model/llama/llama_model.py#L330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants