Understanding the WASM behind Web-LLM #684

Yingrjimsch · 2025-04-25T06:41:57Z

Hi I really love the project and have already built several small websites with WebLLM integrated somewhere inside of them. Now I am starting to explore the more technical side, how it's build and did a deep dive into the code. What I'm missing is, where can I find the WASM Libs of the different LLM Architectures. As far as I've understood (and please correct me if I'm wrong) Web-LLM is handling everything until inference starts and the actual inference and buildup of the LLM Architecture is done in the WASM for performance reasons?

I want to look into the WASM libs to understand the whole inference engine a little better and would appreciate any help 😄

ElituGo · 2025-05-02T15:40:00Z

Hi I really love the project and have already built several small websites with WebLLM integrated somewhere inside of them. Now I am starting to explore the more technical side, how it's build and did a deep dive into the code. What I'm missing is, where can I find the WASM Libs of the different LLM Architectures. As far as I've understood (and please correct me if I'm wrong) Web-LLM is handling everything until inference starts and the actual inference and buildup of the LLM Architecture is done in the WASM for performance reasons?

I want to look into the WASM libs to understand the whole inference engine a little better and would appreciate any help 😄

Your phrasing is a bit opaque but to clarify - it's all managed / orchestrated via WebLLM. There's no magic handoff where WebLLM is inactive. It calls the WASM engine for the inference step, waits for the result, and then continues managing the overall process (WebLLM). Hope this help you a bit

Yingrjimsch · 2025-05-05T06:02:42Z

Hi I really love the project and have already built several small websites with WebLLM integrated somewhere inside of them. Now I am starting to explore the more technical side, how it's build and did a deep dive into the code. What I'm missing is, where can I find the WASM Libs of the different LLM Architectures. As far as I've understood (and please correct me if I'm wrong) Web-LLM is handling everything until inference starts and the actual inference and buildup of the LLM Architecture is done in the WASM for performance reasons?
I want to look into the WASM libs to understand the whole inference engine a little better and would appreciate any help 😄

Your phrasing is a bit opaque but to clarify - it's all managed / orchestrated via WebLLM. There's no magic handoff where WebLLM is inactive. It calls the WASM engine for the inference step, waits for the result, and then continues managing the overall process (WebLLM). Hope this help you a bit

Hi, yes thanks I have understood that. In the WebLLM project the WASM is loaded from the corresponding model. The wasm binaries are also in a repo. Is the inference Code somewhere open source viewable (not as binary but as Code)?

CharlieFRuan · 2025-05-05T06:07:44Z

Hi there! Thanks for the discussion. MLC-LLM and TVM are the two sources for the implementation of the WASM (both WebGPU kernels and necessary runtime support such as tensor manipulation).

For instance, the following line in llm_chat.ts:

    this.prefill = this.tvm.detachFromCurrentScope(
      this.vm.getFunction("prefill"),
    );

loads a compiled function called prefill defined in MLC-LLM. Each model architecture has its own prefill, and here is Llama's: https://github.com/mlc-ai/mlc-llm/blob/d2118b3c9d56da6d1e66dfe2667f650020417010/python/mlc_llm/model/llama/llama_model.py#L330

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding the WASM behind Web-LLM #684

Understanding the WASM behind Web-LLM #684

Yingrjimsch commented Apr 25, 2025

ElituGo commented May 2, 2025

Yingrjimsch commented May 5, 2025

CharlieFRuan commented May 5, 2025 •

edited

Loading

Understanding the WASM behind Web-LLM #684

Understanding the WASM behind Web-LLM #684

Comments

Yingrjimsch commented Apr 25, 2025

ElituGo commented May 2, 2025

Yingrjimsch commented May 5, 2025

CharlieFRuan commented May 5, 2025 • edited Loading

CharlieFRuan commented May 5, 2025 •

edited

Loading