-
Notifications
You must be signed in to change notification settings - Fork 695
Breaking up large (20 MB+) .wasm files #1166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@szilvaa feel free to chip in if I missed anything. |
This is certainly seems like a use case that seems we should have an answer for. I don't have any outstanding starting points ATM for a proposal unfortunately. I do have one comment on your "hot" to "cold" bridge paying the cost of Wasm->JS->Wasm every time, however. I think if you made the bridge calls into indirect calls from a Table then you would only have to do a Wasm -> JS call once and it could stub its entry in the table with the "cold" Wasm function. At least in JSC, although I expect all engines do this, if we see a Wasm -> Wasm indirect call we will bypass the JS entrypoint code. Thus, only the first "cold" call will pay the Wasm->JS->Wasm boundary cost. |
Twitter context, with @TheLarkInn saying that Webpack is looking at doing this 😁 Of course other tools doing it would be great. One sad thing about inserting wasm->js->wasm calls where there used to only be wasm->wasm is that tools now need to re-write all |
@camwest you mentioned the streaming process. WebAssembly supports instantiation using a stream (probably your HTTP request). Have you tried this approach? |
In our Cheerp C++ compiler we are currently exploring a solution based on tagging functions/classes/namespaces at the C++ level (something like [[cheerp::hot]] or [[cheerp::first]], like we do already to automatically generate JS bridges for DOM interaction) to mark them for inclusion into a loader module. PGO based approach could also be considered. |
Agreed that it would be cool if tools could help here. @camwest Is this 29.6mb before Content-Encoding:gzip? Assuming the usual 3x reduction we see with gzip, is the download time of the ~10mb compressed payload problematic, or is it more the subsequent compilation time you see in release browsers? If it's the latter, that should be improving significantly in the coming months, especially if you use the streaming compilation API so that compilation can overlap download. |
Still, the fastest compilation will occur when we don't have to compile anything 😄 |
@lukewagner What we (I work with @camwest) are hoping is that streaming compilation can be further pipelined all the way to the "run" stage so in the end we could start running wasm before it is fully downloaded. We expect this would further improve performance provided that the code in the wasm is arranged such that the first bytes are the first to run (via PGO) |
@jfbastien Agreed. @szilvaa I can see why that's attractive, but having an arbitrary synchronous wasm call block on the network seems to risk the app freezing (if the user does something outside the profiled path or if the network is extra slow) and is also at odds with the general non-blocking-io design of the web. Also, the network can be a lot slower than local i/o, so it's not quite so analogous to what native does when launching a local app. |
Running WASM code before the file is fully downloaded is not possible with the current design: the download process must check for the presence of a data section (which is after the code section) before the instance can be returned. If the implementation had an option to disable the data section, then there's potential. A solution where WASM exports could be directly connected to imports without bridging through JavaScript seems ideal to me. It provides efficient solutions to this problem (IE, on-demand loading of rarely-used code) and enables 64-bit integers (and future types) to be directly transferred. |
@RyanLamansky I don't quite understand how the WASM import/export mechanism avoids the problem that @lukewagner mentions above (i.e. arbitrary wasm call may be blocked on network I/O). The import dependency is either resolved at instantiation time in which case it really does not help at all with startup performance. Or it is resolved at runtime in which case the provider of the import may not be present yet so the call must block. I think the only way to avoid "freezing" the UI thread is to run your WASM on a worker (which is what we do). This suggests that maybe this sort of pipelined code execution should only be available in a worker. |
cc @sokra for coverage. This is scenario is something we'll likely discuss for webpack helping solve. |
webpack will not help you with a big wasm file. It supports Code Splitting with This probably requires you to restructure your native code, at least on these boundaries where you want to load on demand. It's a kind of distributed architecture: Multiple wasm components communicate async over JS. |
@szilvaa Yeah, I can imagine a pure toolchain solution almost working in workers; the main limitation is effective lack of Regarding "We don't currently have a good strategy for defining split points." in the OP, have you considered a coarse-grained strategy of splitting the app up into an exe with asynchronously-loaded DLLs? IIUC, Emscripten provides support for dynamic linking (where the exe and each dll turn into a .wasm) and, now that we have |
@lukewagner Yes, of course, we have considered this. In fact, the code already has exe/dlls break-up on windows/osx but these boundaries are not on the hot vs. cold code boundary for our current web scenarios. But let's say we have a hot.wasm and a cold.wasm. As far as I understand we couldn't use an import |
Yep, I imagined a JS bridge between two WASM modules with a async API inbetween. The bridge would use But @lukewagner's approach where JS only fills imports into a Table sounds also nice. I guess this results in fasterer WASM to WASM calls and you could use a sync interface. |
Sync download + instanciation in WebWorkers doesn't look like a nice approach from UX to me. You basically block your complete native part while parts are downloaded. |
Yes, it should basically be the same call as a plain pointer-to-function call which is going to be a factor faster than thunking through JS.
So it sounds like the current impl of |
Rather than communicating through JavaScript, a better approach may be to reload the entire application: ultimately, the application is binary data that can be modified with string concatenation. You can download a "hot" wasm file and then a patch: the difference between the "hot" wasm file and the the wasm file for the entire application. It should be possible to save the state from the hot application, and then and restore with the full application. |
Or, you could streaming-instantiate the new module in a new WebWorker, and also instantiate the existing module, by passing the |
@qm3ster , but the application would still be running midway when the data is loaded into the new worker so wouldn't we need to exit the application first? |
@awtcode not before the application is finished loading in the new worker. |
Hi folks,
I was asked by @jfbastien to post here based on a twitter conversation we had: https://twitter.com/jfbastien/status/941170112014327808
AutoCAD web is a flavor of AutoCAD which runs entirely in the browser thanks to WebAssembly. We took our core engine, removed everything we could to get it to the smallest possible size. The resulting wasm file is currently 29.6 MB. It's now in beta if you want to try it out: http://client.autocad360.com/
Problem
Our team has been working on this for a while now, and we expect many different users with slow internet connections to use the application. We want to optimize for first-time use as much as possible. We also realize that this represents one of the largest web applications out there.
Ideally, we'd break up the wasm file into smaller chunks, where the first chunks downloaded would only represent the minimum code necessary to display graphics, show the cursor, and let the user zoom and pan around while the commands and other modules are lazy loaded.
We don't currently have a good strategy for defining split points. The desktop variant of the product uses a virtual memory manager (VMM) and profile guided optimization (PGO) to optimize startup time, and our hunch is that this is a good strategy for partitioning our code into "hot" and "cold" chunks. It's not feasible for us to re-design the core engine smaller than it is through manual definition of split points.
Tool based Solution
We think we can solve this problem by investing in tooling internally such that we extend PGO so that we emit two wasm files, one hot wasm file with stubs which, when called, block until the cold wasm file is downloaded and started. Managing two wasm files in JavaScript feels like a pretty nasty hack since the boundary between the hot and cold wasm files would most likely involve crossing from wasm to js to wasm again.
Browser based Solution
It would be even better if we could work with the browser vendors and solve the problem by extending the design of WebAssembly to support this use case. I expect lots of larger applications would be able to take advantage of it.
For example, what if we could pipeline a PGO optimized wasm file so that it was downloading, compiling, instantiating, and executing in a streaming process. The process could also raise events when a cold stub was hit allowing us to design an experience around startup.
The text was updated successfully, but these errors were encountered: