Skip to content

An API to detect crash inside worker thread #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Bubbler-4 opened this issue Jun 18, 2021 · 6 comments · Fixed by #4
Closed

An API to detect crash inside worker thread #3

Bubbler-4 opened this issue Jun 18, 2021 · 6 comments · Fixed by #4

Comments

@Bubbler-4
Copy link
Contributor

Bubbler-4 commented Jun 18, 2021

I found that the current Thread API does not support the situation where the worker crashes while running the given task. (Please correct me if I'm wrong) So I made a patched version inside a small demo app (live version here). The patch adds an onerror hook to the created JS Worker instance, and an is_terminated() method to allow polling if the thread crashed or not.

Should I open a PR on this?

@j-devel
Copy link
Contributor

j-devel commented Jun 21, 2021

Thank you for the feedback!

I found that the current Thread API does not support the situation where the worker crashes while running the given task.

That's true; and thanks for point this out! It's been a while since 'src/atw.rs' was implemented, and handling the worker's crash cases has been totally forgotten! So I'd be very happy to have such a PR that improves the situation.

Regarding the patch, I have some suggestions though.

According to the demo, it's really nice that we can trigger the closure made by create_onerror() upon Rust panics. But within that closure, can it be like broadcasting Promise rejections (maybe sth like this), rather than signalling "termination" of the worker? I think this will allow us to actually handle the error more naturally.

As far as I tested, the worker is still "alive" even after computation_panic() crashes. So I guess we shouldn't treat it as the same as worker termination?

@Bubbler-4
Copy link
Contributor Author

But within that closure, can it be like broadcasting Promise rejections (maybe sth like this), rather than signalling "termination" of the worker?

Yeah, that sounds much better than my initial idea. Even better if the thread can be actually reused after a job crashes, though I guess we'll need an experiment for that part.

@Bubbler-4
Copy link
Contributor Author

I tried copying the body of cancel_pending_requests function into create_onerror(). It works pretty well (the Err result is correctly passed as the result of exec!), and the thread can indeed accept more jobs without terminating. The updated demo is here.

One major(?) caveat is that every crash inside the thread seems to occupy the memory it was holding. If you repeatedly click the "error" button like 30 or 40 times and then click the "good" button, you get OOM panic on that too. A good news is that it can still be fixed by terminating the thread and starting a new one.

I'm honestly not sure how to deal with multiple jobs being run at once. I guess it is probably safe to reject all jobs (as it is now) since other parts of Rust code could be affected by the panic inside the thread.

@j-devel
Copy link
Contributor

j-devel commented Jun 24, 2021

Nice to hear that it worked as we expected! (with an interesting caveat 😄 )

One major(?) caveat is that every crash inside the thread seems to occupy the memory it was holding.

Looks related to this issue. So, once https://github.com/WebAssembly/exception-handling becomes available in future, I think we will be able to do something about it (e.g. informing the developer about the ID of the exact panic'ed job, and/or providing a callback interface).

how to deal with multiple jobs being run at once. I guess it is probably safe to reject all jobs (as it is now) since other parts of Rust code could be affected by the panic inside the thread.

Yeah, and due to the point above, it's likely that the developer ends up calling Thread::terminate() for now(?)

Now, it looks like we are ready; will you file a PR?

@Bubbler-4
Copy link
Contributor Author

I guess you can cargo publish the new version now :)

@j-devel
Copy link
Contributor

j-devel commented Jun 29, 2021

Indeed. wasm-mt = "0.1.2" published!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants