-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move JIT_InitPInvokeFrame implementation to assembler on x86 and ARM32 #113791
Conversation
Tagging subscribers to this area: @mangod9 |
// Thread (r5) = pointer to Thread | ||
// | ||
// | ||
LEAF_ENTRY JIT_InitPInvokeFrame, _TEXT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it actually help perf to have this method in assembly on Linux arm32? I would expect that the C/C++ version is going to have better perf thanks to better optimized TLS access that will outweigh the few extra instructions in JITed code to deal with the standard calling convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dtto for Linux x86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They use custom calling convention, unfortunately, and I didn't want to modify JIT to account for that. On x86 this generated equivalent code to the old stub. On arm32 I moved some register saves to prolog/epilog because it was easier and more efficient.
Moreover, the C version currently behaves differently in regards to who pushes the frame so that would need to be unified too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We can optimize the TLS access; the JIT_PInvokeBegin
helper used on R2R would benefit from that too; and it's very similar code also written in assembly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They use custom calling convention, unfortunately
My guess is that it is a left-over from fragile NGen where we were happy to do a lot of complicated things to save a few bytes of binary size here and there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the asm implementation is ok for win-x86 since win-x86 is weird in many different ways, but I would switch to regular C++ for everything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a much larger change to do that. I'm not necessarily opposed to that in principle, but I went with a change that is less disruptive for a few reasons. Namely, there's already assembler code for this in the R2R PInvoke helpers, so this puts similar code in a similar place which makes any changes to it easier to keep in sync. It removes some of the odd cases for x86/ARM32 even if not all. It produces slightly more optimal code on ARM32 at no expense to readability. It removes unnecessary startup cost (albeit a tiny one) and one place with dynamic code generation.
If we were to revisit this with JIT changes then I would prefer to address more things. Currently we have disparity with usage of USE_PER_FRAME_PINVOKE_INIT
on various platforms. There's also the fact that JIT_PInvokeBegin
and JIT_InitPInvokeFrame
helpers do almost the same thing, except in the later case the JIT has to know more of the internal layout of InlinedCallFrame
. I tried to switch the JIT to use the R2R P/Invoke helpers for all P/Invokes but that resulted in triggering hidden JIT bug (fix submitted in PR) and it painfully made me aware of the fact that JIT_PInvokeBegin
/JIT_PInvokeEnd
is not aware of SuppressGCTransition
. Compared to all these open questions I consider the custom calling convention just a small problem.
(Lastly, the reason I run into this in the first place is that in my x86/new-EH experiment I need to errect a SEH frame on P/Invoke transitions. That's difficult to address with the current state of the code. At very least this removes the need to write a custom code generator for that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second look it seems that going with JIT_PInvokeBegin
/JIT_PInvokeEnd
for all P/Invokes may be a viable alternative for a long term solution (unless it causes some unforeseen regressions; possibly restricted to 32-bit platforms if there are reasons to keep the frame optimizations in 64-bit JIT). There's still the issue with the behavior not being the same with SuppressGCTransition
, but it seems that the GC starvation issues I have seen are unrelated (caused by the recent JIT_PollGC
rewrite, issue #113807).
That said, I would still like to get this in as an improvement over status quo that is unlikely to cause any regressions.
(The test failures look unrelated. I'll double check the ARM code on device before marking this ready for review.) |
/ba-g linux-arm64 is #113810 (it is not possible to create a reliable pattern match for it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I agree that this is an improvement.
No description provided.