Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move JIT_InitPInvokeFrame implementation to assembler on x86 and ARM32 #113791

Merged
merged 1 commit into from
Mar 23, 2025

Conversation

filipnavara
Copy link
Member

No description provided.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 22, 2025
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

// Thread (r5) = pointer to Thread
//
//
LEAF_ENTRY JIT_InitPInvokeFrame, _TEXT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it actually help perf to have this method in assembly on Linux arm32? I would expect that the C/C++ version is going to have better perf thanks to better optimized TLS access that will outweigh the few extra instructions in JITed code to deal with the standard calling convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dtto for Linux x86

Copy link
Member Author

@filipnavara filipnavara Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They use custom calling convention, unfortunately, and I didn't want to modify JIT to account for that. On x86 this generated equivalent code to the old stub. On arm32 I moved some register saves to prolog/epilog because it was easier and more efficient.

Moreover, the C version currently behaves differently in regards to who pushes the frame so that would need to be unified too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We can optimize the TLS access; the JIT_PInvokeBegin helper used on R2R would benefit from that too; and it's very similar code also written in assembly.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They use custom calling convention, unfortunately

My guess is that it is a left-over from fragile NGen where we were happy to do a lot of complicated things to save a few bytes of binary size here and there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the asm implementation is ok for win-x86 since win-x86 is weird in many different ways, but I would switch to regular C++ for everything else.

Copy link
Member Author

@filipnavara filipnavara Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a much larger change to do that. I'm not necessarily opposed to that in principle, but I went with a change that is less disruptive for a few reasons. Namely, there's already assembler code for this in the R2R PInvoke helpers, so this puts similar code in a similar place which makes any changes to it easier to keep in sync. It removes some of the odd cases for x86/ARM32 even if not all. It produces slightly more optimal code on ARM32 at no expense to readability. It removes unnecessary startup cost (albeit a tiny one) and one place with dynamic code generation.

If we were to revisit this with JIT changes then I would prefer to address more things. Currently we have disparity with usage of USE_PER_FRAME_PINVOKE_INIT on various platforms. There's also the fact that JIT_PInvokeBegin and JIT_InitPInvokeFrame helpers do almost the same thing, except in the later case the JIT has to know more of the internal layout of InlinedCallFrame. I tried to switch the JIT to use the R2R P/Invoke helpers for all P/Invokes but that resulted in triggering hidden JIT bug (fix submitted in PR) and it painfully made me aware of the fact that JIT_PInvokeBegin/JIT_PInvokeEnd is not aware of SuppressGCTransition. Compared to all these open questions I consider the custom calling convention just a small problem.

(Lastly, the reason I run into this in the first place is that in my x86/new-EH experiment I need to errect a SEH frame on P/Invoke transitions. That's difficult to address with the current state of the code. At very least this removes the need to write a custom code generator for that.)

Copy link
Member Author

@filipnavara filipnavara Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second look it seems that going with JIT_PInvokeBegin/JIT_PInvokeEnd for all P/Invokes may be a viable alternative for a long term solution (unless it causes some unforeseen regressions; possibly restricted to 32-bit platforms if there are reasons to keep the frame optimizations in 64-bit JIT). There's still the issue with the behavior not being the same with SuppressGCTransition, but it seems that the GC starvation issues I have seen are unrelated (caused by the recent JIT_PollGC rewrite, issue #113807).

That said, I would still like to get this in as an improvement over status quo that is unlikely to cause any regressions.

@filipnavara
Copy link
Member Author

(The test failures look unrelated. I'll double check the ARM code on device before marking this ready for review.)

@jkotas
Copy link
Member

jkotas commented Mar 23, 2025

/ba-g linux-arm64 is #113810 (it is not possible to create a reliable pattern match for it)

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I agree that this is an improvement.

@jkotas jkotas merged commit a50b4a0 into dotnet:main Mar 23, 2025
93 of 97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-VM-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants