-
Notifications
You must be signed in to change notification settings - Fork 20
LD_AUDIT sacrifices some performance #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's a fairly old bug report @amonakov but thank you for bringing it up. I'm happy to do some benchmarking if you have a copy of the original test program, so we can test it against liblsi-intercept If there is still a slowdown, then we'll just patch glibc and document that, as the module only implements |
The original test program is attached to the aforementioned bug report; here's a direct link to the attachment: https://sourceware.org/bugzilla/attachment.cgi?id=7044 Note that if your toolchain enables hardening by default (-z relro -z now) you won't see the slowdown because the test program won't use PLT (but games aren't usually compiled like that). The reason I've brought it up is exactly because this Glibc bug remains unfixed. If you prefer to patch Glibc on your end, what would be your recommendation to people packaging this on other distros? |
I'm definitely seeing a minor regression with your test case here:
However, when I build your libaudit with the distro CFLAGS:
Note that libaudit is being built with the CFLAGS, not the binary (representing a proprietary game). If I reintroduce your
Thus I assume this is more about symbol resolution time, thus, I hacked the demo to call some gtk_ calls:
Even building everything with hardening didn't make a significant difference after. Finally, after installing your patch, even with a hardened toolchain (which Solus uses by default), and having done tests with full relro on the main binary and audit lib, and finally replacing it with the LSI lib:
Basically, we need the rtld-audit interface, and we also need your patch. Given that LSI is aimed at distribution integrators, my hope is that they also integrate your patch (we can add this to Solus without issue). It seems your original patch thread died out, perhaps now is the time to upstream it so that all the distributions benefit from it? Distributions like Ubuntu are more willing to import an out of series patch to fix a bug when it has already landed in the VCS of the upstream project. :) |
Oh, and as a final metric, using your installed patches and your original test:
|
glibc patch import into Solus: https://dev.solus-project.com/R927:afa5b639e8a9b62618457a304d1e6fb42a9f2066 |
Thinking further on this, and correct me if I'm wrong, but the performance regression should only come from initial symbol resolution, thus affecting startup time and module load time, right? During the initial mapping. Anyway, this further illustrates the need for a self contained LSI bundle that is free from distro issues.. |
No, of course not, please read If only initial calls get slower, that's not a major issue for games in the first place. |
Ah well that's not good at all. Just read properly through OK so I'm going to document this issue within the README, just so integrators know the story. Obviously it would be fantastic if upstream accepts your patch (thank you for that!). FWIW LSI does allow you to turn off the intercept module, which may actually come in useful for those wanting to do benchmarks with and without the patch inside the games themselves. FWIW I'm aware of the pressure on distributions when faced with integrating Steam, and it is becoming a heavy burden for them. This is why I'm looking to third party application systems with the view of building a specialised (ABI compatible) runtime containing a strict-mode LSI (and your glibc patch ofc!) that would effectively be a Solus-based runtime to provide the same Steam experience everywhere, even on distributions not supporting multilib. In these third party systems we can ensure only our own libraries are used, and there is no more cross contamination, and distributions wouldn't have to worry about these issues anymore. :) |
This goes some way to satisfy the concerns of issue #15 so that everyone knows where they stand, how to mitigate this **now**, and what we intend to do about this in future. Signed-off-by: Ikey Doherty <ikey@solus-project.com>
^ I've documented this in the README - if you feel it needs more clarification or details, please let me know :) |
Users with older AVX-capable CPUs, especially the famous SandyBridge generation (i5-2500 and such) should especially beware, since there the penalty due to this issue is the highest. My test indicates roughly extra 420 cycles per call (this very high!), of those 140 I believe are twice 70 cycles avx transition penalty; didn't try to accurately analyze the rest. |
Damn - very common CPU too. |
Gonna close this now as the issue is documented, Solus is patched, and we're gonna provide a Snap with a patched glibc. |
The patch ensures that performance handlers aren't installed (breaking PLT lookup) when the RTLD interface (`liblsi-intercept`) doesn't do any PLT mangling, but only implements basic la_objsearch functions. LSI issue: solus-project/linux-steam-integration#15 glibc issue: https://sourceware.org/bugzilla/show_bug.cgi?id=15533 Signed-off-by: Ikey Doherty <ikey@solus-project.com>
Uh oh!
There was an error while loading. Please reload this page.
There's an issue in Glibc that, when LD_AUDIT is non-empty, causes all calls via PLT (i.e. normally all calls to functions implemented in shared libraries) to go via a hook that saves some registers (including vector registers that may hold passed arguments) on stack, calls into dynamic linker to invoke la_pltenter hooks (even if none registered), restores registers, invokes the original destination function, invokes la_pltexit hooks, and finally returns to caller. Obviously this is slow and unnecessary if all the audit module wants is to redirect some libraries. The Glibc bugreport is here: https://sourceware.org/bugzilla/show_bug.cgi?id=15533 (I've hit this issue back then when playing with an idea similar to yours).
It appears you're setting LD_AUDIT for all child processes including games, so that slows games to some degree. If not, can you add a clarifying comment somewhere?
(edited: grammar and clarity)
The text was updated successfully, but these errors were encountered: