-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Disable inlining of _PyStack_AsTuple() to reduce the stack consumption #73420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
While working on the issue bpo-28870, I noticed that _PyStack_AsTuple() is inlined. Compared to Python 3.5 which didn't have this function, it seems like _PyStack_AsTuple() is responsible for an increase of the stack consumption. |
noinline_msvs.patch: implement _Py_NO_INLINE for Microsoft Visual Studio #elif defined(_MSC_VER) I didn't push this change because I don't have access to a Windows VM yet. @Steve: Does noinline_msvs.patch look good to you? I decided to make the macro private because I don't know if _Py_NO_INLINE can be combined with PyAPI_FUNC() which also uses __declspec(): __declspec(dllexport). Is it possible to use __declspec() multiple times per function declaration? |
New changeset 6478e6d0476f by Victor Stinner in branch 'default': |
I'm fairly sure dllexport implies noinline, and it certainly does for the exported function (but maybe PGO will inline it within the module? hard to know). In any case, I agree with keeping it private as it's purely about optimization. Do you have a test that can run against the released build? That will be the easiest way to tell whether it matters at all. |
What is a stack usage effect of disabling inlining _PyStack_AsTuple()? Does it impact performance? Should inlining _PyStack_AsDict() be disabled too? Is it worth to extract slow paths of _PyObject_FastCallDict() and _PyObject_FastCallKeywords() into separate non-inlined functions? |
Full commit message: Issue bpo-29234: Inlining _PyStack_AsTuple() into callers increases their stack Add _Py_NO_INLINE: use __attribute__((noinline)) of GCC and Clang. It reduces the stack consumption, bytes per call, before => after: test_python_call: 1040 => 976 (-64 B) => total: 3136 => 2944 (- 192 B) Serhiy Storchaka: "What is a stack usage effect of disabling inlining _PyStack_AsTuple()?" The total effect on the 3 tests is to reduce the stack consumption by 192 bytes/call, or 64 bytes/call (8 CPU words) for each test.
I ran a benchmark on 3 changes at once. The effect is a speedup, not a slowdown: I don't expect any significant performance impact for the change 6478e6d0476f.
Good question. I didn't try to write a benchmark calling this function. It would help to have numbers to take a decision. I tried to push the fewer changes which have the largest impact on the stack consumption. There is still room to reduce it even further.
Do you mean for performance or stack consumption? I don't know. If you would like to know, you should run a benchmark to measure that. |
The initial issue has been fixed. With the issue bpo-28870 (msg285169): "The default branch is now as good as Python 3.4, in term of stack consumption, and Python 3.4 was the Python version which used the least stack memory according to my tests." Serhiy: As I wrote, for _PyStack_AsDict(), we need numbers to see if this function is inlined or not, and check its usage of the stack. Since I consider that the stack usage is now low enough, I'm not interested to continue to investigate the stack usage. Feel free to continue the effort, but please open a new issue in this case. I close the issue. |
Thank you Victor. Your work on reducing the stack consumption is great. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: