Skip to content

Disable inlining of _PyStack_AsTuple() to reduce the stack consumption #73420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vstinner opened this issue Jan 11, 2017 · 8 comments
Closed
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

BPO 29234
Nosy @vstinner, @serhiy-storchaka, @zooba
Files
  • noinline_msvs.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-02-01.17:03:48.688>
    created_at = <Date 2017-01-11.00:05:11.914>
    labels = ['interpreter-core', 'type-feature', '3.7']
    title = 'Disable inlining of _PyStack_AsTuple() to reduce the stack consumption'
    updated_at = <Date 2017-02-01.17:21:59.725>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2017-02-01.17:21:59.725>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-02-01.17:03:48.688>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2017-01-11.00:05:11.914>
    creator = 'vstinner'
    dependencies = []
    files = ['46248']
    hgrepos = []
    issue_num = 29234
    keywords = ['patch']
    message_count = 8.0
    messages = ['285168', '285170', '285172', '285182', '285191', '285201', '286656', '286663']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'python-dev', 'serhiy.storchaka', 'steve.dower']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue29234'
    versions = ['Python 3.7']

    @vstinner
    Copy link
    Member Author

    While working on the issue bpo-28870, I noticed that _PyStack_AsTuple() is inlined. Compared to Python 3.5 which didn't have this function, it seems like _PyStack_AsTuple() is responsible for an increase of the stack consumption.

    @vstinner vstinner added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 11, 2017
    @vstinner
    Copy link
    Member Author

    noinline_msvs.patch: implement _Py_NO_INLINE for Microsoft Visual Studio

    #elif defined(_MSC_VER)
    # define _Py_NO_INLINE __declspec(noinline)

    I didn't push this change because I don't have access to a Windows VM yet.

    @Steve: Does noinline_msvs.patch look good to you?

    I decided to make the macro private because I don't know if _Py_NO_INLINE can be combined with PyAPI_FUNC() which also uses __declspec(): __declspec(dllexport).

    Is it possible to use __declspec() multiple times per function declaration?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 11, 2017

    New changeset 6478e6d0476f by Victor Stinner in branch 'default':
    Disable _PyStack_AsTuple() inlining
    https://hg.python.org/cpython/rev/6478e6d0476f

    @zooba
    Copy link
    Member

    zooba commented Jan 11, 2017

    I'm fairly sure dllexport implies noinline, and it certainly does for the exported function (but maybe PGO will inline it within the module? hard to know). In any case, I agree with keeping it private as it's purely about optimization.

    Do you have a test that can run against the released build? That will be the easiest way to tell whether it matters at all.

    @serhiy-storchaka
    Copy link
    Member

    What is a stack usage effect of disabling inlining _PyStack_AsTuple()? Does it impact performance? Should inlining _PyStack_AsDict() be disabled too? Is it worth to extract slow paths of _PyObject_FastCallDict() and _PyObject_FastCallKeywords() into separate non-inlined functions?

    @vstinner
    Copy link
    Member Author

    Full commit message:
    ---
    Disable _PyStack_AsTuple() inlining

    Issue bpo-29234: Inlining _PyStack_AsTuple() into callers increases their stack
    consumption, Disable inlining to optimize the stack consumption.

    Add _Py_NO_INLINE: use __attribute__((noinline)) of GCC and Clang.

    It reduces the stack consumption, bytes per call, before => after:

    test_python_call: 1040 => 976 (-64 B)
    test_python_getitem: 976 => 912 (-64 B)
    test_python_iterator: 1120 => 1056 (-64 B)

    => total: 3136 => 2944 (- 192 B)
    ---

    Serhiy Storchaka: "What is a stack usage effect of disabling inlining _PyStack_AsTuple()?"

    The total effect on the 3 tests is to reduce the stack consumption by 192 bytes/call, or 64 bytes/call (8 CPU words) for each test.

    Does it impact performance?

    I ran a benchmark on 3 changes at once. The effect is a speedup, not a slowdown:
    http://bugs.python.org/issue28870#msg285173

    I don't expect any significant performance impact for the change 6478e6d0476f.

    Should inlining _PyStack_AsDict() be disabled too?

    Good question. I didn't try to write a benchmark calling this function. It would help to have numbers to take a decision.

    I tried to push the fewer changes which have the largest impact on the stack consumption. There is still room to reduce it even further.

    Is it worth to extract slow paths of _PyObject_FastCallDict() and _PyObject_FastCallKeywords() into separate non-inlined functions?

    Do you mean for performance or stack consumption? I don't know. If you would like to know, you should run a benchmark to measure that.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 1, 2017

    The initial issue has been fixed.

    With the issue bpo-28870 (msg285169): "The default branch is now as good as Python 3.4, in term of stack consumption, and Python 3.4 was the Python version which used the least stack memory according to my tests."

    Serhiy: As I wrote, for _PyStack_AsDict(), we need numbers to see if this function is inlined or not, and check its usage of the stack.

    Since I consider that the stack usage is now low enough, I'm not interested to continue to investigate the stack usage. Feel free to continue the effort, but please open a new issue in this case.

    I close the issue.

    @vstinner vstinner closed this as completed Feb 1, 2017
    @serhiy-storchaka
    Copy link
    Member

    Thank you Victor. Your work on reducing the stack consumption is great.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants