gh-107674: Improve performance of `sys.settrace` #114986

gaogaotiantian · 2024-02-04T07:42:20Z

We observed a severe performance regression on sys.settrace when PEP 669 is implemented. One of the important reason is how we deal with the instrumented lines in the trace function.

In 3.11, it's a simple check for tstate->tracing in frame eval. However, because we have INSTRUMENTED_LINE now, the check is deeper than before. The extra calls and unnecessary code when tstate->tracing != 0 contributed to the majority of the overhead.

In a real life application, a trace function is normally complicated. Take coverage as an example, for each line that is really "instrumented", there will be many lines in the trace function that are "instrumented" but should not trigger instrument callbacks. Therefore, tstate->tracing != 0 is the real hot path for tracing.

I made a microbenchmark:

import sys

def do_nothing():
    a = 1

def trace(frame, event, arg):
    for _ in range(2000):
        a = 1
    return trace

sys.settrace(trace)


for i in range(10000):
    do_nothing()

This is a bit extreme as there will be a couple thousand "suppressed" line events in the trace function for each real line event. The result is:

before:

real    0m2.009s
user    0m2.008s
sys     0m0.001s

after:

real    0m1.128s
user    0m1.127s
sys     0m0.001s

3.11:

real    0m0.779s
user    0m0.768s
sys     0m0.011s

Then I realized - we should not instrument the code object when we are tracing. Many functions are only called in tracing callbacks (same as sys.monitoring callbacks), and they should not be instrumented. Not only we saved time by not instrumenting them and not calling the instrumented version of bytecode, we can actually enjoy specializing in tracing functions.

So I made some changes to the RESUME and INSTRUMENTED_RESUME instructions, which brings the result to

real    0m0.908s
user    0m0.898s
sys     0m0.011s

The code for RESUME has an extra check for tstate->tracing, but for normal code without monitoring, it should be immediately specialized to RESUME_CHECK so the impact should not be significant.

Even considering the fact that the example is a bit extreme, the result is super significant. I believe this has a better effect than #107780

Is there a way to do a more comprehensive benchmark on this PR?

One difference would be asserts. Moving the check out of the instrumentation function skips the assertions - which we could add back but it would make the code a bit uglier. Admittedly, structure wise, it might not be as pretty as the current code, but I think this amount of performance improvement is worth it.

Issue: sys.settrace dramatic slowdown in 3.12 #107674

markshannon

A nice performance improvement, thanks.

One minor issue, but otherwise looks good.

Python/instrumentation.c

bedevere-app · 2024-02-05T16:16:06Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

gaogaotiantian · 2024-02-13T21:11:18Z

Hi @markshannon , is there anything I need to do on this PR? Thanks!

markshannon

One suggestion. Otherwise looks good.

markshannon · 2024-02-26T16:53:53Z

Python/bytecodes.c

-                next_instr = this_instr;
-            }
-            else {
+            if (tstate->tracing == 0) {


Because RESUME converts itself to RESUME_CHECK, this shouldn't matter in terms of performance, so maybe keep it simple?

Ah I remembered why I did not set this instr to RESUME_CHECK when tracing != 0 - it will convert this instr to RESUME_CHECK but RESUME_CHECK will be deopted as the version does not add up.

For your question - the key is to stop the code from being instrumented, removing this check will still result in an instrumented code, and instrumented code is always slower.

However, we should probably check if the code is tracing and do not deopt RESUME_CHECK if so.

Another way to do this is to DISPATCH() in the instrument case, so we don't have to write the "normal" case twice. Is it frown upon to have multiple DISPATCH() in one bytecode?

Multiple DISPATCH()s should be fine.

markshannon

Thanks for looking into this.

markshannon · 2024-02-27T14:08:28Z

Python/bytecodes.c

-                next_instr = this_instr;
-            }
-            else {
+            if (tstate->tracing == 0) {


Multiple DISPATCH()s should be fine.

gaogaotiantian · 2024-02-27T19:20:12Z

Okay I cleaned up the code a bit - using an extra DISPATCH() in instrument case to avoid duplicate code. This PR should be ready to go.

…nGH-114986)" This reverts commit 0a61e23.

…H-114986)" (GH-116178) Revert "gh-107674: Improve performance of `sys.settrace` (GH-114986)" This reverts commit 0a61e23.

….settrace` (pythonGH-114986)" (pythonGH-116178) Revert "pythongh-107674: Improve performance of `sys.settrace` (pythonGH-114986)" This reverts commit 0a61e23.

Improve performance of sys.settrace

6e53f53

gaogaotiantian requested review from markshannon and gvanrossum as code owners February 4, 2024 07:42

bedevere-app bot added the awaiting review label Feb 4, 2024

bedevere-app bot mentioned this pull request Feb 4, 2024

sys.settrace dramatic slowdown in 3.12 #107674

Open

2 tasks

blurb-it bot and others added 3 commits February 4, 2024 07:45

📜🤖 Added by blurb_it.

a42d7c8

Do not instrument code when tracing

d5b1dec

Reduce extra branch and save some operations

2a1eefb

markshannon requested changes Feb 5, 2024

View reviewed changes

Python/instrumentation.c Outdated Show resolved Hide resolved

bedevere-app bot added awaiting changes and removed awaiting review labels Feb 5, 2024

Change the comment to assertion

cb09c55

gaogaotiantian requested a review from markshannon February 5, 2024 19:08

gaogaotiantian added 2 commits February 23, 2024 18:18

Merge branch 'main' into sys-settrace-optimize

410a07a

Add missing instr change

74d3375

markshannon reviewed Feb 26, 2024

View reviewed changes

gaogaotiantian added 2 commits February 26, 2024 11:50

Do not deopt RESUME_CHECK if it's tracing

47ff554

Regen executor_cases

79f43b9

markshannon approved these changes Feb 27, 2024

View reviewed changes

Python/bytecodes.c

next_instr = this_instr;

}

else {

if (tstate->tracing == 0) {

Copy link

Member

markshannon Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple DISPATCH()s should be fine.

bedevere-app bot added awaiting merge and removed awaiting changes labels Feb 27, 2024

Clean up bytecode code

0c72ada

markshannon merged commit 0a61e23 into python:main Feb 28, 2024

bedevere-app bot removed the awaiting merge label Feb 28, 2024

gaogaotiantian deleted the sys-settrace-optimize branch February 28, 2024 18:27

gaogaotiantian added a commit to gaogaotiantian/cpython that referenced this pull request Mar 1, 2024

Revert "pythongh-107674: Improve performance of sys.settrace (pytho…

30bf8e8

…nGH-114986)" This reverts commit 0a61e23.

encukou pushed a commit that referenced this pull request Mar 1, 2024

gh-116098: Revert "gh-107674: Improve performance of sys.settrace (G…

7895a61

…H-114986)" (GH-116178) Revert "gh-107674: Improve performance of `sys.settrace` (GH-114986)" This reverts commit 0a61e23.

woodruffw pushed a commit to woodruffw-forks/cpython that referenced this pull request Mar 4, 2024

pythongh-107674: Improve performance of sys.settrace (pythonGH-114986)

b1f0e7f

gaogaotiantian added a commit to gaogaotiantian/cpython that referenced this pull request Mar 21, 2024

pythongh-107674: Improve performance of sys.settrace (pythonGH-114986)

ce40742

adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 25, 2024

pythongh-107674: Improve performance of sys.settrace (pythonGH-114986)

45cb900

diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024

pythongh-107674: Improve performance of sys.settrace (pythonGH-114986)

3bd5cbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-107674: Improve performance of `sys.settrace` #114986

gh-107674: Improve performance of `sys.settrace` #114986

gaogaotiantian commented Feb 4, 2024 •

edited

Loading

markshannon left a comment

bedevere-app bot commented Feb 5, 2024

gaogaotiantian commented Feb 13, 2024

markshannon left a comment

markshannon Feb 26, 2024

gaogaotiantian Feb 26, 2024

gaogaotiantian Feb 26, 2024

markshannon Feb 27, 2024

markshannon left a comment

markshannon Feb 27, 2024

gaogaotiantian commented Feb 27, 2024

gh-107674: Improve performance of sys.settrace #114986

gh-107674: Improve performance of sys.settrace #114986

Conversation

gaogaotiantian commented Feb 4, 2024 • edited Loading

markshannon left a comment

Choose a reason for hiding this comment

bedevere-app bot commented Feb 5, 2024

gaogaotiantian commented Feb 13, 2024

markshannon left a comment

Choose a reason for hiding this comment

markshannon Feb 26, 2024

Choose a reason for hiding this comment

gaogaotiantian Feb 26, 2024

Choose a reason for hiding this comment

gaogaotiantian Feb 26, 2024

Choose a reason for hiding this comment

markshannon Feb 27, 2024

Choose a reason for hiding this comment

markshannon left a comment

Choose a reason for hiding this comment

markshannon Feb 27, 2024

Choose a reason for hiding this comment

gaogaotiantian commented Feb 27, 2024

gh-107674: Improve performance of `sys.settrace` #114986

gh-107674: Improve performance of `sys.settrace` #114986

gaogaotiantian commented Feb 4, 2024 •

edited

Loading