-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-107674: Improve performance of sys.settrace
#114986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-107674: Improve performance of sys.settrace
#114986
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice performance improvement, thanks.
One minor issue, but otherwise looks good.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Hi @markshannon , is there anything I need to do on this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion. Otherwise looks good.
next_instr = this_instr; | ||
} | ||
else { | ||
if (tstate->tracing == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because RESUME
converts itself to RESUME_CHECK
, this shouldn't matter in terms of performance, so maybe keep it simple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I remembered why I did not set this instr to RESUME_CHECK
when tracing != 0
- it will convert this instr to RESUME_CHECK
but RESUME_CHECK
will be deopted as the version does not add up.
For your question - the key is to stop the code from being instrumented, removing this check will still result in an instrumented code, and instrumented code is always slower.
However, we should probably check if the code is tracing and do not deopt RESUME_CHECK
if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to do this is to DISPATCH()
in the instrument case, so we don't have to write the "normal" case twice. Is it frown upon to have multiple DISPATCH()
in one bytecode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple DISPATCH()
s should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this.
next_instr = this_instr; | ||
} | ||
else { | ||
if (tstate->tracing == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple DISPATCH()
s should be fine.
Okay I cleaned up the code a bit - using an extra |
…nGH-114986)" This reverts commit 0a61e23.
….settrace` (pythonGH-114986)" (pythonGH-116178) Revert "pythongh-107674: Improve performance of `sys.settrace` (pythonGH-114986)" This reverts commit 0a61e23.
….settrace` (pythonGH-114986)" (pythonGH-116178) Revert "pythongh-107674: Improve performance of `sys.settrace` (pythonGH-114986)" This reverts commit 0a61e23.
….settrace` (pythonGH-114986)" (pythonGH-116178) Revert "pythongh-107674: Improve performance of `sys.settrace` (pythonGH-114986)" This reverts commit 0a61e23.
We observed a severe performance regression on
sys.settrace
when PEP 669 is implemented. One of the important reason is how we deal with the instrumented lines in the trace function.In 3.11, it's a simple check for
tstate->tracing
in frame eval. However, because we haveINSTRUMENTED_LINE
now, the check is deeper than before. The extra calls and unnecessary code whentstate->tracing != 0
contributed to the majority of the overhead.In a real life application, a trace function is normally complicated. Take
coverage
as an example, for each line that is really "instrumented", there will be many lines in the trace function that are "instrumented" but should not trigger instrument callbacks. Therefore,tstate->tracing != 0
is the real hot path for tracing.I made a microbenchmark:
This is a bit extreme as there will be a couple thousand "suppressed" line events in the trace function for each real line event. The result is:
before:
after:
3.11:
Then I realized - we should not instrument the code object when we are tracing. Many functions are only called in tracing callbacks (same as
sys.monitoring
callbacks), and they should not be instrumented. Not only we saved time by not instrumenting them and not calling the instrumented version of bytecode, we can actually enjoy specializing in tracing functions.So I made some changes to the
RESUME
andINSTRUMENTED_RESUME
instructions, which brings the result toThe code for
RESUME
has an extra check fortstate->tracing
, but for normal code without monitoring, it should be immediately specialized toRESUME_CHECK
so the impact should not be significant.Even considering the fact that the example is a bit extreme, the result is super significant. I believe this has a better effect than #107780
Is there a way to do a more comprehensive benchmark on this PR?
One difference would be asserts. Moving the check out of the instrumentation function skips the assertions - which we could add back but it would make the code a bit uglier. Admittedly, structure wise, it might not be as pretty as the current code, but I think this amount of performance improvement is worth it.