-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to #35208 - Query/Perf: don't compile liftable constant resolvers in interpretation mode #35209
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am noting that the performance gap of running interpreted vs. compiled lambdas (as measured in the above benchmark) is indeed known and wasn't in question; the problem is what to do with lambdas which we know are only ever going to be evaluated once.
In #29814 (for 8.0) I changed some compiled delegates to interpreted (in the funcletizer), since the overhead of compiling/JIT is quite big, the lambdas to be evaluated are generally very small, and they're only ever going to get evaluated once in this context (see this benchmark). In other words, if these resolver lambdas really are only evaluated once (as part of query compilation), this change should actually make query compilation slower rather than faster (and not have an impact on runtime, since liftable constant resolvers should be part of runtime at all).
So I'd definitely want us to actually understand what's going on here... But since we're on a tight schedule, approving for now.
@@ -198,7 +199,8 @@ protected virtual ConstantExpression InlineConstant(LiftableConstantExpression l | |||
// Make sure there aren't any problematic un-lifted constants within the resolver expression. | |||
_unsupportedConstantChecker.Check(resolverExpression); | |||
|
|||
var resolver = resolverExpression.Compile(preferInterpretation: true); | |||
var containsLambda = _lambdaChecker.ContainsLambda(resolverExpression.Body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to make the interpretation conditional on the existence of a lambda inside? Is it faster to interpret when there's no lambda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way to limit the impact of this change, as we plan to ship it in patch. There is marginal difference (if any) between regular compiling only nested lambdas vs compiling everything:
only nested lambdas
Method | Async | Mean | Error | StdDev | Op/s | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|---|---|
MultiInclue | False | 456.7 ms | 8.17 ms | 7.64 ms | 2.189 | 11000.0000 | 6000.0000 | 67.92 MB |
MultiInclue | True | 440.7 ms | 3.36 ms | 2.98 ms | 2.269 | 11000.0000 | 6000.0000 | 67.92 MB |
everything
Method | Async | Mean | Error | StdDev | Op/s | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|---|---|
MultiInclue | False | 447.2 ms | 2.02 ms | 1.69 ms | 2.236 | 11000.0000 | 6000.0000 | 67.92 MB |
MultiInclue | True | 448.1 ms | 5.25 ms | 4.91 ms | 2.232 | 11000.0000 | 6000.0000 | 67.92 MB |
I'm perfectly happy with doing the blanket Compile(), it will simplify the code a bit and save us a visitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making another behavior variation (contains lambda or not) makes the situation even more complicated and unpredictable, i.e. something works one way, and you introduce a lambda into it and suddenly it switches to a completely different way... If there's really no specific reason to have a behavior split, I'd prefer having the same behavior indeed...
… in interpretation mode when the resolver itself contains a lambda In EF9 we changed the way we generate shapers in preparation for AOT scenarios. We no longer can embed arbitrary objects into the shaper, instead we need to provide a way to construct that object in code (using LiftableConstant mechanism), or simulate the functionality it used to provide. At the end of our processing, we find all liftable constants and for the non-AOT case we compile their resolver lambdas and invoke the result with liftable context object to produce the resulting constant object we initially wanted. (in AOT case we generate code from the resolver lambda). Problem is that we are compiling the resolver lambda in the interpretation mode - if the final product is itself a delegate, that delegate will itself be in the interpreter mode and therefore less efficient. Fix is to use regular compilation rather than interpretation. Fixes #35208 This is part of a fix for a larger perf issue: #35053
Port of #35209 **Description** In EF9 we changed the way we generate shapers in preparation for AOT scenarios. We no longer can embed arbitrary objects into the shaper, instead we need to provide a way to construct that object in code (using LiftableConstant mechanism), or simulate the functionality it used to provide. At the end of our processing, we find all liftable constants and for the non-AOT case we compile their resolver lambdas and invoke the result with liftable context object to produce the resulting constant object we initially wanted. (in AOT case we generate code from the resolver lambda). Problem is that we are compiling the resolver lambda in the interpretation mode - if the final product is itself a delegate, that delegate will itself be in the interpreter mode and therefore less efficient when invoked multiple times when the query runs. Fix is to use regular compilation rather than interpretation. **Customer impact** Queries using collection navigation with significant amount of data suffer large performance degradation when compared with EF8. No good workaround. **How found** Multiple customer reports on 9.0.0 **Regression** Yes, from 8.0. **Testing** Ad-hoc perf testing with BenchmarkDotNet. Functional change already covered by numerous tests. **Risk** Low, quirk added.
Port of #35209 **Description** In EF9 we changed the way we generate shapers in preparation for AOT scenarios. We no longer can embed arbitrary objects into the shaper, instead we need to provide a way to construct that object in code (using LiftableConstant mechanism), or simulate the functionality it used to provide. At the end of our processing, we find all liftable constants and for the non-AOT case we compile their resolver lambdas and invoke the result with liftable context object to produce the resulting constant object we initially wanted. (in AOT case we generate code from the resolver lambda). Problem is that we are compiling the resolver lambda in the interpretation mode - if the final product is itself a delegate, that delegate will itself be in the interpreter mode and therefore less efficient when invoked multiple times when the query runs. Fix is to use regular compilation rather than interpretation. **Customer impact** Queries using collection navigation with significant amount of data suffer large performance degradation when compared with EF8. No good workaround. **How found** Multiple customer reports on 9.0.0 **Regression** Yes, from 8.0. **Testing** Ad-hoc perf testing with BenchmarkDotNet. Functional change already covered by numerous tests. **Risk** Low, quirk added.
In EF9 we changed the way we generate shapers in preparation for AOT scenarios. We no longer can embed arbitrary objects into the shaper, instead we need to provide a way to construct that object in code (using LiftableConstant mechanism), or simulate the functionality it used to provide. At the end of our processing, we find all liftable constants and for the non-AOT case we compile their resolver lambdas and invoke the result with liftable context object to produce the resulting constant object we initially wanted. (in AOT case we generate code from the resolver lambda). Problem is that we are compiling the resolver lambda in the interpretation mode - if the final product is itself a delegate, that delegate will itself be in the interpreter mode and therefore less efficient.
Fix is to scan the resolver expression and look for nested lambdas inside - if we find some, compile the resolver in the regular mode instead.
Fixes #35208
This is part of a fix for a larger perf issue: #35053