Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize MIN/MAX over DISTINCT #34699

Merged
merged 3 commits into from
Sep 17, 2024
Merged

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Sep 17, 2024

When a value is computed as MAX or MIN of a subquery, DISTINCT has no effect and can be removed.

Fixes #34483.

@ranma42 ranma42 force-pushed the optimize-distinct-minmax branch from 252e4f6 to 61a58a7 Compare September 17, 2024 17:31
SELECT DISTINCT [o].[OrderID]
FROM [Orders] AS [o]
) AS [o0]
SELECT MAX([o].[OrderID])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests seem to be aimed at checking exactly this behavior; should I add something similar or are they adequate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They look good to me...

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ranma42! Can you just confirm that there are tests still exercising DISTINCT in both scenarios (e.g. with Count)?

SELECT DISTINCT [o].[OrderID]
FROM [Orders] AS [o]
) AS [o0]
SELECT MAX([o].[OrderID])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They look good to me...

/// Drops DISTINCT operator on the selector of the <see cref="EnumerableExpression" />.
/// </summary>
/// <returns>The new expression with specified component updated.</returns>
public virtual EnumerableExpression DropDistinct()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd rather have a single SetDistinct(bool) here (or similar) rather than two to keep the API surface tighter

@roji roji self-assigned this Sep 17, 2024
@ranma42 ranma42 force-pushed the optimize-distinct-minmax branch from 61a58a7 to 6820eb6 Compare September 17, 2024 19:53
@ranma42
Copy link
Contributor Author

ranma42 commented Sep 17, 2024

LGTM, thanks @ranma42! Can you just confirm that there are tests still exercising DISTINCT in both scenarios (e.g. with Count)?

Yes 👍

  • Select_distinct_{average,count,sum} cover the case in which the queryable is aggregated into a scalar
  • GroupBy_Select_Distinct_aggregate and GroupBy_group_Distinct_Select_Distinct_aggregate cover the case in which the aggregation is performed within a query (they both exercise AVG/COUNT/SUM in addition to MAX/MIN)

@ranma42 ranma42 marked this pull request as ready for review September 17, 2024 21:26
@roji roji enabled auto-merge (squash) September 17, 2024 21:31
@roji
Copy link
Member

roji commented Sep 17, 2024

Thanks!

@roji roji merged commit df997f7 into dotnet:main Sep 17, 2024
7 checks passed
@ranma42 ranma42 deleted the optimize-distinct-minmax branch September 17, 2024 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove DISTINCT inside MIN/MAX operations
2 participants