Skip to content

Compare performance: Detect 'shadowed' cache hits and a few other fixes #4018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 30, 2025

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Apr 25, 2025

Adds detection of 'shadowed' cache hits in the compare performance view, greatly improving its usefulness when comparing two query runs where the cache was not cleared in-between. Previously, one side of the comparison would get blamed for the cost of evaluating a lot of predicates that the other side could omit because of the cache. Detecting this is harder than it sounds, see explanation comment in 06fcd6f.

Also fixes a few other things, such as the NAMED_LOCAL not being handled, and the millis field referring to the entire SCC cost on COMPUTE_RECURSIVE events, which is not what we want.

Commit-by-commit strongly recommended.

asgerf added 2 commits April 25, 2025 11:41
There's a linting check that fails if we export something without
importing it anywhere.
@asgerf asgerf marked this pull request as ready for review April 28, 2025 12:12
@asgerf asgerf requested a review from a team as a code owner April 28, 2025 12:12
Copy link
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have one real comment and a couple of copy-edit nits.

I haven't tried this locally.

Comment on lines +886 to +887
// Note: we can't import "crypto" here because it is not available in the browser,
// so we just concatenate the hashes of the individual pipelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long will this be? Can we possibly have 100s of keys? It might be safer to do some checks to ensure that the result is never larger than a fixed amount. And if it is, then you can sample the first n chars in each key to keep it under the limit, retain a reasonable amount of stochasticity, and ensure the result easy to reproduce given the same input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back when the "standard order" existed there could be multiple pipelines, but nowadays there are 2 for recursive predicates and 1 for all other predicates. So it's really small.

asgerf and others added 2 commits April 29, 2025 09:07
@asgerf asgerf merged commit c0261ec into github:main Apr 30, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants