-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Inline smaller callees first #130696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline smaller callees first #130696
Conversation
Then limit the total size and number of inlined things, to allow more top-down inlining (particularly important after calling something generic that couldn't inline internally) without getting exponential blowup. Fixes 130590
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Inline smaller callees first Then limit the total size and number of inlined things, to allow more top-down inlining (particularly important after calling something generic that couldn't inline internally) without getting exponential blowup. Fixes rust-lang#130590 r? saethlin
The job Click to see the possible cause of the failure (guessed by this bot)
|
☀️ Try build successful - checks-actions |
Queued 018727d with parent 80aa6fa, future comparison URL. |
|
||
let mut changed = false; | ||
let mut remaining_cost = tcx.sess.opts.unstable_opts.inline_mir_total_threshold.unwrap_or(300); | ||
let mut remaining_count = MAX_INLINED_CALLEES_PER_BODY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this go by some size heuristic? if the callee is absolutely tiny (e.g. just another call) then I'd expect the cost of inlining to be ~zero (replacing a call with another call), which means we can inline an unlimited amount of those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, looking at the testcases I guess the scopes make this non-zero cost?
Are those debug-only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a heterogeneous recursion example in the tests that will go ≈infinitely if there's no numeric limit, so there needs to be a limit more than just the total cost to keep that from excessively exploding.
I can definitely experiment with upping the count limit and have the normal case be hitting the cost limit, though, since as you say we almost always want to inline the ≈free things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could also try some tricks like adding synthetic cost the deeper the inlining, or something, so that it ends up hitting the cost limits anyway. Might be better than the count limit anyway since it'll encourage more breadth instead of depth...
This comment was marked as outdated.
This comment was marked as outdated.
Some impressive size results here:
But it's almost all red for compile-time, so I've moved this to draft until I can rework it to hopefully not have that impact. |
We could limit it to nonincremental release builds. It may also be of interest to embedded folk when optimizing for size |
@oli isn't the inliner already off for incremental? rust/compiler/rustc_mir_transform/src/inline.rs Lines 59 to 67 in 58420a0
Or is mir-opt-level set to 3 with opt-level=3? |
Considering incremental regressed, I'd say we're indeed running on level 3 |
When optimizations are enabled, |
Any improvement to the inliner tends to regress incremental. I think this happens because the standard library is not built with incremental, so better inlining means more of the standard library gets transitively dragged into third party CGUs. |
This is for generic MIR exported from the library or for prebuilt monomorphic code? Would it make sense to distinguish between MIR for export (and maybe defer optimization to the consuming crate?) and MIR generated for same-crate codegen? |
It's for MIR exported from the library; not all of that MIR is generic.
I suspect this would have similarly problematic consequences for incremental builds. What might help is having two version of |
I still want to do something like this, but closing for now because it probably needs some other reworks first. Note to self: perhaps #126640 can help. |
Then limit the total size and number of inlined things, to allow more top-down inlining (particularly important after calling something generic that couldn't inline internally) without getting exponential blowup.
Fixes #130590
r? saethlin