Skip to content

Inline smaller callees first #130696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

scottmcm
Copy link
Member

Then limit the total size and number of inlined things, to allow more top-down inlining (particularly important after calling something generic that couldn't inline internally) without getting exponential blowup.

Fixes #130590
r? saethlin

Then limit the total size and number of inlined things, to allow more top-down inlining (particularly important after calling something generic that couldn't inline internally) without getting exponential blowup.

Fixes 130590
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 22, 2024
@rustbot
Copy link
Collaborator

rustbot commented Sep 22, 2024

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 22, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 22, 2024
Inline smaller callees first

Then limit the total size and number of inlined things, to allow more top-down inlining (particularly important after calling something generic that couldn't inline internally) without getting exponential blowup.

Fixes rust-lang#130590
r? saethlin
@bors
Copy link
Collaborator

bors commented Sep 22, 2024

⌛ Trying commit 51efba2 with merge 018727d...

@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-llvm-18 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
------
 > importing cache manifest from ghcr.io/rust-lang/rust-ci-cache:30ca74372d8b771363f68f939a58b017a592fae4f69398600dc51145997160f03e9652051f957840c41898984a88855e9757fa23464703a5a4ba21316ddebb04:
------
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-18]
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-18', '--enable-llvm-link-shared', '--set', 'rust.randomize-layout=true', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-18/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.randomize-layout := True
configure: rust.thin-lto-import-instr-limit := 10

@bors
Copy link
Collaborator

bors commented Sep 22, 2024

☀️ Try build successful - checks-actions
Build commit: 018727d (018727d766cb382a9e117d9d50a6abc0f5bd7ef0)

@rust-timer
Copy link
Collaborator

Queued 018727d with parent 80aa6fa, future comparison URL.
There is currently 1 preceding artifact in the queue.
It will probably take at least ~2.3 hours until the benchmark run finishes.


let mut changed = false;
let mut remaining_cost = tcx.sess.opts.unstable_opts.inline_mir_total_threshold.unwrap_or(300);
let mut remaining_count = MAX_INLINED_CALLEES_PER_BODY;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this go by some size heuristic? if the callee is absolutely tiny (e.g. just another call) then I'd expect the cost of inlining to be ~zero (replacing a call with another call), which means we can inline an unlimited amount of those cases.

Copy link
Member

@the8472 the8472 Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, looking at the testcases I guess the scopes make this non-zero cost?
Are those debug-only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a heterogeneous recursion example in the tests that will go ≈infinitely if there's no numeric limit, so there needs to be a limit more than just the total cost to keep that from excessively exploding.

I can definitely experiment with upping the count limit and have the normal case be hitting the cost limit, though, since as you say we almost always want to inline the ≈free things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also try some tricks like adding synthetic cost the deeper the inlining, or something, so that it ends up hitting the cost limits anyway. Might be better than the count limit anyway since it'll encourage more breadth instead of depth...

@saethlin

This comment was marked as outdated.

@scottmcm scottmcm marked this pull request as draft September 27, 2024 06:46
@scottmcm
Copy link
Member Author

Some impressive size results here:

  • diesel opt-full 13,372,460 → 11,831,084
  • hyper opt-full 5,393,450 → 4,634,454

But it's almost all red for compile-time, so I've moved this to draft until I can rework it to hopefully not have that impact.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 27, 2024

We could limit it to nonincremental release builds.

It may also be of interest to embedded folk when optimizing for size

@scottmcm
Copy link
Member Author

@oli isn't the inliner already off for incremental?

match sess.mir_opt_level() {
0 | 1 => false,
2 => {
(sess.opts.optimize == OptLevel::Default
|| sess.opts.optimize == OptLevel::Aggressive)
&& sess.opts.incremental == None
}
_ => true,
}

Or is mir-opt-level set to 3 with opt-level=3?

@oli-obk
Copy link
Contributor

oli-obk commented Sep 27, 2024

Considering incremental regressed, I'd say we're indeed running on level 3

@bjorn3
Copy link
Member

bjorn3 commented Sep 27, 2024

When optimizations are enabled, -Zmir-opt-level=2 is used by default and -Zmir-opt-level=1 when optimizations are disabled. Incr check and debug builds also regress. It probably has more to do with this optimization running on the standard library.

@saethlin
Copy link
Member

saethlin commented Sep 27, 2024

Considering incremental regressed

Any improvement to the inliner tends to regress incremental. I think this happens because the standard library is not built with incremental, so better inlining means more of the standard library gets transitively dragged into third party CGUs.

@the8472
Copy link
Member

the8472 commented Sep 27, 2024

Considering incremental regressed

Any improvement to the inliner tends to regress incremental. I think this happens because the standard library is not built with incremental, so better inlining means more of the standard library gets transitively dragged into third party CGUs.

This is for generic MIR exported from the library or for prebuilt monomorphic code? Would it make sense to distinguish between MIR for export (and maybe defer optimization to the consuming crate?) and MIR generated for same-crate codegen?

@saethlin
Copy link
Member

saethlin commented Sep 27, 2024

This is for generic MIR exported from the library or for prebuilt monomorphic code?

It's for MIR exported from the library; not all of that MIR is generic.

and maybe defer optimization to the consuming crate?

I suspect this would have similarly problematic consequences for incremental builds.

What might help is having two version of optimized_mir, one which has all interprocedural optimizations disabled and is used for incremental builds. It's not clear to me if the compiler complexity and metadata size increase is worth that.

@Dylan-DPC Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2024
@scottmcm
Copy link
Member Author

I still want to do something like this, but closing for now because it probably needs some other reworks first.

Note to self: perhaps #126640 can help.

@scottmcm scottmcm closed this Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-waiting-on-perf Status: Waiting on a perf run to be completed. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Range<usize>::next should fully MIR-inline
10 participants