Skip to content

LoopFullUnrollPass failed due to constant inference in opaque pointer mode. #65763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dianqk opened this issue Sep 8, 2023 · 7 comments · Fixed by #65875
Closed

LoopFullUnrollPass failed due to constant inference in opaque pointer mode. #65763

dianqk opened this issue Sep 8, 2023 · 7 comments · Fixed by #65875

Comments

@dianqk
Copy link
Member

dianqk commented Sep 8, 2023

I tried to disable the opaque pointer in rust-lang/rust#115339 to get the SLPVectorizer optimization.
See https://godbolt.org/z/GqdPGarsn.

According to my investigation, the success of optimization in non-opaque pointer mode relates to SROA.cpp#L1831-L1845 or InstructionCombining.cpp#L2220-L2274. Unfortunately, these two transformations have been removed during opaque pointer migration.

My plan is to restore the behavior of InstructionCombining.cpp#L2220-L2274 using alloc instead of bitcast.
Looking at the code comments, this seems to be an important optimization. So it might be worth backporting to LLVM 17?

@fhahn
Copy link
Contributor

fhahn commented Sep 8, 2023

Could you share the LLVM IR before the SLP vectorizer?

@dianqk
Copy link
Member Author

dianqk commented Sep 9, 2023

Could you share the LLVM IR before the SLP vectorizer?

There are two smaller IRs:

But I dont's see the value. See #65764 (comment).

@dianqk
Copy link
Member Author

dianqk commented Sep 10, 2023

Could you share the LLVM IR before the SLP vectorizer?

I added an IR that looks pretty good in #65875.

dianqk added a commit that referenced this issue Sep 19, 2023
Closes #65763.
This will provide more opportunities for constant propagation for
subsequent optimizations.
@krasimirgg
Copy link
Contributor

Hey @dianqk, it appears that after 2d1e8a0, one of rust's codegen tests started failing:
https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-694. I'm not a rust or LLVM expert, but naively it looks like a possible regression. Could you please take a look?

The test code looks like this: https://github.com/rust-lang/rust/blob/ed33e408c5299b4c1fd87a37e050180db9d642d5/tests/codegen/simd/simd-wide-sum.rs#L52-L59

#[no_mangle]
// CHECK-LABEL: @wider_reduce_into_iter
pub fn wider_reduce_into_iter(x: Simd<u8, N>) -> u16 {
    // CHECK: zext <8 x i8>
    // CHECK-SAME: to <8 x i16>
    // CHECK: call i16 @llvm.vector.reduce.add.v8i16(<8 x i16>
    x.to_array().into_iter().map(u16::from).sum()
}

With 2d1e8a0, it looks like now the code is exploded into a long sequence that doesn't call llvm vector anymore: https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-740:

33: ; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(argmem: read) uwtable
--
  | 34: define noundef i16 @wider_reduce_into_iter(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %x) unnamed_addr #1 personality ptr @rust_eh_personality {
  | check:55'0                                               X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
  | 35: start:
  | check:55'0     ~~~~~~~
  | 36:  %0 = load i64, ptr %x, align 8
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 37:  %self.sroa.4.16.extract.trunc = trunc i64 %0 to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 38:  %_0.i.i.i.i.i.i.i.i.i = and i16 %self.sroa.4.16.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 39:  %1 = trunc i64 %0 to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 40:  %2 = lshr i16 %1, 8
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~
  | 41:  %_4.0.i.i.i.i.i.i.i.i.1 = add nuw nsw i16 %_0.i.i.i.i.i.i.i.i.i, %2
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 42:  %self.sroa.4.18.extract.shift = lshr i64 %0, 16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 43:  %self.sroa.4.18.extract.trunc = trunc i64 %self.sroa.4.18.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 44:  %_0.i.i.i.i.i.i.i.i.i.2 = and i16 %self.sroa.4.18.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 45:  %_4.0.i.i.i.i.i.i.i.i.2 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.1, %_0.i.i.i.i.i.i.i.i.i.2
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 46:  %self.sroa.4.19.extract.shift = lshr i64 %0, 24
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 47:  %self.sroa.4.19.extract.trunc = trunc i64 %self.sroa.4.19.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 48:  %_0.i.i.i.i.i.i.i.i.i.3 = and i16 %self.sroa.4.19.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 49:  %_4.0.i.i.i.i.i.i.i.i.3 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.2, %_0.i.i.i.i.i.i.i.i.i.3
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 50:  %self.sroa.4.20.extract.shift = lshr i64 %0, 32
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 51:  %self.sroa.4.20.extract.trunc = trunc i64 %self.sroa.4.20.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 52:  %_0.i.i.i.i.i.i.i.i.i.4 = and i16 %self.sroa.4.20.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 53:  %_4.0.i.i.i.i.i.i.i.i.4 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.3, %_0.i.i.i.i.i.i.i.i.i.4
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 54:  %self.sroa.4.21.extract.shift = lshr i64 %0, 40
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 55:  %self.sroa.4.21.extract.trunc = trunc i64 %self.sroa.4.21.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 56:  %_0.i.i.i.i.i.i.i.i.i.5 = and i16 %self.sroa.4.21.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 57:  %_4.0.i.i.i.i.i.i.i.i.5 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.4, %_0.i.i.i.i.i.i.i.i.i.5
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 58:  %self.sroa.4.22.extract.shift = lshr i64 %0, 48
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 59:  %self.sroa.4.22.extract.trunc = trunc i64 %self.sroa.4.22.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 60:  %_0.i.i.i.i.i.i.i.i.i.6 = and i16 %self.sroa.4.22.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 61:  %_4.0.i.i.i.i.i.i.i.i.6 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.5, %_0.i.i.i.i.i.i.i.i.i.6
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 62:  %self.sroa.4.23.extract.shift = lshr i64 %0, 56
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 63:  %self.sroa.4.23.extract.trunc = trunc i64 %self.sroa.4.23.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 64:  %_4.0.i.i.i.i.i.i.i.i.7 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.6, %self.sroa.4.23.extract.trunc
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 65:  ret i16 %_4.0.i.i.i.i.i.i.i.i.7
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 66: }

I suspect that this may be a potential regression since the other related functions in this test still expand to short sequences that use llvm.vector as expected, e.g.:

6: ; Function Attrs: mustprogress nofree nosync nounwind nonlazybind willreturn memory(argmem: read) uwtable
--
  | 7: define noundef i16 @wider_reduce_simd(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %x) unnamed_addr #0 {
  | 8: start:
  | 9:  %0 = load <8 x i8>, ptr %x, align 8
  | 10:  %1 = zext <8 x i8> %0 to <8 x i16>
  | 11:  %2 = tail call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> %1)
  | 12:  ret i16 %2
  | 13: }

https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-712.

@dianqk
Copy link
Member Author

dianqk commented Sep 20, 2023

Hey @dianqk, it appears that after 2d1e8a0, one of rust's codegen tests started failing: https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-694. I'm not a rust or LLVM expert, but naively it looks like a possible regression. Could you please take a look?

I'm sorry for the trouble. I'm not sure yet. But I'll look into it tomorrow. If this blocks something, you can revert this commit

@dianqk dianqk reopened this Sep 20, 2023
@nikic
Copy link
Contributor

nikic commented Sep 20, 2023

This regression is expected (we've hit it before, see #55693). You can adjust the codegen test.

@dianqk
Copy link
Member Author

dianqk commented Sep 21, 2023

This regression is expected (we've hit it before, see #55693). You can adjust the codegen test.

Yes. I close this issue. The new one is vectorize-related.

@dianqk dianqk closed this as completed Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment