LoopFullUnrollPass failed due to constant inference in opaque pointer mode. #65763

dianqk · 2023-09-08T14:23:16Z

I tried to disable the opaque pointer in rust-lang/rust#115339 to get the SLPVectorizer optimization.
See https://godbolt.org/z/GqdPGarsn.

According to my investigation, the success of optimization in non-opaque pointer mode relates to SROA.cpp#L1831-L1845 or InstructionCombining.cpp#L2220-L2274. Unfortunately, these two transformations have been removed during opaque pointer migration.

My plan is to restore the behavior of InstructionCombining.cpp#L2220-L2274 using alloc instead of bitcast.
Looking at the code comments, this seems to be an important optimization. So it might be worth backporting to LLVM 17?

The text was updated successfully, but these errors were encountered:

fhahn · 2023-09-08T14:43:54Z

Could you share the LLVM IR before the SLP vectorizer?

dianqk · 2023-09-09T03:45:33Z

Could you share the LLVM IR before the SLP vectorizer?

There are two smaller IRs:

But I dont's see the value. See #65764 (comment).

dianqk · 2023-09-10T05:56:04Z

Could you share the LLVM IR before the SLP vectorizer?

I added an IR that looks pretty good in #65875.

Closes #65763. This will provide more opportunities for constant propagation for subsequent optimizations.

krasimirgg · 2023-09-20T14:17:22Z

Hey @dianqk, it appears that after 2d1e8a0, one of rust's codegen tests started failing:
https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-694. I'm not a rust or LLVM expert, but naively it looks like a possible regression. Could you please take a look?

The test code looks like this: https://github.com/rust-lang/rust/blob/ed33e408c5299b4c1fd87a37e050180db9d642d5/tests/codegen/simd/simd-wide-sum.rs#L52-L59

#[no_mangle]
// CHECK-LABEL: @wider_reduce_into_iter
pub fn wider_reduce_into_iter(x: Simd<u8, N>) -> u16 {
    // CHECK: zext <8 x i8>
    // CHECK-SAME: to <8 x i16>
    // CHECK: call i16 @llvm.vector.reduce.add.v8i16(<8 x i16>
    x.to_array().into_iter().map(u16::from).sum()
}

With 2d1e8a0, it looks like now the code is exploded into a long sequence that doesn't call llvm vector anymore: https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-740:

33: ; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(argmem: read) uwtable
--
  | 34: define noundef i16 @wider_reduce_into_iter(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %x) unnamed_addr #1 personality ptr @rust_eh_personality {
  | check:55'0                                               X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
  | 35: start:
  | check:55'0     ~~~~~~~
  | 36:  %0 = load i64, ptr %x, align 8
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 37:  %self.sroa.4.16.extract.trunc = trunc i64 %0 to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 38:  %_0.i.i.i.i.i.i.i.i.i = and i16 %self.sroa.4.16.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 39:  %1 = trunc i64 %0 to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 40:  %2 = lshr i16 %1, 8
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~
  | 41:  %_4.0.i.i.i.i.i.i.i.i.1 = add nuw nsw i16 %_0.i.i.i.i.i.i.i.i.i, %2
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 42:  %self.sroa.4.18.extract.shift = lshr i64 %0, 16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 43:  %self.sroa.4.18.extract.trunc = trunc i64 %self.sroa.4.18.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 44:  %_0.i.i.i.i.i.i.i.i.i.2 = and i16 %self.sroa.4.18.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 45:  %_4.0.i.i.i.i.i.i.i.i.2 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.1, %_0.i.i.i.i.i.i.i.i.i.2
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 46:  %self.sroa.4.19.extract.shift = lshr i64 %0, 24
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 47:  %self.sroa.4.19.extract.trunc = trunc i64 %self.sroa.4.19.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 48:  %_0.i.i.i.i.i.i.i.i.i.3 = and i16 %self.sroa.4.19.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 49:  %_4.0.i.i.i.i.i.i.i.i.3 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.2, %_0.i.i.i.i.i.i.i.i.i.3
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 50:  %self.sroa.4.20.extract.shift = lshr i64 %0, 32
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 51:  %self.sroa.4.20.extract.trunc = trunc i64 %self.sroa.4.20.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 52:  %_0.i.i.i.i.i.i.i.i.i.4 = and i16 %self.sroa.4.20.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 53:  %_4.0.i.i.i.i.i.i.i.i.4 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.3, %_0.i.i.i.i.i.i.i.i.i.4
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 54:  %self.sroa.4.21.extract.shift = lshr i64 %0, 40
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 55:  %self.sroa.4.21.extract.trunc = trunc i64 %self.sroa.4.21.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 56:  %_0.i.i.i.i.i.i.i.i.i.5 = and i16 %self.sroa.4.21.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 57:  %_4.0.i.i.i.i.i.i.i.i.5 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.4, %_0.i.i.i.i.i.i.i.i.i.5
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 58:  %self.sroa.4.22.extract.shift = lshr i64 %0, 48
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 59:  %self.sroa.4.22.extract.trunc = trunc i64 %self.sroa.4.22.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 60:  %_0.i.i.i.i.i.i.i.i.i.6 = and i16 %self.sroa.4.22.extract.trunc, 255
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 61:  %_4.0.i.i.i.i.i.i.i.i.6 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.5, %_0.i.i.i.i.i.i.i.i.i.6
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 62:  %self.sroa.4.23.extract.shift = lshr i64 %0, 56
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 63:  %self.sroa.4.23.extract.trunc = trunc i64 %self.sroa.4.23.extract.shift to i16
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 64:  %_4.0.i.i.i.i.i.i.i.i.7 = add nuw nsw i16 %_4.0.i.i.i.i.i.i.i.i.6, %self.sroa.4.23.extract.trunc
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 65:  ret i16 %_4.0.i.i.i.i.i.i.i.i.7
  | check:55'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  | 66: }

I suspect that this may be a potential regression since the other related functions in this test still expand to short sequences that use llvm.vector as expected, e.g.:

6: ; Function Attrs: mustprogress nofree nosync nounwind nonlazybind willreturn memory(argmem: read) uwtable
--
  | 7: define noundef i16 @wider_reduce_simd(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %x) unnamed_addr #0 {
  | 8: start:
  | 9:  %0 = load <8 x i8>, ptr %x, align 8
  | 10:  %1 = zext <8 x i8> %0 to <8 x i16>
  | 11:  %2 = tail call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> %1)
  | 12:  ret i16 %2
  | 13: }

https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-712.

dianqk · 2023-09-20T14:29:27Z

Hey @dianqk, it appears that after 2d1e8a0, one of rust's codegen tests started failing: https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/22405#018aafae-043a-4a37-86ef-02493663ee73/689-694. I'm not a rust or LLVM expert, but naively it looks like a possible regression. Could you please take a look?

I'm sorry for the trouble. I'm not sure yet. But I'll look into it tomorrow. If this blocks something, you can revert this commit

nikic · 2023-09-20T14:51:39Z

This regression is expected (we've hit it before, see #55693). You can adjust the codegen test.

dianqk · 2023-09-21T04:10:59Z

This regression is expected (we've hit it before, see #55693). You can adjust the codegen test.

Yes. I close this issue. The new one is vectorize-related.

dianqk added llvm:optimizations missed-optimization labels Sep 8, 2023

dianqk self-assigned this Sep 8, 2023

github-actions bot added the new issue label Sep 8, 2023

This was referenced Sep 8, 2023

Performance regression of array::IntoIter vs slice::Iter rust-lang/rust#115339

Closed

Add visitGEPOfAlloc to restore the visitGEPOfBitcast part of the behavior of non-opaque pointers. #65764

Closed

EugeneZelenko added loopoptim and removed new issue llvm:optimizations labels Sep 8, 2023

dianqk mentioned this issue Sep 10, 2023

[EarlyCSE] Compare GEP instructions based on offset #65875

Merged

dianqk closed this as completed in #65875 Sep 19, 2023

dianqk added a commit that referenced this issue Sep 19, 2023

[EarlyCSE] Compare GEP instructions based on offset (#65875)

2d1e8a0

Closes #65763. This will provide more opportunities for constant propagation for subsequent optimizations.

dianqk reopened this Sep 20, 2023

dianqk closed this as completed Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoopFullUnrollPass failed due to constant inference in opaque pointer mode. #65763

LoopFullUnrollPass failed due to constant inference in opaque pointer mode. #65763

dianqk commented Sep 8, 2023

fhahn commented Sep 8, 2023

dianqk commented Sep 9, 2023

dianqk commented Sep 10, 2023

krasimirgg commented Sep 20, 2023

dianqk commented Sep 20, 2023

nikic commented Sep 20, 2023

dianqk commented Sep 21, 2023

LoopFullUnrollPass failed due to constant inference in opaque pointer mode. #65763

LoopFullUnrollPass failed due to constant inference in opaque pointer mode. #65763

Comments

dianqk commented Sep 8, 2023

fhahn commented Sep 8, 2023

dianqk commented Sep 9, 2023

dianqk commented Sep 10, 2023

krasimirgg commented Sep 20, 2023

dianqk commented Sep 20, 2023

nikic commented Sep 20, 2023

dianqk commented Sep 21, 2023