Skip to content

[SelectionDAG] Use unaligned store to legalize EXTRACT_VECTOR_ELT type when Stack is non-realignable #98176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 29, 2024

Conversation

Nirhar
Copy link
Contributor

@Nirhar Nirhar commented Jul 9, 2024

This patch ports the commit a6614ec to SelectionDAG TypeLegalization.

Fixes #98044

@llvmbot llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Jul 9, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 9, 2024

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Manish Kausik H (Nirhar)

Changes

This patch ports the commit a6614ec to SelectionDAG TypeLegalization.

Fixes #98044


Full diff: https://github.com/llvm/llvm-project/pull/98176.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+19-5)
  • (modified) llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll (+18)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index bbf08e862da12..ec13835e6e6fb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -24,6 +24,7 @@
 #include "llvm/Analysis/MemoryLocation.h"
 #include "llvm/Analysis/VectorUtils.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/TypeSize.h"
@@ -221,6 +222,21 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) {
     SetScalarizedVector(SDValue(N, ResNo), R);
 }
 
+// Helper function that generates an MMO that considers the alignment of the
+// stack, and the size of the stack object
+static MachineMemOperand *getStackAlignedMMO(SDValue StackPtr,
+                                             MachineFunction &MF,
+                                             bool isObjectScalable) {
+  auto &MFI = MF.getFrameInfo();
+  int FI = cast<FrameIndexSDNode>(StackPtr)->getIndex();
+  MachinePointerInfo PtrInfo = MachinePointerInfo::getFixedStack(MF, FI);
+  LocationSize ObjectSize = isObjectScalable
+                                ? LocationSize::beforeOrAfterPointer()
+                                : LocationSize::precise(MFI.getObjectSize(FI));
+  return MF.getMachineMemOperand(PtrInfo, MachineMemOperand::MOStore,
+                                 ObjectSize, MFI.getObjectAlign(FI));
+}
+
 SDValue DAGTypeLegalizer::ScalarizeVecRes_BinOp(SDNode *N) {
   SDValue LHS = GetScalarizedVector(N->getOperand(0));
   SDValue RHS = GetScalarizedVector(N->getOperand(1));
@@ -3534,11 +3550,9 @@ SDValue DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N) {
   Align SmallestAlign = DAG.getReducedAlign(VecVT, /*UseABI=*/false);
   SDValue StackPtr =
       DAG.CreateStackTemporary(VecVT.getStoreSize(), SmallestAlign);
-  auto &MF = DAG.getMachineFunction();
-  auto FrameIndex = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
-  auto PtrInfo = MachinePointerInfo::getFixedStack(MF, FrameIndex);
-  SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, PtrInfo,
-                               SmallestAlign);
+  MachineMemOperand *StoreMMO = getStackAlignedMMO(
+      StackPtr, DAG.getMachineFunction(), VecVT.isScalableVector());
+  SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, StoreMMO);
 
   // Load back the required element.
   StackPtr = TLI.getVectorElementPointer(DAG, StackPtr, VecVT, Idx);
diff --git a/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
index 52d0c2b509128..629f44b52bc05 100644
--- a/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
+++ b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
@@ -17,4 +17,22 @@ entry:
   ret i32 %b
 }
 
+define i32 @foo2(i32 %arg1) #1 {
+; CHECK-LABEL: foo2:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    # kill: def $edi killed $edi def $rdi
+; CHECK-NEXT:    vxorps %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vmovups %ymm0, -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    vmovups %ymm0, -{{[0-9]+}}(%rsp)
+; CHECK-NEXT:    andl $31, %edi
+; CHECK-NEXT:    movzwl -72(%rsp,%rdi,2), %eax
+; CHECK-NEXT:    vzeroupper
+; CHECK-NEXT:    retq
+entry:
+  %a = extractelement <32 x i16> zeroinitializer, i32 %arg1
+  %b = zext i16 %a to i32
+  ret i32 %b
+}
+
 attributes #0 = { "no-realign-stack" "target-cpu"="skylake-avx512" }
+attributes #1 = { "no-realign-stack" "target-cpu"="skylake" }

@Nirhar
Copy link
Contributor Author

Nirhar commented Jul 9, 2024

@phoebewang @RKSimon @arsenm

@RKSimon RKSimon requested review from arsenm, RKSimon and phoebewang July 9, 2024 15:57
@Nirhar Nirhar force-pushed the range-missed-opt branch 2 times, most recently from 22d2e83 to 54a99c8 Compare July 10, 2024 11:00
Comment on lines 3556 to 3558
MachineMemOperand *StoreMMO = getStackAlignedMMO(
StackPtr, DAG.getMachineFunction(), VecVT.isScalableVector());
SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, StoreMMO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed we only need to change SmallestAlign. Why do we still need these change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! I thought your case covers some extra conditions that I didn't exactly understand. I've reduced my patch to only your suggestion. A follow up question: Could this patch a6614ec also be rewritten in the form suggested by you ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also your suggestion made me change many lit tests, can you verify if they are correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in lit test are functionally correct, but I'm not sure if we prefer to align them or not.

Revisit the original change, I think we also realigned the stack in the foo2, but it's not used. I guess some later optimizations may schedule the order of mov instructions. So that approach is a bit suboptimal to me.

@RKSimon which solution do you prefer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessarily losing alignment on vector spills to stack that we know is aligned isn't great - we'll likely to see perf reductions as soon as the stores cross cachelines (ymm/zmm stores in particular).

Shouldn't we be addressing any issues we have inside SelectionDAG::getReducedAlign?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have implemented (2) and have corrected lit tests to suit it as well. Please check @phoebewang @RKSimon .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is similar to previous one, but I interpret @RKSimon 's question is we prefer to stack realignment for performance. Maybe I misunderstood it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these new changes would definitely force unaligned access to memory, where we do not need one. In my use case stack alignment was disabled. Perhaps I should go back to the first version of the patch that uses getStackAlignedMMO ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a mirror problem with #82130. The key is no-realign-stack. Can we clamp it down to StackAlign only under this attribute? We would solve all the problem then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated the patch to only limit to the stack alignment, if the stack is not realignable. Please check now @phoebewang

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword title to explain why, this is trying to avoid introducing stack realignment

@Nirhar Nirhar force-pushed the range-missed-opt branch from 6a74776 to aa39184 Compare July 29, 2024 05:21
…ype when Stack is non-realignable

This patch sets the alignment of store instructions generated during type
legalization of extractelement instruction, after considering stack
alignment, if the stack is not realignable.

Fixes llvm#98044
@Nirhar Nirhar force-pushed the range-missed-opt branch from aa39184 to 0d4798d Compare July 29, 2024 05:25
@Nirhar Nirhar changed the title [SelectionDAG] Use unaligned store to legalize EXTRACT_VECTOR_ELT type [SelectionDAG] Use unaligned store to legalize EXTRACT_VECTOR_ELT type when Stack is non-realignable Jul 29, 2024
Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Nirhar
Copy link
Contributor Author

Nirhar commented Jul 29, 2024

@phoebewang can you merge it on my behalf, I see that I dont have the option to merge

@phoebewang phoebewang merged commit f6d1d6f into llvm:main Jul 29, 2024
7 checks passed
@AZero13
Copy link
Contributor

AZero13 commented Aug 2, 2024

"This leads to the generation of a vmovdqa instruction, which can lead to a General Protection Fault. Here is the link to the godbolt demo: https://godbolt.org/z/33h7YGc5K"

Given this, should this backported to 19.x or is that not a problem there?

@phoebewang phoebewang added this to the LLVM 19.X Release milestone Aug 3, 2024
@phoebewang
Copy link
Contributor

/cherry-pick

@phoebewang
Copy link
Contributor

"This leads to the generation of a vmovdqa instruction, which can lead to a General Protection Fault. Here is the link to the godbolt demo: https://godbolt.org/z/33h7YGc5K"

Given this, should this backported to 19.x or is that not a problem there?

I think it's good to backport.

@llvmbot
Copy link
Member

llvmbot commented Aug 3, 2024

Failed to create pull request for issue98176 https://github.com/llvm/llvm-project/actions/runs/10224552516

@phoebewang
Copy link
Contributor

/cherry-pick

@llvmbot
Copy link
Member

llvmbot commented Aug 3, 2024

Failed to create pull request for issue98176 https://github.com/llvm/llvm-project/actions/runs/10224712349

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

16 byte Aligned load generated to load 32 byte wide AVX register from stack memory
6 participants