-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[SelectionDAG] Use unaligned store to legalize EXTRACT_VECTOR_ELT
type when Stack is non-realignable
#98176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-backend-x86 Author: Manish Kausik H (Nirhar) ChangesThis patch ports the commit a6614ec to SelectionDAG TypeLegalization. Fixes #98044 Full diff: https://github.com/llvm/llvm-project/pull/98176.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index bbf08e862da12..ec13835e6e6fb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -24,6 +24,7 @@
#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/VectorUtils.h"
#include "llvm/CodeGen/ISDOpcodes.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TypeSize.h"
@@ -221,6 +222,21 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) {
SetScalarizedVector(SDValue(N, ResNo), R);
}
+// Helper function that generates an MMO that considers the alignment of the
+// stack, and the size of the stack object
+static MachineMemOperand *getStackAlignedMMO(SDValue StackPtr,
+ MachineFunction &MF,
+ bool isObjectScalable) {
+ auto &MFI = MF.getFrameInfo();
+ int FI = cast<FrameIndexSDNode>(StackPtr)->getIndex();
+ MachinePointerInfo PtrInfo = MachinePointerInfo::getFixedStack(MF, FI);
+ LocationSize ObjectSize = isObjectScalable
+ ? LocationSize::beforeOrAfterPointer()
+ : LocationSize::precise(MFI.getObjectSize(FI));
+ return MF.getMachineMemOperand(PtrInfo, MachineMemOperand::MOStore,
+ ObjectSize, MFI.getObjectAlign(FI));
+}
+
SDValue DAGTypeLegalizer::ScalarizeVecRes_BinOp(SDNode *N) {
SDValue LHS = GetScalarizedVector(N->getOperand(0));
SDValue RHS = GetScalarizedVector(N->getOperand(1));
@@ -3534,11 +3550,9 @@ SDValue DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N) {
Align SmallestAlign = DAG.getReducedAlign(VecVT, /*UseABI=*/false);
SDValue StackPtr =
DAG.CreateStackTemporary(VecVT.getStoreSize(), SmallestAlign);
- auto &MF = DAG.getMachineFunction();
- auto FrameIndex = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
- auto PtrInfo = MachinePointerInfo::getFixedStack(MF, FrameIndex);
- SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, PtrInfo,
- SmallestAlign);
+ MachineMemOperand *StoreMMO = getStackAlignedMMO(
+ StackPtr, DAG.getMachineFunction(), VecVT.isScalableVector());
+ SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, StoreMMO);
// Load back the required element.
StackPtr = TLI.getVectorElementPointer(DAG, StackPtr, VecVT, Idx);
diff --git a/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
index 52d0c2b509128..629f44b52bc05 100644
--- a/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
+++ b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
@@ -17,4 +17,22 @@ entry:
ret i32 %b
}
+define i32 @foo2(i32 %arg1) #1 {
+; CHECK-LABEL: foo2:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: # kill: def $edi killed $edi def $rdi
+; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
+; CHECK-NEXT: vmovups %ymm0, -{{[0-9]+}}(%rsp)
+; CHECK-NEXT: vmovups %ymm0, -{{[0-9]+}}(%rsp)
+; CHECK-NEXT: andl $31, %edi
+; CHECK-NEXT: movzwl -72(%rsp,%rdi,2), %eax
+; CHECK-NEXT: vzeroupper
+; CHECK-NEXT: retq
+entry:
+ %a = extractelement <32 x i16> zeroinitializer, i32 %arg1
+ %b = zext i16 %a to i32
+ ret i32 %b
+}
+
attributes #0 = { "no-realign-stack" "target-cpu"="skylake-avx512" }
+attributes #1 = { "no-realign-stack" "target-cpu"="skylake" }
|
22d2e83
to
54a99c8
Compare
MachineMemOperand *StoreMMO = getStackAlignedMMO( | ||
StackPtr, DAG.getMachineFunction(), VecVT.isScalableVector()); | ||
SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, StoreMMO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed we only need to change SmallestAlign
. Why do we still need these change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry! I thought your case covers some extra conditions that I didn't exactly understand. I've reduced my patch to only your suggestion. A follow up question: Could this patch a6614ec also be rewritten in the form suggested by you ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also your suggestion made me change many lit tests, can you verify if they are correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in lit test are functionally correct, but I'm not sure if we prefer to align them or not.
Revisit the original change, I think we also realigned the stack in the foo2
, but it's not used. I guess some later optimizations may schedule the order of mov instructions. So that approach is a bit suboptimal to me.
@RKSimon which solution do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessarily losing alignment on vector spills to stack that we know is aligned isn't great - we'll likely to see perf reductions as soon as the stores cross cachelines (ymm/zmm stores in particular).
Shouldn't we be addressing any issues we have inside SelectionDAG::getReducedAlign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have implemented (2) and have corrected lit tests to suit it as well. Please check @phoebewang @RKSimon .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is similar to previous one, but I interpret @RKSimon 's question is we prefer to stack realignment for performance. Maybe I misunderstood it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these new changes would definitely force unaligned access to memory, where we do not need one. In my use case stack alignment was disabled. Perhaps I should go back to the first version of the patch that uses getStackAlignedMMO
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a mirror problem with #82130. The key is no-realign-stack
. Can we clamp it down to StackAlign only under this attribute? We would solve all the problem then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have updated the patch to only limit to the stack alignment, if the stack is not realignable. Please check now @phoebewang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reword title to explain why, this is trying to avoid introducing stack realignment
…ype when Stack is non-realignable This patch sets the alignment of store instructions generated during type legalization of extractelement instruction, after considering stack alignment, if the stack is not realignable. Fixes llvm#98044
EXTRACT_VECTOR_ELT
typeEXTRACT_VECTOR_ELT
type when Stack is non-realignable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@phoebewang can you merge it on my behalf, I see that I dont have the option to merge |
"This leads to the generation of a vmovdqa instruction, which can lead to a General Protection Fault. Here is the link to the godbolt demo: https://godbolt.org/z/33h7YGc5K" Given this, should this backported to 19.x or is that not a problem there? |
/cherry-pick |
I think it's good to backport. |
Failed to create pull request for issue98176 https://github.com/llvm/llvm-project/actions/runs/10224552516 |
/cherry-pick |
Failed to create pull request for issue98176 https://github.com/llvm/llvm-project/actions/runs/10224712349 |
This patch ports the commit a6614ec to SelectionDAG TypeLegalization.
Fixes #98044