Skip to content

[AArch64][SVE] Don't require 16-byte aligned SVE loads/stores with +strict-align #119732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2569,6 +2569,19 @@ MVT AArch64TargetLowering::getScalarShiftAmountTy(const DataLayout &DL,
bool AArch64TargetLowering::allowsMisalignedMemoryAccesses(
EVT VT, unsigned AddrSpace, Align Alignment, MachineMemOperand::Flags Flags,
unsigned *Fast) const {

// Allow SVE loads/stores where the alignment >= the size of the element type,
// even with +strict-align. Predicated SVE loads/stores (e.g. ld1/st1), used
// for stores that come from IR, only require element-size alignment (even if
// unaligned accesses are disabled). Without this, these will be forced to
// have 16-byte alignment with +strict-align (and fail to lower as we don't
// yet support TLI.expandUnalignedLoad() and TLI.expandUnalignedStore()).
if (VT.isScalableVector()) {
unsigned ElementSizeBits = VT.getScalarSizeInBits();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align(VT.getScalarSizeInBits() / 8) will fail an assert when VT < MVT::i8 (like a predicate MVT::i1), so this would fail when the +strict-align feature is not set. Could you add a test for this case?

Not caused by your patch, but I am surprised to see LLVM actually generate regular loads/stores when the alignment is smaller than the element size, e.g. load <4 x i32>, ptr %ptr, align 1 when the +strict-align flag is not set at all. Is this a bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a bug. Looking at AArch64.UnalignedAccessFaults() (used as part of the FP loads): https://developer.arm.com/documentation/ddi0602/2023-09/Shared-Pseudocode/aarch64-functions-memory?lang=en#AArch64.UnalignedAccessFaults.3 it looks like unaligned accesses are supported (depending on the configuration).

Also the langref states:

The optional constant align argument specifies the alignment of the operation (that is, the alignment of the memory address). It is the responsibility of the code emitter to ensure that the alignment information is correct.

if (ElementSizeBits % 8 == 0 && Alignment >= Align(ElementSizeBits / 8))
return true;
}

if (Subtarget->requiresStrictAlign())
return false;

Expand Down
62 changes: 62 additions & 0 deletions llvm/test/CodeGen/AArch64/sve-load-store-strict-align.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+strict-align < %s | FileCheck %s

define void @nxv16i8(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: nxv16i8:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
; CHECK-NEXT: st1b { z0.b }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 16 x i8>, ptr %ldptr, align 1
store <vscale x 16 x i8> %l3, ptr %stptr, align 1
ret void
}

define void @nxv8i16(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: nxv8i16:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: ld1h { z0.h }, p0/z, [x0]
; CHECK-NEXT: st1h { z0.h }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 8 x i16>, ptr %ldptr, align 2
store <vscale x 8 x i16> %l3, ptr %stptr, align 2
ret void
}

define void @nxv4i32(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: nxv4i32:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
; CHECK-NEXT: st1w { z0.s }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 4 x i32>, ptr %ldptr, align 4
store <vscale x 4 x i32> %l3, ptr %stptr, align 4
ret void
}

define void @nxv2i64(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: nxv2i64:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
; CHECK-NEXT: st1d { z0.d }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 2 x i64>, ptr %ldptr, align 8
store <vscale x 2 x i64> %l3, ptr %stptr, align 8
ret void
}

define void @nxv16i1(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: nxv16i1:
; CHECK: // %bb.0:
; CHECK-NEXT: ldr p0, [x0]
; CHECK-NEXT: str p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 16 x i1>, ptr %ldptr, align 2
store <vscale x 16 x i1> %l3, ptr %stptr, align 2
ret void
}
31 changes: 31 additions & 0 deletions llvm/test/CodeGen/AArch64/sve-unaligned-load-store-strict-align.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
; RUN: not --crash llc -mtriple=aarch64-linux-gnu -mattr=+sve,+strict-align < %s 2>&1 | FileCheck %s --check-prefix=CHECK-FIXME

; REQUIRES: asserts

; FIXME: Support TLI.expandUnalignedLoad()/TLI.expandUnalignedStore() for SVE.
; CHECK-FIXME: LLVM ERROR: Invalid size request on a scalable vector.

define void @unaligned_nxv16i1(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: unaligned_nxv16i1:
; CHECK: // %bb.0:
; CHECK-NEXT: ldr p0, [x0]
; CHECK-NEXT: str p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 16 x i1>, ptr %ldptr, align 1
store <vscale x 16 x i1> %l3, ptr %stptr, align 1
ret void
}

define void @unaligned_nxv2i64(ptr %ldptr, ptr %stptr) {
; CHECK-LABEL: unaligned_nxv2i64:
; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
; CHECK-NEXT: st1d { z0.d }, p0, [x1]
; CHECK-NEXT: ret
%l3 = load <vscale x 2 x i64>, ptr %ldptr, align 4
store <vscale x 2 x i64> %l3, ptr %stptr, align 4
ret void
}
Loading