[NVPTX] Make i16x2 a native type and add supported vec instructions #65432

ThomasRaoux · 2023-09-06T02:13:44Z

On sm_90 some instructions now support i16x2 which allows hardware to execute more efficiently add, min and max instructions.

In order to support that we need to make i16x2 a native type in the backend. This does the necessary changes to make i16x2 a native type and adds support for the instructions natively supporting i16x2.

This caused a negative test in nvptx slp to start passing. Changed the test to a positive one as the IR is correctly vectorized.

Artem-B

Thank you for the patch.
It looks fine in general. One thing I'd suggest is to extract generic tablegen changes (adding specific parameter types) into a separate patch so they do not clutter the v2i16 changes.

llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

llvm/test/CodeGen/NVPTX/i16x2-instructions.ll

…porting it On sm_90 some instructions now support i16x2 which allows hardware to execute more efficiently add, min and max instructions. In order to support that we need to make i16x2 a native type in the backend. This does the necessary changes to make i16x2 a native type and adds support for the instructions natively supporting i16x2. This caused a negative test in nvptx slp to start passing. Changed the test to a positive one as the IR is correctly vectorized.

ThomasRaoux

Thanks, I rebased on top of the tablegen changes and addressed the comments.

llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/test/CodeGen/NVPTX/i16x2-instructions.ll

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

llvm/test/CodeGen/NVPTX/i16x2-instructions.ll

llvm/test/CodeGen/NVPTX/vec-param-load.ll

llvm/test/Transforms/SLPVectorizer/NVPTX/non-vectorizable-intrinsic-inseltpoison.ll

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

…tterns.

Artem-B

LGTM. Thank you!

Artem-B · 2023-09-08T17:00:09Z

Looks like we've missed lowering of bitcasts between v2f16 and v2i16 and it breaks XLA.

LLVM ERROR: Cannot select: t119: v2f16 = bitcast t118
  t118: v2i16 = or t375, t401
    t375: v2i16 = BUILD_VECTOR t374, t372
      t374: i16 = select t247, Constant:i16<8960>, t360

ThomasRaoux · 2023-09-08T17:05:49Z

Looks like we've missed lowering of bitcasts between v2f16 and v2i16 and it breaks XLA.
LLVM ERROR: Cannot select: t119: v2f16 = bitcast t118
  t118: v2i16 = or t375, t401
    t375: v2i16 = BUILD_VECTOR t374, t372
      t374: i16 = select t247, Constant:i16<8960>, t360

Oops, I can send a patch today unless someone else already has a fix.

gribozavr · 2023-09-08T17:06:05Z

Thank you @Artem-B for the breakage analysis!

@ThomasRaoux I hope you don't mind, I'm going to revert this change to unbreak XLA.

ThomasRaoux · 2023-09-08T17:09:07Z

Thank you @Artem-B for the breakage analysis!

@ThomasRaoux I hope you don't mind, I'm going to revert this change to unbreak XLA.

Can we fix it forward? I'll prepare a patch.

Artem-B · 2023-09-08T17:13:12Z

We'll probably also need similar bitcast lowering for v2bf16. I suspect now that v2i16 are available we'll see LLVM picking them for various loads/stores and bf16 will likely end up using that path, too. Not sure if we'll need bitcasts between v2f16 and v2bf16 in practice, but it would not hurt to add them just in case, too.

ThomasRaoux · 2023-09-08T17:16:21Z

We'll probably also need similar bitcast lowering for v2bf16. I suspect now that v2i16 are available we'll see LLVM picking them for various loads/stores and bf16 will likely end up using that path, too. Not sure if we'll need bitcasts between v2f16 and v2bf16 in practice, but it would not hurt to add them just in case, too.

sure, let me add those.

Artem-B · 2023-09-08T17:17:13Z

Can we fix it forward? I'll prepare a patch.

The fix itself should be simple (tablegen pattern converting bitcast to a proxyreg?) but we'll also need tests. Should be doable, but I don't know @gribozavr 's time constraints.

Artem-B · 2023-09-08T17:18:33Z

In any case, it probably does not matter much one way or another -- it's just one more cherry-picked revision to include into a new pull request.

ThomasRaoux · 2023-09-08T17:20:29Z

looking at it we do have some tests for bitcast 2xhalf -> 2xi16:

llvm-project/llvm/test/CodeGen/NVPTX/f16x2-instructions.ll

Line 998 in f2b0443

define <2 x i16> @test_bitcast_2xhalf_to_2xi16(<2 x half> %a) #0 {

Artem-B · 2023-09-08T17:24:29Z

Hmm. Why did they work then? Loads/stores/bitcasts for v2i16 should not have dependent on whether the target is sm_90 or not. Or maybe it did, if we set lowering action to Expand on older GPUs.

You may try running the tests with -mcpu=sm_90

ThomasRaoux · 2023-09-08T17:26:21Z

Hmm. Why did they work then? Loads/stores/bitcasts for v2i16 should not have dependent on whether the target is sm_90 or not. Or maybe it did, if we set lowering action to Expand on older GPUs.

You may try running the tests with -mcpu=sm_90

I tried with -mcpu=sm_90 -mattr=+ptx80 and without it and I can't reproduce the problem.

…ctions (#65432)" This reverts commit db5d845. As per PR discussion "Looks like we've missed lowering of bitcasts between v2f16 and v2i16 and it breaks XLA."

Artem-B · 2023-09-08T17:30:27Z

Another theory is that the bitcast in the tests didn't actually make it to the lowering and we only had to deal with loads/stores.
https://godbolt.org/z/81e31jj8a
nvptx-isel appears to convert IR straight into

# *** IR Dump After NVPTX DAG->DAG Pattern Instruction Selection (nvptx-isel) ***:
# Machine code for function test_bitcast_2xhalf_to_2xi16: IsSSA, TracksLiveness

bb.0 (%ir-block.0):
  %0:int32regs = LD_i32_avar 0, 4, 1, 0, 32, &test_bitcast_2xhalf_to_2xi16_param_0 :: (dereferenceable invariant load (s32) from `ptr addrspace(101) null`, addrspace 101)
  StoreRetvalI32 killed %0:int32regs, 0 :: (store (s32), align 1)
  Return

We may need something more elaborate, like manually constructing v2i16 from an input i16.

gribozavr · 2023-09-08T17:31:27Z

Can we fix it forward? I'll prepare a patch.

The HEAD has been broken for us for more than a day, and we need to get to green ASAP. I pushed the revert.

Artem-B · 2023-09-08T17:36:34Z

Aha. This reproduces the problem: https://godbolt.org/z/K1c4PqYoP

…lvm#65432) commit again b3a14ca.

ThomasRaoux · 2023-09-08T20:30:05Z

Aha. This reproduces the problem: https://godbolt.org/z/K1c4PqYoP

Thanks for the repro. I'm going to land it again with the fix (2bd3ce7)

…65799) recommit #65432 with minor bug fix for bitcasts

…lvm#65432) On sm_90 some instructions now support i16x2 which allows hardware to execute more efficiently add, min and max instructions. In order to support that we need to make i16x2 a native type in the backend. This does the necessary changes to make i16x2 a native type and adds support for the instructions natively supporting i16x2. This caused a negative test in nvptx slp to start passing. Changed the test to a positive one as the IR is correctly vectorized.

cheshire · 2023-09-14T13:40:39Z

This causes a nearly ~2x regression in many Triton kernels on Ampere, we'll post a reproducer.

…lvm#65799) recommit llvm#65432 with minor bug fix for bitcasts

…#65799) recommit llvm/llvm-project#65432 with minor bug fix for bitcasts

ThomasRaoux requested review from jlebar, Artem-B and bchetioui September 6, 2023 02:13

ThomasRaoux force-pushed the nvptx_i16x2_2 branch 2 times, most recently from 6595701 to de49dce Compare September 6, 2023 18:38

Artem-B reviewed Sep 6, 2023

View reviewed changes

ThomasRaoux added 3 commits September 6, 2023 13:50

Scalarize i16x2 op when not natively support instead of expanding

9520bf8

Address review comments

1a4acc7

ThomasRaoux force-pushed the nvptx_i16x2_2 branch from de49dce to 1a4acc7 Compare September 6, 2023 21:15

ThomasRaoux commented Sep 6, 2023

View reviewed changes

More cleanup

ceb29e3

ThomasRaoux requested a review from Artem-B September 6, 2023 21:27

use pattern instead of adding a next inst

6b74af4

Artem-B reviewed Sep 6, 2023

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Outdated Show resolved Hide resolved

Artem-B reviewed Sep 6, 2023

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Outdated Show resolved Hide resolved

ThomasRaoux added 2 commits September 6, 2023 15:02

missing v2bf16 case

5ff5f8c

Fix comment

db22d56

Artem-B reviewed Sep 6, 2023

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td Outdated Show resolved Hide resolved

Artem-B reviewed Sep 6, 2023

View reviewed changes

llvm/test/CodeGen/NVPTX/i16x2-instructions.ll Outdated Show resolved Hide resolved

Artem-B reviewed Sep 6, 2023

View reviewed changes

llvm/test/CodeGen/NVPTX/vec-param-load.ll Show resolved Hide resolved

Artem-B reviewed Sep 6, 2023

View reviewed changes

llvm/test/Transforms/SLPVectorizer/NVPTX/non-vectorizable-intrinsic-inseltpoison.ll Outdated Show resolved Hide resolved

ThomasRaoux added 2 commits September 6, 2023 15:17

addressing more review comments

8bf200f

add missng bf16 case and fix indentation

d827317

ThomasRaoux requested a review from Artem-B September 6, 2023 22:30

use a loop for the sel pattern

a9295ae

Artem-B reviewed Sep 6, 2023

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Outdated Show resolved Hide resolved

Use range-base transformation instead do loop and coalesce extract pa…

59d2c1e

…tterns.

ThomasRaoux requested a review from Artem-B September 6, 2023 23:11

Artem-B approved these changes Sep 6, 2023

View reviewed changes

ThomasRaoux merged commit db5d845 into llvm:main Sep 7, 2023

ThomasRaoux added a commit to ThomasRaoux/llvm-project that referenced this pull request Sep 8, 2023

[NVPTX] Make i16x2 a native type and add supported vec instructions (l…

e712f1e

…lvm#65432) commit again b3a14ca.

ThomasRaoux mentioned this pull request Sep 8, 2023

[NVPTX] Make i16x2 a native type and add supported vec instructions #65799

Merged

ThomasRaoux added a commit that referenced this pull request Sep 8, 2023

[NVPTX] Make i16x2 a native type and add supported vec instructions (#…

0a7a926

…65799) recommit #65432 with minor bug fix for bitcasts

michaelrj-google mentioned this pull request Sep 12, 2023

[libc] Move long double table option to new config #66151

Merged

thomasfaingnaert pushed a commit to thomasfaingnaert/llvm-project that referenced this pull request Jan 15, 2024

[NVPTX] Make i16x2 a native type and add supported vec instructions (l…

641be94

…lvm#65799) recommit llvm#65432 with minor bug fix for bitcasts

qihangkong pushed a commit to rvgpu/llvm that referenced this pull request Apr 18, 2024

[NVPTX] Make i16x2 a native type and add supported vec instructions (…

22796b4

…#65799) recommit llvm/llvm-project#65432 with minor bug fix for bitcasts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVPTX] Make i16x2 a native type and add supported vec instructions #65432

[NVPTX] Make i16x2 a native type and add supported vec instructions #65432

ThomasRaoux commented Sep 6, 2023

Artem-B left a comment

ThomasRaoux left a comment

Artem-B left a comment

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

gribozavr commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

Artem-B commented Sep 8, 2023

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

Artem-B commented Sep 8, 2023

gribozavr commented Sep 8, 2023 •

edited

Loading

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

cheshire commented Sep 14, 2023

[NVPTX] Make i16x2 a native type and add supported vec instructions #65432

[NVPTX] Make i16x2 a native type and add supported vec instructions #65432

Conversation

ThomasRaoux commented Sep 6, 2023

Artem-B left a comment

Choose a reason for hiding this comment

ThomasRaoux left a comment

Choose a reason for hiding this comment

Artem-B left a comment

Choose a reason for hiding this comment

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

gribozavr commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

Artem-B commented Sep 8, 2023

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

Artem-B commented Sep 8, 2023

gribozavr commented Sep 8, 2023 • edited Loading

Artem-B commented Sep 8, 2023

ThomasRaoux commented Sep 8, 2023

cheshire commented Sep 14, 2023

gribozavr commented Sep 8, 2023 •

edited

Loading