Skip to content

Calculation of regunitmasks seems wrong #78942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kaz7 opened this issue Jan 22, 2024 · 13 comments · Fixed by #79831
Closed

Calculation of regunitmasks seems wrong #78942

kaz7 opened this issue Jan 22, 2024 · 13 comments · Fixed by #79831

Comments

@kaz7
Copy link
Contributor

kaz7 commented Jan 22, 2024

Calculation of regunitmasks seems wrong after https://reviews.llvm.org/D157864.

Hi, @jayfoad and @qcolombet, VE is having following troubles after above patch. The existing regression tests are passed, but the calculation of livein and regunitmasks become wrong. I'll explain what are causing problems. Please consider what is the best way to solve this problem. (I add @jayfoad since I though you are foad on llvm phablicator. Sorry if it's not correct.)

  1. Livein is calculated wrongly because of following code.
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index 8fda4c3e3e83..b4cbb93d758e 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -1704,10 +1704,9 @@ static void updateLiveIn(MachineInstr *MI, MachineBasicBlock *SuccBB,
   for (auto U : UsedOpsInCopy) {
     Register SrcReg = MI->getOperand(U).getReg();
     LaneBitmask Mask;
-    for (MCRegUnitMaskIterator S(SrcReg, TRI); S.isValid(); ++S) {
+    for (MCRegUnitMaskIterator S(SrcReg, TRI); S.isValid(); ++S)
       Mask |= (*S).second;
-    }
-    SuccBB->addLiveIn(SrcReg, Mask.any() ? Mask : LaneBitmask::getAll());
+    SuccBB->addLiveIn(SrcReg, Mask);
   }
   SuccBB->sortUniqueLiveIns();
 }
  • LaneBitmasks from all MCRegUnitMaskIterator of SrcReg sometimes becomes none() because of 2nd problem. As a result, the register is treated as not livein although it is alive. IMO, it's better to think a register is alive here. Since, even whole lanebitmasks don't match, a register is used somehow from the point of livein view.
  1. computeRegUnitLaneMasks calculates different lanebitmask for each regunit.
diff --git a/llvm/utils/TableGen/CodeGenRegisters.cpp b/llvm/utils/TableGen/CodeGenRegisters.cpp
index 74fb80610b9c..b8dde6e946a4 100644
--- a/llvm/utils/TableGen/CodeGenRegisters.cpp
+++ b/llvm/utils/TableGen/CodeGenRegisters.cpp
@@ -2123,8 +2123,8 @@ void CodeGenRegBank::computeRegUnitLaneMasks() {
   for (auto &Register : Registers) {
     // Create an initial lane mask for all register units.
     const auto &RegUnits = Register.getRegUnits();
-    CodeGenRegister::RegUnitLaneMaskList
-        RegUnitLaneMasks(RegUnits.count(), LaneBitmask::getNone());
+    CodeGenRegister::RegUnitLaneMaskList RegUnitLaneMasks(
+        RegUnits.count(), LaneBitmask::getAll());
     // Iterate through SubRegisters.
     typedef CodeGenRegister::SubRegMap SubRegMap;
     const SubRegMap &SubRegs = Register.getSubRegs();
@@ -2143,7 +2143,7 @@ void CodeGenRegBank::computeRegUnitLaneMasks() {
         unsigned u = 0;
         for (unsigned RU : RegUnits) {
           if (SUI == RU) {
-            RegUnitLaneMasks[u] |= LaneMask;
+            RegUnitLaneMasks[u] &= LaneMask;
             assert(!Found);
             Found = true;
           }

I'm not sure this is correct or not. However, the result of calculation is different. For example, let's consider that we have two subreg with 0x0001 and 0x0002 lanemasks. original code works like this. 0x0001 | 0x0002 => 0x0003.

computeRegUnitLaneMasks 0, 0000000000000000 |= 0000000000000001 -> 0000000000000001
computeRegUnitLaneMasks 0, 0000000000000001 |= 0000000000000002 -> 0000000000000003

The modified code works like this. 0x0001 & 0x0002 => 0x0000.

computeRegUnitLaneMasks 0, FFFFFFFFFFFFFFFF &= 0000000000000001 -> 0000000000000001
computeRegUnitLaneMasks 0, 0000000000000001 &= 0000000000000002 -> 0000000000000000    

This calculation causes none() LaneBitmask in the 1st problem.

I'm still investigating these problems. So, I'm not sure what is the best, just reverting previous patch or solving this problem. But, I know above patch is causing above problems as a fact. So, I just decide to let you guys know it. Thanks.

@jayfoad
Copy link
Contributor

jayfoad commented Jan 22, 2024

Could you run this code in a debugger to work out exactly which register and subregisters are involved? I can't understand how you would end up with zero from this calculation. You would need:

  • a single register R
  • with two leaf subregisters S1 and S2
  • S1 and S2 share a regunit U (i.e. they overlap)
  • but the lanemasks for S1 and S2 are disjoint.

@kaz7
Copy link
Contributor Author

kaz7 commented Jan 22, 2024

Thank you for checking this issue. Please give me a few days. I'll give you those information.

@kaz7
Copy link
Contributor Author

kaz7 commented Jan 22, 2024

It didn't take a few days. This is answer to you.

You would need:

  • a single register R

For example, it is a register SX5 (EnumValue is 192).

  • with two leaf subregisters S1 and S2

They are SF5 and SW5.

  • S1 and S2 share a regunit U (i.e. they overlap)

Yes. SX5, 64 bit register, represents 64 bit float or integer value. SF5, high 32 bits, represents 32 bit float value. SW5, low 32 bits, represents 32 bit integer value.

  • but the lanemasks for S1 and S2 are disjoint.

Yes. Those are disjoint.

The source code of VE Register information is below.

https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/VE/VERegisterInfo.td#L127

@jayfoad
Copy link
Contributor

jayfoad commented Jan 22, 2024

It didn't take a few days. This is answer to you.

You would need:

  • a single register R

For example, it is a register SX5 (EnumValue is 192).

  • with two leaf subregisters S1 and S2

They are SF5 and SW5.

  • S1 and S2 share a regunit U (i.e. they overlap)

Yes. SX5, 64 bit register, represents 64 bit float or integer value. SF5, high 32 bits, represents 32 bit float value. SW5, low 32 bits, represents 32 bit integer value.

  • but the lanemasks for S1 and S2 are disjoint.

Yes. Those are disjoint.

The source code of VE Register information is below.

https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/VE/VERegisterInfo.td#L127

Thank you for the example, but why would SF5 and SW5 share a regunit U? If they do not share a regunit then RegUnitLaneMasks[u] &= LaneMask will not be executed twice for the same value of u.

@kaz7
Copy link
Contributor Author

kaz7 commented Jan 22, 2024

Thank you for the example, but why would SF5 and SW5 share a regunit U? If they do not share a regunit then RegUnitLaneMasks[u] &= LaneMask will not be executed twice for the same value of u.

I'm not sure why u is identical, this case it is zero. Moreover, I'm not sure what u means unfortunately. So, let me just explain how it runs using the source code.

The subregs loop, L2151, runs over each subregs of SX5 register. This case, it runs over SF5 and SW5 as subregisters of SX5.

The &= expression, L2166, runs twice for both SF5 and SW5.

  1. Before the subregs loop, L2151, RegUnitLaneMasks[0] is initialized by 0xFFFFFFFFFFFFFFFF.
  2. In the first iteration for SF5, the UnitLanemask is 0x0000000000000001 and u is initialized by 0. RegUnitLaneMasks[u] is modified to 0x0000000000000001 as a result of 0xFFFFFFFFFFFFFFFF & 0x0000000000000001.
  3. In the second iteration for SW5, the UnitLanemask is 0x0000000000000002 and u is again initialized by 0. RegUnitLaneMasks[u] is modified to 0x0000000000000000 as a result of 0x0000000000000001 & 0000000000000002

Does this expression make sense?

@jayfoad
Copy link
Contributor

jayfoad commented Jan 22, 2024

Can you try an experiment please?

diff --git a/llvm/lib/Target/VE/VERegisterInfo.td b/llvm/lib/Target/VE/VERegisterInfo.td
index fbe71d383e1e..529151ef9a87 100644
--- a/llvm/lib/Target/VE/VERegisterInfo.td
+++ b/llvm/lib/Target/VE/VERegisterInfo.td
@@ -105,10 +105,8 @@ foreach I = 0-63 in
   def SW#I : VEReg<I, "sw"#I, [], ["s"#I]>, DwarfRegNum<[I]>;
 
 // Generic floating point registers - 32 bits wide
-//   NOTE: Mark SF#I as alias of SW#I temporary to avoid register allocation
-//         problem.
 foreach I = 0-63 in
-  def SF#I : VEReg<I, "sf"#I, [], ["s"#I], [!cast<VEReg>("SW"#I)]>,
+  def SF#I : VEReg<I, "sf"#I, [], ["s"#I]>,
                    DwarfRegNum<[I]>;
 
 // Generic integer registers - 64 bits wide

@jayfoad
Copy link
Contributor

jayfoad commented Jan 22, 2024

To explain: the use of Aliases ("ad hoc aliasing") changes the behaviour of CodeGenRegister::computeSubRegs in ways that I am not too familiar with.

@kaz7
Copy link
Contributor Author

kaz7 commented Jan 22, 2024

Thank you for that experiment. Applying your patch fixes both problems I mentioned in this ticket.

I'm also not familiar with Aliases behavior. So, I cannot say how it should be...

At least, I think the alias above should be removed if the register allocation problem is solved for VE. I'll check VE's behavior after applying this expreiment modification. But it is out of the scope of this ticket little bit. I appreciate your help. Thank you so much.

kaz7 added a commit to sx-aurora-dev/llvm-project that referenced this issue Jan 22, 2024
Remove SF# register's aliases in order to solve a problem caused by D157864.
    llvm/llvm-project#78942
@kaz7
Copy link
Contributor Author

kaz7 commented Jan 23, 2024

I remember the allocation problem. Like I described, SF5 and SW5 shares an identical SX5 register. Both are disjoint. However, many instructions contaminate other part. So, in fact, it is impossible to use SF5 and SW5 simultaneously. Therefore, I'd like to not use for example SW5 while SF5 is alive. I don't know how to let it know to the register allocators. Therefore, I used alias for that purpose. However, this trick is not useable after D157864.

This problem is still existing. So, I need to find the correct way to not use SF#N and SW#N simultaneously. I appreciate if you know such options by any chance. Thanks!

@kaz7
Copy link
Contributor Author

kaz7 commented Jan 23, 2024

I cannot find any other options to force register allocator to not allocate SF and SW registers simultaneously except Aliases.

  // Aliases - A list of registers that this register overlaps with.  A read or
  // modification of this register can potentially read or modify the aliased
  // registers.
  list<Register> Aliases = [];

And the aliases' expression like A read or modification of this register can potentially read or modify the aliased register fits very well for the relationship between SF and SW registers.

@jayfoad
Copy link
Contributor

jayfoad commented Jan 23, 2024

Like I described, SF5 and SW5 shares an identical SX5 register. Both are disjoint. However, many instructions contaminate other part.

I see. To model that properly I guess you could make the instructions define an SX result and then extract the appropriate SF or SW subreg from it. But that sounds like a lot of work.

Maybe we can partly revert tablegen to the old behaviour. Can you try a patch like this, to see if it fixes your problem without breaking any other in-tree targets?

diff --git a/llvm/utils/TableGen/CodeGenRegisters.cpp b/llvm/utils/TableGen/CodeGenRegisters.cpp
index d1abdb74ea4a..b9d404aaee5f 100644
--- a/llvm/utils/TableGen/CodeGenRegisters.cpp
+++ b/llvm/utils/TableGen/CodeGenRegisters.cpp
@@ -2144,7 +2144,7 @@ void CodeGenRegBank::computeRegUnitLaneMasks() {
     // Create an initial lane mask for all register units.
     const auto &RegUnits = Register.getRegUnits();
     CodeGenRegister::RegUnitLaneMaskList RegUnitLaneMasks(
-        RegUnits.count(), LaneBitmask::getAll());
+        RegUnits.count(), LaneBitmask::getNone());
     // Iterate through SubRegisters.
     typedef CodeGenRegister::SubRegMap SubRegMap;
     const SubRegMap &SubRegs = Register.getSubRegs();
@@ -2163,7 +2163,7 @@ void CodeGenRegBank::computeRegUnitLaneMasks() {
         unsigned u = 0;
         for (unsigned RU : RegUnits) {
           if (SUI == RU) {
-            RegUnitLaneMasks[u] &= LaneMask;
+            RegUnitLaneMasks[u] |= LaneMask;
             assert(!Found);
             Found = true;
           }
@@ -2173,6 +2173,10 @@ void CodeGenRegBank::computeRegUnitLaneMasks() {
         assert(Found);
       }
     }
+    for (auto &Mask : RegUnitLaneMasks) {
+      if (Mask.none())
+        Mask = LaneBitmask::getAll();
+    }
     Register.setRegUnitLaneMasks(RegUnitLaneMasks);
   }
 }

@kaz7
Copy link
Contributor Author

kaz7 commented Jan 23, 2024

Thank you for new partial reverting. I'll try that tomorrow.

Like I described, SF5 and SW5 shares an identical SX5 register. Both are disjoint. However, many instructions contaminate other part.

I see. To model that properly I guess you could make the instructions define an SX result and then extract the appropriate SF or SW subreg from it. But that sounds like a lot of work.

Yes. That's a lot of work. But thank you for modeling idea.

kaz7 added a commit to sx-aurora-dev/llvm-project that referenced this issue Jan 23, 2024
Change computeRegUnitLaneMasks to partially revert D157864 modification since
this cause some problems on VE backend.
  llvm/llvm-project#78942 (comment)
@kaz7
Copy link
Contributor Author

kaz7 commented Jan 24, 2024

This partial reverting works great. All regression tests including internal tests which caused errors previously are passed. I'm looking forward to seeing this patch in main branch. Thank you!!

jayfoad added a commit to jayfoad/llvm-project that referenced this issue Jan 29, 2024
When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.

This simplifies the code and matches other places where a whole register
is marked as livein.

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

Fixes llvm#78942
jayfoad added a commit that referenced this issue Feb 15, 2024
When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.

This simplifies the code and matches other places where a whole register
is marked as livein.

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

Fixes #78942
LunNova pushed a commit to LunNova/llvm-project-rocm that referenced this issue Dec 22, 2024
When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.

This simplifies the code and matches other places where a whole register
is marked as livein.

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

Fixes llvm#78942
justinkb pushed a commit to justinkb/llvm-project that referenced this issue Jan 29, 2025
When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.

This simplifies the code and matches other places where a whole register
is marked as livein.

This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.

Fixes llvm#78942
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants