-
Notifications
You must be signed in to change notification settings - Fork 21
Support MULH[US] for all integer types #190
Support MULH[US] for all integer types #190
Conversation
@@ -45,6 +45,30 @@ extern bool RA_InSpillerCode; | |||
AVRInstrInfo::AVRInstrInfo() | |||
: AVRGenInstrInfo(AVR::ADJCALLSTACKDOWN, AVR::ADJCALLSTACKUP), RI() {} | |||
|
|||
// FIXME: BEFORE MERGING: Is there a cleaner tablegen method of doing this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish - the overhead of creating a new TableGen generator is quite large. Would be awesome if TableGen generators could be written as ruby scripts or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, I was mostly asking because we define these registers like
def R25R24 : AVRReg<24, "r25:r24", [R24, R25]>, DwarfRegNum<[24]>;
So something knows that R25R24
is composed of [R24, R25]
. If there was a way to get the "sub registers", then maybe we could avoid some duplication. If we can't, we can't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes there is a way to do this. Will show you an example. Give me a sec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See AVRExpandPseudo::splitRegs
// where Reg is the register number, and TRI is a TargetRegisterInfo instance.
LoReg = TRI->getSubReg(Reg, AVR::sub_lo);
HiReg = TRI->getSubReg(Reg, AVR::sub_hi);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked well, thanks! The harder question: is this a good idea? Does it make sense to try to copy from a 16-bit register to an 8-bit one? What about the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking harder on this, I don't quite understand what the generated code is doing. This LLVM IR
define i1 @unsigned_multiplication_did_overflow(i8, i8) unnamed_addr {
entry-block:
%2 = tail call { i8, i1 } @llvm.umul.with.overflow.i8(i8 %0, i8 %1)
%3 = extractvalue { i8, i1 } %2, 1
ret i1 %3
}
Generates this AVR assembly:
unsigned_multiplication_did_overflow:
;; 8-bit inputs seem to come as 16-bit values
mul r24, r22 ;; Multiply the low bytes; result is in r1:r0
mov r24, r1
eor r1, r1 ;; Inefficient - no need to reset zero register
cpi r24, 0 ;; Check if the high byte was zero
brne LBB0_2
ldi r18, 0 ;; No overflow, return 0 as 16-bit value
ldi r19, 0
mov r24, r19 ;; ??? Why are we doing this ???
ret
LBB0_2: ;; It was not zero; we had overflow
ldi r18, 1
ldi r19, 0 ;; Return 1 as a 16-bit value
mov r24, r19 ;; ??? Why are we doing this ????
This seems to make sense up until the end — why are those rogue mov
s at the end? Those are the ones that generate the move from a 16-bit register to an 8-bit one. That seems super suspicious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I have it. Once we merge the fix-up-travis junk, I'll submit an updated PR. It's much cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I remembered the calling convention notes saying that r0
should generally be 0
but it doesn't look that way.
IIRC @agnat implemented clearing r0
after multiplication.
Did you figure out the weird mov
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I remembered the calling convention notes saying that r0 should generally be 0 but it doesn't look that way.
Yeah, that code is in AVRTargetLowering::insertMul
, and it unconditionally inserts the clear to zero. I don't think that the rest of LLVM knows that r1
(not r0
) is supposed to contain zero, so it always loads an immediate zero elsewhere.
And in this case, we don't need the zero for any other work in this method, so we could have left the value in place and cleared it to zero before we returned.
Did you figure out the weird movs?
Yes, those were the result of the "move 16-bit register to 8-bit register" strangeness and are now gone.
a5a1627
to
923d3f3
Compare
@dylanmckay this one is ready for review! |
; RUN: llc < %s -march=avr | FileCheck %s | ||
|
||
define i1 @signed_multiplication_did_overflow(i8, i8) unnamed_addr { | ||
; CHECK-LABEL: signed_multiplication_did_overflow: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to add CHECK
and CHECK-NEXT
lines to make sure the correct sequence of instructions is generated? Just something basic would suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signed_multiplication_did_overflow: ; @signed_multiplication_did_overflow
; BB#0: ; %entry-block
muls r24, r22
mov r25, r1
eor r1, r1
mov r18, r0
asr r18
asr r18
asr r18
asr r18
asr r18
asr r18
asr r18
ldi r24, 1
cp r25, r18
brne LBB0_2
; BB#1: ; %entry-block
ldi r24, 0
LBB0_2: ; %entry-block
ret
This is a bit more involved than unsigned. What shape of tests would you like to see here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this would be good (with patterns instead of registers):
define i1 @ signed_multiplication_did_overflow(i8, i8) unnamed_addr {
; CHECK-LABEL: signed_multiplication_did_overflow:
entry-block:
%2 = tail call { i8, i1 } @llvm.umul.with.overflow.i8(i8 %0, i8 %1)
%3 = extractvalue { i8, i1 } %2, 1
ret i1 %3
; CHECK-NEXT: muls r24, r22
; CHECK-NEXT: mov r25, r1
; ...
}
declare { i8, i1 } @llvm.umul.with.overflow.i8(i8, i8)
Of course, don't worry about any calling convention or prologue/epilogue junk.
Same with the other test btw
Once MULHS was expanded, this exposed an issue where the condition register was thought to be 16-bit. This caused an attempt to copy a 16-bit register to an 8-bit register.
923d3f3
to
211dd8f
Compare
@dylanmckay tests added; let me know how close I got to what you wanted. |
@shepmaster perfect! |
Before merging:
i16
version failsBoth of these now compile:
and