Skip to content
This repository was archived by the owner on Sep 2, 2018. It is now read-only.

Support MULH[US] for all integer types #190

Merged
merged 2 commits into from
May 17, 2016

Conversation

shepmaster
Copy link

@shepmaster shepmaster commented May 8, 2016

Before merging:

  • Discuss two commented regions
  • Add automated tests
  • Figure out why i16 version fails

Both of these now compile:

; RUN: llc < %s -march=avr | FileCheck %s
; XFAIL:

define i1 @foo(i8, i8) unnamed_addr {
; CHECK-LABEL: foo:
entry-block:
  %2 = tail call { i8, i1 } @llvm.umul.with.overflow.i8(i8 %0, i8 %1)
  %3 = extractvalue { i8, i1 } %2, 1
  ret i1 %3
}

declare { i8, i1 } @llvm.umul.with.overflow.i8(i8, i8)

and

; RUN: llc < %s -march=avr | FileCheck %s
; XFAIL:

define i1 @multiplication_did_overflow(i8, i8) unnamed_addr {
; CHECK-LABEL: multiplication_did_overflow:
entry-block:
  %2 = tail call { i8, i1 } @llvm.smul.with.overflow.i8(i8 %0, i8 %1)
  %3 = extractvalue { i8, i1 } %2, 1
  ret i1 %3
}

declare { i8, i1 } @llvm.smul.with.overflow.i8(i8, i8)

@@ -45,6 +45,30 @@ extern bool RA_InSpillerCode;
AVRInstrInfo::AVRInstrInfo()
: AVRGenInstrInfo(AVR::ADJCALLSTACKDOWN, AVR::ADJCALLSTACKUP), RI() {}

// FIXME: BEFORE MERGING: Is there a cleaner tablegen method of doing this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish - the overhead of creating a new TableGen generator is quite large. Would be awesome if TableGen generators could be written as ruby scripts or something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I was mostly asking because we define these registers like

def R25R24 : AVRReg<24, "r25:r24", [R24, R25]>, DwarfRegNum<[24]>;

So something knows that R25R24 is composed of [R24, R25]. If there was a way to get the "sub registers", then maybe we could avoid some duplication. If we can't, we can't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes there is a way to do this. Will show you an example. Give me a sec.

Copy link
Member

@dylanmckay dylanmckay May 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See AVRExpandPseudo::splitRegs

// where Reg is the register number, and TRI is a TargetRegisterInfo instance.
LoReg = TRI->getSubReg(Reg, AVR::sub_lo);
HiReg = TRI->getSubReg(Reg, AVR::sub_hi);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked well, thanks! The harder question: is this a good idea? Does it make sense to try to copy from a 16-bit register to an 8-bit one? What about the other way around?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking harder on this, I don't quite understand what the generated code is doing. This LLVM IR

define i1 @unsigned_multiplication_did_overflow(i8, i8) unnamed_addr {
entry-block:
  %2 = tail call { i8, i1 } @llvm.umul.with.overflow.i8(i8 %0, i8 %1)
  %3 = extractvalue { i8, i1 } %2, 1
  ret i1 %3
}

Generates this AVR assembly:

unsigned_multiplication_did_overflow:
                         ;; 8-bit inputs seem to come as 16-bit values
        mul     r24, r22 ;; Multiply the low bytes; result is in r1:r0
        mov     r24, r1
        eor     r1, r1   ;; Inefficient - no need to reset zero register
        cpi     r24, 0   ;; Check if the high byte was zero
        brne    LBB0_2
        ldi     r18, 0   ;; No overflow, return 0 as 16-bit value
        ldi     r19, 0
        mov     r24, r19 ;; ??? Why are we doing this ???
        ret
LBB0_2:                  ;; It was not zero; we had overflow
        ldi     r18, 1
        ldi     r19, 0   ;; Return 1 as a 16-bit value
        mov     r24, r19 ;; ??? Why are we doing this ????

This seems to make sense up until the end — why are those rogue movs at the end? Those are the ones that generate the move from a 16-bit register to an 8-bit one. That seems super suspicious.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I have it. Once we merge the fix-up-travis junk, I'll submit an updated PR. It's much cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remembered the calling convention notes saying that r0 should generally be 0 but it doesn't look that way.

IIRC @agnat implemented clearing r0 after multiplication.

Did you figure out the weird movs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remembered the calling convention notes saying that r0 should generally be 0 but it doesn't look that way.

Yeah, that code is in AVRTargetLowering::insertMul, and it unconditionally inserts the clear to zero. I don't think that the rest of LLVM knows that r1 (not r0) is supposed to contain zero, so it always loads an immediate zero elsewhere.

And in this case, we don't need the zero for any other work in this method, so we could have left the value in place and cleared it to zero before we returned.

Did you figure out the weird movs?

Yes, those were the result of the "move 16-bit register to 8-bit register" strangeness and are now gone.

@shepmaster shepmaster force-pushed the mulh-for-everyone branch from a5a1627 to 923d3f3 Compare May 15, 2016 15:36
@shepmaster
Copy link
Author

@dylanmckay this one is ready for review!

; RUN: llc < %s -march=avr | FileCheck %s

define i1 @signed_multiplication_did_overflow(i8, i8) unnamed_addr {
; CHECK-LABEL: signed_multiplication_did_overflow:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to add CHECK and CHECK-NEXT lines to make sure the correct sequence of instructions is generated? Just something basic would suffice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signed_multiplication_did_overflow:     ; @signed_multiplication_did_overflow
; BB#0:                                 ; %entry-block
    muls    r24, r22
    mov r25, r1
    eor r1, r1
    mov r18, r0
    asr r18
    asr r18
    asr r18
    asr r18
    asr r18
    asr r18
    asr r18
    ldi r24, 1
    cp  r25, r18
    brne    LBB0_2
; BB#1:                                 ; %entry-block
    ldi r24, 0
LBB0_2:                                 ; %entry-block
    ret

This is a bit more involved than unsigned. What shape of tests would you like to see here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this would be good (with patterns instead of registers):

define i1 @ signed_multiplication_did_overflow(i8, i8) unnamed_addr {
; CHECK-LABEL: signed_multiplication_did_overflow:
entry-block:
  %2 = tail call { i8, i1 } @llvm.umul.with.overflow.i8(i8 %0, i8 %1)
  %3 = extractvalue { i8, i1 } %2, 1
  ret i1 %3
; CHECK-NEXT: muls r24, r22
; CHECK-NEXT: mov r25, r1
; ...
}

declare { i8, i1 } @llvm.umul.with.overflow.i8(i8, i8)

Of course, don't worry about any calling convention or prologue/epilogue junk.

Same with the other test btw

Once MULHS was expanded, this exposed an issue where the condition
register was thought to be 16-bit. This caused an attempt to copy a
16-bit register to an 8-bit register.
@shepmaster shepmaster force-pushed the mulh-for-everyone branch from 923d3f3 to 211dd8f Compare May 17, 2016 01:21
@shepmaster
Copy link
Author

@dylanmckay tests added; let me know how close I got to what you wanted.

@dylanmckay
Copy link
Member

@shepmaster perfect!

@shepmaster shepmaster merged commit fdb1c14 into avr-llvm:avr-support May 17, 2016
@shepmaster shepmaster deleted the mulh-for-everyone branch May 17, 2016 01:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants