-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[flang] Fixed LoopVersioning for array slices. #65703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The first test case added in the LIT test demonstrates the problem. Even though we did not consider the inner loop as a candidate for the transformation due to the array_coor with a slice, we decided to version the outer loop for the same function argument. During the cloning of the outer loop we dropped the slicing completely producing invalid code. I restructured the code so that we record all arg uses that cannot be transformed (regardless of the reason), and then fixup the usage information across the loop nests. I also noticed that we may generate redundant contiguity checks for the inner loops, so I fixed it since it was easy with the new way of keeping the usage data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for finding and fixing this! The changes look good to me.
if (op->getParentOfType<fir::DoLoopOp>() != loop) | ||
return; | ||
mlir::Value operand = op->getOperand(0); | ||
for (auto a : argsOfInterest) { | ||
if (a.arg == normaliseVal(operand)) { | ||
// use the reboxed value, not the block arg when re-creating the loop: | ||
// Use the reboxed value, not the block arg when re-creating the loop. | ||
// TODO: should we check that the operand dominates the loop? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I understand this correctly?
operand
is an operand to the array indexing operation so it must dominate it. In the normal case it will also dominate the loop (the declare and reboxing are generated in the function entry block). But one could write valid IR where the reboxing does not dominate the outer loop so we ought to handle that case. Nice spot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the operand of array_coor/coordinate_of that would normally also dominate the loop, but this is not guaranteed. I am glad that we are on the same page. I will try to create a reproducer in FIR and fix it in a separate check-in.
Thank you for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vzakhari is this issue fixed? If not, do you have a test that you could share?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vzakhari I'm looking into the "check that the operand dominates the loop", but I'm struggling to come up with some FIR that is valid and shows this issue. Did you get somewhere on a reproducer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may still be a problem. I just hand-modified the first case from loop-versioning.fir
:
module {
func.func @sum1d(%arg0: !fir.box<!fir.array<?xf64>> {fir.bindc_name = "a"}, %arg1: !fir.ref<i32> {fir.bindc_name = "n"}) {
%decl = fir.declare %arg0 {uniq_name = "a"} : (!fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>>
%0 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QMmoduleFsum1dEi"}
%1 = fir.alloca f64 {bindc_name = "sum", uniq_name = "_QMmoduleFsum1dEsum"}
%cst = arith.constant 0.000000e+00 : f64
fir.store %cst to %1 : !fir.ref<f64>
%c1_i32 = arith.constant 1 : i32
%2 = fir.convert %c1_i32 : (i32) -> index
%3 = fir.load %arg1 : !fir.ref<i32>
%4 = fir.convert %3 : (i32) -> index
%c1 = arith.constant 1 : index
%5 = fir.convert %2 : (index) -> i32
%6:2 = fir.do_loop %arg2 = %2 to %4 step %c1 iter_args(%arg3 = %5) -> (index, i32) {
%rebox = fir.rebox %decl : (!fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>>
fir.store %arg3 to %0 : !fir.ref<i32>
%7 = fir.load %1 : !fir.ref<f64>
%8 = fir.load %0 : !fir.ref<i32>
%9 = fir.convert %8 : (i32) -> i64
%c1_i64 = arith.constant 1 : i64
%10 = arith.subi %9, %c1_i64 : i64
%11 = fir.coordinate_of %rebox, %10 : (!fir.box<!fir.array<?xf64>>, i64) -> !fir.ref<f64>
%12 = fir.load %11 : !fir.ref<f64>
%13 = arith.addf %7, %12 fastmath<contract> : f64
fir.store %13 to %1 : !fir.ref<f64>
%14 = arith.addi %arg2, %c1 : index
%15 = fir.convert %c1 : (index) -> i32
%16 = fir.load %0 : !fir.ref<i32>
%17 = arith.addi %16, %15 : i32
fir.result %14, %17 : index, i32
}
fir.store %6#1 to %0 : !fir.ref<i32>
return
}
}
It fails with an assertion: Assertion
changed && "Expected operations to have changed"' failed.`, but I am not sure what it means exactly.
If I understand it correctly, as soon as the loop is proven to be safe to multiversion, we will transform it and we will use %rebox
for generating the contiguity check before the loop, while it is defined inside the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll do some debugging. The assert basically means "we didn't find anything in the loop to update", which of course means versioning didn't achieve anything.
We should identify this case and just say "nope, not doing that".
The first test case added in the LIT test demonstrates the problem.
Even though we did not consider the inner loop as a candidate for
the transformation due to the array_coor with a slice, we decided to
version the outer loop for the same function argument.
During the cloning of the outer loop we dropped the slicing completely
producing invalid code.
I restructured the code so that we record all arg uses that cannot be
transformed (regardless of the reason), and then fixup the usage
information across the loop nests. I also noticed that we may generate
redundant contiguity checks for the inner loops, so I fixed it
since it was easy with the new way of keeping the usage data.