-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[Convergence] allow non-convergent ops before entry and loop intrinsics #65939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The only real requirement is that entry and loop intrinsics should not be preceded by convergent operations in the same basic block. They do not need to be the first in the block. Relaxing the constraint on the entry and loop intrinsics avoids having to make changes in the construction of LLVM IR, such as getFirstInsertionPt(). It also avoids added complexity in the lowering to Machine IR, where COPY instructions may be added to the start of the basic block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me.
@@ -63,6 +63,8 @@ template <typename ContextT> class GenericConvergenceVerifier { | |||
// and not the token values. | |||
DenseMap<const InstructionT *, const InstructionT *> Tokens; | |||
|
|||
bool SeenFirstConvOp = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value will not be reset across basic block, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets reset at the start of every block in the visit method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not? The visit() method you pointed out is only called in Verifier::visitCallBase(). So it cannot reset the flag in a new block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch! You are right. I was paying too much attention on the MachineConvergenceVerifier (not checked in yet), and didn't see how the ConvergenceVerifier traverses the function. I have a fix in my branch, will push it out real soo now. Thanks for catching it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably useful for alloca
…cs (llvm#65939) The only real requirement is that entry and loop intrinsics should not be preceded by convergent operations in the same basic block. They do not need to be the first in the block. Relaxing the constraint on the entry and loop intrinsics avoids having to make changes in the construction of LLVM IR, such as getFirstInsertionPt(). It also avoids added complexity in the lowering to Machine IR, where COPY instructions may be added to the start of the basic block.
The only real requirement is that entry and loop intrinsics should not be preceded by convergent operations in the same basic block. They do not need to be the first in the block.
Relaxing the constraint on the entry and loop intrinsics avoids having to make changes in the construction of LLVM IR, such as getFirstInsertionPt(). It also avoids added complexity in the lowering to Machine IR, where COPY instructions may be added to the start of the basic block.