Optimized _DivideHLBC, fixed incorrect behavior for INT_MIN inputs #596
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Inspired by #595, I took a look at optimizing GraphX's rounded-down signed division routine. Since it wasn't initially designed to round down and the rounding fixup was added later, I thought about ways to make it round down from the start. Eventually, I came up with the idea of transforming the signed division into unsigned, which always rounds down. So, after conditionally negating both the dividend and divisor such that the divisor is positive, the unsigned division can be done as follows:
If x >= 0: Return x / y
If x < 0: Return ((y * 2^24) + x) / y
This setup can be done via an optimized first iteration where the remainder register is initialized to either 0 or y-1 based on the sign bit of the dividend, and that sign bit of the dividend is directly output into the sign bit of the result, allowing the rest of the division to be calculated in 23 iterations.
The one exceptional case is that when the dividend is INT_MIN and gets negated, its sign bit is not actually a sign, so it can't use the same setup. In that case, it simply uses 24 iterations of unsigned division instead.