-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Update Iterator::eq implementation for better performance #44729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The itertools code is not the end station of course, there could be improvements on top of that. |
I'd like to take a shot at this - I'll go check out Gitter when I have time later today (I'm currently on JST) |
hey @mister-walter, are you still interested in working on this? If not, I'd like to give this issue a try 😃 |
Feel free to find me on irc or here |
I'm really sorry about that, I got super caught up in work and this fell by
the wayside...
Matt, feel free to take this up!
…On Oct 1, 2017 5:00 PM, "Matt" ***@***.***> wrote:
hey @mister-walter <https://github.com/mister-walter>, are you still
interested in working on this? If not, I'd like to give this issue a try
😃
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#44729 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABysYa9eVqtCXnK1dE4gj3lcp2pY_ZYBks5sn0ahgaJpZM4PeiOM>
.
|
Awesome, thanks! I tried a few approaches out and it seems that the matching on the tuples does indeed cause slower code to be generated (best on some quick benchmarking). The current implementation does the ff: loop {
match (self.next(), other.next()) {
(None, None) => return true,
(None, _) | (_, None) => return false,
(Some(x), Some(y)) => if x != y { return false },
}
} Modifying the code to separate the matching resulted in some nice gains: loop {
match self.next() {
None => return other.next().is_none(),
Some(x) => match other.next() {
None => return false,
Some(y) => if x != y { return false },
},
};
} Alternatively (to reduce nesting): loop {
let x = match self.next() {
None => return other.next().is_none(),
Some(val) => val,
};
let y = match other.next() {
None => return false,
Some(val) => val,
};
if x != y {
return false;
}
} I haven't taken a good look at the MIR generated though. I also tested it out on the other comparison functions and noticed similar gains. I'll try to take a deeper look at the code generated soon. |
Oh nice, the new versions are side-effect compatible as well right? self.next() and other.next() are called exactly as many times as before. |
Yes, that should be correct. They should be equivalent. I tried comparing the assembly code generated (with the function monomorphized for the iterator of an integer vector and array, inlining of the functions disabled, and optimized for release) and noticed that that the code generated for the non-tuple matches[1] were, as expected, identical. On the other hand, the tuple match generated something a bit more convoluted. I didn't bother checking the MIR anymore since I don't think it does any optimizations there right now that would affect it. Do you have any suggestions on further investigating this issue? Or any tests you want me to try? If not, I'll be submitting a PR to manually optimize the tuple matching on the implementations of the following functions: [1] This includes the later two code blocks I posted in the earlier comment, as well as the while-let and if-let version originally posted. |
I was following this a bit, should an issue be open to investigate why there is such a difference in performance when using match on a tuple vs using nested matches ? It was at least 2x slower on my modest (and maybe not that representative) benchmarks to match on tuples. |
Yeah, I would've expected it to optimize into the same thing. I'm not sure if it's an issue with rustc or with llvm though. Looking into the assembly code didn't really provide me with any clues on what's preventing the compiler from further optimizing the code either (tbh, I'm not very experienced with assembly though). |
when looking at the mir code from https://godbolt.org/g/W8tiyS it feels like rustc can generate mir code that is easier to optimize. maybe it is possible to remove the packing/unpacking of data in to the tuple it looks like the biggest difference in the mir code. |
According to the MIR RFC, it seems like the packing is intentionally done: https://github.com/nox/rust-rfcs/blob/master/text/1211-mir.md#aggregates-and-further-lowering. Interestingly enough, even in a simpler case, the compiler would produce less efficient code when matching on tuples: https://godbolt.org/g/jvcLuQ. Looking at the MIR generated, aside from the tuple packing, we might be able to optimize branches that may lead to a branch marked unreachable to simply go to the reachable branch instead. I doubt that will actually improve things, but I think it's still worth a shot. |
Optimize comparison functions of Iterator Replaced matching on tuples which led to less performant code generation. Testing on microbenchmarks consistently showed ~1.35x improvement in performance on my machine. Fixes #44729.
…scottmcm Remove duplicated code from Iterator::{ne, lt, le, gt, ge} This PR delegates `Iterator::ne` to `Iterator::eq` and `Iterator::{lt, le, gt, ge}` to `Iterator::partial_cmp`. Oddly enough, this change actually simplifies the generated assembly [in some cases](https://rust.godbolt.org/z/riBtNe), although I don't understand assembly well enough to see if the longer assembly is doing something clever. I also added two extremely simple benchmarks: ``` // before test iter::bench_lt ... bench: 98,404 ns/iter (+/- 21,008) test iter::bench_partial_cmp ... bench: 62,437 ns/iter (+/- 5,009) // after test iter::bench_lt ... bench: 61,757 ns/iter (+/- 8,770) test iter::bench_partial_cmp ... bench: 62,151 ns/iter (+/- 13,753) ``` I have no idea why the current `lt`/`le`/`gt`/`ge` implementations don't seem to be compiled optimally, but simply having them call `partial_cmp` seems to be an improvement. See rust-lang#44729 for a previous discussion.
The itertools equal function expresses the same functionality as
Iterator::eq
in a way that compiles better / performs better. Import their implementation into libstd, also see if other iterator comparison functions can be improved.The problem could be that the libstd code is using
match
on a tuple of options.The text was updated successfully, but these errors were encountered: