Update Iterator::eq implementation for better performance #44729

bluss · 2017-09-20T22:01:48Z

The itertools equal function expresses the same functionality as Iterator::eq in a way that compiles better / performs better. Import their implementation into libstd, also see if other iterator comparison functions can be improved.

The problem could be that the libstd code is using match on a tuple of options.

The text was updated successfully, but these errors were encountered:

bluss · 2017-09-20T22:03:29Z

The itertools code is not the end station of course, there could be improvements on top of that.

mister-walter · 2017-09-21T03:17:49Z

I'd like to take a shot at this - I'll go check out Gitter when I have time later today (I'm currently on JST)

lezgomatt · 2017-10-01T08:00:14Z

hey @mister-walter, are you still interested in working on this? If not, I'd like to give this issue a try 😃

bluss · 2017-10-01T09:19:35Z

Feel free to find me on irc or here

mister-walter · 2017-10-01T12:52:22Z

I'm really sorry about that, I got super caught up in work and this fell by the wayside... Matt, feel free to take this up!

…

On Oct 1, 2017 5:00 PM, "Matt" ***@***.***> wrote: hey @mister-walter <https://github.com/mister-walter>, are you still interested in working on this? If not, I'd like to give this issue a try 😃 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#44729 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABysYa9eVqtCXnK1dE4gj3lcp2pY_ZYBks5sn0ahgaJpZM4PeiOM> .

lezgomatt · 2017-10-01T16:15:44Z

Awesome, thanks!

I tried a few approaches out and it seems that the matching on the tuples does indeed cause slower code to be generated (best on some quick benchmarking).

The current implementation does the ff:

loop {
    match (self.next(), other.next()) {
        (None, None) => return true,
        (None, _) | (_, None) => return false,
        (Some(x), Some(y)) => if x != y { return false },
    }
}

Modifying the code to separate the matching resulted in some nice gains:

loop {
    match self.next() {
        None => return other.next().is_none(),
        Some(x) => match other.next() {
            None => return false,
            Some(y) => if x != y { return false },
        },
    };
}

Alternatively (to reduce nesting):

loop {
    let x = match self.next() {
        None => return other.next().is_none(),
        Some(val) => val,
    };

    let y = match other.next() {
        None => return false,
        Some(val) => val,
    };

    if x != y {
        return false;
    }
}

I haven't taken a good look at the MIR generated though. I also tested it out on the other comparison functions and noticed similar gains. I'll try to take a deeper look at the code generated soon.

bluss · 2017-10-01T19:15:41Z

Oh nice, the new versions are side-effect compatible as well right? self.next() and other.next() are called exactly as many times as before.

lezgomatt · 2017-10-02T15:33:19Z

Yes, that should be correct. They should be equivalent.

I tried comparing the assembly code generated (with the function monomorphized for the iterator of an integer vector and array, inlining of the functions disabled, and optimized for release) and noticed that that the code generated for the non-tuple matches[1] were, as expected, identical. On the other hand, the tuple match generated something a bit more convoluted. I didn't bother checking the MIR anymore since I don't think it does any optimizations there right now that would affect it.

Do you have any suggestions on further investigating this issue? Or any tests you want me to try? If not, I'll be submitting a PR to manually optimize the tuple matching on the implementations of the following functions: cmp, partial_cmp, eq, ne, lt, le, gt, ge. Also, which implementation do you think is preferable, the shorter but more nested version or the longer but less nested version?

[1] This includes the later two code blocks I posted in the earlier comment, as well as the while-let and if-let version originally posted.

mdevlamynck · 2017-10-02T15:42:06Z

I was following this a bit, should an issue be open to investigate why there is such a difference in performance when using match on a tuple vs using nested matches ? It was at least 2x slower on my modest (and maybe not that representative) benchmarks to match on tuples.

lezgomatt · 2017-10-02T16:01:35Z

Yeah, I would've expected it to optimize into the same thing. I'm not sure if it's an issue with rustc or with llvm though. Looking into the assembly code didn't really provide me with any clues on what's preventing the compiler from further optimizing the code either (tbh, I'm not very experienced with assembly though).

andjo403 · 2017-10-03T21:47:10Z

when looking at the mir code from https://godbolt.org/g/W8tiyS it feels like rustc can generate mir code that is easier to optimize. maybe it is possible to remove the packing/unpacking of data in to the tuple it looks like the biggest difference in the mir code.

lezgomatt · 2017-10-07T08:27:41Z

According to the MIR RFC, it seems like the packing is intentionally done: https://github.com/nox/rust-rfcs/blob/master/text/1211-mir.md#aggregates-and-further-lowering. Interestingly enough, even in a simpler case, the compiler would produce less efficient code when matching on tuples: https://godbolt.org/g/jvcLuQ.

Looking at the MIR generated, aside from the tuple packing, we might be able to optimize branches that may lead to a branch marked unreachable to simply go to the reachable branch instead. I doubt that will actually improve things, but I think it's still worth a shot.

Optimize comparison functions of Iterator Replaced matching on tuples which led to less performant code generation. Testing on microbenchmarks consistently showed ~1.35x improvement in performance on my machine. Fixes #44729.

…scottmcm Remove duplicated code from Iterator::{ne, lt, le, gt, ge} This PR delegates `Iterator::ne` to `Iterator::eq` and `Iterator::{lt, le, gt, ge}` to `Iterator::partial_cmp`. Oddly enough, this change actually simplifies the generated assembly [in some cases](https://rust.godbolt.org/z/riBtNe), although I don't understand assembly well enough to see if the longer assembly is doing something clever. I also added two extremely simple benchmarks: ``` // before test iter::bench_lt ... bench: 98,404 ns/iter (+/- 21,008) test iter::bench_partial_cmp ... bench: 62,437 ns/iter (+/- 5,009) // after test iter::bench_lt ... bench: 61,757 ns/iter (+/- 8,770) test iter::bench_partial_cmp ... bench: 62,151 ns/iter (+/- 13,753) ``` I have no idea why the current `lt`/`le`/`gt`/`ge` implementations don't seem to be compiled optimally, but simply having them call `partial_cmp` seems to be an improvement. See rust-lang#44729 for a previous discussion.

Manishearth added hacktoberfest and removed hacktoberfest labels Sep 28, 2017

carols10cents added the hacktoberfest label Sep 29, 2017

lezgomatt mentioned this issue Oct 3, 2017

Optimize comparison functions of Iterator #45007

Merged

bors closed this as completed in #45007 Oct 12, 2017

timvermeulen mentioned this issue Mar 17, 2019

Remove duplicated code from Iterator::{ne, lt, le, gt, ge} #59262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Iterator::eq implementation for better performance #44729

Update Iterator::eq implementation for better performance #44729

bluss commented Sep 20, 2017 •

edited

Loading

bluss commented Sep 20, 2017

mister-walter commented Sep 21, 2017

lezgomatt commented Oct 1, 2017

bluss commented Oct 1, 2017

mister-walter commented Oct 1, 2017 via email

lezgomatt commented Oct 1, 2017 •

edited

Loading

bluss commented Oct 1, 2017

lezgomatt commented Oct 2, 2017

mdevlamynck commented Oct 2, 2017

lezgomatt commented Oct 2, 2017

andjo403 commented Oct 3, 2017

lezgomatt commented Oct 7, 2017 •

edited

Loading

Update Iterator::eq implementation for better performance #44729

Update Iterator::eq implementation for better performance #44729

Comments

bluss commented Sep 20, 2017 • edited Loading

bluss commented Sep 20, 2017

mister-walter commented Sep 21, 2017

lezgomatt commented Oct 1, 2017

bluss commented Oct 1, 2017

mister-walter commented Oct 1, 2017 via email

lezgomatt commented Oct 1, 2017 • edited Loading

bluss commented Oct 1, 2017

lezgomatt commented Oct 2, 2017

mdevlamynck commented Oct 2, 2017

lezgomatt commented Oct 2, 2017

andjo403 commented Oct 3, 2017

lezgomatt commented Oct 7, 2017 • edited Loading

bluss commented Sep 20, 2017 •

edited

Loading

lezgomatt commented Oct 1, 2017 •

edited

Loading

lezgomatt commented Oct 7, 2017 •

edited

Loading