Skip to content

proposal: x/exp/xiter: new package with iterator adapters #61898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rsc opened this issue Aug 9, 2023 · 251 comments
Open

proposal: x/exp/xiter: new package with iterator adapters #61898

rsc opened this issue Aug 9, 2023 · 251 comments
Labels
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

We propose to add a new package golang.org/x/exp/xiter that defines adapters on iterators. Perhaps these would one day be moved to the iter package or perhaps not. There are concerns about how these would affect idiomatic Go code. It seems worth defining them in x/exp to help that discussion along, and then we can decide whether they move anywhere else when we have more experience with them.

The package is called xiter to avoid a collision with the standard library iter (see proposal #61897). An alternative would be to have xiter define wrappers and type aliases for all the functions and types in the standard iter package, but the type aliases would depend on #46477, which is not yet implemented.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.

Edit, 2024-05-15: Added some missing 2s in function names, and also changed Reduce to take the function first, instead of between sum and seq.

Edit, 2024-07-17: Updated code to match the final Go 1.23 language change. Corrected various typos.


/*
Package xiter implements basic adapters for composing iterator sequences:

  • [Concat] and [Concat2] concatenate sequences.
  • [Equal], [Equal2], [EqualFunc], and [EqualFunc2] check whether two sequences contain equal values.
  • [Filter] and [Filter2] filter a sequence according to a function f.
  • [Limit] and [Limit2] truncate a sequence after n items.
  • [Map] and [Map2] apply a function f to a sequence.
  • [Merge], [Merge2], [MergeFunc], and [MergeFunc2] merge two ordered sequences.
  • [Reduce] and [Reduce2] combine the values in a sequence.
  • [Zip] and [Zip2] iterate over two sequences in parallel.

*/

package xiter

import (
	"cmp"
	"iter"
)

// Concat returns an iterator over the concatenation of the sequences.
func Concat[V any](seqs ...iter.Seq[V]) iter.Seq[V] {
	return func(yield func(V) bool) {
		for _, seq := range seqs {
			for e := range seq {
				if !yield(e) {
					return
				}
			}
		}
	}
}

// Concat2 returns an iterator over the concatenation of the sequences.
func Concat2[K, V any](seqs ...iter.Seq2[K, V]) iter.Seq2[K, V] {
	return func(yield func(K, V) bool) {
		for _, seq := range seqs {
			for k, v := range seq {
				if !yield(k, v) {
					return
				}
			}
		}
	}
}

// Equal reports whether the two sequences are equal.
func Equal[V comparable](x, y iter.Seq[V]) bool {
	for z := range Zip(x, y) {
		if z.Ok1 != z.Ok2 || z.V1 != z.V2 {
			return false
		}
	}
	return true
}

// Equal2 reports whether the two sequences are equal.
func Equal2[K, V comparable](x, y iter.Seq2[K, V]) bool {
	for z := range Zip2(x, y) {
		if z.Ok1 != z.Ok2 || z.K1 != z.K2 || z.V1 != z.V2 {
			return false
		}
	}
	return true
}

// EqualFunc reports whether the two sequences are equal according to the function f.
func EqualFunc[V1, V2 any](x iter.Seq[V1], y iter.Seq[V2], f func(V1, V2) bool) bool {
	for z := range Zip(x, y) {
		if z.Ok1 != z.Ok2 || !f(z.V1, z.V2) {
			return false
		}
	}
	return true
}

// EqualFunc2 reports whether the two sequences are equal according to the function f.
func EqualFunc2[K1, V1, K2, V2 any](x iter.Seq2[K1, V1], y iter.Seq2[K2, V2], f func(K1, V1, K2, V2) bool) bool {
	for z := range Zip2(x, y) {
		if z.Ok1 != z.Ok2 || !f(z.K1, z.V1, z.K2, z.V2) {
			return false
		}
	}
	return true
}

// Filter returns an iterator over seq that only includes
// the values v for which f(v) is true.
func Filter[V any](f func(V) bool, seq iter.Seq[V]) iter.Seq[V] {
	return func(yield func(V) bool) {
		for v := range seq {
			if f(v) && !yield(v) {
				return
			}
		}
	}
}

// Filter2 returns an iterator over seq that only includes
// the pairs k, v for which f(k, v) is true.
func Filter2[K, V any](f func(K, V) bool, seq iter.Seq2[K, V]) iter.Seq2[K, V] {
	return func(yield func(K, V) bool) {
		for k, v := range seq {
			if f(k, v) && !yield(k, v) {
				return
			}
		}
	}
}

// Limit returns an iterator over seq that stops after n values.
func Limit[V any](seq iter.Seq[V], n int) iter.Seq[V] {
	return func(yield func(V) bool) {
		if n <= 0 {
			return
		}
		for v := range seq {
			if !yield(v) {
				return
			}
			if n--; n <= 0 {
				break
			}
		}
	}
}

// Limit2 returns an iterator over seq that stops after n key-value pairs.
func Limit2[K, V any](seq iter.Seq2[K, V], n int) iter.Seq2[K, V] {
	return func(yield func(K, V) bool) {
		if n <= 0 {
			return
		}
		for k, v := range seq {
			if !yield(k, v) {
				return
			}
			if n--; n <= 0 {
				break
			}
		}
	}
}

// Map returns an iterator over f applied to seq.
func Map[In, Out any](f func(In) Out, seq iter.Seq[In]) iter.Seq[Out] {
	return func(yield func(Out) bool) {
		for in := range seq {
			if !yield(f(in)) {
				return
			}
		}
	}
}

// Map2 returns an iterator over f applied to seq.
func Map2[KIn, VIn, KOut, VOut any](f func(KIn, VIn) (KOut, VOut), seq iter.Seq2[KIn, VIn]) iter.Seq2[KOut, VOut] {
	return func(yield func(KOut, VOut) bool) {
		for k, v := range seq {
			if !yield(f(k, v)) {
				return
			}
		}
	}
}

// Merge merges two sequences of ordered values.
// Values appear in the output once for each time they appear in x
// and once for each time they appear in y.
// If the two input sequences are not ordered,
// the output sequence will not be ordered,
// but it will still contain every value from x and y exactly once.
//
// Merge is equivalent to calling MergeFunc with cmp.Compare[V]
// as the ordering function.
func Merge[V cmp.Ordered](x, y iter.Seq[V]) iter.Seq[V] {
	return MergeFunc(x, y, cmp.Compare[V])
}

// MergeFunc merges two sequences of values ordered by the function f.
// Values appear in the output once for each time they appear in x
// and once for each time they appear in y.
// When equal values appear in both sequences,
// the output contains the values from x before the values from y.
// If the two input sequences are not ordered by f,
// the output sequence will not be ordered by f,
// but it will still contain every value from x and y exactly once.
func MergeFunc[V any](x, y iter.Seq[V], f func(V, V) int) iter.Seq[V] {
	return func(yield func(V) bool) {
		next, stop := iter.Pull(y)
		defer stop()
		v2, ok2 := next()
		for v1 := range x {
			for ok2 && f(v1, v2) > 0 {
				if !yield(v2) {
					return
				}
				v2, ok2 = next()
			}
			if !yield(v1) {
				return
			}
		}
		for ok2 {
			if !yield(v2) {
				return
			}
			v2, ok2 = next()
		}
	}
}

// Merge2 merges two sequences of key-value pairs ordered by their keys.
// Pairs appear in the output once for each time they appear in x
// and once for each time they appear in y.
// If the two input sequences are not ordered by their keys,
// the output sequence will not be ordered by its keys,
// but it will still contain every pair from x and y exactly once.
//
// Merge2 is equivalent to calling MergeFunc2 with cmp.Compare[K]
// as the ordering function.
func Merge2[K cmp.Ordered, V any](x, y iter.Seq2[K, V]) iter.Seq2[K, V] {
	return MergeFunc2(x, y, cmp.Compare[K])
}

// MergeFunc2 merges two sequences of key-value pairs ordered by the function f.
// Pairs appear in the output once for each time they appear in x
// and once for each time they appear in y.
// When pairs with equal keys appear in both sequences,
// the output contains the pairs from x before the pairs from y.
// If the two input sequences are not ordered by f,
// the output sequence will not be ordered by f,
// but it will still contain every pair from x and y exactly once.
func MergeFunc2[K, V any](x, y iter.Seq2[K, V], f func(K, K) int) iter.Seq2[K, V] {
	return func(yield func(K, V) bool) {
		next, stop := iter.Pull2(y)
		defer stop()
		k2, v2, ok2 := next()
		for k1, v1 := range x {
			for ok2 && f(k1, k2) > 0 {
				if !yield(k2, v2) {
					return
				}
				k2, v2, ok2 = next()
			}
			if !yield(k1, v1) {
				return
			}
		}
		for ok2 {
			if !yield(k2, v2) {
				return
			}
			k2, v2, ok2 = next()
		}
	}
}

// Reduce combines the values in seq using f.
// For each value v in seq, it updates sum = f(sum, v)
// and then returns the final sum.
// For example, if iterating over seq yields v1, v2, v3,
// Reduce returns f(f(f(sum, v1), v2), v3).
func Reduce[Sum, V any](f func(Sum, V) Sum, sum Sum, seq iter.Seq[V]) Sum {
	for v := range seq {
		sum = f(sum, v)
	}
	return sum
}

// Reduce2 combines the values in seq using f.
// For each pair k, v in seq, it updates sum = f(sum, k, v)
// and then returns the final sum.
// For example, if iterating over seq yields (k1, v1), (k2, v2), (k3, v3)
// Reduce returns f(f(f(sum, k1, v1), k2, v2), k3, v3).
func Reduce2[Sum, K, V any](f func(Sum, K, V) Sum, sum Sum, seq iter.Seq2[K, V]) Sum {
	for k, v := range seq {
		sum = f(sum, k, v)
	}
	return sum
}

// A Zipped is a pair of zipped values, one of which may be missing,
// drawn from two different sequences.
type Zipped[V1, V2 any] struct {
	V1  V1
	Ok1 bool // whether V1 is present (if not, it will be zero)
	V2  V2
	Ok2 bool // whether V2 is present (if not, it will be zero)
}

// Zip returns an iterator that iterates x and y in parallel,
// yielding Zipped values of successive elements of x and y.
// If one sequence ends before the other, the iteration continues
// with Zipped values in which either Ok1 or Ok2 is false,
// depending on which sequence ended first.
//
// Zip is a useful building block for adapters that process
// pairs of sequences. For example, Equal can be defined as:
//
//	func Equal[V comparable](x, y iter.Seq[V]) bool {
//		for z := range Zip(x, y) {
//			if z.Ok1 != z.Ok2 || z.V1 != z.V2 {
//				return false
//			}
//		}
//		return true
//	}
func Zip[V1, V2 any](x iter.Seq[V1], y iter.Seq[V2]) iter.Seq[Zipped[V1, V2]] {
	return func(yield func(z Zipped[V1, V2]) bool) {
		next, stop := iter.Pull(y)
		defer stop()
		v2, ok2 := next()
		for v1 := range x {
			if !yield(Zipped[V1, V2]{v1, true, v2, ok2}) {
				return
			}
			v2, ok2 = next()
		}
		var zv1 V1
		for ok2 {
			if !yield(Zipped[V1, V2]{zv1, false, v2, ok2}) {
				return
			}
			v2, ok2 = next()
		}
	}
}

// A Zipped2 is a pair of zipped key-value pairs,
// one of which may be missing, drawn from two different sequences.
type Zipped2[K1, V1, K2, V2 any] struct {
	K1  K1
	V1  V1
	Ok1 bool // whether K1, V1 are present (if not, they will be zero)
	K2  K2
	V2  V2
	Ok2 bool // whether K2, V2 are present (if not, they will be zero)
}

// Zip2 returns an iterator that iterates x and y in parallel,
// yielding Zipped2 values of successive elements of x and y.
// If one sequence ends before the other, the iteration continues
// with Zipped2 values in which either Ok1 or Ok2 is false,
// depending on which sequence ended first.
//
// Zip2 is a useful building block for adapters that process
// pairs of sequences. For example, Equal2 can be defined as:
//
//	func Equal2[K, V comparable](x, y iter.Seq2[K, V]) bool {
//		for z := range Zip2(x, y) {
//			if z.Ok1 != z.Ok2 || z.K1 != z.K2 || z.V1 != z.V2 {
//				return false
//			}
//		}
//		return true
//	}
func Zip2[K1, V1, K2, V2 any](x iter.Seq2[K1, V1], y iter.Seq2[K2, V2]) iter.Seq[Zipped2[K1, V1, K2, V2]] {
	return func(yield func(z Zipped2[K1, V1, K2, V2]) bool) {
		next, stop := iter.Pull2(y)
		defer stop()
		k2, v2, ok2 := next()
		for k1, v1 := range x {
			if !yield(Zipped2[K1, V1, K2, V2]{k1, v1, true, k2, v2, ok2}) {
				return
			}
			k2, v2, ok2 = next()
		}
		var zk1 K1
		var zv1 V1
		for ok2 {
			if !yield(Zipped2[K1, V1, K2, V2]{zk1, zv1, false, k2, v2, ok2}) {
				return
			}
			k2, v2, ok2 = next()
		}
	}
}
@rsc rsc added the Proposal label Aug 9, 2023
@gopherbot gopherbot added this to the Proposal milestone Aug 9, 2023
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Aug 9, 2023
@gophun
Copy link

gophun commented Aug 9, 2023

The duplication of each function is the first thing that catches the eye. Are there thoughts on why this is acceptable?

@gophun
Copy link

gophun commented Aug 9, 2023

What about an adapter that converts an iter.Seq[V] to an iter.Seq2[int, V] and an adapter that converts an iter.Seq2[K, V] to an iter.Seq[V]?

@zephyrtronium
Copy link
Contributor

Some typos: EqualFunc2, Map2, Merge2, and MergeFunc2 lack the 2 suffixes on their actual names. They're all correct in the corresponding documentation.

@earthboundkid
Copy link
Contributor

May I humbly suggest that the name "iterutils" is less susceptible to, uh, unfortunate mispronunciation.

@earthboundkid
Copy link
Contributor

For Reduce, the callback should go last: func Reduce[Sum, V any](sum Sum, seq Seq[V], f func(Sum, V) Sum) Sum.

@DeedleFake
Copy link

DeedleFake commented Aug 9, 2023

For Reduce, the callback should go last: func Reduce[Sum, V any](sum Sum, seq Seq[V], f func(Sum, V) Sum) Sum.

I'd actually prefer func Reduce[Sum, V any](seq Seq[V], sum Sum, f func(Sum, V) Sum) Sum.

Edit: I just realized that if Reduce() is being used to build an array, putting sum first puts everything in the same order as Append() and other functions that put the destination first. I'm not sure if that's worth it or not.

@rsc rsc moved this from Incoming to Active in Proposals Aug 9, 2023
@rsc
Copy link
Contributor Author

rsc commented Aug 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@DeedleFake
Copy link

The more I think about it, the more that I think that API design for this should wait until after a decision is made on #49085. Multiple other languages have proven over and over that a left-to-right chained syntax is vastly superior ergonomically to simple top-level functions for iterators. For example, compare

nonNegative := xiter.Filter(
  xiter.Map(
    bufio.Lines(r),
    parseLine,
  ),
  func(v int) bool { return v >= 0 },
)

vs.

nonNegative := bufio.Lines(r).
  Map(parseLine).
  Filter(func(v int) bool { return v >= 0 })

Go's a little weird because of the need to put the .on the previous line, but other than that one oddity, which I could get used to, the second is better in every way. It reads in the order that actions actually happen, it's less repetitive, etc. The only real way to emulate it currently is something like

lines := bufio.Lines(r)
intlines := xiter.Map(lines, parseLine)
nonNegative := xiter.Filter(func(v int) bool { return v >= 0 })

That works, but it clutters up the local namespace and it's significantly harder to edit. For example, if you decide you need to add a new step in the chain, you have to make sure that all of the variables for each iterator match up in the previous and succeeding calls.

@ianlancetaylor
Copy link
Contributor

What type does bufio.Lines return to make that work in Go? What methods does that type support? What is the type of nonNegative? I mean these as honest questions. Can we write this kind of code in Go today, or would we need new language features?

@hherman1
Copy link

You would probably have to wrap the base iterator like:

stream.New(bufio.Lines).
    Filter(…).
    …

@DeedleFake
Copy link

DeedleFake commented Aug 10, 2023

@ianlancetaylor

Sorry. I should have stuck a comment in. I was just coming up with some hypothetical function that would give an iter.Seq[string]. In this case, the idea was that it would internally use a bufio.Scanner to yield lines from an io.Reader or something. My original code had an anonymous func(string) int instead of the vague parseLine but I removed it because it was clogging up the example with irrelevant code and I didn't clarify when I did.

@hherman1

Not necessarily. The transformative and sink functions on iterators could just be defined as methods on iter.Seq.

@hherman1
Copy link

hherman1 commented Aug 10, 2023

But iter.Seq is an interface type no? Are you saying it should be a struct type?

I was wrong, it’s not an interface.

@benhoyt
Copy link
Contributor

benhoyt commented Aug 10, 2023

Why do some functions take the f func as the last parameter, but Filter and Map take it as the first, and Reduce in the middle? Most other functions in the stdlib take funcs as the last parameter, such as sort.Slice, slices.*Func, ServeMux.HandleFunc, and so on. This makes code that uses them with inline function literals more readable:

names := xiter.Map(func (p Person) string {
	return p.Name
}, people) // "people" gets lost

// vs

names := xiter.Map(people, func (p Person) string {
	return p.Name
})

@Merovius
Copy link
Contributor

Merovius commented Aug 10, 2023

@DeedleFake There won't be a "decision" on #49085 anytime soon. There are good reasons not to do it yet, but we also don't want to say it never happens. The issue exists to reflect that state. What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"?

@DeedleFake
Copy link

DeedleFake commented Aug 10, 2023

What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"?

No iterators, definitely. I've done fine without them for over a decade. I can wait a bit longer. If a bad implementation goes in, I'll never get a good version. Plus, I can just write my own implementation of whatever iterator functions I need as long as range-over-func exists while I wait.

@gophun
Copy link

gophun commented Aug 10, 2023

Neither chaining nor functional programming has ever been a decisive or recommended technique in Go. Instead, iteration—specifically, procedural 'for' loops—has always been a core technique since the language's inception. The iterator proposals aim to enhance this core approach. While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

@Merovius
Copy link
Contributor

I can wait a bit longer. If a bad implementation goes in, I'll never get a good version.

I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do #49085 in a decade or so.

@DeedleFake
Copy link

DeedleFake commented Aug 10, 2023

While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for Map(), Filter(), and Reduce(), among others? I have no problem with functions like slices.Backwards() and other source function proposals. My only problem is the transformative and sink functions.

I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do #49085 in a decade or so.

Edit: The way this proposal is phrased does actually imply that they may be heavily reevaluated enough in x/exp that they may not go into the standard library at all, so maybe my point is moot after all. I still think that this is a valid issue with the API design to bring up, but maybe it's a bit off-topic for this particular proposal and should wait until after they're in x/exp and it can be more easily demonstrated how awkward they are to use. I don't like the idea that existing code will be broken when some variant of them does potentially go into the standard library, but it's less of a problem than I was worried about. Never mind. Please ignore my rambling below.

That issue has only been open for 2 years. I think assuming that it'll take a decade to solve is a bit unfair. Yes, a v2 is an option, especially if #61716 is accepted, but that was created out of necessity to deal with problems in an existing package, while this would essentially be purposefully putting problems into a new package. It's not like I'm saying that iterators are unacceptable to me in this state, just that features have been delayed or cut before because of possible changes coming later and that I think that it's prudent to discuss the possibility here. That just happened in the last few weeks in the maps package because of the possibility of the acceptance of #61405. I think the same should be done with the transformative and sink functions for now, or at the very least those functions should be planned to stay in x/exp until some way to clean up the API is decided on, that's all.

One of my favorite things about Go is how slow and methodical it (usually) is in introducing new features. I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics. One of the purposes of that approach is to try avoid having to fix it later. Adding those functions in the proposed manner will almost definitely necessitate that later fix, and I very much would like to avoid that if at all possible.

@gophun
Copy link

gophun commented Aug 10, 2023

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those?

Java Streams and .NET LINQ build on a standardized iterator system, but they are more than that. Both languages had a generic iterator system before. Iterators are useful without chaining or functional programming.

Why else is there a proposal here for Map(), Filter(), and Reduce(), among others?

That would be this very proposal, and it comes with a caveat: "... or perhaps not. There are concerns about how these would affect idiomatic Go code. "

This means that not everyone who has read these proposals in advance believes that this part is a good idea.

@jba
Copy link
Contributor

jba commented Aug 10, 2023

While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for Map(), Filter(), and Reduce(), among others?

Maybe chaining leads to too much of a good thing. It becomes more tempting to write long, hard-to-read chains of functions. You're less likely to do that if you have to nest calls.

As an analogy, Go has if. Isn't the intention of if to allow conditional execution? Why then shouldn't Go have the ternary operator ?:? Because it often leads to hard-to-read code.

@rsc
Copy link
Contributor Author

rsc commented Aug 10, 2023

Re #49085, generic methods either require (A) dynamic code generation or (B) terrible speed or (C) hiding those methods from dynamic interface checks or (D) not doing them at all. We have chosen option (D). The issue remains open like so many suggestions people have made, but I don't see a realistic path forward where we choose A, B, or C, nor do I see a fifth option. So it makes sense to assume generic methods are not going to happen and do our work accordingly.

@Merovius
Copy link
Contributor

Merovius commented Aug 10, 2023

@DeedleFake The issue is not lack of understanding what a lack of parameterized methods means. It's just that, as @rsc said, wanting them doesn't make them feasible. The issue only being 2 years old is deceptive. The underlying problem is actually as old as Go and one of the main reasons we didn't have generics for most of that. Which you should consider, when you say

I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics.

We got generics by committing to keep implementation strategies open, thus avoiding the generics dilemma. Not having parametric methods is a pretty direct consequence of that decision.

@DeedleFake
Copy link

Well, I tried. If that's the decision then that's the decision. I'm disappointed, but I guess I'll just be satisfied with what I do like about the current proposal, even if it has, in my opinion, some fairly major problems. Sorry for dragging this a bit off-topic there.

@thediveo
Copy link
Contributor

Hope that it's not noise: I wondered if naming it the sum parameter might be implying to the less experienced dev that reduce does only summation, so I looked at Javascript's array reduce: that uses accumulator. I don't know if that is much better, I just wanted to point it out. If anything, let's have a good laugh.

@earthboundkid
Copy link
Contributor

earthboundkid commented Dec 20, 2024

Python does the equivalent of iter.Seq[iter.Seq[T]] and I have found it extremely inconvenient.

You could also do cooldown with

for i, item := range enumerate(stream) {
   if i != 0 && i%chunkSize == 0 {
      time.Sleep(cooldown)
   }
   doSomething(item)
}

That seems clearer to me.

@jub0bs
Copy link
Contributor

jub0bs commented Jan 21, 2025

I don't want to derail the discussion, but I want to report some benchmark results for my SortedFromMap function. @earthboundkid's suggestion to rely on a binary heap was sound! My implementation of her idea seems to outperform the naive approach (consisting in sorting all the map's keys upfront) in most cases:


Edit: Unfortunately, I had a bug in my reference "upfront_sort" implementation. I've fixed it and re-run my benchmarks. The new results indicate that the binary-heap implementation is only advantageous as the total number pairs grows and
when the number of consumed pairs is somewhat smaller than the total number of pairs.

goos: darwin
goarch: amd64
pkg: github.com/jub0bs/iterutil
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
                                               │ upfront_sort │             binary_heap              │
                                               │    sec/op    │    sec/op     vs base                │
SortedFromMap/pairs=16/consumed=at_most_16-8      583.0n ± 1%    726.1n ± 1%  +24.55% (p=0.000 n=10)
SortedFromMap/pairs=16/consumed=half-8            499.2n ± 0%    559.3n ± 1%  +12.04% (p=0.000 n=10)
SortedFromMap/pairs=16/consumed=all-8             583.1n ± 1%    723.0n ± 1%  +24.00% (p=0.000 n=10)
SortedFromMap/pairs=64/consumed=at_most_16-8      3.067µ ± 1%    2.460µ ± 1%  -19.80% (p=0.000 n=10)
SortedFromMap/pairs=64/consumed=half-8            3.328µ ± 1%    3.540µ ± 1%   +6.37% (p=0.000 n=10)
SortedFromMap/pairs=64/consumed=all-8             3.664µ ± 2%    5.095µ ± 1%  +39.06% (p=0.000 n=10)
SortedFromMap/pairs=256/consumed=at_most_16-8    14.663µ ± 1%    7.565µ ± 0%  -48.41% (p=0.000 n=10)
SortedFromMap/pairs=256/consumed=half-8           16.02µ ± 1%    15.35µ ± 1%   -4.16% (p=0.000 n=10)
SortedFromMap/pairs=256/consumed=all-8            18.18µ ± 1%    23.69µ ± 2%  +30.32% (p=0.000 n=10)
SortedFromMap/pairs=1024/consumed=at_most_16-8    66.41µ ± 1%    26.95µ ± 0%  -59.42% (p=0.000 n=10)
SortedFromMap/pairs=1024/consumed=half-8          79.15µ ± 2%    70.85µ ± 3%  -10.50% (p=0.000 n=10)
SortedFromMap/pairs=1024/consumed=all-8           90.21µ ± 1%   109.31µ ± 1%  +21.18% (p=0.000 n=10)
SortedFromMap/pairs=4096/consumed=at_most_16-8    300.5µ ± 2%    100.6µ ± 1%  -66.51% (p=0.000 n=10)
SortedFromMap/pairs=4096/consumed=half-8          348.0µ ± 2%    303.1µ ± 1%  -12.90% (p=0.000 n=10)
SortedFromMap/pairs=4096/consumed=all-8           398.1µ ± 1%    494.7µ ± 2%  +24.26% (p=0.000 n=10)
geomean                                           15.20µ         13.70µ        -9.85%

(no differences in terms of allocations)

I still believe that a binary heap is valuable. If x/exp/xiter ever sees the light of day and provides something like SortedFromMap, relying on a binary heap indeed seems like a good idea.

@morlay
Copy link

morlay commented Feb 12, 2025

a small idea to make xiter adapters composable

Added pipe helpers

package xiter

type Operator[I any, O any] func(in I) O

// pipe helpers
// added more Pipe to support n operators
func Pipe[I, O any](
	in I,
	op1 Operator[I, O],
) O {
	return op1(in)
}
func Pipe2[I, A, O any](
	in I,
	op1 Operator[I, A],
	op2 Operator[A, O],
) O {
	return op2(op1(in))
}
func Pipe3[I, A, B, O any](
	in I,
	op1 Operator[I, A],
	op2 Operator[A, B],
	op3 Operator[B, O],
) O {
	return op3(op2(op1(in)))
}
func Pipe4[I, A, B, C, O any](
	in I,
	op1 Operator[I, A],
	op2 Operator[A, B],
	op3 Operator[B, C],
	op4 Operator[C, O],
) O {
	return op4(op3(op2(op1(in))))
}

//  convert any function as operator 
func Do[I any, Arg1 any, O any](
	fn func(src I, arg1 Arg1) O,
	arg1 Arg1,
) Operator[I, O] {
	return func(src I) O {
		return fn(src, arg1)
	}
}

func Do2[I any, Arg1 any, Arg2 any, O any](
	fn func(src I, arg1 Arg1, arg2 Arg2) O,
	arg1 Arg1, arg2 Arg2,
) Operator[I, O] {
	return func(src I) O {
		return fn(src, arg1, arg2)
	}
}

Then we could

package main

func main() {
	// Must define the iter.Seq[int] to Pipe* infer the correct type
	var seq iter.Seq[int] = func(yield func(i int) bool) {
		for i := range 10 {
			if !yield(i) {
				return
			}
		}
	}

        // progress and collect
	_ = xiter.Pipe3(
		seq,
		xiter.Do(xiter.Filter, func(x int) bool { return x%2 == 0 }),
		xiter.Do(xiter.Map, func(x int) string { return fmt.Sprintf("%d", x*x) }),
		slices.Collect,
	)

        // compose for progress each sub chunk
        // then collect
	_ := xiter.Pipe3(
		seq,
		xiter.Do(xiter.Chunk[int], 3),
		xiter.Do(xiter.Map, func(values []int) int {
			return xiter.Pipe(
				slices.Values(values),
				xiter.Do2(xiter.Reduce, 0, func(ret int, v int) int { return ret + v }),
			)
		}),
		slices.Collect,
	)
}

@Jackuczys
Copy link

I was starting to lose track so I went through and tried to find all the official positions and related proposals so far. Let me know if I got anything wrong.
[...]
Separate proposals: (please keep discussion of the below in their respective threads)

@jimmyfrasche would it be possible to link the proposals you listed in that comment in the issue description as well (or at least add a link to your comment, buried behind "Load more")? This is an enormous thread so most people probably won't see your quite helpful summary. The remarks about potentially dropping/renaming Zip and adding Map12/Map21 also seem useful to mention. This issue is quite discoverable through Google searches, not as much as other proposals :)


I'm not sure if there's a point in piling things up in this proposal by suggesting yet another function but it does support the case for addition of "classical" Zip (rather than just ZipLongest) and Zip12 so I figure what I'll say below is at least partially relevant to this issue.
One thing that I think could be useful in terms of "classical" Zip (not as much the one from issue description that I'd call ZipLongest), i.e. this definition of it:

func Zip[V0, V1 any](iter.Seq[V0], iter.Seq[V1]) iter.Seq2[V0, V1]

would be something that allows creating an iterator that repeats the given item (or maybe items like slices.Repeat does for slices - there's something like that in Python called itertools.cycle()) defined number of times or even infinite number of times (since Zip would just stop after the shorter iterator stops yielding items):

// count == -1 could mean an infinite iterator
func Repeat[T any](x T, count int) iter.Seq[T]

This would allow something like:

list1 = []string{"apple", "banana", "orange"}
list2 = []string{"grapefruit", "banana", "pineapple"}
// merge into a deduplicated set
set := map[string]struct{}{}
// potentially could split this into xiter.Repeat and xiter.RepeatN like in `strings` package
valueIter := xiter.Repeat(struct{}{}, -1)

maps.Insert(set, xiter.Zip(slices.Values(list1), valueIter))
maps.Insert(set, xiter.Zip(slices.Values(list2), valueIter))

This specific case is also solved by a less universal Lift proposed earlier in this issue:

func Lift[V any](seq Seq[V]) Seq2[V, struct{}]

but it's a lot more restrictive compared to Repeat which can achieve the same while still allowing the user to not define their own anonymous function. You can also note that, when defining an anonymous function, all of this can technically be achieved with Map12 without using Zip at all:

func Map12(f func(V1) (K2, V2), Seq[V1]) Seq2[K2, V2]

in the following way:

valueFunc := func(string) (string, struct{}) { return struct{}{} }
maps.Insert(set, xiter.Map12(valueFunc, slices.Values(list1)))
maps.Insert(set, xiter.Map12(valueFunc, slices.Values(list2)))

though it starts to get tricky when trying to replicate itertools.cycle rather than just a single item repeat.

@dzherb
Copy link

dzherb commented Apr 2, 2025

By the way, we can easily chain iterators like this:

type Stream[V any] iter.Seq[V]

func FromSeq[V any](seq iter.Seq[V]) Stream[V] {
	return Stream[V](seq)
}

func (s Stream[V]) Iterator() iter.Seq[V] {
	return iter.Seq[V](s)
}

func (s Stream[V]) Filter(predicate func(V) bool) Stream[V] {
	return FromSeq(xiter.Filter[V](s.Iterator(), predicate))
}

func (s Stream[V]) Map(mapper func(V) V) Stream[V] {
	return FromSeq(xiter.Map[V](s.Iterator(), mapper))
}
st := stream.FromSeq(seq).
	Filter(func(i int) bool { return i != 2 }).
	Map(func(i int) int { return i * 2 })

for s := range st {
	fmt.Println(s)
}

I created a draft of a package to explore this more, if you are interested :)

@Merovius
Copy link
Contributor

Merovius commented Apr 3, 2025

@dzherb Note that Map is usually expected to be able to have different input and output type arguments. That's not possible with a method. Hence the preference for functions, for consistency.

@rsc
Copy link
Contributor Author

rsc commented May 13, 2025

I think it is probably time to wind down this proposal. Well-past time actually. It was a good discussion, but I don't think we have consensus on really adding any of these to the standard library and encourage what in many cases is overabstraction (even though in some cases it is not).

Most importantly, if we're not going to do xiter, we should stop talking about it and create more space for third-party packages, instead of standing in their way. We licked the cookie, but the cookie is now quite stale. Better to toss it and let other people bake some new ones. Or something like that.

@leb-kuchen
Copy link

Without a heterogeneous collection xiter is going nowhere. I am not sure about third-party, because I does not seem worth it to import one for an iterator adapter.

and encourage what in many cases is overabstraction (even though in some cases it is not).

Isn't these already solved by making iterator adapters verbose to write. Coroutines and iterator enable reusable algorithms, both are better options than repeating for loops.

@DeedleFake
Copy link

DeedleFake commented May 13, 2025

@rsc, the problem with this proposal is not the idea itself. Most people in here seem to like the idea of having iterator adapters in the standard library. The problem is that without generic methods and/or a pipe operator and shorter, easier-to-write anonymous functions, they are ergonomically unsound. I have an iterator adapter library of my own and while I have used it, I sometimes don't purely because the syntax is so awkward to write, not because it wouldn't simplify logic flow. These problems make evaluation of this proposal skewed because the biggest issues are with syntax and ergonomics, not utility or functionality.

@Merovius
Copy link
Contributor

@DeedleFake We have to evaluate library proposals in the context of the language that we have. That's not skewed. If the language ever changes, it seems easy enough to revisit this. But given that those changes are not really on the horizon, it makes no sense to add things to the standard library that aren't useful without them.

@earthboundkid
Copy link
Contributor

I am basically in favor of withdrawing this proposal because in practice while I have used iterators for real code, I haven't needed a swiss army set of iterator utilities. Instead I generally just write the ones I need at the time. That said, it does feel a little incomplete not to have map/filter/reduce in the standard library.

@adonovan
Copy link
Member

adonovan commented May 13, 2025

I am basically in favor of withdrawing this proposal because in practice while I have used iterators for real code, I haven't needed a swiss army set of iterator utilities. Instead I generally just write the ones I need at the time.

That's been my experience too. Perhaps at some point we can scan the corpus for the most commonly implemented helpers and consider adding them to iter based on the data.

@leb-kuchen
Copy link

leb-kuchen commented May 14, 2025

the problem with this proposal is not the idea itself. Most people in here seem to like the idea of having iterator adapters in the standard library. The problem is that without generic methods and/or a pipe operator and shorter, easier-to-write anonymous functions, they are ergonomically unsound. I have an iterator adapter library of my own and while I have used it, I sometimes don't purely because the syntax is so awkward to write, not because it wouldn't simplify logic flow. These problems make evaluation of this proposal skewed because the biggest issues are with syntax and ergonomics, not utility or functionality.

You're asking for type inference and the omission of the return keyword in closures, both syntactic sugar since these don't add new language features.
The verbosity level of closures would be tied to the frequency of iterator adapters. This makes future language design harder, since Go may one day may decide to become less verbose without sacrificing readability.

I consider it bad practice to chain more than 2 iterator adapters, especially for those with functions as parameters.
There are multiple problems with this approach;

  • Optimizing iterator adapters requires aggressive inlining, however Go does not have a sophisticated inliner.
  • Inlining is performed, before optimizations are applied.
  • The Go compiler generally performs less optimization to speed up compile times.
  • Closures and generics are not statically dispatched, thus there is even less opportunity for optimizations
  • If applied in the standard library, this would lead to a consistent performance penalty.
  • Finally it makes the code harder to read understand for humans and the Go compiler.

A lot of iterator adapters is what LLVM based languages can afford but unfortunately not go without sacrificing performance.

That's been my experience too. Perhaps at some point we can scan the corpus for the most commonly implemented helpers and consider adding them to iter based on the data.

I don't think this requires research, either iterator adapters are added or not.

@dfinkel
Copy link
Contributor

dfinkel commented May 14, 2025

I think that even if we don't include the rest, Concat and Concat2 are useful enough (and needed commonly enough) that they should be considered for inclusion in either an xiter or iter-proper. (I've written both of those multiple times because it avoids a bunch of sequential loops)

FWIW, I'd be fine with a "this is less performant than an equivalent loop" warning on any iterator adapter.

@apparentlymart
Copy link

I think it's probably true that I've not been using iter.Seq and iter.Seq2 with as much gusto as others in this discussion and so my experience might not be representative, but as I review the accumulated uses of both in some of the codebases I maintain I think the most notable gaps for me are represented by the separate proposal #68947.

In those cases, I have a function that normally returns a more interesting sequence but has some situations where it returns a fixed result of either an empty sequence or a sequence containing just one placeholder value, primarily just to satisfy the function signature in a way that avoids the caller needing to make any special accommodation, similar to returning nil slices or maps so that a caller will just naturally take no action on the result.

The only example from this proposal that I've seen arise more than once is "filter", but even that is marginal since we've typically been able to do the filtering either at the source or the destination, rather than as an intermediate step. Our main focus has been to avoid materializing temporary slices just to conveniently iterate over them, and not to significantly change approach otherwise.

Of course I can't and don't intend to speak for anyone else's experience here, but I'd generally agree that it's not clear yet just how extensive the standard library needs to be in this area, or whether future language changes would want these to be designed in a different way, so I might suggest that this larger proposal be closed and instead focus on smaller proposals like #68947 that each solve a specific set of well-understood problems. (While hopefully keeping in mind all of the general API design ideas already worked through above so that the individual proposals can all have consistent design elements.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Active
Development

No branches or pull requests