Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore: fix confusing lifetimes in ParallelVisitor{,Builder} with an associated type #3012

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

cosmicexplorer
Copy link

@cosmicexplorer cosmicexplorer commented Mar 6, 2025

Problem

The docs for WalkParallel::visit() state the following:

Typically, creating a custom visitor is useful if you need to perform some kind of cleanup once traversal is finished. This can be achieved by implementing Drop for your builder (or for your visitor, if you want to execute cleanup for every thread that is launched).

However, by accepting an &mut dyn ParallelVisitorBuilder<'_>, the visit() method makes it rather unclear when the builder's Drop method is called. There's also no example of this custom visitor usage in the repo, and all examples instead use the neighboring .run() method.

It's also relatively unclear what the purpose of the lifetime 's in ParallelVisitorBuilder<'s> is--I understand it's supposed to just avoid the need to enforce a 'static lifetime so we can use std::thread::scope(), but implementors may not know what to do with it. While looking to use this API, I was under the impression that the builder would be able to create visitors scoped to the builder's lifetime, but this ended up not being possible, because of the &mut self requirement meaning that the builder can't hand out references to its own fields. This further increases confusion about where and how the Drop impl of the builder is supposed to get called.

Finally, the API requires creating a Box where none is necessary. This is very unlikely to cause any performance issues, but is a little awkward.

Solution

  1. Make the ParallelVisitorBuilder return a lifetime-annotated associated type Self::Visitor<'s>.
  2. Make the build() method return a concrete instance of Self::Visitor, scoped to the builder's own lifetime.
  3. Make the WalkParallel::visit() method accept an impl ParallelVisitorBuilder, which allows the user to move a builder into the visit() method so that the Drop method can be called immediately after the conclusion of the parallel traversal.
  4. Make the WalkParallel::visit() method return the ParallelVisitorBuilder argument after completion.

Result

The .run() and .visit() methods' signatures were modified, but this should make them strictly more general. I've removed a few Box::new() calls to demonstrate how this works, but the code will still compile with the Box::new() calls. The newly added example script looks like:

fn walk_dir(
    dir: impl AsRef<Path>,
) -> Result<BTreeSet<PathBuf>, ignore::Error> {
    ignore::WalkBuilder::new(dir)
        .build_parallel()
        .visit(VisitorBuilder::new())
        // This is an impl method on the returned VisitorBuilder:
        .into_result()
}

fn main() {
    println!("success: {:?}", walk_dir(".").unwrap());
    println!("err: {:?}", walk_dir("asdf"));
}

Because of the more expansive API which allows the builder to be consumed by the visit() call, it is now clear from a glance that the result of the successful first crawl will be printed to stdout, before the failing second crawl is started at all. It is also clear that if the failing second crawl is removed, there will be no deadlock, because we were able to have the builder take ownership of its data and return a result on the stack instead of creating extraneous Arc and mpsc::channel instances.

It is my belief that this approach is much easier to follow, and actually fulfills the expectations of the existing docstring for WalkParallel::visit().

@cosmicexplorer cosmicexplorer changed the title ignore: avoid boxing in ParallelVisitor{,Builder} with an associated type ignore: fix confusing lifetimes in ParallelVisitor{,Builder} with an associated type instead of boxing Mar 6, 2025
@cosmicexplorer cosmicexplorer changed the title ignore: fix confusing lifetimes in ParallelVisitor{,Builder} with an associated type instead of boxing ignore: fix confusing lifetimes in ParallelVisitor{,Builder} with an associated type Mar 6, 2025
@cosmicexplorer cosmicexplorer force-pushed the ignore-visitor-associated-type branch from bf74e03 to 6fafb72 Compare March 6, 2025 09:52
@cosmicexplorer
Copy link
Author

I believe I have just figured out how to allow the builder to return scoped data (on the stack) by having WalkParallel::visit() return the provided builder argument. This makes it possible for a ParallelBuilderVisitor implementation to produce a really slick synchronous API if desired, by consuming the builder returned by the visit() method (this is not required, and fully asynchronous/channel-based message passing still works, without any changes to the code--this is strictly more general). The added example script demonstrates how returning the builder argument can be used in a high-level API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant