Add multithreading for singular files #468

dramaticblanket · 2025-03-02T12:15:52Z

dramaticblanket
Mar 2, 2025

Currently ugrep (and I believe all other grep-inspired software) can only dedicate threads to individual files, which can be a major performance bottleneck when searching large singular files. A single logical core simply cannot keep up with today's 7 to 14 gigabytes-a-second read speeds on consumer SSDs, leading to unused bandwidth and wasted time.

The obvious workaround to this is to split the file up into smaller chunks so that ugrep can process them in parallel, which works, but it takes extra time, writing the chunks puts wear on the SSD, and it's just an overall unnecessary extra step.

My suggestion would be to add a kind of multithreading to ugrep that can search the same file from multiple worker threads at a time.

genivia-inc · 2025-03-02T15:51:13Z

genivia-inc
Mar 2, 2025
Maintainer

Nice idea that is on my wish list of things to consider, but practically it is not as simple to implement in a generic grep tool. It would work fine for options like -c and -l to count matches and list matching files. For other options we need line number counting and buffering techniques to produce output as well as synchronize threads to output matches in sequence, which kills the benefit of threading on a single file.

If we restrict to these use cases for -c and -l and -q), then we could as well use (or develop) a new utility that solely specializes in finding a match or count matches fast in a single file.

Threaded search on a single file may be faster if we can generally assume that file IO is not the bottleneck. But often it is a bottleneck, since large files aren't cached in memory when searched for the first time or when the file is several GB large and won't fit in "spare" memory for caching.

Furthermore, it's not going to speed up recursive searching that use worker thread pools that already saturate the CPU cores (with some limits, because saturating them all is not ideal for performance when other OS threads are busy).

Therefore, it is nice to have, but there are caveats. IMHO a dedicated new utility to only count matches or find a matching file is more appropriate.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multithreading for singular files #468

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Add multithreading for singular files #468

dramaticblanket Mar 2, 2025

Replies: 1 comment

genivia-inc Mar 2, 2025 Maintainer

dramaticblanket
Mar 2, 2025

genivia-inc
Mar 2, 2025
Maintainer