gh-129005: Align FileIO.readall allocation #129424

cmaloney · 2025-01-29T05:27:56Z

Both _io and _pyio now use a pre-allocated buffer of length bufsize, fill it using a os.readinto / _Py_read, and have matching "expand buffer" logic.

On my machine (Linux, Debug build) this takes: ./python -m test -M8g -uall test_largefile -m test_large_read -v from ~3.7 seconds to ~3.3 seconds

_pyio still uses 2x the memory, there are two remaining copies

the bytes(result) currently copies. I'd like to either just rely on "duck typing" / bytearray is close enough to bytes, or would need to do something similar to C++ "move" semantics where the bytes could take ownership of the data buffer from the bytearray without copying... Not sure what is most Pythonic
_pyio.BufferedIO._read_unlocked in the read-all case where no buffer has been allocated always does return buf[:pos] + chunk which causes another copy. Patch for that case / "if buf is length 0, just return" coming shortly.

Issue: Reduce copies when reading files in pyio, match behavior of _io #129005

Both now use a pre-allocated buffer of length `bufsize`, fill it using a readinto, and have matching "expand buffer" logic. On my machine this takes: `./python -m test -M8g -uall test_largefile -m test_large_read -v` from ~3.7 seconds to ~3.3 seconds

Lib/_pyio.py

cmaloney · 2025-01-29T22:07:19Z

Not sure how to get the Tests / Ubuntu / build and test unbroken, they seem to be auto-cancelling fast... which isn't happening on any of my other open PRs.

cmaloney · 2025-01-29T23:19:24Z

rebased on main and opened #129458 as a replacement that hopefully will revoid the cancelled test runs.

cmaloney added 2 commits January 28, 2025 21:19

add blurb

871979d

bedevere-app bot added the awaiting review label Jan 29, 2025

bedevere-app bot mentioned this pull request Jan 29, 2025

Reduce copies when reading files in pyio, match behavior of _io #129005

Open

cmaloney force-pushed the cmaloney/pyio_fileio_readall branch from e382660 to 871979d Compare January 29, 2025 05:40

cmaloney added 2 commits January 28, 2025 22:35

Update 2025-01-28-21-22-44.gh-issue-129005.h57i9j.rst

5cee34e

Merge branch 'main' into cmaloney/pyio_fileio_readall

6c010f2

cmaloney commented Jan 29, 2025

View reviewed changes

Lib/_pyio.py Outdated Show resolved Hide resolved

cmaloney and others added 2 commits January 29, 2025 13:59

Remove unneecssary variable

869a31b

Merge branch 'main' into cmaloney/pyio_fileio_readall

188b239

cmaloney closed this Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-129005: Align FileIO.readall allocation #129424

gh-129005: Align FileIO.readall allocation #129424

cmaloney commented Jan 29, 2025 •

edited

Loading

cmaloney commented Jan 29, 2025 •

edited

Loading

cmaloney commented Jan 29, 2025

gh-129005: Align FileIO.readall allocation #129424

gh-129005: Align FileIO.readall allocation #129424

Conversation

cmaloney commented Jan 29, 2025 • edited Loading

cmaloney commented Jan 29, 2025 • edited Loading

cmaloney commented Jan 29, 2025

cmaloney commented Jan 29, 2025 •

edited

Loading

cmaloney commented Jan 29, 2025 •

edited

Loading