gh-129005: Align FileIO.readall allocation #129424
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Both
_io
and_pyio
now use a pre-allocated buffer of lengthbufsize
, fill it using aos.readinto
/_Py_read
, and have matching "expand buffer" logic.On my machine (Linux, Debug build) this takes:
./python -m test -M8g -uall test_largefile -m test_large_read -v
from ~3.7 seconds to ~3.3 seconds_pyio
still uses 2x the memory, there are two remaining copiesbytes(result)
currently copies. I'd like to either just rely on "duck typing" / bytearray is close enough to bytes, or would need to do something similar to C++ "move" semantics where the bytes could take ownership of the data buffer from the bytearray without copying... Not sure what is most Pythonic_pyio.BufferedIO._read_unlocked
in the read-all case where no buffer has been allocated always doesreturn buf[:pos] + chunk
which causes another copy. Patch for that case / "if buf is length 0, just return" coming shortly.