PERF: 7% faster pip install, remove fsync call in temporary file, not necessary #12859

morotti · 2024-07-19T12:58:01Z

Hello,

could I have a review on this code?

profiling on a run of pip install jupyter

_install_wheel() function is 3670 ms. that's the installation phase.
fsync is 594 ms out of that, about 15% of the install phase (or 7% of the whole pip run).

It is called here 3 times

pip/src/pip/_internal/operations/install/wheel.py

Line 672 in 8eadcab

with _generate_file(installer_path) as installer_file:

that's the end of the wheel installation, in _generate_file() decorator:

to create the dist-info/INSTALLER file that always contains "pip" string
to create the dist-info/direct-url-something.json that contains a one line json file with the path/url.
to create the dist-info/RECORD file that contains a list of all installed files (allegedly the most important file)

the way it works: it creates a temporary file in the same directory as the final path + flush() + fsync() + close() + atomic rename() the file to the final path.
I think the fsync call is not necessary.

fsync calls are surprisingly expensive, I am always seeing multiple milliseconds on various types of disks on my machine and even on /tmp.
there are only 3 files fsync'ed per extracted wheel, yet they take almost as long as everything else.
I think the run has 5564 files calls to save() to extract files + 116 calls to fsync() for metadata files + for a total of 58 wheels to install.

pfmoore

This looks reasonable, but I don't know enough about filesystem management to be sure that the fsync call is unnecessary. The history of the code doesn't give any indication.

Assuming the tests pass, and none of the other maintainers feels that this is a risk, I'm in favour. We can revert and add a note explaining why the fsync is needed if we do hit issues.

morotti · 2024-07-19T15:13:38Z

This looks reasonable, but I don't know enough about filesystem management to be sure that the fsync call is unnecessary. The history of the code doesn't give any indication.

This was added in 2019 with a comment "The file is created securely and is ensured to be written to disk after the context reaches its end."

My best guess, the guy who wrote this code went over-the-top to both flush() and fsync() to ensure the content is written. He probably through the record file is very important and went all in! :D
(The initial commit might have been bugged because it flushed the file result.file.flush() instead of the buffer result.flush())

As far as I understand, the fsync() is unnecessary because the flush() will flush AND the end of the with block will autoclose and flush too (closing is flushing).
If something crashes like a power loss before the write is complete and the rename is done, it's okay because the file is still a temporary filename. It's the atomic rename() after write that ensures consistency.
(By the way, I don't think that fsync() provides any guarantee that the data will be written to the media on modern hardware)

The code itself was probably a copy/pasted snippet from somewhere else (that didn't do atomic rename) or from previous code in pip (the commit message implies pip was writing in place before).

ichard26

Taking at a look at how adjacent_tmp_file() is used across the codebase, it's only used as part of a write-replace sequence to ensure atomic writes:

In the HTTP cache write implementation:

pip/src/pip/_internal/network/cache.py

Lines 77 to 80 in 203780b

    
           with adjacent_tmp_file(path) as f: 
        
               f.write(data) 
        
           replace(f.name, path)

For the self-check state file:

pip/src/pip/_internal/self_outdated_check.py

Lines 115 to 121 in 203780b

    
           with adjacent_tmp_file(self._statefile_path) as f: 
        
               f.write(text.encode()) 
        
           try: 
        
               # Since we have a prefix-specific state file, we can just 
        
               # overwrite whatever is there, no need to check. 
        
               replace(f.name, self._statefile_path)

For generate_file() as mentioned in this PR:

pip/src/pip/_internal/operations/install/wheel.py

Lines 661 to 666 in 203780b

    
           @contextlib.contextmanager 
        
           def _generate_file(path: str, **kwargs: Any) -> Generator[BinaryIO, None, None]: 
        
               with adjacent_tmp_file(path, **kwargs) as f: 
        
                   yield f 
        
               os.chmod(f.name, generated_file_mode) 
        
               replace(f.name, path)

I did some reading, notably https://blog.gocept.com/2013/07/15/reliable-file-updates-with-python/ and https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/. Assuming I haven't misunderstood something, the main benefit of os.fsync() is system crash resistance (durability). While flushing ensures that the written data is handed off to the OS (and isn't stored in a Python-level buffer which could be easily lost if pip crashes), it provides zero guarantee that the data was written to the physical media. In many situations, the disk write is deferred until some later time. If the entire system crashes or the power goes out, the OS buffer and any pending writes will be lost.

os.fsync() ensures that the writes have been accepted by the storage controller (aka flushed from the OS buffers). This is necessary for durable¹ writes of the temporary file and eventual renamed file. Say, an user pip installs six. During that install, pip writes six's RECORD file using adjacent_tmp_file(). While at no point during the install will the OS or other programs see a partially written record file due to the atomic replace, if the system crashes after the install but before the OS has flushed the writes to the original temporary file to disk, the record file may be corrupted.

It's certainly nice to have an atomic and durable record files², but I'm not certain whether that's a guarantee pip advertises/documents for package installation or any other part of its functionality. I'll also note that the replace operation is currently not durable. We'd have to fsync the containing directory to ensure the rename has been written to disk.

TL;DR: the proposed logic is atomic, but only when the system does not crash before the relevant data is stored safely on disk (i.e. it isn't durable).

I wrote "accepted by the storage controller" for a reason. The controller may store the data in a buffer or cache temporarily before writing it to non-volatile storage. So, in theory a poorly timed power outage could still result in only some of the data being written to disk. ↩
I haven't thought about the importance of atomicity/durability for the self-check and cache use cases. ↩

morotti · 2024-07-30T15:03:44Z

I think this patch is ready to merge.

@ichard26 per the link you gave, ext4 filesystem is ensuring that a rename can only be written after pending writes to the file have completed. the fsync is not necessary.
from the messages in the mailing list, they fix the filesystem because yum, the package manager on centos/rhel, was also doing rename after writes to ensure atomic changes and the filesystem had to handle it.

ichard26 · 2024-07-30T15:10:08Z

Isn't that a kernel/filesystem level hack? You know that we have other operation systems and filesystems to support. I'm not saying that we need to support durable installations, but if we want to, then relying on an ext4 feature is not the way to go.

morotti · 2024-12-11T15:50:03Z

reping. forgot to follow up.

would be nice to merge the patch to get the performance improvement! :D

webknjaz · 2024-12-11T18:01:59Z

@morotti looks like you forgot to respond to Richard's concern? It sounds like said guarantee is file system-specific and other file systems might behave differently. This might be important in the light of pip's cross-platform nature. Even under GNU/Linux distros, people tend to use different file systems, but beyond that, it's rarely possible to use the same file systems under different operating systems. Moreover, on GNU/Linux it's quite common for block devices to be layered on top of others (LUKS/LVM/btrfs). Do you know of implications this would have on them and file systems like NTFS/HFS/exfat etc.? What are the implications of sync on rename for those? Note that a rename across file system boundaries is typically implemented as copy+remove.
It sounds safer to do that fsync syscall than not.

pradyunsg

relying on an ext4 feature is not the way to go.

I'd like us to have confirmation that this isn't relying on a files system specific thing or, if it does, let's only use it when the file system is known to be using a specific format that's compatible.

pradyunsg

Marking as request changes, since the above is a blocking concern.

morotti · 2024-12-15T14:22:57Z

sorry for delay, there is no issue as far as I am aware.

the long discussion above is misleading, it was mainly around a bug ticket on RHEL/CentOS/ext4. early versions of ext4 started reordering filesystem operations in unsafe ways and data caused corruption. this was noticed with yum install leaving empty files and that was quite a problem. it was fixed forever ago after the issue was understood.
I'm not aware of any filesystem being affected by a similar issue.

by the way, the reason this code does a fsync() in the first place is because it is decades old code that used to write files in place which was unsafe (irrelevant of calling fsync or not). the code was fixed years ago to write to a separate file + flush + atomic rename, but they forgot to remove the fsync call.

pfmoore · 2024-12-15T15:01:40Z

but they forgot to remove the fsync call

You've asserted this a few times now, but what's your proof? You said yourself that this is your "best guess". But isn't it just as likely that the fsync was left in because it addressed some corner case (possibly on an obscure platform) that you're not aware of?

Your arguments seem to suggest that the fsync syscall is essentially useless. If that's the case, why does it even exist? It's clearly not a no-op, as it takes a non-trivial amount of time to execute.

While performance improvements are nice, we shouldn't make them at the expense of correctness. So the blocking question here is twofold:

Do we have documentation explaining why the fsync call in this situation is unnecessary? It's not sufficient for you to just assert that's the case.
Can we check for the conditions that make it unnecessary, and only omit the fsync in those cases? Note, for example, that on Windows, the os.fsync function calls the MS CRT function _commit. Neither of the articles you linked to above discuss Windows at all, so why do you assume the behaviour there is the same as Linux?

As a reminder - I originally said that I was OK with this change as long as the other maintainers didn't have concerns. It turns out the other maintainers do have concerns, and if you don't address them, this PR will remain blocked. I'm simply trying to clarify what the blocking concern is, but it's not me you have to persuade, it's @ichard26 and @pradyunsg

rmmancom added 3 commits July 19, 2024 12:28

PERF: debug fsync duration

87fa24e

PERF: remove fsync call on temporary file, it's not necessary

303f628

add news

e827188

psf-chronographer bot added the bot:chronographer:provided label Jul 19, 2024

rmmancom added 2 commits July 19, 2024 14:26

remove debug code

394a734

missing a new line

4e3ba9d

pfmoore approved these changes Jul 19, 2024

View reviewed changes

morotti mentioned this pull request Jul 19, 2024

Release 24.2 #12819

Closed

ichard26 reviewed Jul 24, 2024

View reviewed changes

pradyunsg reviewed Dec 13, 2024

View reviewed changes

pradyunsg requested changes Dec 13, 2024

View reviewed changes

ichard26 added the type: performance Commands take too long to run label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: 7% faster pip install, remove fsync call in temporary file, not necessary #12859

PERF: 7% faster pip install, remove fsync call in temporary file, not necessary #12859

morotti commented Jul 19, 2024 •

edited

Loading

pfmoore left a comment

morotti commented Jul 19, 2024

ichard26 left a comment

morotti commented Jul 30, 2024

ichard26 commented Jul 30, 2024

morotti commented Dec 11, 2024

webknjaz commented Dec 11, 2024 •

edited

Loading

pradyunsg left a comment

pradyunsg left a comment

morotti commented Dec 15, 2024

pfmoore commented Dec 15, 2024

	with adjacent_tmp_file(path) as f:
	f.write(data)

	replace(f.name, path)

	with adjacent_tmp_file(self._statefile_path) as f:
	f.write(text.encode())

	try:
	# Since we have a prefix-specific state file, we can just
	# overwrite whatever is there, no need to check.
	replace(f.name, self._statefile_path)

	@contextlib.contextmanager
	def _generate_file(path: str, **kwargs: Any) -> Generator[BinaryIO, None, None]:
	with adjacent_tmp_file(path, **kwargs) as f:
	yield f
	os.chmod(f.name, generated_file_mode)
	replace(f.name, path)

PERF: 7% faster pip install, remove fsync call in temporary file, not necessary #12859

Are you sure you want to change the base?

PERF: 7% faster pip install, remove fsync call in temporary file, not necessary #12859

Conversation

morotti commented Jul 19, 2024 • edited Loading

pfmoore left a comment

Choose a reason for hiding this comment

morotti commented Jul 19, 2024

ichard26 left a comment

Choose a reason for hiding this comment

Footnotes

morotti commented Jul 30, 2024

ichard26 commented Jul 30, 2024

morotti commented Dec 11, 2024

webknjaz commented Dec 11, 2024 • edited Loading

pradyunsg left a comment

Choose a reason for hiding this comment

pradyunsg left a comment

Choose a reason for hiding this comment

morotti commented Dec 15, 2024

pfmoore commented Dec 15, 2024

morotti commented Jul 19, 2024 •

edited

Loading

webknjaz commented Dec 11, 2024 •

edited

Loading