-
Notifications
You must be signed in to change notification settings - Fork 3.1k
PERF: 7% faster pip install, remove fsync call in temporary file, not necessary #12859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable, but I don't know enough about filesystem management to be sure that the fsync
call is unnecessary. The history of the code doesn't give any indication.
Assuming the tests pass, and none of the other maintainers feels that this is a risk, I'm in favour. We can revert and add a note explaining why the fsync
is needed if we do hit issues.
This was added in 2019 with a comment "The file is created securely and is ensured to be written to disk after the context reaches its end." My best guess, the guy who wrote this code went over-the-top to both flush() and fsync() to ensure the content is written. He probably through the record file is very important and went all in! :D As far as I understand, the fsync() is unnecessary because the The code itself was probably a copy/pasted snippet from somewhere else (that didn't do atomic rename) or from previous code in pip (the commit message implies pip was writing in place before). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking at a look at how adjacent_tmp_file()
is used across the codebase, it's only used as part of a write-replace sequence to ensure atomic writes:
- In the HTTP cache write implementation:
pip/src/pip/_internal/network/cache.py
Lines 77 to 80 in 203780b
with adjacent_tmp_file(path) as f: f.write(data) replace(f.name, path) - For the self-check state file:
pip/src/pip/_internal/self_outdated_check.py
Lines 115 to 121 in 203780b
with adjacent_tmp_file(self._statefile_path) as f: f.write(text.encode()) try: # Since we have a prefix-specific state file, we can just # overwrite whatever is there, no need to check. replace(f.name, self._statefile_path) - For
generate_file()
as mentioned in this PR:pip/src/pip/_internal/operations/install/wheel.py
Lines 661 to 666 in 203780b
@contextlib.contextmanager def _generate_file(path: str, **kwargs: Any) -> Generator[BinaryIO, None, None]: with adjacent_tmp_file(path, **kwargs) as f: yield f os.chmod(f.name, generated_file_mode) replace(f.name, path)
I did some reading, notably https://blog.gocept.com/2013/07/15/reliable-file-updates-with-python/ and https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/. Assuming I haven't misunderstood something, the main benefit of os.fsync()
is system crash resistance (durability). While flushing ensures that the written data is handed off to the OS (and isn't stored in a Python-level buffer which could be easily lost if pip crashes), it provides zero guarantee that the data was written to the physical media. In many situations, the disk write is deferred until some later time. If the entire system crashes or the power goes out, the OS buffer and any pending writes will be lost.
os.fsync()
ensures that the writes have been accepted by the storage controller (aka flushed from the OS buffers). This is necessary for durable1 writes of the temporary file and eventual renamed file. Say, an user pip installs six. During that install, pip writes six's RECORD
file using adjacent_tmp_file()
. While at no point during the install will the OS or other programs see a partially written record file due to the atomic replace, if the system crashes after the install but before the OS has flushed the writes to the original temporary file to disk, the record file may be corrupted.
It's certainly nice to have an atomic and durable record files2, but I'm not certain whether that's a guarantee pip advertises/documents for package installation or any other part of its functionality. I'll also note that the replace operation is currently not durable. We'd have to fsync the containing directory to ensure the rename has been written to disk.
TL;DR: the proposed logic is atomic, but only when the system does not crash before the relevant data is stored safely on disk (i.e. it isn't durable).
Footnotes
-
I wrote "accepted by the storage controller" for a reason. The controller may store the data in a buffer or cache temporarily before writing it to non-volatile storage. So, in theory a poorly timed power outage could still result in only some of the data being written to disk. ↩
-
I haven't thought about the importance of atomicity/durability for the self-check and cache use cases. ↩
I think this patch is ready to merge. @ichard26 per the link you gave, ext4 filesystem is ensuring that a rename can only be written after pending writes to the file have completed. the fsync is not necessary. |
Isn't that a kernel/filesystem level hack? You know that we have other operation systems and filesystems to support. I'm not saying that we need to support durable installations, but if we want to, then relying on an ext4 feature is not the way to go. |
reping. forgot to follow up. would be nice to merge the patch to get the performance improvement! :D |
@morotti looks like you forgot to respond to Richard's concern? It sounds like said guarantee is file system-specific and other file systems might behave differently. This might be important in the light of pip's cross-platform nature. Even under GNU/Linux distros, people tend to use different file systems, but beyond that, it's rarely possible to use the same file systems under different operating systems. Moreover, on GNU/Linux it's quite common for block devices to be layered on top of others (LUKS/LVM/btrfs). Do you know of implications this would have on them and file systems like NTFS/HFS/exfat etc.? What are the implications of sync on rename for those? Note that a rename across file system boundaries is typically implemented as copy+remove. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relying on an ext4 feature is not the way to go.
I'd like us to have confirmation that this isn't relying on a files system specific thing or, if it does, let's only use it when the file system is known to be using a specific format that's compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking as request changes, since the above is a blocking concern.
sorry for delay, there is no issue as far as I am aware. the long discussion above is misleading, it was mainly around a bug ticket on RHEL/CentOS/ext4. early versions of ext4 started reordering filesystem operations in unsafe ways and data caused corruption. this was noticed with yum install leaving empty files and that was quite a problem. it was fixed forever ago after the issue was understood. by the way, the reason this code does a fsync() in the first place is because it is decades old code that used to write files in place which was unsafe (irrelevant of calling fsync or not). the code was fixed years ago to write to a separate file + flush + atomic rename, but they forgot to remove the fsync call. |
You've asserted this a few times now, but what's your proof? You said yourself that this is your "best guess". But isn't it just as likely that the fsync was left in because it addressed some corner case (possibly on an obscure platform) that you're not aware of? Your arguments seem to suggest that the fsync syscall is essentially useless. If that's the case, why does it even exist? It's clearly not a no-op, as it takes a non-trivial amount of time to execute. While performance improvements are nice, we shouldn't make them at the expense of correctness. So the blocking question here is twofold:
As a reminder - I originally said that I was OK with this change as long as the other maintainers didn't have concerns. It turns out the other maintainers do have concerns, and if you don't address them, this PR will remain blocked. I'm simply trying to clarify what the blocking concern is, but it's not me you have to persuade, it's @ichard26 and @pradyunsg |
Hello,
could I have a review on this code?
profiling on a run of
pip install jupyter
_install_wheel()
function is 3670 ms. that's the installation phase.fsync
is 594 ms out of that, about 15% of the install phase (or 7% of the whole pip run).It is called here 3 times
pip/src/pip/_internal/operations/install/wheel.py
Line 672 in 8eadcab
that's the end of the wheel installation, in
_generate_file()
decorator:to create the dist-info/INSTALLER
file that always contains "pip" stringto create the dist-info/direct-url-something.json
that contains a one line json file with the path/url.to create the dist-info/RECORD
file that contains a list of all installed files (allegedly the most important file)the way it works: it creates a temporary file in the same directory as the final path + flush() + fsync() + close() + atomic rename() the file to the final path.
I think the fsync call is not necessary.
fsync calls are surprisingly expensive, I am always seeing multiple milliseconds on various types of disks on my machine and even on /tmp.
there are only 3 files fsync'ed per extracted wheel, yet they take almost as long as everything else.
I think the run has 5564 files calls to save() to extract files + 116 calls to fsync() for metadata files + for a total of 58 wheels to install.