Skip to content

#272 Fix race condition in DirectoryStore #318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 19, 2019

Conversation

jmswaney
Copy link
Contributor

@jmswaney jmswaney commented Oct 25, 2018

Multiple workers may attempt to create subfolders when using NestedDirectoryStore, leading to FileExistsError raised. This fix catches OSError (for python 2 support) and checks if the directory already exists--if it does, then no exception is raised.

Fixes #272

zarr/storage.py Outdated
@@ -758,6 +758,9 @@ def __setitem__(self, key, value):
if not os.path.exists(dir_path):
try:
os.makedirs(dir_path)
except OSError:
if not os.path.isdir(dir_path):
raise KeyError(key)
except Exception:
raise KeyError(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...so this except case doesn't catch it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except Exception block was catching it before, but it was just throwing KeyError instead--so my code would still error out. The block that I added only raises KeyError if the folder doesn't exist already. This way, workers can go on with their business if the folder already exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we then drop the except Exception entirely and only use except OSError? It seems bad to mask other errors by raising KeyError instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. I wasn't sure about that part so I left it, but I'm not sure what other exceptions you can even expect from os.makedirs. If you got OSError because of permissions or something else that prevented making the folder, then the block that I added would raise KeyError anyway.

Copy link
Member

@jakirkham jakirkham Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We can improve this a bit then by checking the errno of the OSError. Namely if we find e.errno == errno.EEXIST, then we know this is a FileExistsError. If it's any other errno, we can just re-raise the exception as is.

ref: https://docs.python.org/3/library/exceptions.html#FileExistsError
ref: https://docs.python.org/3/library/errno.html#errno.EEXIST

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have taken the liberty of pushing a commit to your branch using the errno bit. Please let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about that part so I left it...

That makes sense. Honestly this is an orthogonal issue. So we need not address it here. Will raise an issue to follow-up on this after.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and dropped the generic exception case as it didn't seem to be covered. Happy to revisit if we figure out it was needed for some reason.

On Python 3, we would only want to catch the race case here by checking
for the `FileExistsError` and suppressing raising for that case. However
there is no `FileExistsError` on Python 2. Only `OSError` exists on
Python 2/3. However that catches some other errors that we might not
want to suppress here. So we check the `errno` of the `OSError` instance
to verify it is caused by an existing file/directory represented by the
`EEXIST` error number. This amounts to catching the `FileExistsError` in
a Python 2/3 friendly way. All other `OSErrors` we raise as before.
@pep8speaks
Copy link

pep8speaks commented Feb 20, 2019

Hello @jmswaney! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-03-19 06:22:00 UTC

@jakirkham jakirkham mentioned this pull request Feb 20, 2019
@jakirkham jakirkham requested a review from alimanfoo February 26, 2019 08:23
@jakirkham
Copy link
Member

@alimanfoo, when you have a chance, could you please give this a quick look? Shouldn't take too long I hope. :)

As we now properly handle the existing file case properly, go ahead and
drop the generic exception case. This should now raise some errors that
were hidden previously.
@jakirkham jakirkham merged commit 579b683 into zarr-developers:master Mar 19, 2019
@jakirkham
Copy link
Member

Thanks @jmswaney!

@jakirkham
Copy link
Member

Have added the release note to PR ( #409 ). If there are any other issues, please let us know and we can follow-up on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in storage.py for NestedDirectoryStore writes
4 participants