-
-
Notifications
You must be signed in to change notification settings - Fork 330
groundwork for V3 group tests #1743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…and add some docstrings, and remove redundant def
…ocalStore.get_partial_values to properly handle the byte_range parameter of LocalStore.get.
Some of my group tests are running into bugs from the storage side. So I'm broadening the scope of this effort to include tests for the stores. |
Hello @d-v-b! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-05-10 08:58:14 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't get to reviewing the tests yet but this is looking good.
src/zarr/v3/group.py
Outdated
# if `key` names an object in storage, it cannot be an array or group | ||
if await store_path.exists(): | ||
raise KeyError(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should include this check. This adds a roundtrip to the store to protect against a situation I have a hard time imaging being a real issue.
For one, most storage backends forbid such a situation but I don't think the spec really cares.
/store/a/b/c
/store/a/b/c/zarr.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't have this check and we are on v3, then zarr_json_bytes = await (store_path / ZARR_JSON).get()
will be None
for any object, which will result in an implicit group. I don't think we want this. I'm actually thinking that implicit groups might be more trouble than they are worth.
In fact, getitem
will never raise a KeyError
for zarr v3, since any prefix is Group
| Array
, thanks to implicit groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened zarr-developers/zarr-specs#291 to discuss removing implicit groups from the spec. the PR is at zarr-developers/zarr-specs#292
src/zarr/v3/group.py
Outdated
# would be nice to make these special keys accessible programmatically, | ||
# and scoped to specific zarr versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rework this comment to explain what the subkeys_filtered variable is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You suggestion to move the tuple below is a good one though.
src/zarr/v3/group.py
Outdated
subkeys = await self.store_path.store.list_dir(self.store_path.path) | ||
# would be nice to make these special keys accessible programmatically, | ||
# and scoped to specific zarr versions | ||
subkeys_filtered = filter(lambda v: v not in ("zarr.json", ".zgroup", ".zattrs"), subkeys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused here. Don't we want to be looking for .zarray
to find v2 arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if foo/bar/
is the prefix for a group, then we might expect to find either v2 group stuff (foo/bar/.zattrs
, foo/bar/.zgroup
) or v3 group stuff (foo/bar/zarr.json
), but foo/bar/.zarray
would be unexpected, since we know foo/bar
is a group, and the same path can't correspond to both a group and an array.
src/zarr/v3/group.py
Outdated
) | ||
|
||
raise ValueError(msg) | ||
subkeys = await self.store_path.store.list_dir(self.store_path.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ideal async way to do this would be for list_dir
to return an AsyncGenerator
. Then your for loop below would look something like:
async for key in subkeys:
# filter
if key in ("zarr.json", ".zgroup", ".zattrs"):
...
yield await self.getitem(subkey)
...
…(name, AsyncArray / AsyncGroup) pairs; Group.members repackages these into a dict.
…or group, largely to appease mypy
def _parse_async_node(node: AsyncGroup) -> Group: ... | ||
|
||
|
||
def _parse_async_node(node: AsyncArray | AsyncGroup) -> Array | Group: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this exists as something that can be used with map
over an iterable of AsyncArray
/ AsyncGroup
to transform each one to its synchronous counterpart. Previously we used a tuple around a generator expression, but mypy couldn't type check this correctly.
return tuple(ret) | ||
_members = self._sync_iter(self._async_group.members()) | ||
|
||
result = tuple(map(lambda kv: (kv[0], _parse_async_node(kv[1])), _members)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -88,4 +88,4 @@ async def list_dir(self, prefix: str) -> AsyncGenerator[str, None]: | |||
else: | |||
for key in self._store_dict: | |||
if key.startswith(prefix + "/") and key != prefix: | |||
yield key.strip(prefix + "/").split("/")[0] | |||
yield key.removeprefix(prefix + "/").split("/")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fixed a subtle bug in the list_dir
method of MemoryStore
. str.strip
treats its argument as a set of characters to remove, not a substring / prefix. e.g.,
>>> 'a/b/a/b/b/x'.strip('a/b')
'x'
vs
>>> 'a/b/a/b/b/x'.removeprefix('a/b')
'/a/b/b/x'
I think this is ready to go |
) | ||
|
||
async def create_array(self, path: str, **kwargs) -> AsyncArray: | ||
async def create_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group.create_group
and group.create_array
should preserve the zarr format of the group
, which means we can't accept **kwargs
in these methods without checking the contents of **kwargs for the absence of zarr_format
. Since I'm not a huge fan of **kwargs
anyway, I just elevated the underlying function signatures to this method, sans zarr_format
, which is set automatically based on the zarr format of the group
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to redo that in #1857
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great @d-v-b. A few small comments for you.
src/zarr/fixture/.zgroup
Outdated
@@ -0,0 +1,3 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was added in error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right, this is the zombie fixture directory that gets created by tests, see #1813
tests/v3/test_storage.py
Outdated
@@ -1,10 +1,802 @@ | |||
# import array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a bad merge conflict resolution here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good guess, honestly I have no idea what happened there. it should be fixed now.
one large, but minor (or petty) change in the latest version of this effort: I renamed |
Adds tests for the Group API in v3. In progress. I will remove the in progress label when it's ready for a final review, but I encourage people with an interest in the testing infrastructure to look at the early changes and weigh in if you see anything you don't like. It will be easier to course-correct before a lot of code is written.
To outline the strategy I'm using:
This is a deliberate departure from the strategy used in the v2 tests, where test classes are defined that inherit from a base test class, potentially overriding methods. I'm pretty sure we can use pytest fixtures to express the same functionality without needing the cognitive overhead of inheritance, but this is my personal judgment, and I'm open to other ideas here. The goal is just to have a test suite that's easy for developers to understand, which (in my experience) is not a criterion the v2 tests always meet.
I don't think I will fix failing tests here. That should probably done in separate PRs, once we agree on the shape of the testing infrastructure.edit: amendment to this: since test failures do slow down the creation of new tests, I'm going to apply some fixes to obvious bugs revealed in the process of adding tests. These fixes should be separable in the commit history, and on the basis of the files they affect.depends on #1726
TODO: