Skip to content

groundwork for V3 group tests #1743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 63 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
e492be2
feat: functional .children method for groups
d-v-b Mar 26, 2024
b7f66c7
changes necessary for correctly generating list of children
d-v-b Mar 27, 2024
c7b333a
add stand-alone test for group.children
d-v-b Mar 27, 2024
a64b342
give type hints a glow-up
d-v-b Mar 27, 2024
3d11fc0
test: use separate assert statements to avoid platform-dependent orde…
d-v-b Mar 27, 2024
cf34afc
test: put fixtures in conftest, add MemoryStore fixture
d-v-b Mar 27, 2024
16cb226
docs: release notes
d-v-b Mar 27, 2024
b28eaee
test: remove prematurely-added mock s3 fixture
d-v-b Mar 30, 2024
0215b97
chore: move v3 tests into v3 folder
d-v-b Apr 4, 2024
e33ca6e
chore: type hints
d-v-b Apr 4, 2024
f237172
test: add schema for group method tests
d-v-b Apr 4, 2024
4605446
chore: add type for zarr_formats
d-v-b Apr 4, 2024
6e5ded9
chore: remove localstore for now
d-v-b Apr 4, 2024
f81b6f3
test: add __init__.py to support imports from top-level conftest.py, …
d-v-b Apr 4, 2024
c768ca8
fix: return valid JSON from GroupMetadata.to_bytes for v2 metadata
d-v-b Apr 4, 2024
c405506
fix: don't use a type as a value
d-v-b Apr 4, 2024
5b96554
test: add getitem test
d-v-b Apr 4, 2024
d77d55f
fix: replace reference to nonexistent method in with , which does e…
d-v-b Apr 5, 2024
05f89c6
test: declare v3ness via directory structure, not test file name
d-v-b Apr 5, 2024
b2f9bee
add a docstring to _get, and pass auto_mkdir to _put
d-v-b Apr 5, 2024
88f4c46
fix: add docstring to LocalStore.get_partial_values; adjust body of L…
d-v-b Apr 5, 2024
624ff77
test: add tests for localstore init, set, get, get_partial
d-v-b Apr 5, 2024
b762fa4
fix: Rename children to members; AsyncGroup.members yields tuples of …
d-v-b Apr 11, 2024
5574226
fix: make Group.members return a tuple of str, Array | Group pairs
d-v-b Apr 11, 2024
bcd5c7d
Merge branch 'v3' of github.com:zarr-developers/zarr-python into grou…
d-v-b Apr 11, 2024
d634cbf
fix: revert changes to synchronization code; this is churn that we ne…
d-v-b Apr 11, 2024
9b9d146
Merge branch 'v3' into group_children_fix
d-v-b Apr 12, 2024
d264f71
chore: move v3 tests into v3 folder
d-v-b Apr 4, 2024
0741bed
chore: type hints
d-v-b Apr 12, 2024
eb8a535
test: add schema for group method tests
d-v-b Apr 4, 2024
ee2e233
chore: add type for zarr_formats
d-v-b Apr 4, 2024
01eec6f
chore: remove localstore for now
d-v-b Apr 4, 2024
acae77a
test: add __init__.py to support imports from top-level conftest.py, …
d-v-b Apr 4, 2024
ebe1548
fix: return valid JSON from GroupMetadata.to_bytes for v2 metadata
d-v-b Apr 4, 2024
3dce5e3
fix: don't use a type as a value
d-v-b Apr 4, 2024
7f82fdf
test: add getitem test
d-v-b Apr 4, 2024
1655ff8
fix: replace reference to nonexistent method in with , which does e…
d-v-b Apr 5, 2024
e8514b1
test: declare v3ness via directory structure, not test file name
d-v-b Apr 5, 2024
dacacc8
add a docstring to _get, and pass auto_mkdir to _put
d-v-b Apr 5, 2024
5d2a532
fix: add docstring to LocalStore.get_partial_values; adjust body of L…
d-v-b Apr 5, 2024
06d8b04
test: add tests for localstore init, set, get, get_partial
d-v-b Apr 5, 2024
d8749de
fix: remove pre-emptive fetching from group.open
d-v-b Apr 16, 2024
a2fe4e0
Merge branch 'v3_group_tests' of github.com:d-v-b/zarr-python into v3…
d-v-b Apr 16, 2024
eed03c8
fix: use removeprefix (removes a substring) instead of strip (removes…
d-v-b Apr 16, 2024
75f75b1
xfail v2 tests that are sure to fail; add delitem tests; partition xf…
d-v-b Apr 16, 2024
8a14e3b
fix: handle byte_range[0] being None
d-v-b Apr 16, 2024
459bb42
fix: adjust test for localstore.get to check that get on nonexistent …
d-v-b Apr 16, 2024
8ef3fec
fix: add zarr_format parameter to array creation routines (which rais…
d-v-b Apr 17, 2024
b5a7698
test: add group init test
d-v-b Apr 17, 2024
49f1505
feature(store): make list_* methods async generators (#110)
jhamman Apr 19, 2024
c62e686
Merge branch 'v3' of github.com:zarr-developers/zarr-python into v3_g…
d-v-b May 8, 2024
b1767ce
Merge branch 'v3' of github.com:zarr-developers/zarr-python into v3_g…
d-v-b May 10, 2024
8e61ff9
Merge branch 'v3' of github.com:zarr-developers/zarr-python into v3_g…
d-v-b May 10, 2024
b996aff
fix: define utility for converting asyncarray to array, and similar f…
d-v-b May 10, 2024
923fc18
chore: remove checks that only existed because of implicit groups
d-v-b May 10, 2024
7c6cfb4
chore: clean up docstring and modernize some type hints
d-v-b May 10, 2024
17997f8
chore: move imports to top-level
d-v-b May 10, 2024
1140b5c
Merge branch 'v3' of github.com:zarr-developers/zarr-python into v3_g…
d-v-b May 14, 2024
22c0889
remove fixture files
d-v-b May 15, 2024
57f564f
remove commented imports
d-v-b May 15, 2024
a821808
remove explicit asyncio marks; use __eq__ method of LocalStore for test
d-v-b May 15, 2024
6803f26
rename test_storage to test_store
d-v-b May 15, 2024
d236b24
modern type hints
d-v-b May 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/zarr/abc/store.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from abc import abstractmethod, ABC

from collections.abc import AsyncGenerator

from typing import List, Tuple, Optional


Expand Down
2 changes: 2 additions & 0 deletions src/zarr/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
ChunkCoords,
Selection,
SliceSelection,
ZarrFormat,
concurrent_map,
)
from zarr.config import config
Expand Down Expand Up @@ -89,6 +90,7 @@ async def create(
dimension_names: Optional[Iterable[str]] = None,
attributes: Optional[Dict[str, Any]] = None,
exists_ok: bool = False,
zarr_format: ZarrFormat = 3,
) -> AsyncArray:
store_path = make_store_path(store)
if not exists_ok:
Expand Down
29 changes: 14 additions & 15 deletions src/zarr/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@
Union,
Tuple,
Iterable,
Dict,
List,
TypeVar,
overload,
Any,
Expand All @@ -18,7 +16,7 @@
import functools

if TYPE_CHECKING:
from typing import Any, Awaitable, Callable, Iterator, Optional, Type
from typing import Awaitable, Callable, Iterator, Optional, Type

import numpy as np

Expand All @@ -27,25 +25,26 @@
ZGROUP_JSON = ".zgroup"
ZATTRS_JSON = ".zattrs"

BytesLike = Union[bytes, bytearray, memoryview]
ChunkCoords = Tuple[int, ...]
BytesLike = bytes | bytearray | memoryview
ChunkCoords = tuple[int, ...]
ChunkCoordsLike = Iterable[int]
SliceSelection = Tuple[slice, ...]
Selection = Union[slice, SliceSelection]
JSON = Union[str, None, int, float, Enum, Dict[str, "JSON"], List["JSON"], Tuple["JSON", ...]]
SliceSelection = tuple[slice, ...]
Selection = slice | SliceSelection
ZarrFormat = Literal[2, 3]
JSON = Union[str, None, int, float, Enum, dict[str, "JSON"], list["JSON"], tuple["JSON", ...]]


def product(tup: ChunkCoords) -> int:
return functools.reduce(lambda x, y: x * y, tup, 1)


T = TypeVar("T", bound=Tuple[Any, ...])
T = TypeVar("T", bound=tuple[Any, ...])
V = TypeVar("V")


async def concurrent_map(
items: List[T], func: Callable[..., Awaitable[V]], limit: Optional[int] = None
) -> List[V]:
items: list[T], func: Callable[..., Awaitable[V]], limit: Optional[int] = None
) -> list[V]:
if limit is None:
return await asyncio.gather(*[func(*item) for item in items])

Expand Down Expand Up @@ -127,18 +126,18 @@ def parse_configuration(data: JSON) -> JSON:
@overload
def parse_named_configuration(
data: JSON, expected_name: Optional[str] = None
) -> Tuple[str, Dict[str, JSON]]: ...
) -> tuple[str, dict[str, JSON]]: ...


@overload
def parse_named_configuration(
data: JSON, expected_name: Optional[str] = None, *, require_configuration: bool = True
) -> Tuple[str, Optional[Dict[str, JSON]]]: ...
) -> tuple[str, Optional[dict[str, JSON]]]: ...


def parse_named_configuration(
data: JSON, expected_name: Optional[str] = None, *, require_configuration: bool = True
) -> Tuple[str, Optional[JSON]]:
) -> tuple[str, Optional[JSON]]:
if not isinstance(data, dict):
raise TypeError(f"Expected dict, got {type(data)}")
if "name" not in data:
Expand All @@ -153,7 +152,7 @@ def parse_named_configuration(
return name_parsed, configuration_parsed


def parse_shapelike(data: Any) -> Tuple[int, ...]:
def parse_shapelike(data: Any) -> tuple[int, ...]:
if not isinstance(data, Iterable):
raise TypeError(f"Expected an iterable. Got {data} instead.")
data_tuple = tuple(data)
Expand Down
95 changes: 65 additions & 30 deletions src/zarr/group.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,19 @@
import asyncio
import json
import logging
import numpy.typing as npt

if TYPE_CHECKING:
from typing import (
Any,
AsyncGenerator,
Literal,
AsyncIterator,
)
from typing import Any, AsyncGenerator, Literal, Iterable
from zarr.abc.codec import Codec
from zarr.abc.metadata import Metadata

from zarr.array import AsyncArray, Array
from zarr.attributes import Attributes
from zarr.common import ZARR_JSON, ZARRAY_JSON, ZATTRS_JSON, ZGROUP_JSON
from zarr.common import ZARR_JSON, ZARRAY_JSON, ZATTRS_JSON, ZGROUP_JSON, ChunkCoords
from zarr.store import StoreLike, StorePath, make_store_path
from zarr.sync import SyncMixin, sync
from typing import overload

logger = logging.getLogger("zarr.group")

Expand All @@ -41,6 +39,26 @@ def parse_attributes(data: Any) -> dict[str, Any]:
raise TypeError(msg)


@overload
def _parse_async_node(node: AsyncArray) -> Array: ...


@overload
def _parse_async_node(node: AsyncGroup) -> Group: ...


def _parse_async_node(node: AsyncArray | AsyncGroup) -> Array | Group:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this exists as something that can be used with map over an iterable of AsyncArray / AsyncGroup to transform each one to its synchronous counterpart. Previously we used a tuple around a generator expression, but mypy couldn't type check this correctly.

"""
Wrap an AsyncArray in an Array, or an AsyncGroup in a Group.
"""
if isinstance(node, AsyncArray):
return Array(node)
elif isinstance(node, AsyncGroup):
return Group(node)
else:
assert False


@dataclass(frozen=True)
class GroupMetadata(Metadata):
attributes: dict[str, Any] = field(default_factory=dict)
Expand All @@ -53,7 +71,7 @@ def to_bytes(self) -> dict[str, bytes]:
return {ZARR_JSON: json.dumps(self.to_dict()).encode()}
else:
return {
ZGROUP_JSON: json.dumps({"zarr_format": 2}).encode(),
ZGROUP_JSON: json.dumps({"zarr_format": self.zarr_format}).encode(),
ZATTRS_JSON: json.dumps(self.attributes).encode(),
}

Expand Down Expand Up @@ -113,11 +131,11 @@ async def open(
(store_path / ZGROUP_JSON).get(), (store_path / ZATTRS_JSON).get()
)
if zgroup_bytes is None:
raise KeyError(store_path) # filenotfounderror?
raise FileNotFoundError(store_path)
elif zarr_format == 3:
zarr_json_bytes = await (store_path / ZARR_JSON).get()
if zarr_json_bytes is None:
raise KeyError(store_path) # filenotfounderror?
raise FileNotFoundError(store_path)
elif zarr_format is None:
zarr_json_bytes, zgroup_bytes, zattrs_bytes = await asyncio.gather(
(store_path / ZARR_JSON).get(),
Expand Down Expand Up @@ -168,17 +186,14 @@ async def getitem(
key: str,
) -> AsyncArray | AsyncGroup:
store_path = self.store_path / key
logger.warning("key=%s, store_path=%s", key, store_path)

# Note:
# in zarr-python v2, we first check if `key` references an Array, else if `key` references
# a group,using standalone `contains_array` and `contains_group` functions. These functions
# are reusable, but for v3 they would perform redundant I/O operations.
# Not clear how much of that strategy we want to keep here.

# if `key` names an object in storage, it cannot be an array or group
if await store_path.exists():
raise KeyError(key)

if self.metadata.zarr_format == 3:
zarr_json_bytes = await (store_path / ZARR_JSON).get()
if zarr_json_bytes is None:
Expand Down Expand Up @@ -248,16 +263,42 @@ def attrs(self):
def info(self):
return self.metadata.info

async def create_group(self, path: str, **kwargs) -> AsyncGroup:
async def create_group(
self, path: str, exists_ok: bool = False, attributes: dict[str, Any] = {}
) -> AsyncGroup:
return await type(self).create(
self.store_path / path,
**kwargs,
attributes=attributes,
exists_ok=exists_ok,
zarr_format=self.metadata.zarr_format,
)

async def create_array(self, path: str, **kwargs) -> AsyncArray:
async def create_array(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group.create_group and group.create_array should preserve the zarr format of the group, which means we can't accept **kwargs in these methods without checking the contents of **kwargs for the absence of zarr_format. Since I'm not a huge fan of **kwargs anyway, I just elevated the underlying function signatures to this method, sans zarr_format, which is set automatically based on the zarr format of the group.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to redo that in #1857

self,
path: str,
shape: ChunkCoords,
dtype: npt.DTypeLike,
chunk_shape: ChunkCoords,
fill_value: Any | None = None,
chunk_key_encoding: tuple[Literal["default"], Literal[".", "/"]]
| tuple[Literal["v2"], Literal[".", "/"]] = ("default", "/"),
codecs: Iterable[Codec | dict[str, Any]] | None = None,
dimension_names: Iterable[str] | None = None,
attributes: dict[str, Any] | None = None,
exists_ok: bool = False,
) -> AsyncArray:
return await AsyncArray.create(
self.store_path / path,
**kwargs,
shape=shape,
dtype=dtype,
chunk_shape=chunk_shape,
fill_value=fill_value,
chunk_key_encoding=chunk_key_encoding,
codecs=codecs,
dimension_names=dimension_names,
attributes=attributes,
exists_ok=exists_ok,
zarr_format=self.metadata.zarr_format,
)

async def update_attributes(self, new_attributes: dict[str, Any]):
Expand Down Expand Up @@ -348,7 +389,7 @@ async def array_keys(self) -> AsyncGenerator[str, None]:
yield key

# todo: decide if this method should be separate from `array_keys`
async def arrays(self) -> AsyncIterator[AsyncArray]:
async def arrays(self) -> AsyncGenerator[AsyncArray, None]:
async for key, value in self.members():
if isinstance(value, AsyncArray):
yield value
Expand Down Expand Up @@ -472,19 +513,13 @@ def nmembers(self) -> int:
@property
def members(self) -> tuple[tuple[str, Array | Group], ...]:
"""
Return the sub-arrays and sub-groups of this group as a `tuple` of (name, array | group)
Return the sub-arrays and sub-groups of this group as a tuple of (name, array | group)
pairs
"""
_members: list[tuple[str, AsyncArray | AsyncGroup]] = self._sync_iter(
self._async_group.members()
)
ret: list[tuple[str, Array | Group]] = []
for key, value in _members:
if isinstance(value, AsyncArray):
ret.append((key, Array(value)))
else:
ret.append((key, Group(value)))
return tuple(ret)
_members = self._sync_iter(self._async_group.members())

result = tuple(map(lambda kv: (kv[0], _parse_async_node(kv[1])), _members))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return result

def __contains__(self, member) -> bool:
return self._sync(self._async_group.contains(member))
Expand Down
23 changes: 12 additions & 11 deletions src/zarr/store/local.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,20 @@
import shutil
from collections.abc import AsyncGenerator
from pathlib import Path
from typing import Union, Optional, List, Tuple

from zarr.abc.store import Store
from zarr.common import BytesLike, concurrent_map, to_thread


def _get(path: Path, byte_range: Optional[Tuple[int, Optional[int]]] = None) -> bytes:
def _get(path: Path, byte_range: tuple[int, int | None] | None) -> bytes:
"""
Fetch a contiguous region of bytes from a file.

Parameters
----------
path: Path
The file to read bytes from.
byte_range: Optional[Tuple[int, Optional[int]]] = None
byte_range: tuple[int, int | None] | None = None
The range of bytes to read. If `byte_range` is `None`, then the entire file will be read.
If `byte_range` is a tuple, the first value specifies the index of the first byte to read,
and the second value specifies the total number of bytes to read. If the total value is
Expand Down Expand Up @@ -49,7 +49,7 @@ def _get(path: Path, byte_range: Optional[Tuple[int, Optional[int]]] = None) ->
def _put(
path: Path,
value: BytesLike,
start: Optional[int] = None,
start: int | None = None,
auto_mkdir: bool = True,
) -> int | None:
if auto_mkdir:
Expand All @@ -71,7 +71,7 @@ class LocalStore(Store):
root: Path
auto_mkdir: bool

def __init__(self, root: Union[Path, str], auto_mkdir: bool = True):
def __init__(self, root: Path | str, auto_mkdir: bool = True):
if isinstance(root, str):
root = Path(root)
assert isinstance(root, Path)
Expand All @@ -88,9 +88,7 @@ def __repr__(self) -> str:
def __eq__(self, other: object) -> bool:
return isinstance(other, type(self)) and self.root == other.root

async def get(
self, key: str, byte_range: Optional[Tuple[int, Optional[int]]] = None
) -> Optional[bytes]:
async def get(self, key: str, byte_range: tuple[int, int | None] | None = None) -> bytes | None:
assert isinstance(key, str)
path = self.root / key

Expand All @@ -100,8 +98,8 @@ async def get(
return None

async def get_partial_values(
self, key_ranges: List[Tuple[str, Tuple[int, int]]]
) -> List[Optional[bytes]]:
self, key_ranges: list[tuple[str, tuple[int, int]]]
) -> list[bytes | None]:
"""
Read byte ranges from multiple keys.
Parameters
Expand All @@ -124,7 +122,7 @@ async def set(self, key: str, value: BytesLike) -> None:
path = self.root / key
await to_thread(_put, path, value, auto_mkdir=self.auto_mkdir)

async def set_partial_values(self, key_start_values: List[Tuple[str, int, bytes]]) -> None:
async def set_partial_values(self, key_start_values: list[tuple[str, int, bytes]]) -> None:
args = []
for key, start, value in key_start_values:
assert isinstance(key, str)
Expand Down Expand Up @@ -169,6 +167,9 @@ async def list_prefix(self, prefix: str) -> AsyncGenerator[str, None]:
-------
AsyncGenerator[str, None]
"""
for p in (self.root / prefix).rglob("*"):
if p.is_file():
yield str(p)

to_strip = str(self.root) + "/"
for p in (self.root / prefix).rglob("*"):
Expand Down
2 changes: 1 addition & 1 deletion src/zarr/store/memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,4 @@ async def list_dir(self, prefix: str) -> AsyncGenerator[str, None]:
else:
for key in self._store_dict:
if key.startswith(prefix + "/") and key != prefix:
yield key.strip(prefix + "/").split("/")[0]
yield key.removeprefix(prefix + "/").split("/")[0]
Copy link
Contributor Author

@d-v-b d-v-b May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fixed a subtle bug in the list_dir method of MemoryStore. str.strip treats its argument as a set of characters to remove, not a substring / prefix. e.g.,

>>> 'a/b/a/b/b/x'.strip('a/b')
'x'

vs

>>> 'a/b/a/b/b/x'.removeprefix('a/b')
'/a/b/b/x'

2 changes: 1 addition & 1 deletion tests/v2/conftest.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import pathlib
import pytest
import pathlib


@pytest.fixture(params=[str, pathlib.Path])
Expand Down
Empty file added tests/v3/__init__.py
Empty file.
Loading
Loading