-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Provide protocols for creating structural subtypes of DataArray/Dataset #6462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can't comment on the implementation, but this is certainly a feature I would appreciate. I have been using xarray a lot recently, and have found myself frequently writing functions that map Datasets to Datasets. The ability to encode the expected schemas in type annotations would be very valuable for correctness and documentation. |
Sorry not to respond for a few days. I think this is a good idea if there's demand for it. It would require a small amount of maintenance as the dataset implementation changes, but very manageable. FWIW I frequently find myself having a ds being passing around in an app, and it would be awesome to be able to define and statically check the data variables. (We've discussed also being able to define and statically check the dimensions, that's a separate issue though) |
I'm considering leveraging the power of xarray to greatly simplify a codebase that has its own types that essentially implement a very poor version of xarray's functionality. However to be able to justify integrating it into a large codebase with multiple developers, type hints for linting, autocomplete, and (possibly) static type checking are completely non-optional. Adding this functionality to xarray would make it a shoo-in, and I believe the approach suggested by @rsokl is probably the best. |
Does this package fulfill this requirement? |
This looked promising at a glance, but I think they have gotten the design wrong in very bad ways. Consider the following: from dataclasses import dataclass
from xarray_dataclasses import AsDataArra
@dataclass
class Image(AsDataArray):
"""2D image as DataArray."""
...
def process_image(x: Image):
... If you use img = Image.ones((3, 3))
process_image(img) # pyright error: "DataArray" is incompatible with "Image" "No problem, I'll just >>> isinstance(img, Image)
False WHAT!? So... we have this nice, structured Also the very first example in their readme doesn't type check: Ultimately |
Is your feature request related to a problem?
I frequently find myself wanting to annotate functions in terms of xarray objects that adhere to a particular schema. Given that a dataset's adherence to a schema is a matter of its structure/contents, it is unnatural to try to describe a schema as a subtype of
xr.Dataset
(orDataArray
) (i.e. a type-checker ought not care that a dataset is an instance of a specific subclass ofDataset
).Describe the solution you'd like
Instead, it would be ideal to define a schema as a Protocol (structural subtype) of
xr.Dataset
. Unfortunately, one cannot subclass a normal class to create a protocol.Thus, I am proposing that
xarray
provide Protocol-based descriptions ofDataArray
andDataset
so that users can describe schemas as structural subtypes of these classes. E.g.The contents of
DatasetProtocol
would essentially look like a modified type stub forxarray.Dataset
so the implementation details are relatively simple, I believe.Describe alternatives you've considered
Creating a strict subtype of
Dataset
is not ideal for a few reasons:xarray.Dataset
is too broad for describing a schema. E.g. the presence of__getattr__
prevents type checkers from flagging access to non-existent data variables and coordinates during static analysis.DatasetProtocol
would need to be designed to be less permissive than this.Additional context
Hopefully this could be leveraged by the likes of xarray-schema so that xarray schemas can be used to provide both runtime and static validation capabilities.
I'd love to get feedback on this, and would be happy to open a PR if xarray devs are willing to weigh in on the design of these protocols.
The text was updated successfully, but these errors were encountered: