Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raster index #846

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Add raster index #846

wants to merge 13 commits into from

Conversation

benbovy
Copy link

@benbovy benbovy commented Feb 20, 2025

  • Tests added
  • Fully documented, including docs/history.rst for all changes and docs/rioxarray.rst for new API

This work in progress PR adds a custom RasterIndex Xarray index that is built on top of pydata/xarray#9543 (available in Xarray's main branch).

  • Opt-in feature via rioxarray.set_option(use_raster_index=True)
    • Alternatively or in addition we could expose this option as an IO backend option. The global option is handy for the other places where new spatial coordinates are generated (e.g., reproject).
  • RasterIndex should work with both x/y independent axis 1D coordinates (rectilinear affine transform with no rotation) vs. x/y dependent axis 2D coordinates. In the latter case data selection is more limited.
  • The spatial coordinates freshly generated with a RasterIndex are lazy (in both 1D and 2D cases): they wrap the affine transform.
  • There are still quite a few things to implement and test.
  • I skipped implementing CRS-aware functionality here (Experimentally support CRSIndex #588) as I'm focusing on the coordinate transform (also this PR is already getting big), but I guess this could be added later on top of this.

TODO on the Xarray Index API implementation (could also be done after this PR, if needed):

  • RasterIndex.from_variables: compute the affine transformation from explicit coordinate values, although maybe not very useful?
  • RasterIndex.concat: possible to compute a new affine transformations by combining the transformations of multiple raster indexes?
  • RasterIndex.join: like for concat we could perhaps combine the transformations in some simple cases (check that the two affine transforms are compatible)?
  • RasterIndex.reindex_like: maybe this could be implemented using a forward transformation of the current index chained with a reverse transformation of the other index?
  • RasterIndex.rename: must propagate the new coordinate and/or dimension names to the underlying AffineTransform objects!
  • RasterIndex.copy (not sure we need to override Index.copy)
  • RasterIndex.__getitem__ (not sure we really need to implement it)

Here is an example notebook that I'll update as I continue working on this PR.

There are a few issues that we need to fix in Xarray so this PR can work properly (I've added a few temporary workarounds when possible):

cc @scottyhq @dcherian @maxrjones

Based on coordinate transform examples copied and adapted from
pydata/xarray#9543.
Use a mapping of coordinate name(s) to underlying index(es) behind the
hood.
I encountered pre-commit/pre-commit#1375 as I
didn't have python 3.10 installed in my environment so pre-commit
couldn't find it.

I applied this suggestion:
pre-commit/pre-commit#1375 (comment).
Create a RasterIndex if option use_raster_index is True.
return False

return all(
index.equals(other._wrapped_indexes[k]) # type: ignore[arg-type]
Copy link
Author

@benbovy benbovy Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to relax the typing constraints of xarray.core.indexes.CoordinateTransformIndex.equals() (the other argument should be any xarray.Index) so we can remove the # type: ignore here.

from xarray.core.indexing import IndexSelResult, merge_sel_results


class AffineTransform(CoordinateTransform):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Affine* classes should live in Xarray. They are useful for both geo and bio-imaging domains at least (and possibly astro too).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree these are useful in many domains! The wrapped affine.Affine object is here restricted to 2D affine transformation of the plane so it might be limited for general usage, though. These classes are also thin wrappers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are "thin" in code but 'deep' in understanding of the Xarray model. That's why I think we should vendor it. Even so, vendoring will save a bunch of your time as you iterate on this since you wouldn't need to keep code in two places in-sync.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the whole content of raster_index.py including the RasterIndex class could (eventually) be moved into Xarray or some dedicated package? There's nothing IO specific and it could be useful in other downstream packages like odc-geo, geoxarray, etc.

self.axis_transform = transform
self.dim = transform.dim

def isel( # type: ignore[override]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again here we should make the return type of xarray.core.Index.isel() more permissive (i.e., any Index object or None) so we can remove the # type: ignore comment.

Comment on lines +377 to +382
if new_indexes:
# TODO: if there's only a single PandasIndex can we just return it?
# (maybe better to keep it wrapped if we plan to later make RasterIndex CRS-aware)
return RasterIndex(new_indexes)
else:
return None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it is possible, it would be convenient to return a PandasIndex instead of a RasterIndex for reusing the selection results in further Xarray operations that require alignment with other Xarray objects. The reason is that Xarray alignment is based on strict Index types, e.g., it cannot compare or align a RasterIndex (encapsulating a single PandasIndex) with another PandasIndex.

However, we can only return a PandasIndex in a limited number of cases, e.g.,

  • it is possible for da.isel(x=[...], y=0) (or da.isel(x=0, y=[...])) to return a "x" (or "y") coordinate with a PandasIndex and to drop the index for the "y" (or "x") scalar coordinate.
  • it is not possible for da.isel(x=[...], y=[...]) to return both "x" and "y" coordinates each with a PandasIndex. This is because xarray.Index.isel() doesn't currently allow returning multiple index objects.

Returning a PandasIndex in some cases and a RasterIndex in other cases would make the behavior complex and pretty confusing.

Alternatively, we could always return a RasterIndex for both "x" and "y" coordinates even when selecting it with scalar values. This would provide simple behavior and might facilitate CRS-aware operations even with scalar spatial coordinates, but this would be even less Xarray-alignment friendly.

So in short: it is not straightforward to choose the right behavior here. It is a trade-off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants