-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose sample on the rioxarray accessor #800
Comments
I would be interested to know if this method is any faster. If it is, that may enable us to add def sample(self, *args, **kwargs):
try:
return self._manager.acquire().sample(*args, **kwargs)
except AttributeError:
pass
with MemoryFile() as memfile:
self._xds.rio.to_raster(memfile.name)
with memfile.open() as dst:
return dst.sample(*args, **kwargs) |
Using the DTM from the notebook above, the trip via It seems that exposing rasterio's sample is not a feasible option. Thanks anyway! |
These observations are also useful to be aware of: xarray-contrib/xvec#81 (comment) Thanks for the proposal! |
I just looked into this a little bit, and notice that selecting points via xarray first loads the all pixels a buffered envelope around all the points, rather than the import logging
import rasterio
import xarray as xr
logging.basicConfig(level=logging.DEBUG,
handlers=[logging.StreamHandler()]
)
# Size is 1296001, 417601
url = 'https://opentopography.s3.sdsc.edu/raster/SRTM_GL1_Ellip/SRTM_GL1_Ellip_srtm.vrt'
ENV = rasterio.Env(GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR',
CPL_DEBUG=True,
)
# Sample a few points spread far apart
# https://docs.xarray.dev/en/latest/user-guide/interpolation.html#advanced-interpolation
points = [(-122.33, 47.60), (-74.00, 40.71), (-95.37, 29.76)]
xi = xr.DataArray([p[0] for p in points], dims="z")
yi = xr.DataArray([p[1] for p in points], dims="z")
with rasterio.open(url) as src:
samples_rasterio = list(rasterio.sample.sample_gen(src, points))
# IO window xoff=304668.0 yoff=108864.0 width=1.0 height=1.0
# IO window xoff=381600.0 yoff=69444.0 width=1.0 height=1.0
# IO window xoff=304668.0 yoff=108864.0 width=1.0 height=1.0
# NOTE: xarray tries to load a large buffered window arround all points
# This will likely crash, requiring too much memory
daOT = xr.open_dataarray(url, engine='rasterio').squeeze()
samples_xarray = daOT.sel(x=xi, y=yi, method='nearest').compute()
# IO window xoff=207612.0 yoff=44640.0 width=173989.0 height=64225.0 @benbovy, do you know if there is a way to make xarray only sample neighborhoods around points? Or is this behavior unique to rioxarray? I admit I haven't looked at the code close enough to know where that envelope window is coming from... |
I think it would be nice to have a built-in function to make this simple and efficient with rioxarray (or xvec) as it's a common use-case. The workaround I'm currently using is to compute windows ahead of time and use # Loop over custom windows
# NOTE: could be various sizes to enable bilinear interpolation etc
# e.g. https://github.com/kylebarron/demquery/blob/5871ad22541c7645b456a8fb998dc2be57b5ef91/demquery/demquery.py#L43
from rasterio.windows import Window
values = []
with rasterio.open(url) as src:
for point in points:
row, col = src.index(point[0], point[1])
window = Window(col, row, 1, 1)
values.append(daOT.rio.isel_window(window).to_numpy()) Edit: actually, it's more straightforward to just loop & stick with da.sel (single points trigger IO windows with width=1.0 height=1.0): pts_da = np.zeros_like(xi)
for i,p in enumerate(points):
pts_da[i] = dtm_da.sel(x=p[0], y=p[1], method="nearest").squeeze().to_numpy() |
This seems to be handled by Line 291 in 4745aec
More specifically Maybe |
Line 457 in 4745aec
You'll have to chnage this to |
Is it (already) recommended for 3rd-party backend array classes to migrate to |
no anderson and my work is pending a review from Stephan :( |
I was recently comparing the performance of point sampling from a raster in xvec (xarray-contrib/xvec#81) and learned that when using xarray's
sel
, the sampling is about 40x slower than if I use rasterio'ssample
method (see the notebook in the linked issue).I can possibly use
sample
from the rasterio object available within therio._manager
aslist(dtm_da.rio._manager.acquire().sample(list(zip(x, y))))
but that is using a private API of rioxarray.Would it be in scope to expose
sample
directly on therio
accessor? We could then consume it in xvec when dealing with rioxarray-backed DataArrays.The text was updated successfully, but these errors were encountered: