Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new blockwise for large data #707

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

adebardo
Copy link
Contributor

Lots of work to do but we will use this PR to debate tomorrow

@adebardo
Copy link
Contributor Author

image

@rhugonnet
Copy link
Member

rhugonnet commented Mar 29, 2025

Great! 🙂

I went through all the code as we mentioned in our discussion yesterday. All good on the functionality. In terms of the structure and consistency with the rest of the package, here are my remarks (difficulty level of the change between parenthesis):

  1. (Easy) As mentioned in the call, we should likely add plane_fit_ransac as an option of apply(), keeping the behaviour consistent with BlockwiseCoreg otherwise,
  2. (Easy-Moderate) I don't understand why reference_dem and to_be_aligned_dem need to be __init__. The tiling_grid could be defined during fit() as it only relates to reference_dem (and not the coregistration method itself), and the profile saved there too?
  3. (Moderate) After reviewing the functions, it would be a bit of effort but not as difficult as I anticipated to integrate these changes into BlockwiseCoreg directly, and trigger the out-of-memory behaviour by passing a MultiprocConfig to fit() and apply().

For 3.: Why? I realized we actually don't need to mirror the full complexity of #525 for BlockwiseCoreg. This is thanks to BlockwiseCoreg having its own fit()/apply() function independent of Coreg.fit()/apply().
As BlockwiseCoreg splits the input array very early on (and that once we have blocks, we don't need any memory considerations), we don't need as many complex out-of-memory processing steps during the rest of the coregistration, as we need for the generic Coreg.fit()/apply() steps in #525. For instance, ignoring the preprocessing, coreg.NuthKaab() still requires out-of-memory subsampling, then out-of-memory interpolation that re-runs at each iteration of the algorithm depending on the shift iteration.
Here, we'd only need two simple, one-time, out-of-memory steps to work in the very beginning, and that's it! Those are:

In short: We could pass down the MultiprocConfig to the preprocess_fit/apply functions, and define filenames in MultiprocConfig for these two "break points" (that will create a file for the "reprojected_dem" and "inlier_mask"). Then we just continue the rest of the coregistration with these two files!

So, in short, I think 1-2. shouldn't be a problem.
For 3., it's a bit more effort but not as much as initially thought, and would be ideal to avoid splitting into two classes. I'm happy to help out on this to finalize it in time 😉

Finally, I also have one question: Is there any core difference between our interp_points() and resample_dem()?
We have a grid resample wrapper based on interp_points() (tested to match exactly GDAL.warp in a same CRS), which would be easy to run with Multiprocessing:

def _reproject_horizontal_shift_samecrs(

We generally prefer to use this interp_points() implementation for consistency due to:

  • Its integrated nodata propagation scheme depending on the resampling, to avoid unexpected outliers,
  • Its understanding of pixel interpretation (center or corner of pixel), to avoid unexpected shifts,
  • It is the resamping used by the coregistration algorithms, so it ensures the resulting DEM alignment is fully consistent (otherwise, with a different resampling here, if we re-ran the coregistration algorithm, we would likely find another shift again instead of (0,0,0)).

Some issues I mention above sound like the ones you described in the call, so this might part of the solution.
In terms of performance of our resampling: The default "bilinear" sampling relies on scipy.ndimage.map_coordinates (to harness the regular-grid nature of a raster = X and Y axis have equal spacing), and therefore is much faster than Xarray-SciPy's interpn (that works on a rectilinear-grid = X and Y can have any coordinates).
It's likely not as fast as cars-resample, but shouldn't be too far off 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants