-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Getting data CF-compliant for regridding (with xesmf) #718
Comments
I see in xesmf docs that:
Updating this to False doesn't fix the issue. |
The following seems to be working as a work-around:
I think this might indicate that something changed with how xcdat passes information to xesmf (or xesmf changed). |
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/bin/anaconda3/envs/xcdat/lib/python3.12/site-packages/xesmf/frontend.py:75, in _get_lon_lat(ds)
74 try:
---> 75 lon = ds.cf['longitude']
76 lat = ds.cf['latitude'] I've isolated the code block in xESMF where the Also your workaround above might indicate that xCDAT is dropping or not copying over attributes to the lat/lon (which is a regression in behavior from previous versions I think?). |
Looking at the CF info via cf_xarray, both ngrid.cf
Coordinates:
CF Axes: * X: ['lon']
* Y: ['lat']
Z, T: n/a
CF Coordinates: * longitude: ['lon']
* latitude: ['lat']
vertical, time: n/a
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: n/a
Grid Mappings: n/a
Data Variables:
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: X: ['lon_bnds']
Y: ['lat_bnds']
lat: ['lat_bnds']
latitude: ['lat_bnds']
lon: ['lon_bnds']
longitude: ['lon_bnds']
Grid Mappings: n/a ds.cf
Coordinates:
CF Axes: X: ['lon', 'nlon']
Y: ['lat', 'nlat']
* T: ['time']
Z: n/a
CF Coordinates: longitude: ['lon']
latitude: ['lat']
* time: ['time']
vertical: n/a
Cell Measures: area, volume: n/a
Standard Names: latitude: ['lat']
longitude: ['lon']
* time: ['time']
Bounds: n/a
Grid Mappings: n/a
Data Variables:
Cell Measures: area, volume: n/a
Standard Names: surface_downward_mass_flux_of_carbon_dioxide_expressed_as_carbon: ['fgco2']
Bounds: T: ['time_bnds']
X: ['lon_bnds', 'nlon_bnds']
Y: ['lat_bnds']
lat: ['lat_bnds']
latitude: ['lat_bnds']
lon: ['lon_bnds']
longitude: ['lon_bnds']
nlon: ['nlon_bnds']
time: ['time_bnds']
Grid Mappings: n/a |
@tomvothecoder – thank you for digging into this...before you get too far – can you reproduce the error? Maybe the issue is on my end (weird environment and old/new cf-xarray or something)? |
@pochedls I can reproduce your issue with the latest I found the root cause of this issue. The dataset has two X and Y axes ( <xarray.Dataset> Size: 2GB
Dimensions: (time: 3420, nlat: 384, nlon: 320, d2: 2, vertices: 4, bnds: 2)
Coordinates:
lat (nlat, nlon) float64 983kB dask.array<chunksize=(384, 320), meta=np.ndarray>
lon (nlat, nlon) float64 983kB dask.array<chunksize=(384, 320), meta=np.ndarray>
* nlat (nlat) int32 2kB 1 2 3 4 5 6 7 8 ... 378 379 380 381 382 383 384
* nlon (nlon) int32 1kB 1 2 3 4 5 6 7 8 ... 314 315 316 317 318 319 320
* time (time) object 27kB 2015-01-15 13:00:00.000007 ... 2299-12-15 1... The input grid that is extracted (code here) from the dataset consists of ds.nlat.attrs
{'long_name': 'cell index along second dimension', 'units': '1'}
ds.nlon.attrs
{'long_name': 'cell index along first dimension', 'units': '1', 'bounds': 'nlon_bnds'} Meanwhile, ds.lat.attrs
{'axis': 'Y', 'bounds': 'lat_bnds', 'standard_name': 'latitude', 'title': 'Latitude', 'type': 'double', 'units': 'degrees_north', 'valid_max': np.float64(90.0), 'valid_min': np.float64(-90.0)}
ds.lon.attrs
{'axis': 'X', 'bounds': 'lon_bnds', 'standard_name': 'longitude', 'title': 'Longitude', 'type': 'double', 'units': 'degrees_east', 'valid_max': np.float64(360.0), 'valid_min': np.float64(0.0)} Are |
Got it. I think xcdat used to be able to sort this out since this data structure is what is used for unstructured grids (which can be regridded with ESMF). I'll see if I can find examples in which this did work in the past (I know we tested regridding sea ice or sea surface temperature). |
It is hard to trace, but I think that an example dataset that we used to be able to regrid, but is now problematic is this one:
I think @jasonb5 was able to get this working by resolving this issue. |
One data point: the code above works on This version of xesmf and esmpy do not work with |
@pochedls Thank you for checking this. I was trying to build an environment but ran into dependency conflicts. I'll analyze the differences in code between |
Root cause of issueThe dataset in this GitHub issue contains With Coordinates:
lat (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
lon (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
* nlat (nlat) int32 1 2 3 4 5 6 7 8 9 ... 377 378 379 380 381 382 383 384
* nlon (nlon) int32 1 2 3 4 5 6 7 8 9 ... 313 314 315 316 317 318 319 320
* time (time) object 2015-01-15 13:00:00.000007 ... 2299-12-15 12:00:00 Unfortunately, Code Traceback
Possible Solutions
Additional InfoDiff between |
@tomvothecoder @pochedls I'm thinking there's a bug in xcdat/xcdat/regridder/accessor.py Line 108 in b260fcb
xcdat/xcdat/regridder/accessor.py Line 313 in b260fcb
ds.cf["lat"] if name == "Y" else ds.cf["lon"] . This works and builds an input grid that xESMF can work with since lat and lon are in the input grid now.
The last issues is the bounds, the above change gets us a new issue. ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/xesmf/frontend.py:95, in _get_lon_lat_bounds(ds)
94 try:
---> 95 lon_bnds = ds.cf.get_bounds('longitude')
96 lat_bnds = ds.cf.get_bounds('latitude')
File ~/.local/lib/python3.10/site-packages/cf_xarray/accessor.py:2463, in CFDatasetAccessor.get_bounds(self, key)
2462 if not results:
-> 2463 raise KeyError(f"No results found for {key!r}.")
2465 return self._obj[results[0] if len(results) == 1 else results]
KeyError: "No results found for 'longitude'."
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
Cell In[11], line 2
1 import pdb
----> 2 pdb.run("ds_regrid = ds.regridder.horizontal('fgco2', ngrid, tool='xesmf', method='conservative_normed', periodic=True)")
File [~/conda-envs/xcdat-736/lib/python3.10/pdb.py:1607](https://nimbus.llnl.gov/user/jasonb5/lab/tree/conda-envs/xcdat-736/lib/python3.10/pdb.py#line=1606), in run(statement, globals, locals)
1606 def run(statement, globals=None, locals=None):
-> 1607 Pdb().run(statement, globals, locals)
File [~/conda-envs/xcdat-736/lib/python3.10/bdb.py:598](https://nimbus.llnl.gov/user/jasonb5/lab/tree/conda-envs/xcdat-736/lib/python3.10/bdb.py#line=597), in Bdb.run(self, cmd, globals, locals)
596 sys.settrace(self.trace_dispatch)
597 try:
--> 598 exec(cmd, globals, locals)
599 except BdbQuit:
600 pass
File <string>:1
File [~/devel/xcdat/xcdat/regridder/accessor.py:240](https://nimbus.llnl.gov/user/jasonb5/lab/tree/devel/xcdat/xcdat/regridder/accessor.py#line=239), in RegridderAccessor.horizontal(self, data_var, output_grid, tool, **options)
238 input_grid = _get_input_grid(self._ds, data_var, ["X", "Y"])
239 regridder = regrid_tool(input_grid, output_grid, **options)
--> 240 output_ds = regridder.horizontal(data_var, self._ds)
242 return output_ds
File [~/devel/xcdat/xcdat/regridder/xesmf.py:158](https://nimbus.llnl.gov/user/jasonb5/lab/tree/devel/xcdat/xcdat/regridder/xesmf.py#line=157), in XESMFRegridder.horizontal(self, data_var, ds)
153 if input_da is None:
154 raise KeyError(
155 f"The data variable '{data_var}' does not exist in the dataset."
156 )
--> 158 regridder = xe.Regridder(
159 self._input_grid,
160 self._output_grid,
161 method=self._method,
162 **self._extra_options,
163 )
165 output_da = regridder(input_da, keep_attrs=True)
167 output_ds = xr.Dataset({data_var: output_da}, attrs=ds.attrs)
File ~/.local/lib/python3.10/site-packages/xesmf/frontend.py:919, in Regridder.__init__(self, ds_in, ds_out, method, locstream_in, locstream_out, periodic, parallel, **kwargs)
917 grid_in, shape_in, input_dims = ds_to_ESMFlocstream(ds_in)
918 else:
--> 919 grid_in, shape_in, input_dims = ds_to_ESMFgrid(
920 ds_in, need_bounds=need_bounds, periodic=periodic
921 )
922 if locstream_out:
923 grid_out, shape_out, output_dims = ds_to_ESMFlocstream(ds_out)
File ~/.local/lib/python3.10/site-packages/xesmf/frontend.py:167, in ds_to_ESMFgrid(ds, need_bounds, periodic, append)
164 grid = Grid.from_xarray(lon.T, lat.T, periodic=periodic, mask=None)
166 if need_bounds:
--> 167 lon_b, lat_b = _get_lon_lat_bounds(ds)
168 lon_b, lat_b = as_2d_mesh(np.asarray(lon_b), np.asarray(lat_b))
169 add_corner(grid, lon_b.T, lat_b.T)
File ~/.local/lib/python3.10/site-packages/xesmf/frontend.py:100, in _get_lon_lat_bounds(ds)
97 except KeyError: # bounds are not already present
98 if ds.cf['longitude'].ndim > 1:
99 # We cannot infer 2D bounds, raise KeyError as custom "lon_b" is missing.
--> 100 raise KeyError('lon_b')
101 lon_name = ds.cf['longitude'].name
102 lat_name = ds.cf['latitude'].name
KeyError: 'lon_b' This is probably just an issue in |
Looks like Line 135 in b260fcb
obj.indexes.keys() .
I'm wondering if the solution is to update the regridding code to distinguish between rectilinear and curvilinear grids and change the behavior there. |
I have not stepped through the code (so this may be off base), but in looking at your comments (and maybe from past versions of code), it might be useful to have/use a |
I think The issue here sounds we need to add support for curvilinear grids during regridding, which can have
If we take a look at the variable's dims, it is As mentioned earlier, cf-xarray/xCDAT recognizes
We'll need to update This sounds like a potential path forward. Option 1
Option 2Or we can just use cf-xarray directly for both cases, which would simplify the code |
Continuing my comment above, I'm currently exploring possible solutions. Another one is updating |
@tomvothecoder Sounds like I'll add some properties to the |
Thanks @jasonb5. Let me know if you need anything. |
Another option is to use SGRID which would allow you to distinguish between the two: https://cf-xarray.readthedocs.io/en/latest/sgrid_ugrid.html . Cf-xarray does not interpret |
Nice! Thanks for the rec @dcherian. We'll look into SGRID. |
What happened?
In regridding some ocean data, many (not all) datasets are returning:
I'm having trouble getting the datasets to be CF-compliant on xcdat 0.6.1 / 0.7.1 / 0.7.3. I think xcdat used to better handle ocean grids (though I'm not sure when this might have changed).
What did you expect to happen? Are there are possible answers you came across?
Ideally xcdat could figure out the input grid automatically. If not, this issue is still useful for sorting out what we need to specify in the metadata to make the dataset CF-compliant for xesmf.
Minimal Complete Verifiable Example (MVCE)
Relevant log output
Anything else we need to know?
Note the same code works with
p = '/p/css03/esgf_publish/CMIP6/ScenarioMIP/BCC/BCC-CSM2-MR/ssp585/r1i1p1f1/Omon/fgco2/gn/v20190319/'
(though I now think this may be because the grid is rectilinear).Environment
xcdat 0.7.3
xesmf 0.8.8
The text was updated successfully, but these errors were encountered: