Skip to content

setting variables named in CF attributes as coordinate variables #4215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dcherian opened this issue Jul 10, 2020 · 5 comments
Closed

setting variables named in CF attributes as coordinate variables #4215

dcherian opened this issue Jul 10, 2020 · 5 comments

Comments

@dcherian
Copy link
Contributor

This came up in #2844 by @DWesl (see also #3689)

Currently we have decode_coords which sets variables named in attrs["coordinates"] as coordinate variables.

There are a number of other CF attributes that can contain variable names.

  1. bounds
  2. grid_mapping
  3. ancillary_variables
  4. cell_measures
  5. maybe more?

As in #3689 it's hard to see why a lot of these variables named in these attributes would be useful as "data variables".

Question: Should we allow decode_coords to control whether variables mentioned in these attributes are set as coordinate variables?

cc @jthielen

@jthielen
Copy link
Contributor

jthielen commented Jul 10, 2020

I agree with #3689 that it makes the most sense to have decode_coords set those variables referenced in bounds as coordinates. By the same reasoning, I would think the other special variable-linked attrs of CF Section 7 should be treated similarly:

  • cell_measures
  • climatology
  • geometry
  • node_coordinates
  • node_count
  • part_node_count
  • interior_ring

grid_mapping and ancillary_variables were also brought up. grid_mapping definitely makes sense to be interpreted as a coordinate variable, since it is inherently linked to the CRS of the data. I would say no however to ancillary_variables, since those are not really about coordinates and instead about linked data variables (like uncertainties).

My one concern with #2844 is clarifying the role of encoding vs. attrs. I don't have any good conclusions about it, but I'd want to be very cautious about not having these "links" defined by the CF conventions disappear unexpectedly because they were decoded by decode_coords, moved to encoding, and then erased due to some xarray operation clearing encoding on the returned data. I'd hope to keep them around in some fashion so that they are still usable by libraries like cf-xarray and MetPy, among others.

@shoyer
Copy link
Member

shoyer commented Jul 10, 2020

Sounds good to me! coordinates were the main example that came up when I wrote this (and we needed them for xarray's data model), but these other attributes look like they serve a similar role.

Question: Should we allow decode_coords to control whether variables mentioned in these attributes are set as coordinate variables?

I don't think this is necessary. It's easy to explicitly set or reset coordinates afterwards if desired.

My one concern with #2844 is clarifying the role of encoding vs. attrs.

I think we should probably ensure that xarray always propagates encoding exactly like how it propagates attrs.

@DWesl
Copy link
Contributor

DWesl commented Jul 14, 2020

formula_terms is another attribute with variable names, although it requires a bit more parsing.

Question: Should we allow decode_coords to control whether variables mentioned in these attributes are set as coordinate variables?

I don't think this is necessary. It's easy to explicitly set or reset coordinates afterwards if desired.

Is that "putting the variables in these attributes in coords is out of scope for XArray" or "putting the variables in these attributes in coords is out of scope for decode_coords" or something else?

I would say no however to ancillary_variables, since those are not really about coordinates and instead about linked data variables (like uncertainties).

I tend to think of uncertainties and status flags as important for the interpretation of the associated variables that should stay with the data variables unless a decision is explicitly made to drop them. On the other hand, since XArray seems to associate coordinates with dimensions rather than with variables, I can see why this might be less than desirable. This argument would also apply to grid_mapping.

My one concern with #2844 is clarifying the role of encoding vs. attrs.

I think we should probably ensure that xarray always propagates encoding exactly like how it propagates attrs.

Should this be part of #2844 or should preserving encoding be a separate PR?

@dcherian
Copy link
Contributor Author

dcherian commented Jul 14, 2020

formula_terms might make more sense here: xarray-contrib/cf-xarray#34

Is that "putting the variables in these attributes in coords is out of scope for XArray" or "putting the variables in these attributes in coords is out of scope for decode_coords" or something else?

I think this is "we should put things in coords without adding a new flag". It is a behaviour change though. So maybe we should starting issuing a warning now.

I would say no however to ancillary_variables, since those are not really about coordinates and instead about linked data variables (like uncertainties).

The only way to link variables in xarray objects is to set them as coords. So I think it still makes sense in xarray-world to do this.

should preserving encoding be a separate PR?

Separate PR. It will be a reasonably big change throughout the code base.

@dcherian
Copy link
Contributor Author

I think this can be closed thanks to @DWesl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants