Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add placeholder CMIP7 MIP tables, and CMIP7_CV.json (#762) #778

Merged
merged 20 commits into from
Mar 9, 2025

Conversation

durack1
Copy link
Contributor

@durack1 durack1 commented Feb 26, 2025

Ref #762

@mauzey1 @taylor13 @matthew-mizielinski @wolfiex ping - first pass at placeholder template files

@mauzey1
Copy link
Collaborator

mauzey1 commented Feb 26, 2025

Will the variable tables in this pull request eventually have the branded variable format? Will those changes be reflected in mip-cmor-tables?

@taylor13
Copy link
Collaborator

I'll be happy to edit the python code that creates these tables to be consistent with the table structure described in #762 if you point me to where it is.

@taylor13 taylor13 marked this pull request as draft February 27, 2025 22:49
@mauzey1
Copy link
Collaborator

mauzey1 commented Mar 3, 2025

Aside from needing CMIP7_coordinate.json and CMIP7_formula_terms.json, there should be the following sections in CMIP7_CV.json.

  • required_global_attributes
  • institution_id
  • experiment_id
  • sub_experiment_id
  • source_id.

@durack1
Copy link
Contributor Author

durack1 commented Mar 3, 2025

@mauzey1 great, I'll pull some placeholder entries across so we're starting to build out a file for CMIP7/CMOR3.10 testing - expect an updated commit to drop later today

@taylor13
Copy link
Collaborator

taylor13 commented Mar 3, 2025

regarding #778 (comment), do we really expect the structure of the coordinates, formulas, and CMIP_CV.json file to be different from CMIP6? I think only if we need to associate an approx_interval with each frequency, but not if that's hard-wired within CMOR. Everything else should be the same as CMIP6, right? There will certainly be changes in content, but we have to wait for that.

@durack1
Copy link
Contributor Author

durack1 commented Mar 4, 2025

@taylor13 I think you're right, the contents of CMIP6_coordinates.json, *formula_terms.json, and *grids.json are not likely to need changing at this moment, so these can be used directly from CMIP6/CMIP6Plus or any other project that has duplicated the contents.

The current version of this PR now includes the changes I believe are captured in #762 - might have some final tweaks, but these are almost ready to start working with

Copy link
Collaborator

@taylor13 taylor13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of changes needed as indicated. A couple may deserve some discussion.

@durack1
Copy link
Contributor Author

durack1 commented Mar 6, 2025

@taylor13 ok this is where I am up to - this is getting very close. Feel free to nit on what I have in the file changed in this PR

Copy link

@matthew-mizielinski matthew-mizielinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts;

  • I'm still a little uncomfortable with the removal of an allowed set of frequencies for each variable. I'm not saying we shouldn't do this, but it does mean that any variable can be published at any frequency. It does give modelling groups more freedom to publish data, but it could lead to inconsistencies in the output of different groups. This is an item that I think warrants discussion, if only briefly, by the WIP.
  1. I'd be a little cautious about completing the max and min entries (fine for testing purposes) as there may (a) be a resolution dependence in certain variables and (b) be a frequency dependence (range of hourly data >> that of monthly means)

@durack1
Copy link
Contributor Author

durack1 commented Mar 6, 2025

Adding some extra nits, following @taylor13 replies

CMIP7 data archive_id = "WCRP" which distinguishes the MIP projects from other projects like E3SM and distinguishes one governance body/vocabularies/name spaces/formats/data structures/ from a different set of these.

The regions needed for CMIP7 are "global", "antarctica", "greenland", and possibly "northern_hemisphere" and "southern_hemisphere" (will need to check with the CMIP7 data request)

@mauzey1
Copy link
Collaborator

mauzey1 commented Mar 6, 2025

Another section that is needed in the CV is the nominal_resolution section. ✅

        "nominal_resolution":[
            "0.5 km",
            "1 km",
            "10 km",
            "100 km",
            "1000 km",
            "10000 km",
            "1x1 degree",
            "2.5 km",
            "25 km",
            "250 km",
            "2500 km",
            "5 km",
            "50 km",
            "500 km",
            "5000 km"
        ],

@taylor13
Copy link
Collaborator

taylor13 commented Mar 6, 2025

Hi Matt,
some responses:

I'm still a little uncomfortable with the removal of an allowed set of frequencies for each variable. I'm not saying we shouldn't do this, but it does mean that any variable can be published at any frequency. It does give modelling groups more freedom to publish data, but it could lead to inconsistencies in the output of different groups. This is an item that I think warrants discussion, if only briefly, by the WIP.

I agree that users will be able to write a wider variety of unrequested datasets than in CIMP6. In CMIP6, any variable found in any cmor table could be written even if that variable was not requested for a given experiment or a particular portion of an experiment. And I'm sure lots of unrequested data was written. Now a user will, in addition, be able to write data at unrequested frequencies. If we are concerned that unrequested data (either from unrequested experiments or unrequested frequencies) will clutter up the archive, then we could write a bit of software whereby a user (or the publisher) could interrogate the data request and check whether a variable is in the data request (given an experiment, a time-slice, a frequency, and a region). Then the that variable would be skipped. I'm not sure who might step up and volunteer to do that,
but we could advertise, I suppose.

I'd be a little cautious about completing the max and min entries (fine for testing purposes) as there may (a) be a resolution dependence in certain variables and (b) be a frequency dependence (range of hourly data >> that of monthly means)

Definitely. Given our current time-line, I agree that at most we should impose limits only on variables that are constrained physically by a max or min value (e.g., area fractions and other fractions, positive definite quantities like certain vertical fluxes and measures of quantity like mass which have a lower limit of 0, etc.). Even then, some modelers will complain because in some models you can get things like slightly negative specific humidity (but values so small they don't affect anything). If someone has the time and energy, they could also think of safe limits for ok_max_mean_abs and ok_min_mean_abs which would trap units problems in some variables. For example, for land_area_fraction, if you set those limits at 100 and 2, you would trap someone trying to report this as a fraction and not a percentage, as requested. (This test would raise a false error in a regional model place over the ocean (with no land) where the true mean land fraction is 0, independent of unit, which is less the ok_min_mean_abs values).

@taylor13
Copy link
Collaborator

taylor13 commented Mar 8, 2025

Yes, @mauzey1, we need to add the nominal_resolution CV. It has changed a bit from the above.

@durack1
Copy link
Contributor Author

durack1 commented Mar 9, 2025

Additional tweaks - I believe these are all the outstanding ones - if not please markup any additional changes in the Files changed tab - here

tweak

  • frequency (account for 3hr, 6hr pt values)

add

  • data_archive_id
  • nominal_resolution
  • region

@durack1
Copy link
Contributor Author

durack1 commented Mar 9, 2025

  • I'm still a little uncomfortable with the removal of an allowed set of frequencies for each variable. I'm not saying we shouldn't do this, but it does mean that any variable can be published at any frequency. It does give modelling groups more freedom to publish data, but it could lead to inconsistencies in the output of different groups. This is an item that I think warrants discussion, if only briefly, by the WIP.

Just replying to @matthew-mizielinski's comment above. The role of CMOR is to produce CF and project-compliant data (of which there are now quite a few projects other than CMIP), irrespective of whether this data has been requested by a project that is using the software. For e.g. data that was not included in the CMIP6 Data Request/DR was routinely produced for specific experiment(s), and made available either locally, or through the local node ESGF publication in some cases to those users.

The role of the ESGF publisher is to validate that any data that is being submitted to a project complies with project needs (global attributes, known variables, known frequencies, known activities/MIPs, known experiments required to populate the index matching search facets), and this could include a check whether certain data was requested by the CMIP7 project. While most CMIP contributors use CMOR all don't, which further suggests that the publisher (rather than CMOR) is the right place to do such project-compliance checks.

Ultimately, only data that is of broad interest (most likely requested in the CMIP7 DR) and produced by many groups will get replicated, and so I am less worried about your concern.

I've just finalized the "test" file formats, so will merge this PR, and we can continue this dialogue about compliance and how to enforce it elsewhere. This might be a good discussion to elevate to a WIP meeting agenda, rather squirreled away in a CMOR PR chatter

@durack1 durack1 marked this pull request as ready for review March 9, 2025 22:41
@durack1 durack1 merged commit c83e67e into main Mar 9, 2025
13 of 15 checks passed
@durack1 durack1 deleted the issue762_durack1_newMIPTableTemplates branch March 9, 2025 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants