Add placeholder CMIP7 MIP tables, and CMIP7_CV.json (#762) #778

durack1 · 2025-02-26T01:02:37Z

@mauzey1 @taylor13 @matthew-mizielinski @wolfiex ping - first pass at placeholder template files

mauzey1 · 2025-02-26T15:19:59Z

Will the variable tables in this pull request eventually have the branded variable format? Will those changes be reflected in mip-cmor-tables?

taylor13 · 2025-02-26T21:36:32Z

I'll be happy to edit the python code that creates these tables to be consistent with the table structure described in #762 if you point me to where it is.

mauzey1 · 2025-03-03T20:24:22Z

Aside from needing CMIP7_coordinate.json and CMIP7_formula_terms.json, there should be the following sections in CMIP7_CV.json.

required_global_attributes
institution_id
experiment_id
sub_experiment_id
source_id.

durack1 · 2025-03-03T21:57:51Z

@mauzey1 great, I'll pull some placeholder entries across so we're starting to build out a file for CMIP7/CMOR3.10 testing - expect an updated commit to drop later today

taylor13 · 2025-03-03T22:39:09Z

regarding #778 (comment), do we really expect the structure of the coordinates, formulas, and CMIP_CV.json file to be different from CMIP6? I think only if we need to associate an approx_interval with each frequency, but not if that's hard-wired within CMOR. Everything else should be the same as CMIP6, right? There will certainly be changes in content, but we have to wait for that.

durack1 · 2025-03-04T04:23:31Z

@taylor13 I think you're right, the contents of CMIP6_coordinates.json, *formula_terms.json, and *grids.json are not likely to need changing at this moment, so these can be used directly from CMIP6/CMIP6Plus or any other project that has duplicated the contents.

The current version of this PR now includes the changes I believe are captured in #762 - might have some final tweaks, but these are almost ready to start working with

taylor13

Lots of changes needed as indicated. A couple may deserve some discussion.

TestTables/CMIP7_atmos2d.json

TestTables/CMIP7_CV.json

TestTables/CMIP7_atmos2d.json

TestTables/CMIP7_oceanLev.json

TestTables/CMIP7_CV.json

…CV (#762)

durack1 · 2025-03-06T01:55:44Z

@taylor13 ok this is where I am up to - this is getting very close. Feel free to nit on what I have in the file changed in this PR

TestTables/CMIP7_CV.json

TestTables/CMIP7_atmos2d.json

matthew-mizielinski

My thoughts;

I'm still a little uncomfortable with the removal of an allowed set of frequencies for each variable. I'm not saying we shouldn't do this, but it does mean that any variable can be published at any frequency. It does give modelling groups more freedom to publish data, but it could lead to inconsistencies in the output of different groups. This is an item that I think warrants discussion, if only briefly, by the WIP.

I'd be a little cautious about completing the max and min entries (fine for testing purposes) as there may (a) be a resolution dependence in certain variables and (b) be a frequency dependence (range of hourly data >> that of monthly means)

TestTables/CMIP7_CV.json

durack1 · 2025-03-06T17:48:38Z

Adding some extra nits, following @taylor13 replies

CMIP7 data archive_id = "WCRP" which distinguishes the MIP projects from other projects like E3SM and distinguishes one governance body/vocabularies/name spaces/formats/data structures/ from a different set of these.
✅
The regions needed for CMIP7 are "global", "antarctica", "greenland", and possibly "northern_hemisphere" and "southern_hemisphere" (will need to check with the CMIP7 data request)
✅

mauzey1 · 2025-03-06T18:30:08Z

Another section that is needed in the CV is the nominal_resolution section. ✅

        "nominal_resolution":[
            "0.5 km",
            "1 km",
            "10 km",
            "100 km",
            "1000 km",
            "10000 km",
            "1x1 degree",
            "2.5 km",
            "25 km",
            "250 km",
            "2500 km",
            "5 km",
            "50 km",
            "500 km",
            "5000 km"
        ],

taylor13 · 2025-03-06T23:56:13Z

Hi Matt,
some responses:

I'm still a little uncomfortable with the removal of an allowed set of frequencies for each variable. I'm not saying we shouldn't do this, but it does mean that any variable can be published at any frequency. It does give modelling groups more freedom to publish data, but it could lead to inconsistencies in the output of different groups. This is an item that I think warrants discussion, if only briefly, by the WIP.

I agree that users will be able to write a wider variety of unrequested datasets than in CIMP6. In CMIP6, any variable found in any cmor table could be written even if that variable was not requested for a given experiment or a particular portion of an experiment. And I'm sure lots of unrequested data was written. Now a user will, in addition, be able to write data at unrequested frequencies. If we are concerned that unrequested data (either from unrequested experiments or unrequested frequencies) will clutter up the archive, then we could write a bit of software whereby a user (or the publisher) could interrogate the data request and check whether a variable is in the data request (given an experiment, a time-slice, a frequency, and a region). Then the that variable would be skipped. I'm not sure who might step up and volunteer to do that,
but we could advertise, I suppose.

I'd be a little cautious about completing the max and min entries (fine for testing purposes) as there may (a) be a resolution dependence in certain variables and (b) be a frequency dependence (range of hourly data >> that of monthly means)

Definitely. Given our current time-line, I agree that at most we should impose limits only on variables that are constrained physically by a max or min value (e.g., area fractions and other fractions, positive definite quantities like certain vertical fluxes and measures of quantity like mass which have a lower limit of 0, etc.). Even then, some modelers will complain because in some models you can get things like slightly negative specific humidity (but values so small they don't affect anything). If someone has the time and energy, they could also think of safe limits for ok_max_mean_abs and ok_min_mean_abs which would trap units problems in some variables. For example, for land_area_fraction, if you set those limits at 100 and 2, you would trap someone trying to report this as a fraction and not a percentage, as requested. (This test would raise a false error in a regional model place over the ocean (with no land) where the true mean land fraction is 0, independent of unit, which is less the ok_min_mean_abs values).

taylor13 · 2025-03-08T17:35:03Z

Yes, @mauzey1, we need to add the nominal_resolution CV. It has changed a bit from the above.

durack1 · 2025-03-09T21:19:23Z

Additional tweaks - I believe these are all the outstanding ones - if not please markup any additional changes in the Files changed tab - here

tweak

frequency (account for 3hr, 6hr pt values)

add

data_archive_id
nominal_resolution
region

TestTables/CMIP7_CV.json

durack1 · 2025-03-09T22:41:28Z

I'm still a little uncomfortable with the removal of an allowed set of frequencies for each variable. I'm not saying we shouldn't do this, but it does mean that any variable can be published at any frequency. It does give modelling groups more freedom to publish data, but it could lead to inconsistencies in the output of different groups. This is an item that I think warrants discussion, if only briefly, by the WIP.

Just replying to @matthew-mizielinski's comment above. The role of CMOR is to produce CF and project-compliant data (of which there are now quite a few projects other than CMIP), irrespective of whether this data has been requested by a project that is using the software. For e.g. data that was not included in the CMIP6 Data Request/DR was routinely produced for specific experiment(s), and made available either locally, or through the local node ESGF publication in some cases to those users.

The role of the ESGF publisher is to validate that any data that is being submitted to a project complies with project needs (global attributes, known variables, known frequencies, known activities/MIPs, known experiments required to populate the index matching search facets), and this could include a check whether certain data was requested by the CMIP7 project. While most CMIP contributors use CMOR all don't, which further suggests that the publisher (rather than CMOR) is the right place to do such project-compliance checks.

Ultimately, only data that is of broad interest (most likely requested in the CMIP7 DR) and produced by many groups will get replicated, and so I am less worried about your concern.

I've just finalized the "test" file formats, so will merge this PR, and we can continue this dialogue about compliance and how to enforce it elsewhere. This might be a good discussion to elevate to a WIP meeting agenda, rather squirreled away in a CMOR PR chatter

adding placeholder CMIP7 MIP tables, and CMIP7_CV.json (#762)

500e0d4

durack1 mentioned this pull request Feb 26, 2025

CMIP7 requirements: "branded variable" and new mip_table specification #762

Closed

adding CMIP7 *.ipynb (#762)

0aa2d04

taylor13 marked this pull request as draft February 27, 2025 22:49

first commit @taylor13 (#762)

cf6a003

implement branded_variables (#762)

b9f2809

taylor13 reviewed Mar 5, 2025

View reviewed changes

durack1 added 6 commits March 5, 2025 10:56

remove sub_experiment_id, table_id (#762)

7090469

augmenting branch_*, parent_*, and *_label entries into req_glob_att …

5d65521

…CV (#762)

update branded_variable names (#762)

5263814

remove frequency from variable_entry (#762)

2562dc0

brand_description, variable_title implemented (#762)

fb27951

remove duplicated/redundant valid/obs_*/positive; fix realm (#762)

a1a17dc

durack1 added 3 commits March 5, 2025 18:36

fix superscript transcription in *_labels (#762)

a9e50cf

remove sub_experiment* from req_glob_att (#762)

bd93688

Merge branch 'main' into issue762_durack1_newMIPTableTemplates

0db67d1

durack1 commented Mar 6, 2025

View reviewed changes

TestTables/CMIP7_CV.json Outdated Show resolved Hide resolved

durack1 commented Mar 6, 2025

View reviewed changes

TestTables/CMIP7_atmos2d.json Show resolved Hide resolved

durack1 added 4 commits March 5, 2025 20:17

convert *_labels to descriptive dictionary (#762)

077eae2

correct *_label lists to dicts (#762)

8a3bd1a

correct CV:area_label:u description (#762)

b85317e

group all branding_labels (#762)

0fb25de

matthew-mizielinski reviewed Mar 6, 2025

View reviewed changes

TestTables/CMIP7_CV.json Outdated Show resolved Hide resolved

durack1 added 2 commits March 9, 2025 14:10

further tweaks: nominal_resolution, region, *data_archive (#762)

46f63de

update frequency to account for pt samples (#762)

511110e

durack1 mentioned this pull request Mar 9, 2025

Handle variables that have a branding suffix #779

Merged

durack1 commented Mar 9, 2025

View reviewed changes

TestTables/CMIP7_CV.json Show resolved Hide resolved

TestTables/CMIP7_CV.json Outdated Show resolved Hide resolved

TestTables/CMIP7_CV.json Outdated Show resolved Hide resolved

updates: nominal_resolution, region identifiers, remove monC freq (#762)

3d88f69

durack1 marked this pull request as ready for review March 9, 2025 22:41

durack1 merged commit c83e67e into main Mar 9, 2025
13 of 15 checks passed

durack1 deleted the issue762_durack1_newMIPTableTemplates branch March 9, 2025 22:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add placeholder CMIP7 MIP tables, and CMIP7_CV.json (#762) #778

Add placeholder CMIP7 MIP tables, and CMIP7_CV.json (#762) #778

durack1 commented Feb 26, 2025 •

edited

Loading

mauzey1 commented Feb 26, 2025

taylor13 commented Feb 26, 2025

mauzey1 commented Mar 3, 2025

durack1 commented Mar 3, 2025

taylor13 commented Mar 3, 2025

durack1 commented Mar 4, 2025

taylor13 left a comment

durack1 commented Mar 6, 2025

matthew-mizielinski left a comment

durack1 commented Mar 6, 2025 •

edited

Loading

mauzey1 commented Mar 6, 2025 •

edited by durack1

Loading

taylor13 commented Mar 6, 2025 •

edited

Loading

taylor13 commented Mar 8, 2025

durack1 commented Mar 9, 2025 •

edited

Loading

durack1 commented Mar 9, 2025

Add placeholder CMIP7 MIP tables, and CMIP7_CV.json (#762) #778

Add placeholder CMIP7 MIP tables, and CMIP7_CV.json (#762) #778

Conversation

durack1 commented Feb 26, 2025 • edited Loading

mauzey1 commented Feb 26, 2025

taylor13 commented Feb 26, 2025

mauzey1 commented Mar 3, 2025

durack1 commented Mar 3, 2025

taylor13 commented Mar 3, 2025

durack1 commented Mar 4, 2025

taylor13 left a comment

Choose a reason for hiding this comment

durack1 commented Mar 6, 2025

matthew-mizielinski left a comment

Choose a reason for hiding this comment

durack1 commented Mar 6, 2025 • edited Loading

mauzey1 commented Mar 6, 2025 • edited by durack1 Loading

taylor13 commented Mar 6, 2025 • edited Loading

taylor13 commented Mar 8, 2025

durack1 commented Mar 9, 2025 • edited Loading

durack1 commented Mar 9, 2025

durack1 commented Feb 26, 2025 •

edited

Loading

durack1 commented Mar 6, 2025 •

edited

Loading

mauzey1 commented Mar 6, 2025 •

edited by durack1

Loading

taylor13 commented Mar 6, 2025 •

edited

Loading

durack1 commented Mar 9, 2025 •

edited

Loading