Skip to content

Call for Prototype/Implementation Owners for Different GeoZarr Conformance Classes #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
briannapagan opened this issue Apr 3, 2025 · 43 comments

Comments

@briannapagan
Copy link

briannapagan commented Apr 3, 2025

In the April 2nd monthly meeting, @christophenoel gave a great presentation explaining abstract data models, file formats, and encodings. He gave great context explaining how HDF, CF, and GDAL work, and proposed a meta-model as a bridge to zarr. More importantly he identified that what we have struggled with most in GeoZarr is trying to resolve issues that stem from diverging abstract geospatial data models.

In this proposed unified data model, there would be specific Profiles (labeled in this slide as Feature Types, but the group agrees to move forward with the terminology of Profiles):

Image Image

The group that attended the call agreed with this characterization and approach. To move this conversation forward, we want to identify point of contacts who will own a specific type of Profile that is desired to work with GeoZarr. These owners would be responsible for prototyping specific encodings in GeoZarr and support full round-trip translation between the existing data model implementation to GeoZarr and back.

Below is the working list of Profiles that we need to identify owners for - please add additional Profiles to this list and suggestions of the best people to engage with:

  • RGB Raster
  • Single Variable Raster
  • 3D Raster (XYT, XYZ)
  • Hyperspectral
Profile Proposed Owners
RGB Raster
Single Variable Raster
3D Raster (XYT, XYZ) @maxrjones ? @ethanrd ? @briannapagan @dcherian
Hyperspectral
SAR SLC @emmanuelmathot
DEM @emmanuelmathot

Once we have an agreed upon list of Profiles and identified potential owners, I suggest this focus group to meet at a more frequent interval than monthly to coordinate. Of course open to any suggestions and feedback!

@tylere
Copy link

tylere commented Apr 3, 2025

Here are some other potential profiles, that might be considered new items or could be folded into items in the list above:

  • Multi-spectral - more than a 3 band RGB, but less than hyperspectral. Examples: Landsat, MODIS, Sentinel-2 -3 -5P products

  • Topography - 2-D raster with one or more height bands. Examples: SRTM DEM, DTMs, DSMs

  • SAR Single Look Complex (SLC) - Images containing both scaler and complex values. Examples: Sentinel-1 bursts

@emmanuelmathot
Copy link

emmanuelmathot commented Apr 3, 2025

I volunteer for the SAR SLC and DEM. I believe it is important to define the ideal profile for access patterns that align with various use cases: simple screening, terrain correction, interferometry using topsar bursting passing, ...

EDIT: and for multispectral, having different groups of resolution must be also addressed.

@mdsumner
Copy link

mdsumner commented Apr 3, 2025

I would add the GDAL multidimensional model, and at least be clear that "GDAL" (as above) usually means "2D classic raster" (setting aside the warper api and the geoocation frameworks), I don't think that's been considered.

edit: I wrote a bit about it here, it's not much but I've only just got my teeth into it in recent weeks https://www.hypertidy.org/posts/2025-03-12-r-py-multidim/r-py-multidim

@mdsumner
Copy link

mdsumner commented Apr 4, 2025

oops, my apologies I see @christophenoel did cover this, very glad to see

(awesome having this video and transcript!)

@felixcremer
Copy link

@christophenoel could you share the slides of this presentation?

@echarles
Copy link

echarles commented Apr 4, 2025

@christophenoel could you share the slides of this presentation?

the link to the slides has been shared in the public geozarr google group on https://groups.google.com/u/0/g/geozarr/c/9NbEa84BBSA and is https://drive.google.com/file/d/1zoIhQK-J4fSM3dsRdWXXXW9v57GrhjTi/view?usp=sharing

@dcherian
Copy link

dcherian commented Apr 4, 2025

I'd love to assist with the 3D raster feature.

@christophenoel
Copy link

Thanks for sharing the link. I'm, Interested in the first four items, with a preference to start with the RGB case and a single-variable example initially.

For info, I created branch cnl-examples with an initial RGB raster profile example in both Zarr V2 and Zarr V3 formats. The examples are provided in a Jupyter Notebook intended for automatic launch via MyBinder.

You can test by creating the Jupyter environment simply by accessing the URL: binder

Image

@christophenoel
Copy link

Note: the V3 was created using an old library. I will fix this.

@mdsumner
Copy link

mdsumner commented Apr 7, 2025

I'm up for RGB,single, and DEM, especially where they overlap with VRT or GTI (COP30, GEBCO, terrain RGB)

How about XYZT? Thredds servers via fileServer vs dodsC, there's a few good examples on NCI here

don't know anything about hyperspectral 😀

@christophenoel
Copy link

While drafting the raster profiles and their examples, it became clear that some profiles—such as time-series-raster—serve best as complementary extensions to core 2D raster profiles (e.g. scalar-raster, rgb-raster). They add requirements for specific dimensions (e.g. time ) but do not redefine the overall structure.

To maintain interoperability and simplicity, the number of combinations must remain limited. Excessive flexibility would increase complexity for applications and hinder standardisation efforts.

📦 These initial drafts are available in a dedicated branch cnl-examples, along with working examples:

📓 You can explore them directly in a Jupyter environment using Binder: 👉 Launch examples notebook

@briannapagan
Copy link
Author

Thank you @christophenoel for these examples already! @emmanuelmathot, @maxrjones and I chatted yesterday about how to address this work and I want to make sure folks volunteering aren't diverging too much in expectations. Can I propose folks who have volunteered to find a time to meet next week and discuss goals of this exercise?

@emmanuelmathot
Copy link

Thank you, @christophenoel. In the meantime, could you transform the cnl-branch into a PR to allow commenting on your input?

@christophenoel
Copy link

@emmanuelmathot I prefer to wait until the work is split into distinct tasks. This approach avoids dealing with a large PR that generates scattered discussions. I think a branch for each profile should be created. Additionally, the current branch contains only early drafts.

Note that I am preparing example for Zarr v3.

The constraint concerning projected coordinates (projection_x_coordinate) seems overly restrictive and could be handled through an additional profile.

@emmanuelmathot
Copy link

I think rgb_raster is not necessary and the band_raster can encompass the role. This is also closer to STAC band construct model and allow for better alignment in the future.

@christophenoel
Copy link

I agree. But maybe rgb_raster can be a profile refining band_raster (which means: includes at least red, green, blue)

@mdsumner
Copy link

mdsumner commented Apr 9, 2025

Separate interpretation of sets of bands from their type

I don't think we have to model ambiguity of ZT from sets of types. It's a convention of sorts to model colour vs time vs depth vs any arbitrary coordinate space

TIFF can only specify grey, RGB, RGBa, multiband of any number of -type-

I wonder if we're mixing GDAL heuristics with actual tiff models

@christophenoel
Copy link

Hi @mdsumner , I'm not sure to understand to what you're replying exactly ?
What is important to me, is to provide the ability to a client application to detect that there are RGB colors that can be displayed. There are multiple possible approach of course, but such standard profile would make sense to me.

@mdsumner
Copy link

mdsumner commented Apr 9, 2025

Bare with me, I think I'm so used to human-detection of interpretation I can't even imagine a standard for that

@christophenoel
Copy link

christophenoel commented Apr 9, 2025

To eliminate the ambiguity between data type and interpretation, the symbology extension (based on OGC symbology) proposed in the initial GeoZarr draft appears to offer a suitable solution.
(see https://github.com/zarr-developers/geozarr-spec/blob/main/geozarr-spec.md#portrayals-and-symbology )

(Edit: however, a lot of GeoTiff would match this RGB profile, and allows detecting a possible mapping/export to GeoTiff.)

@briannapagan
Copy link
Author

@christophenoel @emmanuelmathot @mdsumner In the CNG #geozarr slack channel I posed some times for next week to chat.

@christophenoel
Copy link

@christophenoel @emmanuelmathot @mdsumner @rabernat

Defining a single profile to cover all kinds of rasters and datacubes is difficult. These datasets can include many different combinations—such as time, height, or wavelength—and can use either a projected or geographic coordinate system. In OGC, a profile is meant to tailor a standard for a specific use or community, not to describe every possible variation.

From my point of view, a better approach is to use OGC conformance classes ((see conformance classes). These are clear, testable building blocks. Each dataset can declare which classes it follows—like “has time”, “uses projected coordinates”, or “includes multiple bands

This makes it easier to describe what a dataset contains, and to check that it meets the expected rules. Instead of one big profile, each dataset is a combination of smaller, well-defined parts.

Image

@christophenoel
Copy link

Note: regarding the "meta model" spec approach, see the PR: #64

@christophenoel
Copy link

The latest Editor's Draft version of OGC GeoZarr Specificationis found here in HTML or PDF

@briannapagan briannapagan changed the title Call for Prototype/Implementation Owners for Different GeoZarr Profiles Call for Prototype/Implementation Owners for Different GeoZarr Conformance Classes Apr 17, 2025
@rabernat
Copy link

Here's some text on the approach I proposed at the last meeting

GeoZarr Composable Conformance Classes

Defining a single profile to cover all kinds of rasters and datacubes is difficult. These datasets can include many different combinations—such as time, height, or wavelength—and can use either a projected or geographic coordinate system.

The Four Dimensions of Profiles

GeoZarr datasets may be classified within a multi-dimensional space of options.
This option space includes:

  • Data variables types - This characterizes the data values themselves. Options include
    • Single-band raster (a single array)
    • Multi-band raster - multiple bands with the same dtype and resolution stored as an additional band dimension on an array, with named bands (e.g. B01, red)
    • Hyperspectral raster - similar to multi-band raster, but with more bands and an encoding of the band dimension as specific wavelengths ranges
    • CF-style data variables. Following CF conventions each variable is stored as a separate array with standard_name attribute.
  • Horizontal geospatial coordinate type - This describes how the horizontal (x, y) coordinates of the data are specified. Broadly speaking, options include
    • GDAL-style projected raster coordinates. Here the data are treated as pixels on a rectangular grid, with georeferences provided by a GeoTransform and CRS.
    • CF-style explicit coordinates. Here the coordinates are encoded using NetCDF / CF conventions, with all of the possibilities allowed therein (e.g. independent latitude and longitude coordinte, two-dimensional latitude, longitude coordinate )
    • Discrete Global Grid Systems (DGGS). Here the data are represented as cells within a specific DGGS. (Encoding is still TBD.)
  • Vertical coordinate type - This describes the vertical dimension of the data. Options include
    • None - no vertical dimension provided
    • CF-Style vertical coordinate (ref)
  • Temporal coordinate type
    • None - no temporal dimension provided.
    • CF-style time coordinate (ref)

Image

Examples

dataset variable type horiz. coord. vert. coord. time coord
Sentinel 2 Scene Multiband Raster GDAL None None
Harmonized Sentinel Datacube Multiband Raster GDAL None CF
CMIP6 Output CF CF CF CF

@briannapagan
Copy link
Author

Thank you for kicking this off Ryan, this is a great foundation to build from! I am trying to think where sparse/ragged data, would fall into the existing table. It looks like CF conventions work: https://www.ncei.noaa.gov/netcdf-ragged-array-format, so perhaps there is another column of just 'n coord' where variable and 'n-coord' would fall under CF. Also trying to follow: pydata/xarray#7988

@tylere
Copy link

tylere commented Apr 24, 2025

I like the idea of the "building block" options for constructing profiles, but I have some questions/comments on the current option descriptions.

In regards to "multi-band raster" type, many imaging satellite data products may not fit this definition, because the pixel spacing ("resolution") and/or dtype differs between bands. For example, Landsat 9 has bands with 15m, 30m, and 100m pixels, and the bands have different data types (INT16, UINT16, UINT8). Could the data variable type dimension option be expanded to accommodate this, or would this require a satellite data product to be composed of multiple GeoZarr multi-band rasters?

Also I'm not sure about the distinction between multi-band and hyper-spectral... multi-spectral satellite data products often have bands that have specific wavelength ranges, and this information is important when trying to harmonize bands between different sensors (example: Landsat 9 and Sentinel-2).

@rabernat
Copy link

I am trying to think where sparse/ragged data, would fall into the existing table

They currently don't. What I wrote above is focused on dense rasters. Could you clarify the specific use case you have in mind here (e.g. an example from an existing data product)?

many imaging satellite data products may not fit this definition, because the pixel spacing ("resolution") and/or dtype differs between bands

This is a good point Tyler. Zarr can't treat arrays of different shape or dtype as part of the same array. (Related perhaps to Brianna's comment about ragged arrays.) In this case, the different bands of different resolution would have to be stored as distinct arrays. In the CF coordinate model, they would also need distinct dimension coordinates. (Not sure how the GDAL raster coordinate model handles that case; is the affine transform the same?)

But in summary, yes, we would need to modify this categorization to allow for this scenario.

Also I'm not sure about the distinction between multi-band and hyper-spectral

I agree it's a fuzzy distinction. Is there an existing metadata convention that covers this somehow, e.g. in STAC? AFAIK CF does not.

@mdsumner
Copy link

mdsumner commented Apr 25, 2025

(Not sure how the GDAL raster coordinate model handles that case; is the affine transform the same?)

GDAL calls these subdatasets, and that case (can't be stored on the same array) is exactly when a container format will present as subdatasets. Each one then has its own crs and transform (these could be grouped together but will or won't be depending on driver details, I think)

e.g. snipping out a few subdatasets from this file to show the range of array sizes (here unrolled as bands in GDAL classic mode for dims > yx), each "*_NAME=" here is a classic 2D raster with its own transform and crs

gdalinfo   "ZARR:\"/vsizip//vsicurl/https://eopf-public.s3.sbg.perf.cloud.ovh.net/eoproducts/S02MSIL1C_20230629T063559_0000_A064_T3A5.zarr.zip\""

...
Subdatasets:
  SUBDATASET_1_NAME=ZARR:"/vsizip//vsicurl/https://eopf-public.s3.sbg.perf.cloud.ovh.net/eoproducts/S02MSIL1C_20230629T063559_0000_A064_T3A5.zarr.zip":/conditions/geometry/mean_viewing_incidence_angles
  SUBDATASET_1_DESC=[13x2] /conditions/geometry/mean_viewing_incidence_angles (Float64)
  SUBDATASET_2_NAME=ZARR:"/vsizip//vsicurl/https://eopf-public.s3.sbg.perf.cloud.ovh.net/eoproducts/S02MSIL1C_20230629T063559_0000_A064_T3A5.zarr.zip":/conditions/geometry/sun_angles
  SUBDATASET_2_DESC=[2x23x23] /conditions/geometry/sun_angles (Float64)
...
  SUBDATASET_3_NAME=ZARR:"/vsizip//vsicurl/https://eopf-public.s3.sbg.perf.cloud.ovh.net/eoproducts/S02MSIL1C_20230629T063559_0000_A064_T3A5.zarr.zip":/conditions/geometry/viewing_incidence_angles
  SUBDATASET_3_DESC=[13x4x2x23x23] /conditions/geometry/viewing_incidence_angles (Float64)
...
  SUBDATASET_7_NAME=ZARR:"/vsizip//vsicurl/https://eopf-public.s3.sbg.perf.cloud.ovh.net/eoproducts/S02MSIL1C_20230629T063559_0000_A064_T3A5.zarr.zip":/conditions/mask/detector_footprint/r10m/b08
  SUBDATASET_7_DESC=[10980x10980] /conditions/mask/detector_footprint/r10m/b08 (Byte)
  SUBDATASET_8_NAME=ZARR:"/vsizip//vsicurl/https://eopf-public.s3.sbg.perf.cloud.ovh.net/eoproducts/S02MSIL1C_20230629T063559_0000_A064_T3A5.zarr.zip":/conditions/mask/detector_footprint/r20m/b05
  SUBDATASET_8_DESC=[5490x5490] /conditions/mask/detector_footprint/r20m/b05 (Byte)

that's classic mode, in multidimensional mode it's a lot more like zarr groups and arrays

@briannapagan
Copy link
Author

briannapagan commented Apr 25, 2025

@rabernat The specific dataset I was thinking of was an example was OCO-2. https://disc.gsfc.nasa.gov/datasets/OCO2_L2_Lite_FP_11.2r/summary?keywords=oco2

Any sounding type of dataset or level-2 product would be similar.

Image

@christophenoel
Copy link

Here's some text on the approach I proposed at the last meeting

However a few thoughts:

  • CF-style data variables: These requirements appear to be applicable across the three raster types, not in a dedicated requirement class from my point of view.
  • Coordinate encoding (GDAL-style transform vs CF-style array): Indeed, while I assume the model supports both type of coordinates encoding, why not advertise how the coordinates are encoded through a requirement class... However, the choice between coordinate styles should be viewed as a decision made by the data provider, rather than one dictated by the mission.
  • Other classes: I'm not sure this will be limited to these four dimensions (while I don't have a relevant example yet), but seems a good start.
  • Extension capability of conformance classes: The mechanism should be designed to support extensions. A mission may define its own requirement classes, which can specify custom constraints—such as group naming conventions, as seen in the EOS-EOF data model.

@christophenoel
Copy link

In regards to "multi-band raster" type, many imaging satellite data products may not fit this definition, because the pixel spacing ("resolution") and/or dtype differs between bands. For example, Landsat 9 has bands with 15m, 30m, and 100m pixels, and the bands have different data types (INT16, UINT16, UINT8).

Great example of additional requirement classes as this is a use cases we already faced when working with Zarr.
A requirement class might be needed to provide a standard mean for accomodating such usual case. Just as a example (from what we did), multi-resolution raster:

  • For each raster resolution, a group name with the the resolution identifier (e.g., 15m, 30m, 100m) should be created
  • TBD

Also I'm not sure about the distinction between multi-band and hyper-spectral... multi-spectral satellite data products often have bands that have specific wavelength ranges, and this information is important when trying to harmonize bands between different sensors (example: Landsat 9 and Sentinel-2).

I would name them:

  • multi-band: band dimension
  • multi-spectral: wavelength dimensions (might be multi spectral or hyperspectral)

@christophenoel
Copy link

I added requirement-classes.ipynb to illustrate my current opinion about what requirements / conformances classes should be.

The approach structures the specification into complementary and exclusive building blocks, offering flexibility while ensuring interoperability across diverse Earth Observation (EO) and environmental data products.

At the core, the raster requirement class establishes the baseline for CF-compliant geospatial rasters. From this foundation:

  • Exactly one horizontal coordinate type must be selected, choosing between affine_coordinates, projection_coordinates, or geographic_coordinates.
  • Exactly one band structure must be selected, choosing between single-band, multiband, or multispectral.
  • Optional complementary dimensions, vertical-coordinate and temporal-coordinate, may extend the data model to support three-dimensional or time-varying datasets.

Image of req classes

This structure provides a flexible and rigorous foundation upon which further specialised requirement classes can be defined. Extensions may include, but are not limited to:

  • Handling multi-resolution raster datasets where multiple pyramidal levels of detail are stored.
  • Defining mission-specific encodings for particular EO products, such as Sentinel-2 Level-2A reflectance datasets or Landsat-8 surface temperature products,

The goal is to offer a scalable standardisation approach: simple datasets can conform to minimal classes, while complex, high-level products can be described through aggregation and extension of the core building blocks.

@christophenoel
Copy link

Note: multiple possible representation alternatives such as:
Image

@rabernat
Copy link

Christoph, in your hierarchy, where does weather and climate model data fit in? Or Level 4 data with variables like wind speed, temperature, etc? I would not call these data "rasters" and I would not describe the data variables as "bands". Do you feel that is out of scope for GeoZarr?

@mdsumner
Copy link

Where in that framework does crs and non degenerate coordinates fit? Is i think the more relevant question here. xarray and Zarr need to define the standards, and elevate us from the legacy of CF, I don't see why that's not clear (?)

@rabernat
Copy link

Where in that framework does crs and non degenerate coordinates fit?

In Christoph's framework, I believe "non-degenerate coordinates" (Michael's terminology) would correspond to the "AffineCoordinates" or "ProjectionCoordinates" options. CRS is mandatory for all datasets.

elevate us from the legacy of CF

Simply abandoning CF is not feasible, as it is a mandatory standard for many data providers (e.g. CMIP). I'm not sure if that's what you're suggesting. Our intention is to leverage CF conventions wherever appropriate (not reinvent the wheel) while providing some additional options beyond CF (e.g. the affine coordinates) where needed.

@christophenoel
Copy link

Christoph, in your hierarchy, where does weather and climate model data fit in? Or Level 4 data with variables like wind speed, temperature, etc? I would not call these data "rasters" and I would not describe the data variables as "bands". Do you feel that is out of scope for GeoZarr?

  1. Indeed, the diagram only shows a small part of the whole picture (I imagine we'll find maybe a lot more). Again let's see it as a starting point
  2. For me, a raster is a grid of cells (or pixels), where each cell has a value representing information such as temperature, elevation, or colour. It is commonly used for storing images or spatial data in Earth observation and GIS.
  3. For data variables such as wind speed:
    a. If this is defined by at least 2D x,y coordinates, then why isn't a raster ? (how do you define raster by the way)
    b. If you consider not a raster, then there might be no requirement classes at all (do we need to standardise something for such feature types), or a generic requirement class for CF-compliant variables or we can define a requirement class if relevant.
  4. I think there a very large set of dataset which provide multispectral data, so I belive that at least one requirement class should be available to use the popular name of band for the dimension (like all GDAL datasets). This does not enforce anybody to use the dimension name band, only if supporting that requirement class.

Maybe we should continue the discussion to have a better understanding of the above points, because I'm not sure I'm in line.

@christophenoel
Copy link

elevate us from the legacy of CF

Simply abandoning CF is not feasible, as it is a mandatory standard for many data providers (e.g. CMIP). I'm not sure if that's what you're suggesting. Our intention is to leverage CF conventions wherever appropriate (not reinvent the wheel) while providing some additional options beyond CF (e.g. the affine coordinates) where needed.

@mdsumner I fully agree with @rabernat . Our customers and partners are supportive and keen to rely on CF wherever possible. As a compromise, the data model remains permissive (i.e. not strictly CF-compliant), but most of the requirement classes (which are optional) are expected to build upon CF conventions.

@christophenoel
Copy link

@rabernat One final important point to move forward (apologies for the repeated messages): if there is a commonly used dataset or use case that does not fit within the initial set of requirement classes, we can then assess which additional classes would be appropriate and extend the class diagram accordingly. (?)

@maxrjones
Copy link
Member

maxrjones commented May 7, 2025

I put the following in the agenda for today, but sadly won't be able to make the meeting so I'm copying here if anyone wants to discuss asynchronously.

I observed at EGU that most geospatial use-cases for Zarr currently are simple translations (or virtualizations) from existing well-defined standards including OGC GeoTIFF/COG and NetCDF CF conventions. I'd like to propose that we prioritize a GeoZarr v1.0 release that includes conformance classes matching exactly existing CF and OGC standards with only the changes being those necessary to match the Zarr data structure and specification. I think that this path would allow us to move much quicker and prompt adoption from those already producing geospatial Zarr before next prioritizing features not directly supported by OGC GeoTIFF standards (i.e., n-dimensionality) or CF (i.e., functional coordinate representation).

Here would be the steps for accomplishing this proposal:

  • Translate NetCDF CF conventions v1.12 to match Zarr's data/metadata structure and parlance
  • Translate OGC GeoTIFF standard v1.1 to match Zarr's data/metadata structure and parlance
  • Translate OGC Cloud Optimized GeoTIFF (COG) Standard v1.0 to match Zarr's data/metadata structure and parlance
  • Define a lightweight standard for defining which conformance class is used at the root-level metadata of the structure.
  • Complete two independent implementations of the three mentioned conformance classes to ensure the proposed translations are valid
  • Complete OGC voting process
  • Release GeoZarr v1.0
  • Add an extension to the CF conformance class that covers the the “functional coordinate representation”
  • Add an extension to the GeoTIFF/COG conformance classes that allows defining dimension names for N-D datasets
  • Complete two implementations of the extensions
  • Release GeoZarr v1.1

This may be exactly what @christophenoel has already been proposing, admittedly I've been trying it hard to understand some of the discussions happening in this issue and how they would translate to a specification and implementations.

@christophenoel
Copy link

A few clarifications may help ensure consistency with the ideas I shared:

  • GeoZarr’s abstract data model is designed to harmonise concepts from both CF and GDAL. This is articulated in the data model section of the specification.
  • Zarr encoding must be defined for all relevant constructs derived from CF and GDAL conventions (see encoding section).
  • GeoZarr as a format is intentionally permissive—it does not restrict implementations to CF or GDAL subsets. Instead, it allows full compatibility where possible.

The spec would provide requirement classes that allow data producers to advertise conformance to specific conventions (e.g. presence of lat/lon, CF compliance, multiscale support, affine transforms, time dimension, etc.). These classes are meant to be composable and declarative, rather than restrictive.

Note: I’m not aiming to rush decisions on including or excluding topics from the specification. From my perspective, once the initial PR is reviewed and accepted, we should establish dedicated working groups for each topic (with dedicated PR) and observe which ones converge more quickly.

The priority task seems to me to properly define the data model and its encoding into Zarr. The definition of requirement classes is secondary at this stage.

@maxrjones
Copy link
Member

A few clarifications may help ensure consistency with the ideas I shared:

  • GeoZarr’s abstract data model is designed to harmonise concepts from both CF and GDAL. This is articulated in the data model section of the specification.
  • Zarr encoding must be defined for all relevant constructs derived from CF and GDAL conventions (see encoding section).
  • GeoZarr as a format is intentionally permissive—it does not restrict implementations to CF or GDAL subsets. Instead, it allows full compatibility where possible.

The spec would provide requirement classes that allow data producers to advertise conformance to specific conventions (e.g. presence of lat/lon, CF compliance, multiscale support, affine transforms, time dimension, etc.). These classes are meant to be composable and declarative, rather than restrictive.

Note: I’m not aiming to rush decisions on including or excluding topics from the specification. From my perspective, once the initial PR is reviewed and accepted, we should establish dedicated working groups for each topic (with dedicated PR) and observe which ones converge more quickly.

The priority task seems to me to properly define the data model and its encoding into Zarr. The definition of requirement classes is secondary at this stage.

Thanks for your clarifications and apologies for any terseness in my comments, I just want to rapidly share thoughts in advance of the meeting since I won't be there . I think we need two sources of input -

  1. Information from Unidata about why the abstract data model that they defined and is the basis for the new GeoZarr structure didn't really take off to make sure we're not repeating old mistakes.
  2. Buy-in from implementers. The mix-and-match harmonization approach sounds way more challenging to support than supporting CF and OGC with a lightweight extension mechanism. I think we need to hear what of the two approaches GDAL devs, xarray devs, and other possible implementers would prefer. One of my main use-cases is virtualization (virtualizarr could be considered an implementation) and I would prefer to support data mostly subscribing to CF or OGC rather than any features from either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants