Attributes added by intake-esm contain invalid characters for netCDF #490

aulemahal · 2022-07-14T15:58:33Z

intake-esm creates global attributes based on the catalog. By default, they follow the format: intake_esm_attr/(x). While the first part of the "path" is customizable (#460), the separator character "/" is not. However, it is an illegal character when writing netCDF files (it is valid for zarr datasets though).

Would it be reasonable to have this character:

changed to something legal ("_", "-", "", ...)
be customizable by downstream applications
?

andersy005 · 2022-07-14T19:32:48Z

changed to something legal ("_", "-", "", ...)

👍🏽 for using a reasonable alternative. i'm curious... is : a valid option?

i've always wondered without it's worth keeping these attributes after we're done with merging datasets. should we add a global option to keep or drop these attributes whenever intake-esm finishes assembling the datasets?

andersy005 · 2022-07-14T19:33:34Z

@aulemahal, if you have time, a PR is welcome :)

aulemahal · 2022-07-14T20:41:25Z

Test:

import xarray as xr

ds = xr.tutorial.open_dataset('air_temperature')
for char in ['-', ':', '.', '/', "\\", ';', '·', '~', '|', '=', '>', '▶', '🪕']:
    print(char, ':')
    ds2 = ds.copy()
    ds2.attrs[f'cat{char}name'] = 'attr'
    for engine in ['netcdf4', 'h5netcdf', 'scipy']:
        try:
            ds2.to_netcdf('/tmp/test.nc', engine=engine)
        except Exception as err:
            print(f'    {engine}: {err}')
        else:
            print(f'    {engine}: ok!')

Prints:

- :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
: :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
. :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
/ :
    netcdf4: NetCDF: Name contains illegal characters
    h5netcdf: ok!
    scipy: Not a valid attribute name
\ :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
; :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
· :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
~ :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
| :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
= :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
> :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
▶ :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: 'latin-1' codec can't encode character '\u25b6' in position 3: ordinal not in range(256)
🪕 :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: 'latin-1' codec can't encode character '\U0001fa95' in position 3: ordinal not in range(256)

It really looks like the '/' is the only character that netCDF4 doesn't support. While I'm not suggesting using the banjo 🪕, it shows that the UTF8 support is quite good.

I like : and the \, but the latter is slightly harder to write because you need to escape it '\\'.

I might have time for a PR. I'll want to tackle a few other attribute issues at the same time I think.

aulemahal mentioned this issue Jul 14, 2022

attributes created by intake-esm contain illegal characters Ouranosinc/xscen#13

Closed

andersy005 added the enhancement Issues that are found to be a reasonable candidate feature additions label Jul 14, 2022

andersy005 mentioned this issue Aug 22, 2022

Ensure global attributes added by intake-esm are compatible with netCDF and Zarr #509

Merged

3 tasks

andersy005 closed this as completed in #509 Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attributes added by intake-esm contain invalid characters for netCDF #490

Attributes added by intake-esm contain invalid characters for netCDF #490

aulemahal commented Jul 14, 2022

andersy005 commented Jul 14, 2022

andersy005 commented Jul 14, 2022

aulemahal commented Jul 14, 2022

Attributes added by intake-esm contain invalid characters for netCDF #490

Attributes added by intake-esm contain invalid characters for netCDF #490

Comments

aulemahal commented Jul 14, 2022

andersy005 commented Jul 14, 2022

andersy005 commented Jul 14, 2022

aulemahal commented Jul 14, 2022