Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributes added by intake-esm contain invalid characters for netCDF #490

Closed
aulemahal opened this issue Jul 14, 2022 · 3 comments · Fixed by #509
Closed

Attributes added by intake-esm contain invalid characters for netCDF #490

aulemahal opened this issue Jul 14, 2022 · 3 comments · Fixed by #509
Labels
enhancement Issues that are found to be a reasonable candidate feature additions

Comments

@aulemahal
Copy link
Contributor

intake-esm creates global attributes based on the catalog. By default, they follow the format: intake_esm_attr/(x). While the first part of the "path" is customizable (#460), the separator character "/" is not. However, it is an illegal character when writing netCDF files (it is valid for zarr datasets though).

Would it be reasonable to have this character:

  • changed to something legal ("_", "-", "", ...)
  • be customizable by downstream applications
    ?
@andersy005
Copy link
Member

changed to something legal ("_", "-", "", ...)

👍🏽 for using a reasonable alternative. i'm curious... is : a valid option?

i've always wondered without it's worth keeping these attributes after we're done with merging datasets. should we add a global option to keep or drop these attributes whenever intake-esm finishes assembling the datasets?

@andersy005 andersy005 added the enhancement Issues that are found to be a reasonable candidate feature additions label Jul 14, 2022
@andersy005
Copy link
Member

@aulemahal, if you have time, a PR is welcome :)

@aulemahal
Copy link
Contributor Author

Test:

import xarray as xr

ds = xr.tutorial.open_dataset('air_temperature')
for char in ['-', ':', '.', '/', "\\", ';', '·', '~', '|', '=', '>', '▶', '🪕']:
    print(char, ':')
    ds2 = ds.copy()
    ds2.attrs[f'cat{char}name'] = 'attr'
    for engine in ['netcdf4', 'h5netcdf', 'scipy']:
        try:
            ds2.to_netcdf('/tmp/test.nc', engine=engine)
        except Exception as err:
            print(f'    {engine}: {err}')
        else:
            print(f'    {engine}: ok!')

Prints:

- :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
: :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
. :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
/ :
    netcdf4: NetCDF: Name contains illegal characters
    h5netcdf: ok!
    scipy: Not a valid attribute name
\ :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
; :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
· :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
~ :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
| :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
= :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
> :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: ok!
▶ :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: 'latin-1' codec can't encode character '\u25b6' in position 3: ordinal not in range(256)
🪕 :
    netcdf4: ok!
    h5netcdf: ok!
    scipy: 'latin-1' codec can't encode character '\U0001fa95' in position 3: ordinal not in range(256)

It really looks like the '/' is the only character that netCDF4 doesn't support. While I'm not suggesting using the banjo 🪕, it shows that the UTF8 support is quite good.

I like : and the \, but the latter is slightly harder to write because you need to escape it '\\'.

I might have time for a PR. I'll want to tackle a few other attribute issues at the same time I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that are found to be a reasonable candidate feature additions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants