Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default_dtype set to something else than None in open_mdsdataset ? #29

Closed
serazing opened this issue Jan 12, 2017 · 14 comments
Closed

default_dtype set to something else than None in open_mdsdataset ? #29

serazing opened this issue Jan 12, 2017 · 14 comments

Comments

@serazing
Copy link
Contributor

serazing commented Jan 12, 2017

I had some trouble to figure out that I needed to specify default_dtype to read llc4320 outputs without having a .meta file for each variable.

By default, it is set to 'f4' in mds_store._MDSDataStore(xr.backends.common.AbstractDataStore) but it is probably overridden by None coming from mds_store.open_mdsdataset default arguments.

Would it be better to set 'default_dtype' to 'f4' in mds_store.open_mdsdataset ?

This would avoid to raise the IO exception in utils.read_mds when the code try to build the missing metadata.

  except IOError as e:
        # we can recover from not having a .meta file if dtype and shape have
        # been specified already
        if (shape is None) or (dtype is None):
            raise e
        else:
            nrecs = 1
            shape = list(shape)
            shape.insert(0, nrecs)
name = os.path.basename(fname)
@serazing serazing changed the title Is d Could default_dtype be set to something else than None in mds_store.open_mdsdataset by default ? Jan 12, 2017
@serazing serazing changed the title Could default_dtype be set to something else than None in mds_store.open_mdsdataset by default ? default_dtype set to something else than None in open_mdsdataset ? Jan 12, 2017
@rabernat
Copy link
Member

@serazing, are you using the version on pypi, or the latest master from github.

I did some serious re-factoring of the LLC code path in #25. Could you check to see if that fixes your problem?

I think I should release v0.2 soon, since I have made lots of updates.

@serazing
Copy link
Contributor Author

I am using the latest master from github that I have updated last week. I think I will still have this issue with the current master but I have to test it again to be sure (soon because I'm currently attending a meeting).

By the way, I have successfully made an extraction from llc4320 to a netcdf file. It only takes a couple of lines. I have to thank you for this nice job. You've made the binaries quite easy to read using xarray objects.

@rabernat
Copy link
Member

Ok, glad to hear it! Did you do this on pleiades? Or some other system?

Could you post your code as a gist perhaps?

I am still having big problems on pleaides.

@serazing
Copy link
Contributor Author

Not yet, I am still prototyping with a few snapshots on a local cluster. I have to figure out the best way to convert several timesteps into netcdf files. I'll be glad to share this extraction code afterwards.

@rabernat
Copy link
Member

You can read multiple timesteps with open_mdsdataset. Did this not work?

And you can write multiple netcdf files using xarray.save_mfdataset

@rabernat
Copy link
Member

So to clarify the original issue: are you saying that the default_dtype argument does not work?

If so, I would welcome a pull request to fix the problem.

@serazing
Copy link
Contributor Author

serazing commented Jan 17, 2017

The default_dtype argument works fine but if it is left to its default value (None), the code raises an error when there is no metadata files associated with the variables.

I could suggest to set another default value. I could make a pull request for that.

@rabernat
Copy link
Member

the code raises an error when there is no metadata files associated with the variables.

This was on purpose. I don't want to assume the datatype. If the user wants to be able to read .data files without any .meta files, the user has to manually put this information in. If we assume a datatype and it is the wrong datatype, the user will get nonsense and not know why.

However, I think the documentation and error message could be improved to clarify this. Can you post the error you get if you don't specify the datatype?

@serazing
Copy link
Contributor Author

serazing commented Jan 17, 2017

That's fair enough.

Here is the complete error I get.

  File "extract_llc_eta.py", line 27, in <module>
    ds = open_mdsdataset(main_dir, grid_dir=grid_dir, geometry='llc', iters=timestep)
  File "/home/users/serazin3g/miniconda2/envs/xmitgcm-dev/lib/python2.7/site-packages/xmitgcm-0.1.0-py2.7.egg/xmitgcm/mds_store.py", line 180, in open_mdsdataset
    nx=nx, ny=ny, nz=nz)
  File "/home/users/serazin3g/miniconda2/envs/xmitgcm-dev/lib/python2.7/site-packages/xmitgcm-0.1.0-py2.7.egg/xmitgcm/mds_store.py", line 421, in __init__
    for (vname, dims, data, attrs) in self.load_from_prefix(p, iternum):
  File "/home/users/serazin3g/miniconda2/envs/xmitgcm-dev/lib/python2.7/site-packages/xmitgcm-0.1.0-py2.7.egg/xmitgcm/mds_store.py", line 500, in load_from_prefix
    shape=data_shape, llc=self.llc)
  File "/home/users/serazin3g/miniconda2/envs/xmitgcm-dev/lib/python2.7/site-packages/xmitgcm-0.1.0-py2.7.egg/xmitgcm/utils.py", line 87, in read_mds
    raise e
IOError: [Errno 2] No such file or directory: '/srv/share/modelset102/SSH_from_NASA/Eta.0000327888.meta'

It basically says that the *.meta file does not exist. In this case, it would be nice to add something that encourages the user to specify the default_dtype parameter. The exception in utils.mds_store (see my first post) may have to be slightly modified.

@rabernat
Copy link
Member

rabernat commented Jan 17, 2017 via email

@serazing
Copy link
Contributor Author

Sure.

@serazing
Copy link
Contributor Author

I have faced another issue with the default_dtype parameter related to the endian. When metadata files are absent, if default_dtype is only set to 'f4', for example, the binary files are read with the wrong endian. The default parameter endian does not seem to apply on default_dtype and the user has to specify the full dtype '>f4'.

I may suggest a little check for the default_dtype parameter:

if default_dtype is not None:
    default_dtype = np.dtype(default_dtype).newbyteorder(endian)

When metadata are present, everything is alright since the endian is specified on the current dtype.
Lines 87-89 in utils.read_mds

try:
    nrecs, shape, name, dtype, fldlist = get_useful_info_from_meta_file(metafile)
    dtype = dtype.newbyteorder(endian)

@rabernat
Copy link
Member

Yes, I agree those keywords are somewhat overlapping and confusing.

Please go ahead and address all of this in your PR.

@rabernat
Copy link
Member

Closed by #34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants