Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray to and from Iris #1750

Merged
merged 35 commits into from
Dec 20, 2017
Merged

xarray to and from Iris #1750

merged 35 commits into from
Dec 20, 2017

Conversation

duncanwp
Copy link
Contributor

@duncanwp duncanwp commented Nov 30, 2017

nparley and others added 20 commits April 1, 2016 10:31
…s.cube.Cube objects. Uses the same template as the cdms2 conversion.
…rd and DimCoord correctly so 2d coords will work. Use dims variable to convert dimension numbers into names. Add 2d coord to test DataArray.
# Conflicts:
#	ci/requirements-py27-cdat+iris+pynio.yml
# Conflicts:
#	xarray/test/test_dataarray.py
Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also should work this into the documentation somewhere. I'm not sure if the I/O page is the right place to mention this but it may be.

except ImportError:
# prior to v1.10
from iris.fileformats._pyke_rules.compiled_krb.fc_rules_cf_fc \
import _parse_cell_methods as parse_cell_methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be wroth using importlib to clean this up?

Copy link
Contributor Author

@duncanwp duncanwp Dec 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth just removing this catch completeley actually and requiring iris > 1.10. It's at 1.13 now and nearly 2.0... @pelson?

@@ -23,7 +23,11 @@ Enhancements

- :py:func:`~plot.contourf()` learned to contour 2D variables that have both a 1D co-ordinate (e.g. time) and a 2D co-ordinate (e.g. depth as a function of time).
By `Deepak Cherian <https://github.com/dcherian>`_.
- Added :py:meth:`DataArray.to_iris <xray.DataArray.to_iris>` for
converting a data array into an Iris_ Cube with the same data and coordinates (:issue:`621` and :issue:`37`).
By `Neil Parley <https://github.com/nparley>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note the from_iris method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -446,6 +446,7 @@ DataArray methods
DataArray.to_index
DataArray.to_masked_array
DataArray.to_cdms2
DataArray.to_iris
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also note the from_iris method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@shoyer
Copy link
Member

shoyer commented Dec 1, 2017

@pelson any thoughts here?

I think this may be a reasonable place to start, though it is certainly not leveraging Iris's full metadata decoding capabilities.

It would also be nice to pass dask arrays back and forth, but that will probably need to wait until after Iris 2.0 and #1372 is solved.

@duncanwp
Copy link
Contributor Author

duncanwp commented Dec 1, 2017

I think this may be a reasonable place to start, though it is certainly not leveraging Iris's full metadata decoding capabilities.

It seams like a decent starting point, were there specific decodings you had in mind?

It would also be nice to pass dask arrays back and forth, but that will probably need to wait until after Iris 2.0 and #1372 is solved.

Agreed.

Do you have a feel for if/where else this should be documented? I'm not sure if really fits in I/O...

@fmaussion
Copy link
Member

Do you have a feel for if/where else this should be documented? I'm not sure if really fits in I/O...

I think it's OK in I/O (for now). This is also the place where we document serialization in pickle, and the from_dict methods.

@shoyer
Copy link
Member

shoyer commented Dec 1, 2017

I think it's OK in I/O (for now). This is also the place where we document serialization in pickle, and the from_dict methods.

Agreed. In the long term, the IO docs have gotten pretty long, it could make sense to split them up into several subpages.

doc/io.rst Outdated
----

The Iris_ tool allows easy reading of common meteorological and climate model formats
(including GRIB and UK MetOffice PP files) into ``Cube``s which are in many ways very
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write "Cube objects". Sphinx won't render Cubes properly.

doc/io.rst Outdated
The Iris_ tool allows easy reading of common meteorological and climate model formats
(including GRIB and UK MetOffice PP files) into ``Cube``s which are in many ways very
similar to to ``DataArray``s, while enforcing a CF-compliant data model. If iris is
installaed xarray can convert a ``Cube`` into a ``DataArray`` using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

installaed -> installed

doc/io.rst Outdated
da_cube


Conversly, We can create a new cube object from a ``DataArray`` using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We -> we

doc/io.rst Outdated


Conversly, We can create a new cube object from a ``DataArray`` using
:py:meth:`~xarray.Dataset.from_dict`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_iris?

if 'cell_methods' in dataarray.attrs:
args['cell_methods'] = parse_cell_methods(dataarray.attrs['cell_methods'])

cube = iris.cube.Cube(dataarray.to_masked_array(), **args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than loading every dataset into memory, can we raise NotImplementedError for now if a DataArray's data is a dask array? isinstance(dataarray.dask, dask_array_type) should do it, where dask_array_type is imported from xarray.core.pycompat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising NotImplementedError seems a bit drastic, especially since Iris 2.0 is just around the corner. The latest commit lets Iris decide what to do with the array - it keeps it as a dask array if your version of Iris supports it but will read it into memory otherwise.

Otherwise we have to work out which version of Iris we're converting to/from which would be a pain... If you're happy with this approach I can add another set of tests for the various dask/numpy Cube/DataArray permutations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if it's easy enough to add support for dask, then that is great!

coord_attrs = _iris_obj_to_attrs(coord)
coord_dims = [dims[i] for i in cube.coord_dims(coord)]
if not coord.var_name:
raise ValueError('Coordinate has no var_name')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we print the offending coordinate in the error message? That makes things easier to debug.

try:
dim_coord = cube.coord(dim_coords=True, dimensions=(i,))
dims.append(dim_coord.var_name)
except iris.exceptions.CoordinateNotFoundError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coord_attrs = _iris_obj_to_attrs(coord)
coord_dims = [dims[i] for i in cube.coord_dims(coord)]
if not coord.var_name:
raise ValueError('Coordinate has no var_name')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line also need coverage in a test.

…sion of dask arrays which may or may not be masked.
…onvert

# Conflicts:
#	ci/requirements-py27-cdat+iris+pynio.yml
@duncanwp
Copy link
Contributor Author

OK, here's another go. It's not pretty, but it seems to work. Once the various issues referenced above are solved it should clean it up, but I'd rather not wait for those since this has been in the works a while now.

doc/io.rst Outdated

The Iris_ tool allows easy reading of common meteorological and climate model formats
(including GRIB and UK MetOffice PP files) into ``Cube`` objects which are in many ways very
similar to to ``DataArray``s, while enforcing a CF-compliant data model. If iris is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double "to"

doc/io.rst Outdated

da_cube = xr.Dataset.from_iris(cube)
da_cube

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these examples working? If you'd like this code to run at build time you'll have to add iris to the doc build environment and define the cube variable first.

Alternatively, you can use the :verbatim: environment (see OPeNDAP example below) which is static and therefore easier (and more error prone on the long term, though)

Copy link
Member

@fmaussion fmaussion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I just went over the doc section and have a couple of small comments.

@duncanwp
Copy link
Contributor Author

Great - thanks!

@shoyer
Copy link
Member

shoyer commented Dec 18, 2017

dask/dask#2977 needs to be resolved to make it possible to properly translate dask arrays back and forth between xarray/Iris.

I don't want to hold up this PR which is coming along very nicely (and I'm sure would already be useful), so we can we simply differ handling of dask arrays for now? I would suggest raising NotImplementedError and printing an error message with a link to a follow-up issue in xarray's issue tracker.

@duncanwp
Copy link
Contributor Author

dask/dask#2977 would certianly make it cleaner, but 95b0197 should work in the meantime, unless there's a specific case I've missed?

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we're OK on the dask front, most likely :)

doc/io.rst Outdated

The Iris_ tool allows easy reading of common meteorological and climate model formats
(including GRIB and UK MetOffice PP files) into ``Cube`` objects which are in many ways very
similar to ``DataArray``s, while enforcing a CF-compliant data model. If iris is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be "DataArray object"

(RST can't handle anything adjacent to backticks which is why the color looks weird here in GitHub's syntax highlighting. Yes, it's not very intuitive.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks

from iris.fileformats.netcdf import parse_cell_methods
try:
from dask.array import ma as dask_ma
from dask.array import Array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of testing for Array, use:

from xarray.core.pycompat import dask_array_type

...
if isinstance(data, dask_array_type):
    # this is always defined, even if dask is not installed


# Create the right type of masked array (should be easier after #1769)
if isinstance(dataarray.data, Array):
masked_data = dask_ma.masked_invalid(dataarray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just import from dask.array import ma as dask_ma right about this line to avoid the need for guarding the import statement in try/except.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer - thanks!

cube_data = cube.core_data() if hasattr(cube, 'core_data') else cube.data

# Deal with dask and numpy masked arrays
if dask_ma and isinstance(cube_data, Array):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Iris always use dask masked arrays? Or does it ever use dask numpy arrays? @pelson?

(I would guess not, since we can't tell these apart.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters, if the given array isn't masked then ma.filled is just a no-op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, good to hear!

cube_data = cube.core_data() if hasattr(cube, 'core_data') else cube.data

# Deal with dask and numpy masked arrays
if dask_ma and isinstance(cube_data, Array):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, again it would be better to use dask_array_type andd import dask.array.ma here rather than above. That would ensure that an error is raised if somehow a cube contains a dask array but has the wrong version of dask installed, rather than letting the error pass silently.

doc/io.rst Outdated
:py:meth:`~xarray.Dataset.from_iris`:

.. ipython:: python
:verbatim:
Copy link
Member

@shoyer shoyer Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:verbatim: is basically designed for copying/pasting output from the IPython command line, so this isn't quite using it right yet.

I would definitely prefer adding Iris to our docs builds if that's feasible. See:
https://github.com/pydata/xarray/blob/master/doc/environment.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll give it a shot. How does the environment work, are all the examples run in the same interpreter session? And are there pre-built datarrays I can just use as examples?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have xr.tutorial.load_dataset('air_temperature') is a reasonable choice, though I'm not 100% sure it's sufficiently CF compliant for Iris :).

I'm OK without actually printing output here (maybe for now).... but in that case I would set things up a little differently so verbatim prints it properly, e.g.,

.. ipython:: python
    :verbatim:

    In [1]: da_cube = xr.Dataset.from_iris(cube)

Basically, don't include the line that would print the repr for the cube/array unless you actually would make the Python objects.

@duncanwp
Copy link
Contributor Author

How's this? (I turned the example around so that I create the cube as a DataArray then convert it...)

@shoyer
Copy link
Member

shoyer commented Dec 18, 2017

@duncanwp that example looks good to me. I assume it runs locally with this branch?

@duncanwp
Copy link
Contributor Author

Yep, it's not a very exciting cube, but it runs fine.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready. @fmaussion @jhamman any further concerns?

Copy link
Member

@fmaussion fmaussion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @duncanwp , this is a very important addition!

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks for pushing this over the finish line. Can we fix a few style issues before merging:

git diff upstream/master **/*py | flake8 --diff
xarray/convert.py:12:1: F401 '.conventions.decode_cf_variable' imported but unused
xarray/convert.py:13:80: E501 line too long (81 > 79 characters)
xarray/convert.py:16:80: E501 line too long (118 > 79 characters)
xarray/convert.py:17:80: E501 line too long (118 > 79 characters)
xarray/convert.py:18:80: E501 line too long (107 > 79 characters)
xarray/convert.py:20:80: E501 line too long (100 > 79 characters)
xarray/convert.py:120:80: E501 line too long (82 > 79 characters)
xarray/convert.py:185:80: E501 line too long (94 > 79 characters)
xarray/convert.py:189:80: E501 line too long (81 > 79 characters)
xarray/tests/test_dataarray.py:2866:80: E501 line too long (83 > 79 characters)
xarray/tests/test_dataarray.py:2872:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2878:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2888:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2933:80: E501 line too long (105 > 79 characters)
xarray/tests/test_dataarray.py:2939:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2945:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2949:80: E501 line too long (89 > 79 characters)
xarray/tests/test_dataarray.py:2958:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2982:80: E501 line too long (84 > 79 characters)

@duncanwp
Copy link
Contributor Author

Ah yes, sorry about that - fixed now.

@duncanwp
Copy link
Contributor Author

This looks good to go now - thanks for all your help!

@shoyer shoyer merged commit eeb109d into pydata:master Dec 20, 2017
@pelson
Copy link

pelson commented Jan 3, 2018

Apologies for the delay. This is a great first pass. Definitely more metadata sharing possible, but I'm glad this has been merged - let's grow the capability as required. If any issues crop up, please feel free to ping me & @bjlittle. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants