-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray to and from Iris #1750
xarray to and from Iris #1750
Conversation
…s.cube.Cube objects. Uses the same template as the cdms2 conversion.
…rd and DimCoord correctly so 2d coords will work. Use dims variable to convert dimension numbers into names. Add 2d coord to test DataArray.
# Conflicts: # .travis.yml
# Conflicts: # ci/requirements-py27-cdat+iris+pynio.yml
# Conflicts: # xarray/test/test_dataarray.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also should work this into the documentation somewhere. I'm not sure if the I/O page is the right place to mention this but it may be.
xarray/convert.py
Outdated
except ImportError: | ||
# prior to v1.10 | ||
from iris.fileformats._pyke_rules.compiled_krb.fc_rules_cf_fc \ | ||
import _parse_cell_methods as parse_cell_methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be wroth using importlib
to clean this up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth just removing this catch completeley actually and requiring iris > 1.10. It's at 1.13 now and nearly 2.0... @pelson?
doc/whats-new.rst
Outdated
@@ -23,7 +23,11 @@ Enhancements | |||
|
|||
- :py:func:`~plot.contourf()` learned to contour 2D variables that have both a 1D co-ordinate (e.g. time) and a 2D co-ordinate (e.g. depth as a function of time). | |||
By `Deepak Cherian <https://github.com/dcherian>`_. | |||
- Added :py:meth:`DataArray.to_iris <xray.DataArray.to_iris>` for | |||
converting a data array into an Iris_ Cube with the same data and coordinates (:issue:`621` and :issue:`37`). | |||
By `Neil Parley <https://github.com/nparley>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note the from_iris
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -446,6 +446,7 @@ DataArray methods | |||
DataArray.to_index | |||
DataArray.to_masked_array | |||
DataArray.to_cdms2 | |||
DataArray.to_iris |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also note the from_iris
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
It seams like a decent starting point, were there specific decodings you had in mind?
Agreed. Do you have a feel for if/where else this should be documented? I'm not sure if really fits in I/O... |
I think it's OK in I/O (for now). This is also the place where we document serialization in pickle, and the |
Agreed. In the long term, the IO docs have gotten pretty long, it could make sense to split them up into several subpages. |
…thods interface (this is already pretty old now)
doc/io.rst
Outdated
---- | ||
|
||
The Iris_ tool allows easy reading of common meteorological and climate model formats | ||
(including GRIB and UK MetOffice PP files) into ``Cube``s which are in many ways very |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write "Cube
objects". Sphinx won't render Cube
s properly.
doc/io.rst
Outdated
The Iris_ tool allows easy reading of common meteorological and climate model formats | ||
(including GRIB and UK MetOffice PP files) into ``Cube``s which are in many ways very | ||
similar to to ``DataArray``s, while enforcing a CF-compliant data model. If iris is | ||
installaed xarray can convert a ``Cube`` into a ``DataArray`` using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
installaed -> installed
doc/io.rst
Outdated
da_cube | ||
|
||
|
||
Conversly, We can create a new cube object from a ``DataArray`` using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We -> we
doc/io.rst
Outdated
|
||
|
||
Conversly, We can create a new cube object from a ``DataArray`` using | ||
:py:meth:`~xarray.Dataset.from_dict`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_iris?
xarray/convert.py
Outdated
if 'cell_methods' in dataarray.attrs: | ||
args['cell_methods'] = parse_cell_methods(dataarray.attrs['cell_methods']) | ||
|
||
cube = iris.cube.Cube(dataarray.to_masked_array(), **args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than loading every dataset into memory, can we raise NotImplementedError
for now if a DataArray's data is a dask array? isinstance(dataarray.dask, dask_array_type)
should do it, where dask_array_type
is imported from xarray.core.pycompat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising NotImplementedError
seems a bit drastic, especially since Iris 2.0 is just around the corner. The latest commit lets Iris decide what to do with the array - it keeps it as a dask array if your version of Iris supports it but will read it into memory otherwise.
Otherwise we have to work out which version of Iris we're converting to/from which would be a pain... If you're happy with this approach I can add another set of tests for the various dask/numpy Cube/DataArray permutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if it's easy enough to add support for dask, then that is great!
xarray/convert.py
Outdated
coord_attrs = _iris_obj_to_attrs(coord) | ||
coord_dims = [dims[i] for i in cube.coord_dims(coord)] | ||
if not coord.var_name: | ||
raise ValueError('Coordinate has no var_name') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we print the offending coordinate in the error message? That makes things easier to debug.
try: | ||
dim_coord = cube.coord(dim_coords=True, dimensions=(i,)) | ||
dims.append(dim_coord.var_name) | ||
except iris.exceptions.CoordinateNotFoundError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line doesn't have an test coverage:
https://coveralls.io/builds/14513466/source?filename=xarray%2Fconvert.py
xarray/convert.py
Outdated
coord_attrs = _iris_obj_to_attrs(coord) | ||
coord_dims = [dims[i] for i in cube.coord_dims(coord)] | ||
if not coord.var_name: | ||
raise ValueError('Coordinate has no var_name') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line also need coverage in a test.
…sion of dask arrays which may or may not be masked.
…onvert # Conflicts: # ci/requirements-py27-cdat+iris+pynio.yml
OK, here's another go. It's not pretty, but it seems to work. Once the various issues referenced above are solved it should clean it up, but I'd rather not wait for those since this has been in the works a while now. |
doc/io.rst
Outdated
|
||
The Iris_ tool allows easy reading of common meteorological and climate model formats | ||
(including GRIB and UK MetOffice PP files) into ``Cube`` objects which are in many ways very | ||
similar to to ``DataArray``s, while enforcing a CF-compliant data model. If iris is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double "to"
doc/io.rst
Outdated
|
||
da_cube = xr.Dataset.from_iris(cube) | ||
da_cube | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these examples working? If you'd like this code to run at build time you'll have to add iris to the doc build environment and define the cube
variable first.
Alternatively, you can use the :verbatim:
environment (see OPeNDAP example below) which is static and therefore easier (and more error prone on the long term, though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! I just went over the doc section and have a couple of small comments.
Great - thanks! |
dask/dask#2977 needs to be resolved to make it possible to properly translate dask arrays back and forth between xarray/Iris. I don't want to hold up this PR which is coming along very nicely (and I'm sure would already be useful), so we can we simply differ handling of dask arrays for now? I would suggest raising |
dask/dask#2977 would certianly make it cleaner, but 95b0197 should work in the meantime, unless there's a specific case I've missed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we're OK on the dask front, most likely :)
doc/io.rst
Outdated
|
||
The Iris_ tool allows easy reading of common meteorological and climate model formats | ||
(including GRIB and UK MetOffice PP files) into ``Cube`` objects which are in many ways very | ||
similar to ``DataArray``s, while enforcing a CF-compliant data model. If iris is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be "DataArray
object"
(RST can't handle anything adjacent to backticks which is why the color looks weird here in GitHub's syntax highlighting. Yes, it's not very intuitive.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks
xarray/convert.py
Outdated
from iris.fileformats.netcdf import parse_cell_methods | ||
try: | ||
from dask.array import ma as dask_ma | ||
from dask.array import Array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of testing for Array, use:
from xarray.core.pycompat import dask_array_type
...
if isinstance(data, dask_array_type):
# this is always defined, even if dask is not installed
xarray/convert.py
Outdated
|
||
# Create the right type of masked array (should be easier after #1769) | ||
if isinstance(dataarray.data, Array): | ||
masked_data = dask_ma.masked_invalid(dataarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just import from dask.array import ma as dask_ma
right about this line to avoid the need for guarding the import statement in try/except.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much nicer - thanks!
xarray/convert.py
Outdated
cube_data = cube.core_data() if hasattr(cube, 'core_data') else cube.data | ||
|
||
# Deal with dask and numpy masked arrays | ||
if dask_ma and isinstance(cube_data, Array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Iris always use dask masked arrays? Or does it ever use dask numpy arrays? @pelson?
(I would guess not, since we can't tell these apart.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters, if the given array isn't masked then ma.filled is just a no-op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, good to hear!
xarray/convert.py
Outdated
cube_data = cube.core_data() if hasattr(cube, 'core_data') else cube.data | ||
|
||
# Deal with dask and numpy masked arrays | ||
if dask_ma and isinstance(cube_data, Array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, again it would be better to use dask_array_type
andd import dask.array.ma
here rather than above. That would ensure that an error is raised if somehow a cube contains a dask array but has the wrong version of dask installed, rather than letting the error pass silently.
doc/io.rst
Outdated
:py:meth:`~xarray.Dataset.from_iris`: | ||
|
||
.. ipython:: python | ||
:verbatim: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:verbatim:
is basically designed for copying/pasting output from the IPython command line, so this isn't quite using it right yet.
I would definitely prefer adding Iris to our docs builds if that's feasible. See:
https://github.com/pydata/xarray/blob/master/doc/environment.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll give it a shot. How does the environment work, are all the examples run in the same interpreter session? And are there pre-built datarrays I can just use as examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have xr.tutorial.load_dataset('air_temperature')
is a reasonable choice, though I'm not 100% sure it's sufficiently CF compliant for Iris :).
I'm OK without actually printing output here (maybe for now).... but in that case I would set things up a little differently so verbatim prints it properly, e.g.,
.. ipython:: python
:verbatim:
In [1]: da_cube = xr.Dataset.from_iris(cube)
Basically, don't include the line that would print the repr for the cube/array unless you actually would make the Python objects.
How's this? (I turned the example around so that I create the cube as a DataArray then convert it...) |
@duncanwp that example looks good to me. I assume it runs locally with this branch? |
Yep, it's not a very exciting cube, but it runs fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready. @fmaussion @jhamman any further concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @duncanwp , this is a very important addition!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Thanks for pushing this over the finish line. Can we fix a few style issues before merging:
git diff upstream/master **/*py | flake8 --diff
xarray/convert.py:12:1: F401 '.conventions.decode_cf_variable' imported but unused
xarray/convert.py:13:80: E501 line too long (81 > 79 characters)
xarray/convert.py:16:80: E501 line too long (118 > 79 characters)
xarray/convert.py:17:80: E501 line too long (118 > 79 characters)
xarray/convert.py:18:80: E501 line too long (107 > 79 characters)
xarray/convert.py:20:80: E501 line too long (100 > 79 characters)
xarray/convert.py:120:80: E501 line too long (82 > 79 characters)
xarray/convert.py:185:80: E501 line too long (94 > 79 characters)
xarray/convert.py:189:80: E501 line too long (81 > 79 characters)
xarray/tests/test_dataarray.py:2866:80: E501 line too long (83 > 79 characters)
xarray/tests/test_dataarray.py:2872:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2878:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2888:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2933:80: E501 line too long (105 > 79 characters)
xarray/tests/test_dataarray.py:2939:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2945:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2949:80: E501 line too long (89 > 79 characters)
xarray/tests/test_dataarray.py:2958:80: E501 line too long (80 > 79 characters)
xarray/tests/test_dataarray.py:2982:80: E501 line too long (84 > 79 characters)
Ah yes, sorry about that - fixed now. |
This looks good to go now - thanks for all your help! |
Apologies for the delay. This is a great first pass. Definitely more metadata sharing possible, but I'm glad this has been merged - let's grow the capability as required. If any issues crop up, please feel free to ping me & @bjlittle. 👍 |
git diff upstream/master **/*py | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new API