Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use xarray dataset as internal data container #262

Merged
merged 75 commits into from
Sep 29, 2023
Merged

Conversation

jsmariegaard
Copy link
Member

@jsmariegaard jsmariegaard commented Sep 20, 2023

Changing the internal data structure of observations from pd.DataFrame to xr.Dataset.

Some considerations:

  • When should na values be removed? (and should we keep track na-values? before this PR it was dependent on the input type i.e. different functionality whether you constructed from df or dfs0)
  • Can all "matching" be extracted from _comparison and moved to matching.py? Such that Comparer can only be initialized with an already-matched dataset
  • TrackModelResult is now almost equal to TrackObservation and PointModelResult is almost equal to PointObservation - should we rather make a TrackTimeseries and a PointTimeseries and have most functionality in there?

Objectives of this PR:

  • Consistent .data throughout modelskill
  • Make it easy to match modelresults and observations (as everything is now xr.Dataset)
  • Make it easy to add aux items to observation and modelresults
  • Make it easy to add attrs container to observation and modelresults

@jsmariegaard jsmariegaard marked this pull request as ready for review September 27, 2023 18:38
@jsmariegaard
Copy link
Member Author

Much more to be done, but maybe good to merge soon to ease collaboration and minimize merge conflicts

return data


def _model2obs_interp(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function doesn't reveal that it also remove gaps, also not the docstring. Should gap removal be part of this function.

@ecomodeller
Copy link
Member

I tried to modify a few lines like this

ds[name].attrs["quantity"] = quantity.to_dict()

to

ds[name].attrs["long_name"] = quantity.name
ds[name].attrs["units"] = quantity.unit

To be able to persist the dataset as NetCDF and be compatible with CF convention making easier to plot the data with xarray, but somehow managed to change more than necessary, making a lot of tests to fail. ☹️

So I deleted those changes and only committed the tests that saves the data as NetCDF.

@ecomodeller
Copy link
Member

Plotting the xarray dataarray uses long_name and units
image

@ecomodeller
Copy link
Member

x, y in track data are now coordinates and not data variables.
image

@jsmariegaard
Copy link
Member Author

x, y in track data are now coordinates and not data variables. image

I was wondering about this. Thanks

@ecomodeller ecomodeller merged commit 1196c1a into main Sep 29, 2023
@jsmariegaard jsmariegaard deleted the use-xarray-dataset branch November 20, 2023 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants