-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persist raw model data #261
Conversation
modelskill/comparison/_comparison.py
Outdated
@@ -462,7 +462,7 @@ def __init__( | |||
modeldata=None, | |||
max_model_gap: Optional[TimeDeltaTypes] = None, | |||
matched_data: Optional[xr.Dataset] = None, | |||
raw_mod_data: Optional[Dict[str, pd.DataFrame]] = None, | |||
raw_mod_data: Optional[Dict[str, pd.Series]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update the docstring to match? Or is that automated somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not automated, but Copilot is very helpful in creating new ones.
def save(self, fn: Union[str, Path]) -> None: | ||
# save to file in netcdf format using xarray | ||
# save each comparer to a netcdf and pack them into a zip file | ||
def save(self, filename: Union[str, Path]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to add an optional argument to avoid accidentally overwriting existing files? Like wrapping the 'mode' parameter from ZipFIle? Default could be kept as 'w' or 'x' depending on which is more convenient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I haven't considered that possibility. If users expect a mode
parameter, we should probably add it to all methods that writes files, e.g. mikeio.Dataset.to_dfs()
@@ -324,3 +324,15 @@ def test_trackmodelresult_and_trackobservation_uses_model_name(): | |||
def test_item_selection_items_are_unique(): | |||
with pytest.raises(ValueError): | |||
ItemSelection(obs="foo", model=["foo", "bar"], aux=["baz"]) | |||
|
|||
|
|||
def test_save_comparercollection(o1, o3, tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this actually be in test_comparercollection.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, I suppose the entire test suite needs to be (re-)organized. I think it would make sense to split tests into unit tests and integration tests and end-to-end tests, or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, nice going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. I like the content related to the name of the PR.
I am not so sure about the change of behavior for compare() which now always returns a cc. I somehow appreciate the consistency, but to me, it was quite meaningful before: if compare one obs then I get one comparer, if I compare multiple obss then I get a collection. Makes sense. Let's discuss tomorrow :-)
But related to save/load functionality, I guess my only comment is: is "raw_" a special enough name that we will not get in trouble if people accidentally named their station "raw_stn1" or similar? I would suggest "raw" or something else that starts with an underscore to indicate that is an internal structure.
I guess I also agree with @rywm-dhi on the overwrite functionality...
The main reason I changed The |
I'll add an underscore at the beginning, but we could also use NetCDF groups and split the raw data into separate datasets. |
It seems like it could be useful, but after inspecting some other methods writing files in Python, checking if the file exists behavior seem to be very unusual, so I guess it is better to leave this up to the end user, they can add a path.exists(), since the default would be to overwrite anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with this for now - and discuss later about return type of compare()
No description provided.