Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist raw model data #261

Merged
merged 16 commits into from
Sep 22, 2023
Merged

Persist raw model data #261

merged 16 commits into from
Sep 22, 2023

Conversation

ecomodeller
Copy link
Member

No description provided.

@ecomodeller ecomodeller changed the title Persist_raw_model_data Persist raw model data Sep 20, 2023
@ecomodeller ecomodeller marked this pull request as ready for review September 20, 2023 11:15
@@ -462,7 +462,7 @@ def __init__(
modeldata=None,
max_model_gap: Optional[TimeDeltaTypes] = None,
matched_data: Optional[xr.Dataset] = None,
raw_mod_data: Optional[Dict[str, pd.DataFrame]] = None,
raw_mod_data: Optional[Dict[str, pd.Series]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the docstring to match? Or is that automated somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not automated, but Copilot is very helpful in creating new ones.

def save(self, fn: Union[str, Path]) -> None:
# save to file in netcdf format using xarray
# save each comparer to a netcdf and pack them into a zip file
def save(self, filename: Union[str, Path]) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to add an optional argument to avoid accidentally overwriting existing files? Like wrapping the 'mode' parameter from ZipFIle? Default could be kept as 'w' or 'x' depending on which is more convenient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I haven't considered that possibility. If users expect a mode parameter, we should probably add it to all methods that writes files, e.g. mikeio.Dataset.to_dfs()

@@ -324,3 +324,15 @@ def test_trackmodelresult_and_trackobservation_uses_model_name():
def test_item_selection_items_are_unique():
with pytest.raises(ValueError):
ItemSelection(obs="foo", model=["foo", "bar"], aux=["baz"])


def test_save_comparercollection(o1, o3, tmp_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this actually be in test_comparercollection.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I suppose the entire test suite needs to be (re-)organized. I think it would make sense to split tests into unit tests and integration tests and end-to-end tests, or something along those lines.

Copy link
Collaborator

@ryan-kipawa ryan-kipawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, nice going.

Copy link
Member

@jsmariegaard jsmariegaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I like the content related to the name of the PR.

I am not so sure about the change of behavior for compare() which now always returns a cc. I somehow appreciate the consistency, but to me, it was quite meaningful before: if compare one obs then I get one comparer, if I compare multiple obss then I get a collection. Makes sense. Let's discuss tomorrow :-)

But related to save/load functionality, I guess my only comment is: is "raw_" a special enough name that we will not get in trouble if people accidentally named their station "raw_stn1" or similar? I would suggest "raw" or something else that starts with an underscore to indicate that is an internal structure.

I guess I also agree with @rywm-dhi on the overwrite functionality...

@ecomodeller
Copy link
Member Author

I am not so sure about the change of behavior for compare() which now always returns a cc. I somehow appreciate the consistency, but to me, it was quite meaningful before: if compare one obs then I get one comparer, if I compare multiple obss then I get a collection. Makes sense. Let's discuss tomorrow :-)

The main reason I changed compare to return a ComparerCollection, is that multiple return types causes editors like VS Code to misinterpret the returned object, which you can expect writing a script with no active session, but also in a notebook when it should be able to inspect the actual object☹️. Since both the Comparer and the ComparerCollection has a save method, VS Code presents the wrong auto-completion and documentation, which is not helpful.

The compare function should be the main entry-point for people starting out with the library and multiple return types is confusing.

@ecomodeller
Copy link
Member Author

But related to save/load functionality, I guess my only comment is: is "raw_" a special enough name that we will not get in trouble if people accidentally named their station "raw_stn1" or similar? I would suggest "raw" or something else that starts with an underscore to indicate that is an internal structure.

I'll add an underscore at the beginning, but we could also use NetCDF groups and split the raw data into separate datasets.
The structure of this file is a fairly important decision, if we want to be able to support loading files across versions of modelskill.

@ecomodeller
Copy link
Member Author

ecomodeller commented Sep 21, 2023

I guess I also agree with @rywm-dhi on the overwrite functionality...

It seems like it could be useful, but after inspecting some other methods writing files in Python, checking if the file exists behavior seem to be very unusual, so I guess it is better to leave this up to the end user, they can add a path.exists(), since the default would be to overwrite anyway.

Copy link
Member

@jsmariegaard jsmariegaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with this for now - and discuss later about return type of compare()

@ecomodeller ecomodeller merged commit 11bf57c into main Sep 22, 2023
@ecomodeller ecomodeller deleted the persist_raw_model_data branch September 22, 2023 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants