Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_metrics_validation_services #241

Merged
merged 6 commits into from
Aug 4, 2023
Merged

new_metrics_validation_services #241

merged 6 commits into from
Aug 4, 2023

Conversation

daniel-caichac-DHI
Copy link
Collaborator

@daniel-caichac-DHI daniel-caichac-DHI commented Jul 5, 2023

Hi,

I added 2 new metrics that will most likely be used in the validation services.
1: Explained Variation (EV). That on is straight forward. Formula added in docstring.
2: Peak Ratio (PR):

  • For this one I need an ancilliary function called 'partial_duration_series', very similar to the PoT function from the EVA_editor in Mikezero toolbox (to identify peaks).

  • The peak ratio then is just the ratio of the mean of model peaks/ measured peaks.

  • Added scipy to dependencies list

  • Also added tests

@daniel-caichac-DHI daniel-caichac-DHI marked this pull request as draft July 5, 2023 09:30
@daniel-caichac-DHI daniel-caichac-DHI marked this pull request as ready for review July 5, 2023 09:55
@jsmariegaard
Copy link
Member

Great - thanks for adding this

Copy link
Member

@jsmariegaard jsmariegaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Almost there - see comments and maybe add a few more tests

assert obs.size == model.size
if len(obs) == 0:
return np.nan
time=obs.index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have an assert statement before this line checking that the inputs are indeed dataframes with a DatetimeIndex. All other metrics just assumes input is array-like.

Copy link
Collaborator Author

@daniel-caichac-DHI daniel-caichac-DHI Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually.... we changed that in a previous PR, so the metrics inputs are now Pandas.Series I understand, see:
#215
I just copied the same lines from all the other metrics. But from the previous pull request I understand that now that the inputs changed to pandas series instead of arrays :S maybe be should change the all the metrics from
obs:array to obs:series

what do you say? Maybe I am wrong and inputs are still arrays.


import modelskill.metrics as mtr

@pytest.fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since obs_series and mod_series are only used a single place I think it is better to define them inside test_pr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used on a single place so far ... if this makes running the tests slower I understand, but if not, will it no be easier for future tests to already have it as a pytest fixture? I can move it if you really think is better though.

\\frac{ \\sum_{i=1}^n (obs_i - \\overline{obs})^2 - \\sum_{i=1}^n \\left( (obs_i - \\overline{obs}) - (model_i - \\overline{model}) \\right)^2}{\\sum_{i=1}^n (obs_i - \\overline{obs})^2}

Range: [0, 1]; Best: 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a "See Also" with reference to r2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that I am not sure if it is the same, as r2 can be negative, and this ev cannot.
I do not have the reference for this metric.

@daniel-caichac-DHI
Copy link
Collaborator Author

daniel-caichac-DHI commented Jul 6, 2023

Great work. Almost there - see comments and maybe add a few more tests

Answered the comments here (waiting for confirmation) but regarding the tests, all metrics seem to have a single test.
I could add a 2nd test to metrics EV and PR with some expected output, but what other test could it be?

Ok Just gave it a thought. maybe if number of peaks =< 1 then we should return NaN or something like that.

@jsmariegaard
Copy link
Member

@daniel-caichac-DHI is this PR done in your opinion? You mentioned: "maybe if number of peaks =< 1 then we should return NaN or something like that."... But should we postpone that? Or do you have other things?

@daniel-caichac-DHI
Copy link
Collaborator Author

daniel-caichac-DHI commented Aug 2, 2023

I was suppossed to do more tests but I didn't manage as I went on vacations :)
But besides that I think is ready to be merged honestly. The number of peaks <1 can be postponed

@daniel-caichac-DHI
Copy link
Collaborator Author

daniel-caichac-DHI commented Aug 3, 2023

I added the changes as requested. I did not add on purpose the 'example' of the peak ratio because it needs a timeseries as input (instead of just a numpy array) as it needs to detect the peaks on time, but did add the Explained Variance.

Copy link
Member

@jsmariegaard jsmariegaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@jsmariegaard jsmariegaard merged commit c262abb into main Aug 4, 2023
@jsmariegaard jsmariegaard deleted the new_metrics branch August 4, 2023 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants