Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skill groupby attrs #351

Merged
merged 34 commits into from
Jan 4, 2024
Merged

Skill groupby attrs #351

merged 34 commits into from
Jan 4, 2024

Conversation

jsmariegaard
Copy link
Member

@jsmariegaard jsmariegaard commented Dec 19, 2023

Extracted by="attrs:gtype" part of #331

e.g. "attrs:gtype" or "attrs:DA" (to distinguish between assimilation and validation stations)

image

@jsmariegaard
Copy link
Member Author

The to_dataframe() method returned object dtypes which I have now changed to category. This however leads to problems with the groupby specifically for cc.mean_skill() in multiple variable cases. I tried different things including setting observed=True, but that gives problems in gridded_skill() (where empty bins makes sense). I think that the actual problem is that the "default" by that mean_skill() passed on to skill() is ["model", "observation", "variable"] even though an observation can only have one variable. So I guess the by should instead be ["model", "observation"] and then just add variable afterwards. Maybe we should even check that observation and variable do not both occur in the by?

@ecomodeller
Copy link
Member

Very useful functionality! 👍

Here is a snippet of a slightly incomplete example, where I have added attrs to 2(3) observations.

When the attrs is absent, the default is now to exclude it from the skill table, but by setting observed=True

...
>>> o1 = ms.PointObservation('HKNA_Hm0.dfs0', attrs={"use": "calibration"})
>>> o2 = ms.PointObservation("eur_Hm0.dfs0",   attrs={"use": "validation"})
>>> o3 = ms.TrackObservation("Alti_c2_Dutch.dfs0")
>>> cc = ms.match(obs=[o1, o2, o3], mod=[mr1, mr2])
>>> cc.skill(by=("model","attrs:use"), observed=False).round(2)
                     n  bias  rmse  urmse   mae    cc    si    r2
model use
SW_1  False        113 -0.00  0.35   0.35  0.29  0.97  0.13  0.90
      calibration  386 -0.19  0.35   0.29  0.25  0.97  0.09  0.91
      validation    67 -0.07  0.22   0.21  0.19  0.97  0.08  0.93
SW_2  False        113  0.08  0.43   0.42  0.36  0.97  0.15  0.85
      calibration  386 -0.10  0.29   0.28  0.21  0.97  0.09  0.93
      validation    67 -0.00  0.23   0.23  0.20  0.97  0.09  0.93
>>> cc.skill(by=("model","attrs:use")).round(2)
                     n  bias  rmse  urmse   mae    cc    si    r2
model use
SW_1  calibration  386 -0.19  0.35   0.29  0.25  0.97  0.09  0.91
      validation    67 -0.07  0.22   0.21  0.19  0.97  0.08  0.93
SW_2  calibration  386 -0.10  0.29   0.28  0.21  0.97  0.09  0.93
      validation    67 -0.00  0.23   0.23  0.20  0.97  0.09  0.93

@jsmariegaard
Copy link
Member Author

Finally managed to do the merge 😬that was not easy

@jsmariegaard jsmariegaard marked this pull request as ready for review January 3, 2024 16:47
def test_skill_by_attrs_gtype(cc):
sk = cc.skill(by="attrs:gtype")
assert len(sk) == 2
assert sk.data.index[0] == "point"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems overly specific to assert that the index is sorted in this order.

Isn't it enough to verify that:

assert "point" in sk.index
assert "track" in sk.index

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true - will fix in next PR which will be on sorting

@jsmariegaard jsmariegaard merged commit 2e495c2 into main Jan 4, 2024
@jsmariegaard jsmariegaard deleted the skill-grouby-attrs branch January 4, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants