-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skill groupby attrs #351
Skill groupby attrs #351
Conversation
The to_dataframe() method returned object dtypes which I have now changed to category. This however leads to problems with the groupby specifically for cc.mean_skill() in multiple variable cases. I tried different things including setting observed=True, but that gives problems in gridded_skill() (where empty bins makes sense). I think that the actual problem is that the "default" |
Very useful functionality! 👍 Here is a snippet of a slightly incomplete example, where I have added attrs to 2(3) observations. When the attrs is absent, the default is now to exclude it from the skill table, but by setting observed=True ...
>>> o1 = ms.PointObservation('HKNA_Hm0.dfs0', attrs={"use": "calibration"})
>>> o2 = ms.PointObservation("eur_Hm0.dfs0", attrs={"use": "validation"})
>>> o3 = ms.TrackObservation("Alti_c2_Dutch.dfs0")
>>> cc = ms.match(obs=[o1, o2, o3], mod=[mr1, mr2])
>>> cc.skill(by=("model","attrs:use"), observed=False).round(2)
n bias rmse urmse mae cc si r2
model use
SW_1 False 113 -0.00 0.35 0.35 0.29 0.97 0.13 0.90
calibration 386 -0.19 0.35 0.29 0.25 0.97 0.09 0.91
validation 67 -0.07 0.22 0.21 0.19 0.97 0.08 0.93
SW_2 False 113 0.08 0.43 0.42 0.36 0.97 0.15 0.85
calibration 386 -0.10 0.29 0.28 0.21 0.97 0.09 0.93
validation 67 -0.00 0.23 0.23 0.20 0.97 0.09 0.93
>>> cc.skill(by=("model","attrs:use")).round(2)
n bias rmse urmse mae cc si r2
model use
SW_1 calibration 386 -0.19 0.35 0.29 0.25 0.97 0.09 0.91
validation 67 -0.07 0.22 0.21 0.19 0.97 0.08 0.93
SW_2 calibration 386 -0.10 0.29 0.28 0.21 0.97 0.09 0.93
validation 67 -0.00 0.23 0.23 0.20 0.97 0.09 0.93 |
af9fbe2
to
3b874ad
Compare
Finally managed to do the merge 😬that was not easy |
def test_skill_by_attrs_gtype(cc): | ||
sk = cc.skill(by="attrs:gtype") | ||
assert len(sk) == 2 | ||
assert sk.data.index[0] == "point" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems overly specific to assert that the index is sorted in this order.
Isn't it enough to verify that:
assert "point" in sk.index
assert "track" in sk.index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true - will fix in next PR which will be on sorting
Extracted by="attrs:gtype" part of #331
e.g. "attrs:gtype" or "attrs:DA" (to distinguish between assimilation and validation stations)