-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difficulty Score Docs #633
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## trunk #633 +/- ##
==========================================
- Coverage 94.49% 93.57% -0.93%
==========================================
Files 86 86
Lines 5397 5397
Branches 792 792
==========================================
- Hits 5100 5050 -50
- Misses 220 260 +40
- Partials 77 87 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
docs/metrics/difficulty-score.md
Outdated
|
||
# Difficulty Score | ||
|
||
Difficulty scores are automatically computed within Kolena to surface datapoints that commonly contribute to poor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would link Kolena
and datapoints
so that anyone landing on this page without any context can navigate themselves to the right places.
With a filter for `datapoint.difficulty_score > 0.9`, we see all the datapoints that significantly struggle | ||
across both the `old` and `new` models, which are common failures that persist over different model iterations. | ||
|
||
## Implementation Details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we jump to the implementation details, should we summarize the logic here? Also, can this be summarized into a formula? One formula for model-level difficulty score and one for the aggregate difficulty score.
docs/metrics/difficulty-score.md
Outdated
|
||
#### Multiclass Classification | ||
|
||
The `delta` column for a datapoint of a multiclass classification task is simply the number of times a model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean by number of times
? Isn't is just a correct vs. incorrect classification per datapoint?
docs/metrics/index.md
Outdated
|
||
--- | ||
|
||
Difficulty scores indicate which datapoints models commonly struggle on based on custom Quality Standards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struggle on based on
-> struggle based on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some final edits but overall very well written doc and this doc should demystify how we are computing the "difficulty score" on our app. Once we get a thumbs up from @mkaramlou I will merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good!
Linked issue(s)
Fixes KOL-6787
What change does this PR introduce and why?
Adds docs for difficulty score