Skip to content

Scoring of packages using pagerank, rather than user ratings #818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chessai opened this issue Mar 15, 2019 · 4 comments
Open

Scoring of packages using pagerank, rather than user ratings #818

chessai opened this issue Mar 15, 2019 · 4 comments

Comments

@chessai
Copy link
Member

chessai commented Mar 15, 2019

The 3-star rating of packages has historically been not very useful. Even the most well-known of packages receive even a small number of ratings from users. For example:

Package Number of Votes
base 24
text 13
bytestring 11
vector 4

Implementing a rating system based on pagerank could help in the following ways:

  1. Ratings would not be an (effectively) redundant piece of information on a package's page
  2. Users could see a numerical number roughly reflective of the community's trust of the package
  3. PageRank is a better fit for rating over time, since user ratings are very unlikely to be retracted/changed (or even made at all) after a changed opinion (perhaps due to an improvement in the library).

@taktoa and I discussed this and we both think this is a better approach over rule of succession/bayesian averaging/anything that relies on explicit user voting. @taktoa please comment if you have anything to add.

@taktoa
Copy link

taktoa commented Mar 15, 2019

If it's too slow to recompute the PageRanks every time someone uploads a package, you could implement Fast Incremental and Personalized PageRank instead.

@hvr
Copy link
Member

hvr commented Mar 15, 2019

Users could see a numerical number roughly reflective of the community's trust of the package

While I agree that the 3-star rating isn't a sufficient metric (and curiously there's been cases of politically motivated downvoting on Hackage already, but that's just something you have to live with), I think that claiming PageRank to be a metric of trust is a very misleading premise. While you didn't specifiy exactly how you'd apply PageRank to the Cabal metadata, PageRank is merely a metric of popularity, but certainly not of "trust". It only applies to dependencies maintainers voluntarily depend on, but not those that are forced upon you due either lack of alternatives or due to other choices you made which vendor lock you into that choice. Then there's also effects of cargo-culting where people are just not aware of the alternatives, and a PageRank metric might even reinforce this vicious cycle by making people less confident about walking lesser travelled roads. And fwiw, I can think of a couple of packages which I certainly wouldn't classify as trustworthy and yet they appear in a majority of install-plans across Hackage.

That being said, I welcome adding a PageRank-like metric as an additional number to look at or that you can sort by, but I don't consider it a replacement for the manual user rating metric.

@gbaz
Copy link
Contributor

gbaz commented Mar 15, 2019

Pagerank in this case is being described as a sort of fancy-weighted way of summing over transitive reverse dependency counts, right? So the first step would be for somebody to jump in and help finish the long-delayed reverse-dependency code that now exists at #723

This code is entirely feature-complete, but appears to still consume excessive space in-memory when at full hackagedb scale. With that in place, it would be straightforward in code (but perhaps interesting mathematically) to augment the revdep information further with incremental pagerank data.

But that said, given the structure of dep-graphs in Haskell, I'd be curious if pagerank actually provided a value-add over revdep counts themselves. However, such a question is best answered empirically, by actually implementing things and seeing what happens :-)

@gbaz
Copy link
Contributor

gbaz commented Feb 4, 2022

cf: #986

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants