AUC_SpeedTest

Benchmarking Area Under the Curve

Background

This Notebook is prompted by a recent discussion on StackOverflow, where I asked:

I want to estimate the likelihood that a randomly-selected item from one group will have a higher score than a randomly-selected item from a different group. That is, the Probability of Superiority, sometimes called the Common-language Effect Size. See for example: https://rpsychologist.com/d3/cohend/. This can be resolved algebraically if we accept that the data are normally distributed (McGraw and Wong (1992, Psychological Bulletin, 111), but I know that my data are not normally distributed, making such an estimate unreliable.

The resulting discussion produced several new alternatives, most of which were a great deal better than my first efforts. This file runs benchmarking tests to see the relative speed at which each algorithm works with different sample sizes.

Functions to test

Eight alternative functions are benchmarked against each other and compared using a prespecified number of calculations with 100 iterations. Summary tables are presented for absolute and relative times. We then present these results with violin plots.

Functions are:

My Original For-If-Else function
Bring data together with expandgrid
"CJ" (C)ross (J)oin. A data.table is formed from the cross product of the vectors
DataTable cartesian product of data.table
Wilcox from base R
Modified Wilcox from base R
BaseR AUC, taking advantage of properties of matrix outer product
AUC from BigStatsR

Generate some artificial data

Generating data

In this example, two non-normal data sets are compared, so it's unclear which should have higher probability of superiority. Group Alpha is more platykurtic (spread out), with many large negative and positive values, while group Beta is a chi-squared distribution (df=1), so highly skewed. So the two distributions should look like this:

Simulation with small sample sizes

Simulation with larger sample sizes

Simulation with very large sample sizes

Conclusion

Base AUC function is the consistent winner, except when sample size is unusually small.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
code		code
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AUC_SpeedTest

Benchmarking Area Under the Curve

Background

Functions to test

Functions are:

Generate some artificial data

Generating data

Simulation with small sample sizes

Simulation with larger sample sizes

Simulation with very large sample sizes

Conclusion

About

Releases

Packages

Languages

WinzarH/AUC_SpeedTest

Folders and files

Latest commit

History

Repository files navigation

AUC_SpeedTest

Benchmarking Area Under the Curve

Background

Functions to test

Functions are:

Generate some artificial data

Generating data

Simulation with small sample sizes

Simulation with larger sample sizes

Simulation with very large sample sizes

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages