Skip to content

Whiskers and outliers for geom_boxplot with transformed scale #3336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mbertolacci opened this issue May 20, 2019 · 7 comments · Fixed by #3380
Closed

Whiskers and outliers for geom_boxplot with transformed scale #3336

mbertolacci opened this issue May 20, 2019 · 7 comments · Fixed by #3380
Labels
bug an unexpected problem or unintended behavior layers 📈

Comments

@mbertolacci
Copy link

Boxplot whiskers and outliers seem to be incorrect when using geom_boxplot with a transformed scale. It probably affects notches too, but I haven't checked.

The hinges are fine, which makes sense—monotonic transformations preserve quantiles.

library(ggplot2)
df <- data.frame(x = 1, y = c(1, 4, 5, 6, 9))
print(ggplot(df, aes(x, y)) + geom_boxplot())

print(ggplot(df, aes(x, y)) + geom_boxplot() + scale_y_sqrt())

Created on 2019-05-20 by the reprex package (v0.3.0)

I'm using ggplot2 3.1.1.

@paleolimbot
Copy link
Member

paleolimbot commented May 20, 2019

You're right, boxplots are calculated based on the transformed values, which affects the calculation of outliers. However, StatBoxplot$compute_group() has access to the x and y scales (and their transformations), so it is possible to change this.

https://github.com/tidyverse/ggplot2/blob/master/R/stat-boxplot.r#L82-L85

@paleolimbot
Copy link
Member

@paleolimbot
Copy link
Member

It's worth noting that this (surprise at Stats being calculated on transformed values) has been reported several times (#2880, #2804), and at least once it was deemed as a won't-fix: #2804 (comment)

@mbertolacci
Copy link
Author

mbertolacci commented May 27, 2019

Ah, sorry I missed those issues—I only searched for boxplot related ones.

Computing stats on the transformed values is potentially quite reasonable from a statistical perspective. If that's the way this goes, I would then just suggest a documentation update to note the behaviour since it's a bit surprising.

@clauswilke
Copy link
Member

This topic came up at Lee Wilkinson's talk yesterday at SDSS 2019: Stats are computed on the transformed values. If you want the stats to be calculated on the original values then you need to work with a transformed coordinate system: https://ggplot2.tidyverse.org/reference/coord_trans.html

This is common confusion in ggplot2: Many people try to use scales for things that should be done in coords. Setting axis limits without dropping data points is another common issue.

@paleolimbot
Copy link
Member

If that's true, it should be much easier to make a pretty looking plot with coord_trans() - right now the defaults don't produce the plot most people are looking for, xlim and ylim parameters aren't named the same as coord_cartesian(), non-default axis expansion isn't possible (#3338 and #2990), second axes aren't well-supported (#2990), and some of the nice things you can do like use NA as one of the limits aren't possible in coords (#2907). Even using breaks = scales::log_breaks() doesn't really work in coord_trans().

library(ggplot2)

p <- ggplot(diamonds, aes(cut, price)) +
  geom_boxplot() 

p + aes(y = log10(price))

p + scale_y_log10()

p + coord_trans(y = "log10")

p + coord_trans(y = "log10") + scale_y_continuous(breaks = scales::log_breaks())

Created on 2019-06-02 by the reprex package (v0.2.1)

@lock
Copy link

lock bot commented Dec 28, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Dec 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior layers 📈
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants