Skip to content

stat_summary_bin strange behavior when combined with scale_y_* #2880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mkoohafkan opened this issue Sep 4, 2018 · 8 comments
Closed

stat_summary_bin strange behavior when combined with scale_y_* #2880

mkoohafkan opened this issue Sep 4, 2018 · 8 comments

Comments

@mkoohafkan
Copy link

Attempting to do a binned summation of data via fun.y with stat_summary_bin and represent as a line. I get strange y-axis labels when combining with a scale_y_* aesthetic. It looks like a bug to me, but perhaps I misunderstand how to use stat_summary_bin.

library(ggplot2)
ggplot(mtcars) +
  aes(x = disp, y = mpg) +
  stat_summary_bin(fun.y = sum, binwidth = 2.5,
    geom = "line", size = 1)

ggplot(mtcars) +
  aes(x = disp, y = mpg) +
  stat_summary_bin(fun.y = sum, binwidth = 2.5,
    geom = "line", size = 1) +
  scale_y_log10()

ggplot(mtcars) +
  aes(x = disp, y = mpg) +
  stat_summary_bin(fun.y = sum, binwidth = 2.5,
    geom = "line", size = 1) +
  scale_y_sqrt()

Created on 2018-09-04 by the reprex package (v0.2.0).

@yutannihilation
Copy link
Member

This issue seems the same as #2804 (comment). stat_summary_bin() summarises the values after transformation, which is a kind of implementational limitation so is unlikely to change.

You can restore the transformation (e.g. 10^x), then summarise (e.g. sum()), and finally transform (e.g. log10()) again. But, it would be easier to summarise data before passing it to ggplot2...

library(ggplot2)

ggplot(mtcars) +
  aes(x = disp, y = mpg) +
  stat_summary_bin(fun.y = function(x) log10(sum(10^x)), binwidth = 2.5,
                   geom = "line", size = 1) +
  scale_y_log10()

Created on 2018-09-05 by the reprex package (v0.2.0).

@mkoohafkan
Copy link
Author

mkoohafkan commented Sep 5, 2018

Thanks for the explanation @yutannihilation. The workaround makes sense and after reading that issue I understand that changing this behavior would be a big undertaking. I think a warning or explanation in the documentation for stat_summary-bin would be helpful for future reference.

It's nice to be able to do this operation with stat_summary_bin rather than summarizing beforehand because defining the bins and axis labels for passing to ggplot can get pretty annoying.

@yutannihilation
Copy link
Member

Ah, sorry, I was wrong! During search for the document about this, I found we can use coord_trans(). ?stat_summary says:

# Transforming the scale means the data are transformed
# first, after which statistics are computed:
p1 <- m2 + scale_y_log10()
# Transforming the coordinate system occurs after the
# statistic has been computed. This means we're calculating the summary on the raw data
# and stretching the geoms onto the log scale.  Compare the widths of the
# standard errors.
p2 <- m2 + coord_trans(y="log10")

So, in this case you could write like p2 in the following code:

library(ggplot2)

p <- ggplot(mtcars) +
  aes(x = disp, y = mpg) +
  stat_summary_bin(fun.y = sum, binwidth = 2.5,
                   geom = "line", size = 1)

p1 <- p + scale_y_log10()
p2 <- p + coord_trans(y = "log10")

egg::ggarrange(p1, p2)

Created on 2018-09-05 by the reprex package (v0.2.0).

@yutannihilation
Copy link
Member

IIUC, the summary of #2804 is:

  • If you don't care much if lines will curve or not, you can use coord_trans().
  • If you do want straight lines, you can use that "restore-summarise-retransform" strategy.

@mkoohafkan
Copy link
Author

mkoohafkan commented Sep 5, 2018

Oh wow, it is in the docs after all... but hidden in an example. I do think this nuance regarding the transform --> stat vs. stat-->transform order of operations should be described in the Summary functions section.

Thanks for pointing out these two options for resolving the issue.

@yutannihilation
Copy link
Member

Agreed that there should be a section for this in the doc of stat_summary() and stat_summary_bin().

But..., transform --> stat vs. stat-->transform seems more general topic. I have no idea where this fits...

@mkoohafkan
Copy link
Author

I think just a note about using stat_summary* with scale/coordinate transformations should suffice. The existing comments in the example explain it in necessary depth. Slightly reworded:

Note that scale transformations occur before statistics are computed, while coordinate system transformations occur after statistics are computed.

@teunbrand
Copy link
Collaborator

The existing comments in the example explain it in necessary depth.

Closing this issue for the reason above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants