Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table_or() doesn't seem to sort categorical predictors as expected #46

Open
craig-parylo opened this issue Mar 15, 2025 · 0 comments
Open
Labels
bug Something isn't working
Milestone

Comments

@craig-parylo
Copy link
Owner

I noticed the plotor::table_or() function doesn't output multi-level factor predictors in the correct order.

# data with separation for pred1
df_separated <- tibble::tibble(
  outcome = sample(0:1, size = 1000, replace = TRUE, prob = c(0.2,0.8)) |>
    factor(levels = c(0,1), labels = c('Fail', 'Success')),
  pred1 = dplyr::if_else(
    condition = outcome == 'Fail',
    true = sample(0:2, size = 1000, replace = TRUE),
    false = sample(1:3, size = 1000, replace = TRUE)
  ) |>
    factor(levels = c(0, 1, 2, 3), labels = c('red', 'green', 'brown', 'blue')) |>
    forcats::fct_infreq(),
  pred2 = rpois(n = 1000, lambda = 10)
)

# see the separation
table(df_separated$outcome, df_separated$pred1)

# model this
lr_separated <- stats::glm(
  data = df_separated,
  formula = outcome ~ pred1 + pred2,
  family = 'binomial'
)

# run a {plotor} function
plotor::table_or(lr_separated)

# output
# A tibble: 5 × 14
  label level  rows outcome outcome_rate class    estimate std.error statistic p.value   conf.low conf.high significance    comparator
  <fct> <chr> <int>   <int>        <dbl> <chr>       <dbl>     <dbl>     <dbl>   <dbl>      <dbl>     <dbl> <chr>                <dbl>
1 pred1 blue    275     275        1     factor   8.67e+ 7  648.        0.0282   0.977  1.28e+ 21  3.49e119 Significant             NA
2 pred1 brown   298     235        0.789 factor   1.02e+ 0    0.193     0.114    0.909  7.01e-  1  1.49e  0 Not significant         NA
3 pred1 green   348     273        0.784 factor  NA          NA        NA       NA     NA         NA        Comparator               1
4 pred1 red      79       0        0     factor   8.63e-10 1209.       -0.0173   0.986  3.87e-204  3.03e  9 Not significant         NA
5 pred2 pred2  1000     783        0.783 integer  1.03e+ 0    0.0307    0.921    0.357  9.69e-  1  1.09e  0 Not significant         NA

pred1 doesn't seem to be ordered by frequency, as specified in the data definition.

It could be connected to 'level' being a character variable instead of a factor. This may need further exploration.

@craig-parylo craig-parylo added the bug Something isn't working label Mar 15, 2025
@craig-parylo craig-parylo added this to the v0.6.0 milestone Mar 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant