Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle predictor variables which are ordered factors #49

Open
craig-parylo opened this issue Mar 15, 2025 · 0 comments
Open

Handle predictor variables which are ordered factors #49

craig-parylo opened this issue Mar 15, 2025 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@craig-parylo
Copy link
Owner

The app currently doesn't support models where the predictor variable is an ordered factor. This issue to look at how we can visualise these types of models.

Here is an example:

# create a dataset with two predictors, one of which `pred1` is an ordered factor
df <- tibble::tibble(
  outcome = sample(0:1, size = 1000, replace = TRUE),
  pred1 = sample(0:2, size = 1000, replace = TRUE) |>
    factor(levels = c(0, 1, 2), labels = c('zero', 'one', 'two'), ordered = TRUE),
  pred2 = rpois(n = 1000, lambda = 5)
)

# we can produce a model from this
lr <- stats::glm(
  data = df,
  formula = outcome ~ pred1 + pred2,
  family = "binomial"
)

# however, the app doesn't handle these cases well
plotor::table_or(lr)

# A tibble: 4 × 14
  label level  rows outcome outcome_rate class          estimate std.error statistic p.value conf.low conf.high significance  comparator
  <fct> <chr> <int>   <int>        <dbl> <chr>             <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>              <dbl>
1 pred1 two     320       0            0 ordered factor    NA      NA          NA     NA       NA         NA    Comparator             1
2 pred1 zero    349       0            0 ordered factor    NA      NA          NA     NA       NA         NA    Comparator             1
3 pred1 one     331       0            0 ordered factor    NA      NA          NA     NA       NA         NA    Comparator             1
4 pred2 pred2  1000       0            0 integer            1.04    0.0283      1.32   0.186    0.982      1.10 Not signific…         NA

pred1 is shown as ordered factor class, however it is being treated as a regular factor where each level is listed separately.

Looking at the data from the model

# what comes out of the model
lr |> broom::tidy(exponentiate = T)

# A tibble: 4 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)    0.803    0.154     -1.42    0.156
2 pred1.L        1.17     0.110      1.47    0.143
3 pred1.Q        0.970    0.110     -0.278   0.781
4 pred2          1.04     0.0283     1.32    0.186

pred1 is treated as a numeric variable which is analysed in two ways, as noted by the .L and .Q suffixes:

  • .L refers to the linear contrast, which represents the linear trend or slope of the relationship between the ordered predictor variable and the outcome variable.
  • .Q refers to the quadratic contrast, which represents the quadratic or curved trend of the relationship between the ordered predictor variable and the outcome variable.

References:
https://stats.stackexchange.com/questions/117593/using-ordered-factor-as-predictor-in-r
https://stats.stackexchange.com/questions/381877/whether-to-use-factors-in-r-and-when-ordered-factors

@craig-parylo craig-parylo added the enhancement New feature or request label Mar 15, 2025
@craig-parylo craig-parylo added this to the 0.7.0 milestone Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant