Handle predictor variables which are ordered factors #49

craig-parylo · 2025-03-15T14:51:02Z

The app currently doesn't support models where the predictor variable is an ordered factor. This issue to look at how we can visualise these types of models.

Here is an example:

# create a dataset with two predictors, one of which `pred1` is an ordered factor
df <- tibble::tibble(
  outcome = sample(0:1, size = 1000, replace = TRUE),
  pred1 = sample(0:2, size = 1000, replace = TRUE) |>
    factor(levels = c(0, 1, 2), labels = c('zero', 'one', 'two'), ordered = TRUE),
  pred2 = rpois(n = 1000, lambda = 5)
)

# we can produce a model from this
lr <- stats::glm(
  data = df,
  formula = outcome ~ pred1 + pred2,
  family = "binomial"
)

# however, the app doesn't handle these cases well
plotor::table_or(lr)

# A tibble: 4 × 14
  label level  rows outcome outcome_rate class          estimate std.error statistic p.value conf.low conf.high significance  comparator
  <fct> <chr> <int>   <int>        <dbl> <chr>             <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>              <dbl>
1 pred1 two     320       0            0 ordered factor    NA      NA          NA     NA       NA         NA    Comparator             1
2 pred1 zero    349       0            0 ordered factor    NA      NA          NA     NA       NA         NA    Comparator             1
3 pred1 one     331       0            0 ordered factor    NA      NA          NA     NA       NA         NA    Comparator             1
4 pred2 pred2  1000       0            0 integer            1.04    0.0283      1.32   0.186    0.982      1.10 Not signific…         NA

pred1 is shown as ordered factor class, however it is being treated as a regular factor where each level is listed separately.

Looking at the data from the model

# what comes out of the model
lr |> broom::tidy(exponentiate = T)

# A tibble: 4 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)    0.803    0.154     -1.42    0.156
2 pred1.L        1.17     0.110      1.47    0.143
3 pred1.Q        0.970    0.110     -0.278   0.781
4 pred2          1.04     0.0283     1.32    0.186

pred1 is treated as a numeric variable which is analysed in two ways, as noted by the .L and .Q suffixes:

.L refers to the linear contrast, which represents the linear trend or slope of the relationship between the ordered predictor variable and the outcome variable.
.Q refers to the quadratic contrast, which represents the quadratic or curved trend of the relationship between the ordered predictor variable and the outcome variable.

References:
https://stats.stackexchange.com/questions/117593/using-ordered-factor-as-predictor-in-r
https://stats.stackexchange.com/questions/381877/whether-to-use-factors-in-r-and-when-ordered-factors

The text was updated successfully, but these errors were encountered:

craig-parylo added the enhancement New feature or request label Mar 15, 2025

craig-parylo added this to the 0.7.0 milestone Mar 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle predictor variables which are ordered factors #49

Handle predictor variables which are ordered factors #49

craig-parylo commented Mar 15, 2025

Handle predictor variables which are ordered factors #49

Handle predictor variables which are ordered factors #49

Comments

craig-parylo commented Mar 15, 2025