---
title: "Ordinal Regression"
execute:
  cache: true
  error: false
engine: julia
author:
  - Jose Storopoli
  - Vijay Ivaturi
  - Patrick Mogensen
bibliography: references.bib
format:
  html:
    include-in-header:
      text: |
        <style>
          .header-container {
            width: 100%;
          }
          .logo {
            display: block;
            width: 150px;
            margin-left: -2.5%;
          }
        </style>
        <link rel = "shortcut icon" href = "https://pumas.ai/favicon.ico" />
    include-before-body:
      text: |
        <div class="header-container">
          <img src="https://pumas-assets.s3.amazonaws.com/CompanyLogos/PumasAI/RGB/PNG/Pumas+AI+Primary%404x.png" alt="Pumas Logo" class="logo">
        </div>
    self-contained-math: true
    anchor-sections: true
    theme: default
    toc: true
    toc-depth: 4
    toc-expand: 2
    toc-location: left
    toc-title: Contents
    number-sections: true
    code-summary: Show/Hide Code
fig-format: svg
fig-width: 8
fig-height: 6
license: CC BY-SA 4.0
---

**Ordinal regression** is a regression model for **discrete data**,
and more specifically,
when the **dependent variable values have a "natural order"**.

For example,
opinion polls with their ubiquitous disagree-agree range of plausible values,
or a patient pain perception.

:::{.callout-note}
There are several resources available for a deep dive into ordinal regression.
Specifically, we recommend:

 1. [Richard McElreath - Statistical Rethinking, 2nd Edition, Section 12.3](https://xcelab.net/rm/statistical-rethinking/).
 2. Agresti, A. (2014). Categorical Data Analysis, Section 8.2.
 3. Bürkner, P.-C., & Vuorre, M. (2019). Ordinal Regression Models in Psychology: A Tutorial. Advances in Methods and Practices in Psychological Science, 77–101. [10.1177/2515245918823199](https://doi.org/10.1177/2515245918823199).

:::

## Why not just use Linear Regression?

The main reason for not using plain linear regression with ordinal outcomes is that the **response categories of an ordinal variable may not be equidistant**.
In linear regression,
the distance between 2 and 3 is assumed to be the same as between 1 and 2.
This assumption can be **violated with ordinal data**.

## Ordinal Regression: The Key Idea

The core idea behind ordinal regression is to model the **cumulative probabilities** of the ordered categories,
rather than the categories directly.

For a response variable $Y$ with $k$ ordered categories,
we based the model on the **cumulative distribution function** (CDF):

$$
\gamma_j = P(Y \leq j) = \sum_{i=1}^j P(Y = i)
$$

We then apply the **logit function** (the inverse of the logistic function) to the CDF:

$$
\alpha_j = \mathrm{logit}(\gamma_j) = \ln\left(\frac{\gamma_j}{1 - \gamma_j}\right)
$$

This gives us the **log-cumulative-odds** — a real-valued scale where we can add intercepts and covariate effects linearly.
Since the highest category always has cumulative probability 1 (and thus $\mathrm{logit}(1) = \infty$),
we only need $k-1$ intercepts $\alpha_j$, known as **cut points**.

:::{.callout-tip}
**Odds** provide a measure of the **likelihood of a particular outcome**.
They are calculated as the ratio of the number of events that produce that outcome to the number that do not:
$\mathrm{odds}(p) = \frac{p}{1 - p}$.
The higher the odds, the higher the probability.
:::

The $k-1$ cut points can be **non-equidistant**,
allowing the model to capture any probability distribution of the ordinal categories.

### Visualizing with Synthetic Data

Let's see this in action with a synthetic example.

```{julia}
using Pumas
using DataFrames
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using SummaryTables
using CategoricalArrays
```

```{julia}
#| include: false
using Pumas: Categorical
# Register Stem aesthetic mapping so it works with AlgebraOfGraphics
AlgebraOfGraphics.aesthetic_mapping(
    ::Type{Stem},
    n1::AlgebraOfGraphics.Continuous,
    n2::AlgebraOfGraphics.Continuous,
) = AlgebraOfGraphics.aesthetic_mapping(Scatter, n1, n2)
```

Here we have a 6-category ordinal variable with probabilities 10%, 15%, 33%, 25%, 10%, and 7%.
We specify this using `OrderedLogit`,
which takes a vector of $k-1 = 5$ logit transformed cumulative probabilities:

```{julia}
#| echo: false
probs = [0.10, 0.15, 0.33, 0.25, 0.10, 0.07]
ologits = logit.(cumsum(probs[begin:(end-1)]))
dist = OrderedLogit(ologits)
x = 1:length(probs)
x_pmf = pdf.(dist, x)
x_cdf = cdf.(dist, x)
df = DataFrame(; x, x_pmf, x_cdf, x_logodds = [ologits; missing])
fig = Figure()
plt1 = data(df) * mapping(:x, :x_pmf => "probability") * visual(Stem; trunkwidth = 0)
plt_cdf =
    data(df) *
    mapping(:x, :x_cdf => "probability") *
    (visual(Stairs; step = :post) + visual(Scatter))
plt_logodds = data(df) * mapping(:x, :x_logodds => "log-odds") * visual(Scatter)

draw!(fig[1, 2:3], plt1; axis = (; xticks = 1:6, title = "PMF"))
draw!(
    fig[2, 1:2],
    plt_cdf;
    axis = (; xticks = 1:6, limits = (nothing, (0, nothing)), title = "CDF"),
)
draw!(fig[2, 3:4], plt_logodds; axis = (; xticks = 1:6, title = "Log-cumulative-odds"))
fig
```

:::{.callout-tip}
We are using the convenient `logit` function from the [`LogExpFunctions.jl`](https://juliastats.org/LogExpFunctions.jl/stable/) package,
which Pumas reexports.
It is safer and more efficient than a user-defined function.
:::

The 5 log-cumulative-odds values (lower-right plot) are clearly **not equidistant** — this is the flexibility that ordinal regression provides.

### Adding Covariates

Covariate effects are incorporated as a common regression coefficient $\beta$ subtracted from each of cut point $\alpha_j$:

$$
\phi_j(x) = \log\frac{\gamma_j(x)}{1 - \gamma_j(x)} = \alpha_j - \beta^T x
$$

where $x$ is the vector of covariates and it must hold that $\alpha_1 \leq \dots \leq \alpha_k$.

This model was coined the proportional-odds model by @cullagh1980regression, because the
odds of two different covariate vectors $x_1$ and $x_2$ are proportional, i.e.
$$
\frac{\gamma_j(x_1)/(1 - \gamma_j(x_1))}{\gamma_j(x_2)/(1 - \gamma_j(x_2))} = \exp(\beta(x_1 - x_2))
$$

When $\beta^T x \approx \alpha_j$ the odds for $P(Y \leq j)$ vs $P(Y > j)$ are even so when
$\beta^T x$ is large the higher categories ($j$ values) will be more likely and vice versa.

:::{.callout-note}
@mandema1996population parameterizes the ordinal regression model in terms of $P(Y \geq j)$ instead
of the CDF used above. In addition, the cut points are defined as a cumulative sum
of parameters all required to be negative in instead being ordered as in the parameterization used
in this tutorial. However, the two parameterizations are equivalent, i.e. they define the
same probability model for the data.
:::

## Fitting an Ordinal Model in Pumas

### The `pk_painrelief` Dataset

Let's use the `pk_painrelief` dataset from `PharmaDatasets.jl`:

```{julia}
using PharmaDatasets
```

```{julia}
pk_painrelief = dataset("pk_painrelief")
first(pk_painrelief, 5)
```

Key variables:

  - **`:Subject`**: subject ID.
  - **`:Time`**: time in hours.
  - **`:Conc`**: concentration (mg/L).
  - **`:PainRelief`**: binary indicator of pain relief.
  - **`:PainScore`** (**dependent variable**): ordinal, ranging from 0 to 3.
  - **`:RemedStatus`**: binary indicator of remedication request.
  - **`:Dose`**: placebo, 5 mg, 20 mg, or 80 mg.

### Exploratory Data Analysis

Our dependent variable `:PainScore` takes values from 0 to 3.
Since ordinal distributions in Pumas expect values starting from 1,
we shift by adding 1:

```{julia}
pk_painrelief.PainScore |> unique |> sort
```

```{julia}
@rtransform! pk_painrelief :PainScore = :PainScore + 1
pk_painrelief.PainScore |> unique |> sort
```

To display the dose levels in the right order we first convert the column to a CategoricalVector

```{julia}
#| output: false
@transform! pk_painrelief :Dose =
    categorical(:Dose; levels = ["Placebo", "5 mg", "20 mg", "80 mg"])
```

Summary statistics stratified by dose:

```{julia}
summarytable(
    pk_painrelief,
    "PainScore";
    summary = [minimum, maximum, mean, median, std],
    cols = "Dose",
)
```

The distribution of the pain scores for each dose group

```{julia}
data(pk_painrelief) *
mapping(:Dose, stack = :PainScore => nonnumeric, color = :PainScore => nonnumeric) *
frequency() |> draw
```

Higher doses are associated with lower perceived pain scores.

### Data Wrangling

We convert `:Dose` to a binary treatment indicator:

```{julia}
#| output: false
@rtransform! pk_painrelief :isRx = Int(:Dose != "Placebo");
```

### Model Definition

In Pumas, the entire ordinal regression parameterization is captured by the `OrderedLogit` error model which takes vector of logits `ϕ = [ϕ₁,ϕ₂,ϕ₃]` as inputs `OrderedLogit(ϕ)`. The user supplies the vector of $k - 1$ log-cumulative-odds and `OrderedLogit` handles the logistic transform and probability computation internally.

Here is our ordinal regression model using `OrderedLogit`:

```{julia}
orderedlogit_model = @model begin
    @param begin
        # intercepts (k - 1 = 3 cut points)
        α ∈ VectorDomain(; init = [0.001, 0.002, 0.003])
        # treatment effect coefficient
        βRx ∈ RealDomain()
    end

    @covariates begin
        isRx
    end

    @pre begin
        # log-cumulative-odds vector
        lle = α .- isRx * βRx
    end

    @derived begin
        PainScore ~ @. OrderedLogit(lle)
    end
end
```

The model has four components:

  - **`@param`**: a vector of intercepts `α` (the $k - 1 = 3$ cut points for our 4-category outcome) and one treatment coefficient `βRx`.
    The intercept initial values must be **ordered** (here 0.001 < 0.002 < 0.003) to help convergence.
  - **`@covariates`**: the binary treatment indicator `isRx`.
  - **`@pre`**: constructs the log-cumulative-odds vector `lle` by adding the treatment effect to each intercept.
  - **`@derived`**: specifies `PainScore` as following an `OrderedLogit` distribution parameterized by `lle`.

:::{.callout-tip}
`OrderedLogit` automatically ensures that category probabilities are valid (non-negative and sum to 1).
See [Under the Hood](#under-the-hood-the-manual-categorical-approach) for the manual alternative using `Categorical`,
which requires you to compute probabilities yourself and guard against negative values.
:::

### Choosing Initial Values

Good initial values for the intercepts are critical for ordinal model convergence.
Since the intercepts represent the log-cumulative-odds at the reference condition (all covariates = 0),
we can derive them directly from the marginal distribution of the outcome.

**Step 1**: Calculate the marginal distribution of `PainScore`:

```{julia}
marginal = @chain pk_painrelief begin
    groupby(:PainScore)
    @combine :n = length(:PainScore)
    @rtransform :prop = :n / nrow(pk_painrelief)
end
simple_table(marginal)
```

**Step 2**: Calculate cumulative probabilities $P(Y \leq j)$ for $j = 1, \ldots, k - 1$:

```{julia}
cum_probs = cumsum(marginal.prop)[1:end-1]
```

**Step 3**: Convert to the logit scale — these are our initial intercept values:

```{julia}
init_intercepts = logit.(cum_probs)
```

:::{.callout-tip}
The resulting intercepts are properly ordered (increasing),
which ensures valid probabilities at the start of optimization.
For covariate effects like `βRx`,
starting near 0 is a safe default.
:::

### Creating a Population

```{julia}
pk_painrelief_pop = read_pumas(
    pk_painrelief;
    observations = [:PainScore],
    covariates = [:isRx],
    id = :Subject,
    time = :Time,
    event_data = false,
)
```

### Model Fitting

We use `NaivePooled()` since this model has no random effects:

```{julia}
pk_painrelief_fit = fit(
    orderedlogit_model,
    pk_painrelief_pop,
    (; init_params(orderedlogit_model)..., α = init_intercepts),
    NaivePooled(),
)
```

### Interpreting the Coefficients

```{julia}
coeftable(pk_painrelief_fit) |> simple_table
```

Interpreting the exponentiated coefficients is sometimes easier:

```{julia}
(@rtransform coeftable(pk_painrelief_fit) :estimate_exp = exp(:estimate)) |> simple_table
```

The intercepts `αⱼ` represent the baseline odds of $P(Y \leq j)$ against $P(Y > j)$.
For example,
`α₁` has exponentiated value approximately 0.05 — the odds of the lowest pain score category relative to any of the higher pain scores when untreated.

The negative of the treatment coefficient `βRx` has an exponentiated value of approximately 3.44,
meaning treatment increases the odds of higher pain relief by about 2.44x compared to placebo.

Standard errors and 95% confidence intervals:

```{julia}
infer(pk_painrelief_fit)
```

Other confidence levels (e.g., 90%) can be specified with `level`:

```{julia}
infer(pk_painrelief_fit; level = 0.90)
```

### Visualizing Predicted Probabilities

To see what the fitted model actually predicts,
we can compute category probabilities for each treatment group.
The `probstable()` function provides predicted category probabilities in a `DataFrame`.
The columns `:PainScore_iprob1` through `:PainScore_iprob4` give the probability of each outcome:

```{julia}
eachgroup_df = @combine groupby(pk_painrelief, "isRx") begin
    :id = string(first(:Subject)) # Since output DataFrames have an "id" column of strings
    :time = first(:Time)
end
pred_df = select(
    leftjoin(eachgroup_df, probstable(pk_painrelief_fit); on = ["id", "time"]),
    Not("id", "time"),
)
simple_table(pred_df)
```

```{julia}
data(stack(pred_df)) *
mapping(
    :isRx => (i -> ["Placebo", "Treatment"][i+1]) => "Group",
    :value => "Predicted Probability",
    stack = :variable => "Pain Score Category",
    color = :variable,
) *
visual(BarPlot) |> draw
```

Treatment shifts the probability mass toward lower pain scores (higher pain relief),
consistent with the positive `βRx` coefficient.

We can also visualize the cumulative probabilities $P(Y \leq k)$,
which are the quantities directly modeled by ordinal regression:

```{julia}
pred_cumulative_df = combine(
    groupby(stack(pred_df), "isRx"),
    "value" => cumsum => "Predicted Cumulative Probability",
    "variable",
)
data(pred_cumulative_df) *
mapping(
    :variable => "P(Y ≤ k)",
    "Predicted Cumulative Probability",
    dodge = :isRx => nonnumeric => "Group",
    color = :isRx => (i -> ["Placebo", "Treatment"][i+1]),
) *
visual(BarPlot) |> draw
```

The proportional odds assumption means the treatment effect shifts all cumulative probabilities uniformly on the log-odds scale.

Similarly, $P(Y \geq k)$ — the probability of being at or above each category:

```{julia}
pred_cumulative_df = combine(
    groupby(stack(pred_df), "isRx"),
    "value" => reverse ∘ cumsum ∘ reverse => "Predicted Cumulative Probability",
    "variable",
)
data(pred_cumulative_df) *
mapping(
    :variable => "P(Y ≥ k)",
    "Predicted Cumulative Probability",
    dodge = :isRx => nonnumeric => "Group",
    color = :isRx => (i -> ["Placebo", "Treatment"][i+1]),
) *
visual(BarPlot) |> draw
```

### Simulations

Random observations can be generated with `simobs()`:

```{julia}
#| output: false
simobs(orderedlogit_model, pk_painrelief_pop, coef(pk_painrelief_fit));
```

### Model Diagnostics

```{julia}
DataFrame(inspect(pk_painrelief_fit))
```

AIC:

```{julia}
aic(pk_painrelief_fit)
```

BIC:

```{julia}
bic(pk_painrelief_fit)
```

## Under the Hood: The Manual `Categorical` Approach

`OrderedLogit` encapsulates several transformation steps that you can also perform manually using the `Categorical` distribution.
Understanding these steps provides insight into what the model is actually doing.

### From Log-Cumulative-Odds to Probabilities

Given the $k - 1$ log-cumulative-odds $\phi\_1,\dots,\phi_{k-1}$,
the manual approach requires three steps:

**1. Apply the logistic function** to convert log-cumulative-odds to cumulative probabilities (cut points):

$$
\gamma_j = \mathrm{logistic}(\phi_j) = \frac{1}{1 + e^{-\phi_j}}
$$

**2. Subtract consecutive cut points** to recover individual category probabilities:

$$
\begin{aligned}
p_1 &= \gamma_1 \\
&\cdots \\
p_j &= \gamma_k - \gamma_{k-1} \quad \text{for } 1 < j < k \\
p_k &= 1 - \gamma_{k - 1}
\end{aligned}
$$

**3. Pass the probabilities** to a `Categorical` distribution.

We can see the equivalence with a simple example.
First, using `Categorical` with explicit probabilities:

```{julia}
p = [0.2, 0.3, 0.1, 0.4]
x = 1:length(p)
d = Categorical(p)
pdf_values = pdf.(d, x)
data((; x, pdf_values)) *
mapping(:x => "outcome", :pdf_values => "PMF") *
visual(BarPlot) |> draw
```

The same distribution via `OrderedLogit` — note we supply only $k - 1 = 3$ log-cumulative-odds:

```{julia}
lle = logit.(cumsum([0.2, 0.3, 0.1]))
x = 1:(length(lle)+1)
d = OrderedLogit(lle)
pdf_values = pdf.(d, x)
data((; x, pdf_values)) *
mapping(:x => "outcome", :pdf_values => "PMF") *
visual(BarPlot) |> draw
```

### The Manual Model

In the `@pre` block,
we manually compute each linear predictor,
apply the logistic function,
and calculate category probabilities by subtraction:

```julia
@pre begin
    ϕ = α .- isRx * βRx

    # cut points for >= 1, >=2 and >=3
    γ = logistic.(ϕ)

    # probabilities for PainScore =1, =2, =3 and =4
    p₄ = 1.0 - γ[3]
    p₃ = γ[3] - γ[2]
    p₂ = γ[2] - γ[1]
    p₁ = γ[1]
end
```

Compare this to the `OrderedLogit` version which collapses everything into a single line:

```julia
@pre begin
    lle = α - isRx * βRx
end
```

The `@derived` block uses `Categorical` with the explicit probability vector:

```julia
PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)
```

versus:

```julia
PainScore ~ @. OrderedLogit(lle)
```

Here is the full manual model:

```{julia}
ordinal_model = @model begin
    @param begin
        α ∈ VectorDomain(; init = [0.001, 0.002, 0.003])
        βRx ∈ RealDomain()
    end

    @covariates begin
        isRx
    end

    @pre begin
        ϕ = α .- isRx * βRx

        # cut points for >= 1, >=2 and >=3
        γ = logistic.(ϕ)

        # probabilities for PainScore =1, =2, =3 and =4
        p₄ = 1.0 - γ[3]
        p₃ = γ[3] - γ[2]
        p₂ = γ[2] - γ[1]
        p₁ = γ[1]
    end

    @derived begin
        PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)
    end
end
```

:::{.callout-caution}
With the manual `Categorical` approach,
you must ensure that all `pⱼ` values are non-negative.
If the cut points are not properly ordered,
subtraction can produce negative probabilities.
`OrderedLogit` handles this constraint automatically.
:::

### Comparing Results

Fitting the manual model produces identical estimates:

```{julia}
pk_painrelief_categorical_fit = fit(
    ordinal_model,
    pk_painrelief_pop,
    (; init_params(ordinal_model)..., α = init_intercepts),
    NaivePooled(),
)
```

:::{.callout-note}
Both models produce identical parameter estimates,
since the underlying statistical model is the same.
`OrderedLogit` simply provides a more concise and safer parameterization.
:::

## Conclusion

In this tutorial we covered ordinal regression — a model for discrete outcomes with a natural order.
Whenever your dependent variable is ordinal,
there's a great chance that it violates the equidistant assumption.
This is where ordinal regression comes to the rescue.

Pumas provides `OrderedLogit` as a streamlined way to specify ordinal regression models.
Instead of manually computing cut points, logistic transforms, and category probabilities,
you simply supply a vector of log-cumulative-odds and `OrderedLogit` handles the rest.
We recommend using `OrderedLogit` for all new ordinal regression models.
