using Pumas
using DataFrames
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using SummaryTables
using CategoricalArrays
Ordinal Regression
Ordinal regression is a regression model for discrete data, and more specifically, when the dependent variable values have a “natural order”.
For example, opinion polls with their ubiquitous disagree-agree range of plausible values, or a patient pain perception.
There are several resources available for a deep dive into ordinal regression. Specifically, we recommend:
- Richard McElreath - Statistical Rethinking, 2nd Edition, Section 12.3.
- Agresti, A. (2014). Categorical Data Analysis, Section 8.2.
- Bürkner, P.-C., & Vuorre, M. (2019). Ordinal Regression Models in Psychology: A Tutorial. Advances in Methods and Practices in Psychological Science, 77–101. 10.1177/2515245918823199.
1 Why not just use Linear Regression?
The main reason for not using plain linear regression with ordinal outcomes is that the response categories of an ordinal variable may not be equidistant. In linear regression, the distance between 2 and 3 is assumed to be the same as between 1 and 2. This assumption can be violated with ordinal data.
2 Ordinal Regression: The Key Idea
The core idea behind ordinal regression is to model the cumulative probabilities of the ordered categories, rather than the categories directly.
For a response variable \(Y\) with \(k\) ordered categories, we based the model on the cumulative distribution function (CDF):
\[ \gamma_j = P(Y \leq j) = \sum_{i=1}^j P(Y = i) \]
We then apply the logit function (the inverse of the logistic function) to the CDF:
\[ \alpha_j = \mathrm{logit}(\gamma_j) = \ln\left(\frac{\gamma_j}{1 - \gamma_j}\right) \]
This gives us the log-cumulative-odds — a real-valued scale where we can add intercepts and covariate effects linearly. Since the highest category always has cumulative probability 1 (and thus \(\mathrm{logit}(1) = \infty\)), we only need \(k-1\) intercepts \(\alpha_j\), known as cut points.
Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of events that produce that outcome to the number that do not: \(\mathrm{odds}(p) = \frac{p}{1 - p}\). The higher the odds, the higher the probability.
The \(k-1\) cut points can be non-equidistant, allowing the model to capture any probability distribution of the ordinal categories.
2.1 Visualizing with Synthetic Data
Let’s see this in action with a synthetic example.
Here we have a 6-category ordinal variable with probabilities 10%, 15%, 33%, 25%, 10%, and 7%. We specify this using OrderedLogit, which takes a vector of \(k-1 = 5\) logit transformed cumulative probabilities:
We are using the convenient logit function from the LogExpFunctions.jl package, which Pumas reexports. It is safer and more efficient than a user-defined function.
The 5 log-cumulative-odds values (lower-right plot) are clearly not equidistant — this is the flexibility that ordinal regression provides.
2.2 Adding Covariates
Covariate effects are incorporated as a common regression coefficient \(\beta\) subtracted from each of cut point \(\alpha_j\):
\[ \phi_j(x) = \log\frac{\gamma_j(x)}{1 - \gamma_j(x)} = \alpha_j - \beta^T x \]
where \(x\) is the vector of covariates and it must hold that \(\alpha_1 \leq \dots \leq \alpha_k\).
This model was coined the proportional-odds model by Cullagh (1980), because the odds of two different covariate vectors \(x_1\) and \(x_2\) are proportional, i.e. \[ \frac{\gamma_j(x_1)/(1 - \gamma_j(x_1))}{\gamma_j(x_2)/(1 - \gamma_j(x_2))} = \exp(\beta(x_1 - x_2)) \]
When \(\beta^T x \approx \alpha_j\) the odds for \(P(Y \leq j)\) vs \(P(Y > j)\) are even so when \(\beta^T x\) is large the higher categories (\(j\) values) will be more likely and vice versa.
Mandema and Stanski (1996) parameterizes the ordinal regression model in terms of \(P(Y \geq j)\) instead of the CDF used above. In addition, the cut points are defined as a cumulative sum of parameters all required to be negative in instead being ordered as in the parameterization used in this tutorial. However, the two parameterizations are equivalent, i.e. they define the same probability model for the data.
3 Fitting an Ordinal Model in Pumas
3.1 The pk_painrelief Dataset
Let’s use the pk_painrelief dataset from PharmaDatasets.jl:
using PharmaDatasetspk_painrelief = dataset("pk_painrelief")
first(pk_painrelief, 5)| Row | Subject | Time | Conc | PainRelief | PainScore | RemedStatus | Dose |
|---|---|---|---|---|---|---|---|
| Int64 | Float64 | Float64 | Int64 | Int64 | Int64 | String7 | |
| 1 | 1 | 0.0 | 0.0 | 0 | 3 | 1 | 20 mg |
| 2 | 1 | 0.5 | 1.15578 | 1 | 1 | 0 | 20 mg |
| 3 | 1 | 1.0 | 1.37211 | 1 | 0 | 0 | 20 mg |
| 4 | 1 | 1.5 | 1.30058 | 1 | 0 | 0 | 20 mg |
| 5 | 1 | 2.0 | 1.19195 | 1 | 1 | 0 | 20 mg |
Key variables:
:Subject: subject ID.:Time: time in hours.:Conc: concentration (mg/L).:PainRelief: binary indicator of pain relief.:PainScore(dependent variable): ordinal, ranging from 0 to 3.:RemedStatus: binary indicator of remedication request.:Dose: placebo, 5 mg, 20 mg, or 80 mg.
3.2 Exploratory Data Analysis
Our dependent variable :PainScore takes values from 0 to 3. Since ordinal distributions in Pumas expect values starting from 1, we shift by adding 1:
pk_painrelief.PainScore |> unique |> sort4-element Vector{Int64}:
0
1
2
3
@rtransform! pk_painrelief :PainScore = :PainScore + 1
pk_painrelief.PainScore |> unique |> sort4-element Vector{Int64}:
1
2
3
4
To display the dose levels in the right order we first convert the column to a CategoricalVector
@transform! pk_painrelief :Dose =
categorical(:Dose; levels = ["Placebo", "5 mg", "20 mg", "80 mg"])Summary statistics stratified by dose:
summarytable(
pk_painrelief,
"PainScore";
summary = [minimum, maximum, mean, median, std],
cols = "Dose",
)| Dose | ||||
| Placebo | 5 mg | 20 mg | 80 mg | |
| PainScore | ||||
| minimum | 1 | 1 | 1 | 1 |
| maximum | 4 | 4 | 4 | 4 |
| mean | 2.93 | 2.59 | 2.33 | 2.16 |
| median | 3 | 3 | 2 | 2 |
| std | 0.772 | 0.855 | 0.804 | 0.823 |
The distribution of the pain scores for each dose group
data(pk_painrelief) *
mapping(:Dose, stack = :PainScore => nonnumeric, color = :PainScore => nonnumeric) *
frequency() |> drawHigher doses are associated with lower perceived pain scores.
3.3 Data Wrangling
We convert :Dose to a binary treatment indicator:
@rtransform! pk_painrelief :isRx = Int(:Dose != "Placebo");3.4 Model Definition
In Pumas, the entire ordinal regression parameterization is captured by the OrderedLogit error model which takes vector of logits ϕ = [ϕ₁,ϕ₂,ϕ₃] as inputs OrderedLogit(ϕ). The user supplies the vector of \(k - 1\) log-cumulative-odds and OrderedLogit handles the logistic transform and probability computation internally.
Here is our ordinal regression model using OrderedLogit:
orderedlogit_model = @model begin
@param begin
# intercepts (k - 1 = 3 cut points)
α ∈ VectorDomain(; init = [0.001, 0.002, 0.003])
# treatment effect coefficient
βRx ∈ RealDomain()
end
@covariates begin
isRx
end
@pre begin
# log-cumulative-odds vector
lle = α .- isRx * βRx
end
@derived begin
PainScore ~ @. OrderedLogit(lle)
end
endPumasModel
Parameters: α, βRx
Random effects:
Covariates: isRx
Dynamical system variables:
Dynamical system type: No dynamical model
Derived: PainScore
Observed: PainScore
The model has four components:
@param: a vector of interceptsα(the \(k - 1 = 3\) cut points for our 4-category outcome) and one treatment coefficientβRx. The intercept initial values must be ordered (here 0.001 < 0.002 < 0.003) to help convergence.@covariates: the binary treatment indicatorisRx.@pre: constructs the log-cumulative-odds vectorlleby adding the treatment effect to each intercept.@derived: specifiesPainScoreas following anOrderedLogitdistribution parameterized bylle.
OrderedLogit automatically ensures that category probabilities are valid (non-negative and sum to 1). See Under the Hood for the manual alternative using Categorical, which requires you to compute probabilities yourself and guard against negative values.
3.5 Choosing Initial Values
Good initial values for the intercepts are critical for ordinal model convergence. Since the intercepts represent the log-cumulative-odds at the reference condition (all covariates = 0), we can derive them directly from the marginal distribution of the outcome.
Step 1: Calculate the marginal distribution of PainScore:
marginal = @chain pk_painrelief begin
groupby(:PainScore)
@combine :n = length(:PainScore)
@rtransform :prop = :n / nrow(pk_painrelief)
end
simple_table(marginal)| PainScore | n | prop |
| 1 | 228 | 0.119 |
| 2 | 748 | 0.39 |
| 3 | 696 | 0.362 |
| 4 | 248 | 0.129 |
Step 2: Calculate cumulative probabilities \(P(Y \leq j)\) for \(j = 1, \ldots, k - 1\):
cum_probs = cumsum(marginal.prop)[1:end-1]3-element Vector{Float64}:
0.11875
0.5083333333333333
0.8708333333333333
Step 3: Convert to the logit scale — these are our initial intercept values:
init_intercepts = logit.(cum_probs)3-element Vector{Float64}:
-2.0043209112117277
0.03333642026759171
1.9083470474796649
The resulting intercepts are properly ordered (increasing), which ensures valid probabilities at the start of optimization. For covariate effects like βRx, starting near 0 is a safe default.
3.6 Creating a Population
pk_painrelief_pop = read_pumas(
pk_painrelief;
observations = [:PainScore],
covariates = [:isRx],
id = :Subject,
time = :Time,
event_data = false,
)Population
Subjects: 160
Covariates: isRx
Observations: PainScore
3.7 Model Fitting
We use NaivePooled() since this model has no random effects:
pk_painrelief_fit = fit(
orderedlogit_model,
pk_painrelief_pop,
(; init_params(orderedlogit_model)..., α = init_intercepts),
NaivePooled(),
)[ Info: Checking the initial parameter values. [ Info: The initial negative log likelihood and its gradient are finite. Check passed. Iter Function value Gradient norm 0 2.404753e+03 1.283979e+02 * time: 0.03441596031188965 1 2.385524e+03 8.142892e+01 * time: 2.205214023590088 2 2.379769e+03 1.388532e+02 * time: 2.206387996673584 3 2.328261e+03 2.619273e+01 * time: 2.2074289321899414 4 2.327683e+03 2.703575e+01 * time: 2.2083399295806885 5 2.326153e+03 1.265745e+00 * time: 2.209211826324463 6 2.326148e+03 8.512436e-02 * time: 2.210052013397217 7 2.326148e+03 8.930993e-03 * time: 2.210886001586914 8 2.326148e+03 9.869284e-04 * time: 2.211709976196289
FittedPumasModel
Dynamical system type: No dynamical model
Number of subjects: 160
Observation records: Active Missing
PainScore: 1920 0
Total: 1920 0
Number of parameters: Constant Optimized
0 4
Likelihood approximation: NaivePooled
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -2326.148
---------------
Estimate
---------------
α₁ -3.0216
α₂ -0.89654
α₃ 1.0973
βRx -1.2369
---------------
3.8 Interpreting the Coefficients
coeftable(pk_painrelief_fit) |> simple_table| parameter | constant | estimate |
| α₁ | false | -3.02 |
| α₂ | false | -0.897 |
| α₃ | false | 1.1 |
| βRx | false | -1.24 |
Interpreting the exponentiated coefficients is sometimes easier:
(@rtransform coeftable(pk_painrelief_fit) :estimate_exp = exp(:estimate)) |> simple_table| parameter | constant | estimate | estimate_exp |
| α₁ | false | -3.02 | 0.0487 |
| α₂ | false | -0.897 | 0.408 |
| α₃ | false | 1.1 | 3 |
| βRx | false | -1.24 | 0.29 |
The intercepts αⱼ represent the baseline odds of \(P(Y \leq j)\) against \(P(Y > j)\). For example, α₁ has exponentiated value approximately 0.05 — the odds of the lowest pain score category relative to any of the higher pain scores when untreated.
The negative of the treatment coefficient βRx has an exponentiated value of approximately 3.44, meaning treatment increases the odds of higher pain relief by about 2.44x compared to placebo.
Standard errors and 95% confidence intervals:
infer(pk_painrelief_fit)[ Info: Calculating: variance-covariance matrix. [ Info: Done.
Asymptotic inference results using sandwich estimator
Dynamical system type: No dynamical model
Number of subjects: 160
Observation records: Active Missing
PainScore: 1920 0
Total: 1920 0
Number of parameters: Constant Optimized
0 4
Likelihood approximation: NaivePooled
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -2326.148
-------------------------------------------------
Estimate SE 95.0% C.I.
-------------------------------------------------
α₁ -3.0216 0.20597 [ -3.4253 ; -2.6179 ]
α₂ -0.89654 0.19315 [ -1.2751 ; -0.51798]
α₃ 1.0973 0.19705 [ 0.71104; 1.4835 ]
βRx -1.2369 0.21954 [ -1.6672 ; -0.80659]
-------------------------------------------------
Other confidence levels (e.g., 90%) can be specified with level:
infer(pk_painrelief_fit; level = 0.90)[ Info: Calculating: variance-covariance matrix. [ Info: Done.
Asymptotic inference results using sandwich estimator
Dynamical system type: No dynamical model
Number of subjects: 160
Observation records: Active Missing
PainScore: 1920 0
Total: 1920 0
Number of parameters: Constant Optimized
0 4
Likelihood approximation: NaivePooled
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -2326.148
-------------------------------------------------
Estimate SE 90.0% C.I.
-------------------------------------------------
α₁ -3.0216 0.20597 [ -3.3604 ; -2.6828 ]
α₂ -0.89654 0.19315 [ -1.2142 ; -0.57884]
α₃ 1.0973 0.19705 [ 0.77314; 1.4214 ]
βRx -1.2369 0.21954 [ -1.598 ; -0.87577]
-------------------------------------------------
3.9 Visualizing Predicted Probabilities
To see what the fitted model actually predicts, we can compute category probabilities for each treatment group. The probstable() function provides predicted category probabilities in a DataFrame. The columns :PainScore_iprob1 through :PainScore_iprob4 give the probability of each outcome:
eachgroup_df = @combine groupby(pk_painrelief, "isRx") begin
:id = string(first(:Subject)) # Since output DataFrames have an "id" column of strings
:time = first(:Time)
end
pred_df = select(
leftjoin(eachgroup_df, probstable(pk_painrelief_fit); on = ["id", "time"]),
Not("id", "time"),
)
simple_table(pred_df)| isRx | PainScore_iprob1 | PainScore_iprob2 | PainScore_iprob3 | PainScore_iprob4 |
| 1 | 0.144 | 0.441 | 0.327 | 0.0883 |
| 0 | 0.0465 | 0.243 | 0.46 | 0.25 |
data(stack(pred_df)) *
mapping(
:isRx => (i -> ["Placebo", "Treatment"][i+1]) => "Group",
:value => "Predicted Probability",
stack = :variable => "Pain Score Category",
color = :variable,
) *
visual(BarPlot) |> drawTreatment shifts the probability mass toward lower pain scores (higher pain relief), consistent with the positive βRx coefficient.
We can also visualize the cumulative probabilities \(P(Y \leq k)\), which are the quantities directly modeled by ordinal regression:
pred_cumulative_df = combine(
groupby(stack(pred_df), "isRx"),
"value" => cumsum => "Predicted Cumulative Probability",
"variable",
)
data(pred_cumulative_df) *
mapping(
:variable => "P(Y ≤ k)",
"Predicted Cumulative Probability",
dodge = :isRx => nonnumeric => "Group",
color = :isRx => (i -> ["Placebo", "Treatment"][i+1]),
) *
visual(BarPlot) |> drawThe proportional odds assumption means the treatment effect shifts all cumulative probabilities uniformly on the log-odds scale.
Similarly, \(P(Y \geq k)\) — the probability of being at or above each category:
pred_cumulative_df = combine(
groupby(stack(pred_df), "isRx"),
"value" => reverse ∘ cumsum ∘ reverse => "Predicted Cumulative Probability",
"variable",
)
data(pred_cumulative_df) *
mapping(
:variable => "P(Y ≥ k)",
"Predicted Cumulative Probability",
dodge = :isRx => nonnumeric => "Group",
color = :isRx => (i -> ["Placebo", "Treatment"][i+1]),
) *
visual(BarPlot) |> draw3.10 Simulations
Random observations can be generated with simobs():
simobs(orderedlogit_model, pk_painrelief_pop, coef(pk_painrelief_fit));3.11 Model Diagnostics
DataFrame(inspect(pk_painrelief_fit))[ Info: Calculating predictions. [ Info: Calculating empirical bayes. [ Info: Evaluating dose control parameters. [ Info: Evaluating individual parameters. [ Info: Done.
| Row | id | time | evid | PainScore | isRx | PainScore_pred | PainScore_ipred | lle |
|---|---|---|---|---|---|---|---|---|
| String | Float64 | Int8 | Float64? | Int64? | Missing | Missing | Array…? | |
| 1 | 1 | 0.0 | 0 | 4.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 2 | 1 | 0.5 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 3 | 1 | 1.0 | 0 | 1.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 4 | 1 | 1.5 | 0 | 1.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 5 | 1 | 2.0 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 6 | 1 | 2.5 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 7 | 1 | 3.0 | 0 | 1.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 8 | 1 | 4.0 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 9 | 1 | 5.0 | 0 | 3.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 10 | 1 | 6.0 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 11 | 1 | 7.0 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 12 | 1 | 8.0 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 13 | 10 | 0.0 | 0 | 3.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 1909 | 99 | 0.0 | 0 | 4.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1910 | 99 | 0.5 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1911 | 99 | 1.0 | 0 | 4.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1912 | 99 | 1.5 | 0 | 3.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1913 | 99 | 2.0 | 0 | 3.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1914 | 99 | 2.5 | 0 | 4.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1915 | 99 | 3.0 | 0 | 3.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1916 | 99 | 4.0 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1917 | 99 | 5.0 | 0 | 3.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1918 | 99 | 6.0 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1919 | 99 | 7.0 | 0 | 2.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
| 1920 | 99 | 8.0 | 0 | 3.0 | 1 | missing | missing | [-1.78475, 0.340327, 2.33412] |
AIC:
aic(pk_painrelief_fit)4660.2960634261635
BIC:
bic(pk_painrelief_fit)4682.536385286251
4 Under the Hood: The Manual Categorical Approach
OrderedLogit encapsulates several transformation steps that you can also perform manually using the Categorical distribution. Understanding these steps provides insight into what the model is actually doing.
4.1 From Log-Cumulative-Odds to Probabilities
Given the \(k - 1\) log-cumulative-odds \(\phi\_1,\dots,\phi_{k-1}\), the manual approach requires three steps:
1. Apply the logistic function to convert log-cumulative-odds to cumulative probabilities (cut points):
\[ \gamma_j = \mathrm{logistic}(\phi_j) = \frac{1}{1 + e^{-\phi_j}} \]
2. Subtract consecutive cut points to recover individual category probabilities:
\[ \begin{aligned} p_1 &= \gamma_1 \\ &\cdots \\ p_j &= \gamma_k - \gamma_{k-1} \quad \text{for } 1 < j < k \\ p_k &= 1 - \gamma_{k - 1} \end{aligned} \]
3. Pass the probabilities to a Categorical distribution.
We can see the equivalence with a simple example. First, using Categorical with explicit probabilities:
p = [0.2, 0.3, 0.1, 0.4]
x = 1:length(p)
d = Categorical(p)
pdf_values = pdf.(d, x)
data((; x, pdf_values)) *
mapping(:x => "outcome", :pdf_values => "PMF") *
visual(BarPlot) |> drawThe same distribution via OrderedLogit — note we supply only \(k - 1 = 3\) log-cumulative-odds:
lle = logit.(cumsum([0.2, 0.3, 0.1]))
x = 1:(length(lle)+1)
d = OrderedLogit(lle)
pdf_values = pdf.(d, x)
data((; x, pdf_values)) *
mapping(:x => "outcome", :pdf_values => "PMF") *
visual(BarPlot) |> draw4.2 The Manual Model
In the @pre block, we manually compute each linear predictor, apply the logistic function, and calculate category probabilities by subtraction:
@pre begin
ϕ = α .- isRx * βRx
# cut points for >= 1, >=2 and >=3
γ = logistic.(ϕ)
# probabilities for PainScore =1, =2, =3 and =4
p₄ = 1.0 - γ[3]
p₃ = γ[3] - γ[2]
p₂ = γ[2] - γ[1]
p₁ = γ[1]
endCompare this to the OrderedLogit version which collapses everything into a single line:
@pre begin
lle = α - isRx * βRx
endThe @derived block uses Categorical with the explicit probability vector:
PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)versus:
PainScore ~ @. OrderedLogit(lle)Here is the full manual model:
ordinal_model = @model begin
@param begin
α ∈ VectorDomain(; init = [0.001, 0.002, 0.003])
βRx ∈ RealDomain()
end
@covariates begin
isRx
end
@pre begin
ϕ = α .- isRx * βRx
# cut points for >= 1, >=2 and >=3
γ = logistic.(ϕ)
# probabilities for PainScore =1, =2, =3 and =4
p₄ = 1.0 - γ[3]
p₃ = γ[3] - γ[2]
p₂ = γ[2] - γ[1]
p₁ = γ[1]
end
@derived begin
PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)
end
endPumasModel
Parameters: α, βRx
Random effects:
Covariates: isRx
Dynamical system variables:
Dynamical system type: No dynamical model
Derived: PainScore
Observed: PainScore
With the manual Categorical approach, you must ensure that all pⱼ values are non-negative. If the cut points are not properly ordered, subtraction can produce negative probabilities. OrderedLogit handles this constraint automatically.
4.3 Comparing Results
Fitting the manual model produces identical estimates:
pk_painrelief_categorical_fit = fit(
ordinal_model,
pk_painrelief_pop,
(; init_params(ordinal_model)..., α = init_intercepts),
NaivePooled(),
)[ Info: Checking the initial parameter values. [ Info: The initial negative log likelihood and its gradient are finite. Check passed. Iter Function value Gradient norm 0 2.404753e+03 1.283979e+02 * time: 1.811981201171875e-5 1 2.385524e+03 8.142892e+01 * time: 1.0987300872802734 2 2.379769e+03 1.388532e+02 * time: 1.1030330657958984 3 2.328261e+03 2.619273e+01 * time: 1.1072030067443848 4 2.327683e+03 2.703575e+01 * time: 1.1113150119781494 5 2.326153e+03 1.265745e+00 * time: 1.1154160499572754 6 2.326148e+03 8.512436e-02 * time: 1.1194651126861572 7 2.326148e+03 8.930993e-03 * time: 1.1235020160675049 8 2.326148e+03 9.869284e-04 * time: 1.1275930404663086
FittedPumasModel
Dynamical system type: No dynamical model
Number of subjects: 160
Observation records: Active Missing
PainScore: 1920 0
Total: 1920 0
Number of parameters: Constant Optimized
0 4
Likelihood approximation: NaivePooled
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -2326.148
---------------
Estimate
---------------
α₁ -3.0216
α₂ -0.89654
α₃ 1.0973
βRx -1.2369
---------------
Both models produce identical parameter estimates, since the underlying statistical model is the same. OrderedLogit simply provides a more concise and safer parameterization.
5 Conclusion
In this tutorial we covered ordinal regression — a model for discrete outcomes with a natural order. Whenever your dependent variable is ordinal, there’s a great chance that it violates the equidistant assumption. This is where ordinal regression comes to the rescue.
Pumas provides OrderedLogit as a streamlined way to specify ordinal regression models. Instead of manually computing cut points, logistic transforms, and category probabilities, you simply supply a vector of log-cumulative-odds and OrderedLogit handles the rest. We recommend using OrderedLogit for all new ordinal regression models.