Ordinal Regression

Authors

Jose Storopoli

Vijay Ivaturi

Patrick Mogensen

Ordinal regression is a regression model for discrete data, and more specifically, when the dependent variable values have a “natural order”.

For example, opinion polls with their ubiquitous disagree-agree range of plausible values, or a patient pain perception.

Note

There are several resources available for a deep dive into ordinal regression. Specifically, we recommend:

  1. Richard McElreath - Statistical Rethinking, 2nd Edition, Section 12.3.
  2. Agresti, A. (2014). Categorical Data Analysis, Section 8.2.
  3. Bürkner, P.-C., & Vuorre, M. (2019). Ordinal Regression Models in Psychology: A Tutorial. Advances in Methods and Practices in Psychological Science, 77–101. 10.1177/2515245918823199.

1 Why not just use Linear Regression?

The main reason for not using plain linear regression with ordinal outcomes is that the response categories of an ordinal variable may not be equidistant. In linear regression, the distance between 2 and 3 is assumed to be the same as between 1 and 2. This assumption can be violated with ordinal data.

2 Ordinal Regression: The Key Idea

The core idea behind ordinal regression is to model the cumulative probabilities of the ordered categories, rather than the categories directly.

For a response variable \(Y\) with \(k\) ordered categories, we based the model on the cumulative distribution function (CDF):

\[ \gamma_j = P(Y \leq j) = \sum_{i=1}^j P(Y = i) \]

We then apply the logit function (the inverse of the logistic function) to the CDF:

\[ \alpha_j = \mathrm{logit}(\gamma_j) = \ln\left(\frac{\gamma_j}{1 - \gamma_j}\right) \]

This gives us the log-cumulative-odds — a real-valued scale where we can add intercepts and covariate effects linearly. Since the highest category always has cumulative probability 1 (and thus \(\mathrm{logit}(1) = \infty\)), we only need \(k-1\) intercepts \(\alpha_j\), known as cut points.

Tip

Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of events that produce that outcome to the number that do not: \(\mathrm{odds}(p) = \frac{p}{1 - p}\). The higher the odds, the higher the probability.

The \(k-1\) cut points can be non-equidistant, allowing the model to capture any probability distribution of the ordinal categories.

2.1 Visualizing with Synthetic Data

Let’s see this in action with a synthetic example.

using Pumas
using DataFrames
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using SummaryTables
using CategoricalArrays

Here we have a 6-category ordinal variable with probabilities 10%, 15%, 33%, 25%, 10%, and 7%. We specify this using OrderedLogit, which takes a vector of \(k-1 = 5\) logit transformed cumulative probabilities:

Tip

We are using the convenient logit function from the LogExpFunctions.jl package, which Pumas reexports. It is safer and more efficient than a user-defined function.

The 5 log-cumulative-odds values (lower-right plot) are clearly not equidistant — this is the flexibility that ordinal regression provides.

2.2 Adding Covariates

Covariate effects are incorporated as a common regression coefficient \(\beta\) subtracted from each of cut point \(\alpha_j\):

\[ \phi_j(x) = \log\frac{\gamma_j(x)}{1 - \gamma_j(x)} = \alpha_j - \beta^T x \]

where \(x\) is the vector of covariates and it must hold that \(\alpha_1 \leq \dots \leq \alpha_k\).

This model was coined the proportional-odds model by Cullagh (1980), because the odds of two different covariate vectors \(x_1\) and \(x_2\) are proportional, i.e. \[ \frac{\gamma_j(x_1)/(1 - \gamma_j(x_1))}{\gamma_j(x_2)/(1 - \gamma_j(x_2))} = \exp(\beta(x_1 - x_2)) \]

When \(\beta^T x \approx \alpha_j\) the odds for \(P(Y \leq j)\) vs \(P(Y > j)\) are even so when \(\beta^T x\) is large the higher categories (\(j\) values) will be more likely and vice versa.

Note

Mandema and Stanski (1996) parameterizes the ordinal regression model in terms of \(P(Y \geq j)\) instead of the CDF used above. In addition, the cut points are defined as a cumulative sum of parameters all required to be negative in instead being ordered as in the parameterization used in this tutorial. However, the two parameterizations are equivalent, i.e. they define the same probability model for the data.

3 Fitting an Ordinal Model in Pumas

3.1 The pk_painrelief Dataset

Let’s use the pk_painrelief dataset from PharmaDatasets.jl:

using PharmaDatasets
pk_painrelief = dataset("pk_painrelief")
first(pk_painrelief, 5)
5×7 DataFrame
Row Subject Time Conc PainRelief PainScore RemedStatus Dose
Int64 Float64 Float64 Int64 Int64 Int64 String7
1 1 0.0 0.0 0 3 1 20 mg
2 1 0.5 1.15578 1 1 0 20 mg
3 1 1.0 1.37211 1 0 0 20 mg
4 1 1.5 1.30058 1 0 0 20 mg
5 1 2.0 1.19195 1 1 0 20 mg

Key variables:

  • :Subject: subject ID.
  • :Time: time in hours.
  • :Conc: concentration (mg/L).
  • :PainRelief: binary indicator of pain relief.
  • :PainScore (dependent variable): ordinal, ranging from 0 to 3.
  • :RemedStatus: binary indicator of remedication request.
  • :Dose: placebo, 5 mg, 20 mg, or 80 mg.

3.2 Exploratory Data Analysis

Our dependent variable :PainScore takes values from 0 to 3. Since ordinal distributions in Pumas expect values starting from 1, we shift by adding 1:

pk_painrelief.PainScore |> unique |> sort
4-element Vector{Int64}:
 0
 1
 2
 3
@rtransform! pk_painrelief :PainScore = :PainScore + 1
pk_painrelief.PainScore |> unique |> sort
4-element Vector{Int64}:
 1
 2
 3
 4

To display the dose levels in the right order we first convert the column to a CategoricalVector

@transform! pk_painrelief :Dose =
    categorical(:Dose; levels = ["Placebo", "5 mg", "20 mg", "80 mg"])

Summary statistics stratified by dose:

summarytable(
    pk_painrelief,
    "PainScore";
    summary = [minimum, maximum, mean, median, std],
    cols = "Dose",
)
Dose
Placebo 5 mg 20 mg 80 mg
PainScore
minimum 1 1 1 1
maximum 4 4 4 4
mean 2.93 2.59 2.33 2.16
median 3 3 2 2
std 0.772 0.855 0.804 0.823

The distribution of the pain scores for each dose group

data(pk_painrelief) *
mapping(:Dose, stack = :PainScore => nonnumeric, color = :PainScore => nonnumeric) *
frequency() |> draw

Higher doses are associated with lower perceived pain scores.

3.3 Data Wrangling

We convert :Dose to a binary treatment indicator:

@rtransform! pk_painrelief :isRx = Int(:Dose != "Placebo");

3.4 Model Definition

In Pumas, the entire ordinal regression parameterization is captured by the OrderedLogit error model which takes vector of logits ϕ = [ϕ₁,ϕ₂,ϕ₃] as inputs OrderedLogit(ϕ). The user supplies the vector of \(k - 1\) log-cumulative-odds and OrderedLogit handles the logistic transform and probability computation internally.

Here is our ordinal regression model using OrderedLogit:

orderedlogit_model = @model begin
    @param begin
        # intercepts (k - 1 = 3 cut points)
        α  VectorDomain(; init = [0.001, 0.002, 0.003])
        # treatment effect coefficient
        βRx  RealDomain()
    end

    @covariates begin
        isRx
    end

    @pre begin
        # log-cumulative-odds vector
        lle = α .- isRx * βRx
    end

    @derived begin
        PainScore ~ @. OrderedLogit(lle)
    end
end
PumasModel
  Parameters: α, βRx
  Random effects:
  Covariates: isRx
  Dynamical system variables:
  Dynamical system type: No dynamical model
  Derived: PainScore
  Observed: PainScore

The model has four components:

  • @param: a vector of intercepts α (the \(k - 1 = 3\) cut points for our 4-category outcome) and one treatment coefficient βRx. The intercept initial values must be ordered (here 0.001 < 0.002 < 0.003) to help convergence.
  • @covariates: the binary treatment indicator isRx.
  • @pre: constructs the log-cumulative-odds vector lle by adding the treatment effect to each intercept.
  • @derived: specifies PainScore as following an OrderedLogit distribution parameterized by lle.
Tip

OrderedLogit automatically ensures that category probabilities are valid (non-negative and sum to 1). See Under the Hood for the manual alternative using Categorical, which requires you to compute probabilities yourself and guard against negative values.

3.5 Choosing Initial Values

Good initial values for the intercepts are critical for ordinal model convergence. Since the intercepts represent the log-cumulative-odds at the reference condition (all covariates = 0), we can derive them directly from the marginal distribution of the outcome.

Step 1: Calculate the marginal distribution of PainScore:

marginal = @chain pk_painrelief begin
    groupby(:PainScore)
    @combine :n = length(:PainScore)
    @rtransform :prop = :n / nrow(pk_painrelief)
end
simple_table(marginal)
PainScore n prop
1 228 0.119
2 748 0.39
3 696 0.362
4 248 0.129

Step 2: Calculate cumulative probabilities \(P(Y \leq j)\) for \(j = 1, \ldots, k - 1\):

cum_probs = cumsum(marginal.prop)[1:end-1]
3-element Vector{Float64}:
 0.11875
 0.5083333333333333
 0.8708333333333333

Step 3: Convert to the logit scale — these are our initial intercept values:

init_intercepts = logit.(cum_probs)
3-element Vector{Float64}:
 -2.0043209112117277
  0.03333642026759171
  1.9083470474796649
Tip

The resulting intercepts are properly ordered (increasing), which ensures valid probabilities at the start of optimization. For covariate effects like βRx, starting near 0 is a safe default.

3.6 Creating a Population

pk_painrelief_pop = read_pumas(
    pk_painrelief;
    observations = [:PainScore],
    covariates = [:isRx],
    id = :Subject,
    time = :Time,
    event_data = false,
)
Population
  Subjects: 160
  Covariates: isRx
  Observations: PainScore

3.7 Model Fitting

We use NaivePooled() since this model has no random effects:

pk_painrelief_fit = fit(
    orderedlogit_model,
    pk_painrelief_pop,
    (; init_params(orderedlogit_model)..., α = init_intercepts),
    NaivePooled(),
)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     2.404753e+03     1.283979e+02
 * time: 0.03441596031188965
     1     2.385524e+03     8.142892e+01
 * time: 2.205214023590088
     2     2.379769e+03     1.388532e+02
 * time: 2.206387996673584
     3     2.328261e+03     2.619273e+01
 * time: 2.2074289321899414
     4     2.327683e+03     2.703575e+01
 * time: 2.2083399295806885
     5     2.326153e+03     1.265745e+00
 * time: 2.209211826324463
     6     2.326148e+03     8.512436e-02
 * time: 2.210052013397217
     7     2.326148e+03     8.930993e-03
 * time: 2.210886001586914
     8     2.326148e+03     9.869284e-04
 * time: 2.211709976196289
FittedPumasModel

Dynamical system type:          No dynamical model

Number of subjects:                            160

Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

Number of parameters:      Constant      Optimized
                                  0              4

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                    -2326.148

---------------
      Estimate
---------------
α₁    -3.0216
α₂    -0.89654
α₃     1.0973
βRx   -1.2369
---------------

3.8 Interpreting the Coefficients

coeftable(pk_painrelief_fit) |> simple_table
parameter constant estimate
α₁ false -3.02
α₂ false -0.897
α₃ false 1.1
βRx false -1.24

Interpreting the exponentiated coefficients is sometimes easier:

(@rtransform coeftable(pk_painrelief_fit) :estimate_exp = exp(:estimate)) |> simple_table
parameter constant estimate estimate_exp
α₁ false -3.02 0.0487
α₂ false -0.897 0.408
α₃ false 1.1 3
βRx false -1.24 0.29

The intercepts αⱼ represent the baseline odds of \(P(Y \leq j)\) against \(P(Y > j)\). For example, α₁ has exponentiated value approximately 0.05 — the odds of the lowest pain score category relative to any of the higher pain scores when untreated.

The negative of the treatment coefficient βRx has an exponentiated value of approximately 3.44, meaning treatment increases the odds of higher pain relief by about 2.44x compared to placebo.

Standard errors and 95% confidence intervals:

infer(pk_painrelief_fit)
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator

Dynamical system type:          No dynamical model

Number of subjects:                            160

Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

Number of parameters:      Constant      Optimized
                                  0              4

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                    -2326.148

-------------------------------------------------
      Estimate   SE        95.0% C.I.
-------------------------------------------------
α₁    -3.0216    0.20597   [ -3.4253 ; -2.6179 ]
α₂    -0.89654   0.19315   [ -1.2751 ; -0.51798]
α₃     1.0973    0.19705   [  0.71104;  1.4835 ]
βRx   -1.2369    0.21954   [ -1.6672 ; -0.80659]
-------------------------------------------------

Other confidence levels (e.g., 90%) can be specified with level:

infer(pk_painrelief_fit; level = 0.90)
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator

Dynamical system type:          No dynamical model

Number of subjects:                            160

Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

Number of parameters:      Constant      Optimized
                                  0              4

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                    -2326.148

-------------------------------------------------
      Estimate   SE        90.0% C.I.
-------------------------------------------------
α₁    -3.0216    0.20597   [ -3.3604 ; -2.6828 ]
α₂    -0.89654   0.19315   [ -1.2142 ; -0.57884]
α₃     1.0973    0.19705   [  0.77314;  1.4214 ]
βRx   -1.2369    0.21954   [ -1.598  ; -0.87577]
-------------------------------------------------

3.9 Visualizing Predicted Probabilities

To see what the fitted model actually predicts, we can compute category probabilities for each treatment group. The probstable() function provides predicted category probabilities in a DataFrame. The columns :PainScore_iprob1 through :PainScore_iprob4 give the probability of each outcome:

eachgroup_df = @combine groupby(pk_painrelief, "isRx") begin
    :id = string(first(:Subject)) # Since output DataFrames have an "id" column of strings
    :time = first(:Time)
end
pred_df = select(
    leftjoin(eachgroup_df, probstable(pk_painrelief_fit); on = ["id", "time"]),
    Not("id", "time"),
)
simple_table(pred_df)
isRx PainScore_iprob1 PainScore_iprob2 PainScore_iprob3 PainScore_iprob4
1 0.144 0.441 0.327 0.0883
0 0.0465 0.243 0.46 0.25
data(stack(pred_df)) *
mapping(
    :isRx => (i -> ["Placebo", "Treatment"][i+1]) => "Group",
    :value => "Predicted Probability",
    stack = :variable => "Pain Score Category",
    color = :variable,
) *
visual(BarPlot) |> draw

Treatment shifts the probability mass toward lower pain scores (higher pain relief), consistent with the positive βRx coefficient.

We can also visualize the cumulative probabilities \(P(Y \leq k)\), which are the quantities directly modeled by ordinal regression:

pred_cumulative_df = combine(
    groupby(stack(pred_df), "isRx"),
    "value" => cumsum => "Predicted Cumulative Probability",
    "variable",
)
data(pred_cumulative_df) *
mapping(
    :variable => "P(Y ≤ k)",
    "Predicted Cumulative Probability",
    dodge = :isRx => nonnumeric => "Group",
    color = :isRx => (i -> ["Placebo", "Treatment"][i+1]),
) *
visual(BarPlot) |> draw

The proportional odds assumption means the treatment effect shifts all cumulative probabilities uniformly on the log-odds scale.

Similarly, \(P(Y \geq k)\) — the probability of being at or above each category:

pred_cumulative_df = combine(
    groupby(stack(pred_df), "isRx"),
    "value" => reverse  cumsum  reverse => "Predicted Cumulative Probability",
    "variable",
)
data(pred_cumulative_df) *
mapping(
    :variable => "P(Y ≥ k)",
    "Predicted Cumulative Probability",
    dodge = :isRx => nonnumeric => "Group",
    color = :isRx => (i -> ["Placebo", "Treatment"][i+1]),
) *
visual(BarPlot) |> draw

3.10 Simulations

Random observations can be generated with simobs():

simobs(orderedlogit_model, pk_painrelief_pop, coef(pk_painrelief_fit));

3.11 Model Diagnostics

DataFrame(inspect(pk_painrelief_fit))
[ Info: Calculating predictions.
[ Info: Calculating empirical bayes.
[ Info: Evaluating dose control parameters.
[ Info: Evaluating individual parameters.
[ Info: Done.
1920×8 DataFrame
1895 rows omitted
Row id time evid PainScore isRx PainScore_pred PainScore_ipred lle
String Float64 Int8 Float64? Int64? Missing Missing Array…?
1 1 0.0 0 4.0 1 missing missing [-1.78475, 0.340327, 2.33412]
2 1 0.5 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
3 1 1.0 0 1.0 1 missing missing [-1.78475, 0.340327, 2.33412]
4 1 1.5 0 1.0 1 missing missing [-1.78475, 0.340327, 2.33412]
5 1 2.0 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
6 1 2.5 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
7 1 3.0 0 1.0 1 missing missing [-1.78475, 0.340327, 2.33412]
8 1 4.0 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
9 1 5.0 0 3.0 1 missing missing [-1.78475, 0.340327, 2.33412]
10 1 6.0 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
11 1 7.0 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
12 1 8.0 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
13 10 0.0 0 3.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1909 99 0.0 0 4.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1910 99 0.5 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1911 99 1.0 0 4.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1912 99 1.5 0 3.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1913 99 2.0 0 3.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1914 99 2.5 0 4.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1915 99 3.0 0 3.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1916 99 4.0 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1917 99 5.0 0 3.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1918 99 6.0 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1919 99 7.0 0 2.0 1 missing missing [-1.78475, 0.340327, 2.33412]
1920 99 8.0 0 3.0 1 missing missing [-1.78475, 0.340327, 2.33412]

AIC:

aic(pk_painrelief_fit)
4660.2960634261635

BIC:

bic(pk_painrelief_fit)
4682.536385286251

4 Under the Hood: The Manual Categorical Approach

OrderedLogit encapsulates several transformation steps that you can also perform manually using the Categorical distribution. Understanding these steps provides insight into what the model is actually doing.

4.1 From Log-Cumulative-Odds to Probabilities

Given the \(k - 1\) log-cumulative-odds \(\phi\_1,\dots,\phi_{k-1}\), the manual approach requires three steps:

1. Apply the logistic function to convert log-cumulative-odds to cumulative probabilities (cut points):

\[ \gamma_j = \mathrm{logistic}(\phi_j) = \frac{1}{1 + e^{-\phi_j}} \]

2. Subtract consecutive cut points to recover individual category probabilities:

\[ \begin{aligned} p_1 &= \gamma_1 \\ &\cdots \\ p_j &= \gamma_k - \gamma_{k-1} \quad \text{for } 1 < j < k \\ p_k &= 1 - \gamma_{k - 1} \end{aligned} \]

3. Pass the probabilities to a Categorical distribution.

We can see the equivalence with a simple example. First, using Categorical with explicit probabilities:

p = [0.2, 0.3, 0.1, 0.4]
x = 1:length(p)
d = Categorical(p)
pdf_values = pdf.(d, x)
data((; x, pdf_values)) *
mapping(:x => "outcome", :pdf_values => "PMF") *
visual(BarPlot) |> draw

The same distribution via OrderedLogit — note we supply only \(k - 1 = 3\) log-cumulative-odds:

lle = logit.(cumsum([0.2, 0.3, 0.1]))
x = 1:(length(lle)+1)
d = OrderedLogit(lle)
pdf_values = pdf.(d, x)
data((; x, pdf_values)) *
mapping(:x => "outcome", :pdf_values => "PMF") *
visual(BarPlot) |> draw

4.2 The Manual Model

In the @pre block, we manually compute each linear predictor, apply the logistic function, and calculate category probabilities by subtraction:

@pre begin
    ϕ = α .- isRx * βRx

    # cut points for >= 1, >=2 and >=3
    γ = logistic.(ϕ)

    # probabilities for PainScore =1, =2, =3 and =4
    p₄ = 1.0 - γ[3]
    p₃ = γ[3] - γ[2]
    p₂ = γ[2] - γ[1]
    p₁ = γ[1]
end

Compare this to the OrderedLogit version which collapses everything into a single line:

@pre begin
    lle = α - isRx * βRx
end

The @derived block uses Categorical with the explicit probability vector:

PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)

versus:

PainScore ~ @. OrderedLogit(lle)

Here is the full manual model:

ordinal_model = @model begin
    @param begin
        α  VectorDomain(; init = [0.001, 0.002, 0.003])
        βRx  RealDomain()
    end

    @covariates begin
        isRx
    end

    @pre begin
        ϕ = α .- isRx * βRx

        # cut points for >= 1, >=2 and >=3
        γ = logistic.(ϕ)

        # probabilities for PainScore =1, =2, =3 and =4
        p₄ = 1.0 - γ[3]
        p₃ = γ[3] - γ[2]
        p₂ = γ[2] - γ[1]
        p₁ = γ[1]
    end

    @derived begin
        PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)
    end
end
PumasModel
  Parameters: α, βRx
  Random effects:
  Covariates: isRx
  Dynamical system variables:
  Dynamical system type: No dynamical model
  Derived: PainScore
  Observed: PainScore
Caution

With the manual Categorical approach, you must ensure that all pⱼ values are non-negative. If the cut points are not properly ordered, subtraction can produce negative probabilities. OrderedLogit handles this constraint automatically.

4.3 Comparing Results

Fitting the manual model produces identical estimates:

pk_painrelief_categorical_fit = fit(
    ordinal_model,
    pk_painrelief_pop,
    (; init_params(ordinal_model)..., α = init_intercepts),
    NaivePooled(),
)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     2.404753e+03     1.283979e+02
 * time: 1.811981201171875e-5
     1     2.385524e+03     8.142892e+01
 * time: 1.0987300872802734
     2     2.379769e+03     1.388532e+02
 * time: 1.1030330657958984
     3     2.328261e+03     2.619273e+01
 * time: 1.1072030067443848
     4     2.327683e+03     2.703575e+01
 * time: 1.1113150119781494
     5     2.326153e+03     1.265745e+00
 * time: 1.1154160499572754
     6     2.326148e+03     8.512436e-02
 * time: 1.1194651126861572
     7     2.326148e+03     8.930993e-03
 * time: 1.1235020160675049
     8     2.326148e+03     9.869284e-04
 * time: 1.1275930404663086
FittedPumasModel

Dynamical system type:          No dynamical model

Number of subjects:                            160

Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

Number of parameters:      Constant      Optimized
                                  0              4

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                    -2326.148

---------------
      Estimate
---------------
α₁    -3.0216
α₂    -0.89654
α₃     1.0973
βRx   -1.2369
---------------
Note

Both models produce identical parameter estimates, since the underlying statistical model is the same. OrderedLogit simply provides a more concise and safer parameterization.

5 Conclusion

In this tutorial we covered ordinal regression — a model for discrete outcomes with a natural order. Whenever your dependent variable is ordinal, there’s a great chance that it violates the equidistant assumption. This is where ordinal regression comes to the rescue.

Pumas provides OrderedLogit as a streamlined way to specify ordinal regression models. Instead of manually computing cut points, logistic transforms, and category probabilities, you simply supply a vector of log-cumulative-odds and OrderedLogit handles the rest. We recommend using OrderedLogit for all new ordinal regression models.

References

Cullagh, MC. 1980. “Regression Models for Ordinal Data (with Discussion).” J. Roy. Statist. Soc., B 42: 109–42.
Mandema, Jaap W, and Donald R Stanski. 1996. “Population Pharmacodynamic Model for Ketorolac Analgesia.” Clinical Pharmacology & Therapeutics 60 (6): 619–35.

Reuse