Ordinal Regression

Author

Jose Storopoli

Ordinal regression is a regression model for discrete data, and more specifically, when the dependent variable values have a “natural order”.

For example, opinion pools with their ubiquously disagree-agree range of plausible values, or a patient pain perception.

Note

There are several resources available for a deep dive into ordinal regression. Specifically, we recommend:

  1. Richard McElreath - Statistical Rethinking, 2nd Edition, Section 12.3.
  2. Agresti, A. (2014). Categorical Data Analysis, Section 8.2.
  3. Bürkner, P.-C., & Vuorre, M. (2019). Ordinal Regression Models in Psychology: A Tutorial. Advances in Methods and Practices in Psychological Science, 77–101. 10.1177/2515245918823199.

1 Why not just use Linear Regression?

The main reason for not using plain linear regression with ordinal outcomes (dependent variable) is that the response categories of an ordinal variable may not be equidistant. This is an assumption in linear regression (and in almost all models that uses metric dependent variables): the distance between, for example, 2 and 3 is not the same distance between 1 and 2. This assumption can be violated in an ordinal regression.

2 How to deal with Ordered Discrete Dependent Variable?

So, how we deal with ordered discrete responses in our dependent variable? This is similar with the previous logistic regression approach. We have to do a non-linear transformation of the dependent variable.

2.1 Cumulative Distribution Function (CDF)

In the case of ordinal regression, we need to first transform the dependent variable into a cumulative scale. We need to calculate the cumulative distribution function (CDF) of our dependent variable:

\[P(Y \leq y) = \sum^y_{i=y_{\text{min}}} P(Y = i)\]

The CDF is a monotonic increasing function that depicts the probability of our random variable \(Y\) taking values less than a certain value. In our case, the discrete ordinal case, these values can be represented as positive integers ranging from 1 to the length of possible values. For instance, a 6-categorical ordered response variable will have 6 possible values, and all their CDFs will lie between 0 and 1. Furthermore, their sum will be 1; since it represents the total probability of the variable taking any of the possible values, i.e. 100%.

2.2 Log-cumulative-odds

That is still not enough, we need to apply the logit function to the CDF:

\[\mathrm{logit}(x) = \mathrm{logistic}^{-1}(x) = \ln\left(\frac{x}{1 -x}\right)\]

where \(\ln\) is the natural logarithm function.

The logit is the inverse of the logistic transformation, it takes as a input any number between 0 and 1 (where a probability is the perfect candidate) and outputs a real number, which we call the log-odds.

Tip

Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of events that produce that outcome to the number that do not. To convert probability \(p\) to \(\mathrm{odds}\) you can use the formula:

\[\mathrm{odds}(p) = \frac{p}{1-p}\]

The higher the odds, the higher the probability.

Since we are taking the log-odds of the CDF, we can call this complete transformation as log-odds of the CDF, or log-cumulative-odds.

2.3 \(K-1\) Intercepts

Now, the next question is: what do we do with the log-cumulative-odds?

We need the log-cumulative-odds because it allows us to construct different intercepts for the possible values our ordinal dependent variable. We create an unique intercept for each possible outcome \(k \in K\).

Note

Notice that the highest probable value of \(Y\) will always have a log-cumulative-odds of \(\infty\), since for \(p=1\):

\[\ln \frac{p}{1-p} = \ln \frac{1}{1-1} = \ln 0 = \infty\]

Thus, we only need \(K-1\) intercepts for a \(K\) possible dependent variables’ response values.

These are known as cut points.

Each intercept implies a CDF for each value \(K\). This allows us to violate the equidistant assumption absent in most ordinal variables.

Each intercept implies a log-cumulative-odds for each \(k \in K\). We also need to undo the cumulative nature of the \(K-1\) intercepts. We can accomplish this by first converting a log-cumulative-odds back to a cumulative probability. This is done by undoing the logit transformation and applying the logistic function:

\[\mathrm{logit}^{-1}(x) = \mathrm{logistic}(x) = \frac{1}{1 + e^{-x}}\]

Then, finally, we can remove the cumulative from “cumulative probability” by subtraction of each of the \(k\) cut points by their previous \(k-1\) cut point:

\[P(Y=k) = P(Y \leq k) - P(Y \leq k-1)\]

where \(Y\) is the dependent variable and \(k \in K\) are the cut points for each intercept.

2.3.1 Example with Synthetic Data

Let us show an example with some synthetic data.

First, we’ll load some packages:

using Pumas
using Distributions
using DataFrames
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics

Here we have a discrete variable x with 6 possible ordered values as response. The values range from 1 to 6 having probability, respectively: 10%, 15%, 33%, 25%, 10%, and 7%; represented with the probs vector. The underlying distribution is represented by a Categorical distribution from Distributions.jl, which takes a vector of probabilities as parameters (probs).

For each value we are calculating:

  1. Probability Mass Function with the pdf function
  2. Cumulative Distribution Function with the cdf function
  3. Log-cumulative-odds with the logit transformation of the CDF

In the plot below there are 3 subplots:

  • Upper corner: histogram of x
  • Left lower corner: CDF of x
  • Right lower corner: log-cumulative-odds of x
probs = [0.10, 0.15, 0.33, 0.25, 0.10, 0.07]
dist = Categorical(probs)
x = 1:length(probs)
x_pmf = pdf.(dist, x)
x_cdf = cdf.(dist, x)
x_logodds_cdf = logit.(x_cdf)
df = DataFrame(; x, x_pmf, x_cdf, x_logodds_cdf)
labels = ["CDF", "Log-cumulative-odds"]
fig = Figure()
plt1 = data(df) * mapping(:x, :x_pmf) * visual(BarPlot)
plt2 =
    data(df) *
    mapping(:x, [:x_cdf, :x_logodds_cdf]; col = dims(1) => renamer(labels)) *
    visual(ScatterLines)
axis = (; xticks = 1:6)
draw!(fig[1, 2:3], plt1; axis)
draw!(fig[2, 1:4], plt2; axis, facet = (; linkyaxes = :none))
fig

Tip

We are using the convenient logit function from StatsFuns.jl package. It is safer and more efficient than a simple user-defined function.

As we can see, we have \(K-1\) (in our case \(6-1=5\)) intercept values in log-cumulative-odds. You can carly see that these values they violate the equidistant assumption for metric response values. The spacing between the cut points are not the same, they vary.

Note

We generally assign \(\alpha\) to intercept in statistical models. Since we have \(K-1\) intercepts, we need a vector of intercepts. Thus, our single parameter \(\alpha\) turns into a vector parameter with a bold \(\boldsymbol{\alpha}\) to denote it is a vector.

3 Adding Coefficients \(\boldsymbol{\beta}\)

Ok, the \(K-1\) intercepts \(\boldsymbol{\alpha}\) are done. Now let’s add coefficients to act as covariate effects in our ordinal regression model.

We transformed everything into log-odds scale so that we can add effects (coefficients multiplying a covariate) or basal rates (intercepts) together. For each \(k \in K-1\), we calculate:

\[\phi_k = \alpha_k + \beta_i x_i\]

where \(\alpha_k\) is the log-cumulative-odds for the \(k \in K-1\) intercepts, \(\beta_i\) is the coefficient for the \(i\)th covariate \(x\). Finally, \(\phi_k\) represents the linear predictor for the \(k\)th intercept.

Note

Observe that the coefficient \(\beta\) is being added to a log-cumulative-odds, such that, it will be expressed in a log-cumulative-odds also.

We can express it in matrix form:

\[\boldsymbol{\phi} = \boldsymbol{\alpha} + \mathbf{X} \cdot \boldsymbol{\beta}\]

where \(\boldsymbol{\phi}\), \(\boldsymbol{\alpha}\) and \(\boldsymbol{\beta}\) are vectors and \(\mathbf{X}\) is the data matrix where each row is an observation and each column a covariate.

This still obeys the ordered constraint on the dependent variable possible values.

3.0.1 How to Interpret Coefficient \(\boldsymbol{\beta}\)?

Now, suppose we have found our ideal value for our \(\boldsymbol{\beta}\). How we would interpret our \(\boldsymbol{\beta}\) estimated values?

First, to recap, \(\boldsymbol{\beta}\) measures the strength of association between the covariate \(\mathbf{x}\) and dependent variable \(\mathbf{y}\). But, \(\boldsymbol{\beta}\) is nested inside a non-linear transformation called logistic function:

\[\mathrm{logistic}(\boldsymbol{\beta}) = \frac{1}{1 + e^{-\boldsymbol{\beta}}}\]

So, our first step is to undo the logistic function. There is a function that is called logit function that is the inverse of the logistic function:

\[\mathrm{logistic}^{-1}(x) = \mathrm{logit}(x) = \ln\left(\frac{x}{1 -x}\right)\]

where \(\ln\) is the natural logarithm function.

If we analyze closely the logit function we can find that inside the \(\ln\) there is a disguised odds in the \(\frac{x}{1 -x}\). Hence, our \(\boldsymbol{\beta}\) can be interpreted as the logarithm of the odds, or in short form: the log-odds.

Tip

Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of events that produce that outcome to the number that do not. To convert probability \(p\) to \(\mathrm{odds}\) you can use the formula:

\[\mathrm{odds}(p) = \frac{p}{1-p}\]

The higher the odds, the higher the probability.

Since odds take values between \((0, \infty)\), and 1 is interpreted as neutral odds (like probability 0.5); the log-odds have a nice intuition that any negative value, that is the \(\ln\) of something between \((0, 1)\), enhances the probability of \(y\) taking lower values, while any positive value, that is the \(\ln\) of something between \((1, \infty)\), enhances the probability of \(y\) taking higher values. Log-odds equal to 0 is exactly \(\ln(1)\), i.e. odds as 1:1; which means neutral effect on probability of \(y\) taking lower or higher values.

The log-odds are the key to interpret coefficient \(\boldsymbol{\beta}\). Any positive value of \(\beta\) means that there exists a positive association between \(x\) and \(y\), while any negative values indicate a negative association between \(x\) and \(y\). Values close to 0 demonstrates the lack of association between \(x\) and \(y\).

4 Likelihood

We have almost everything we need for our ordinal regression. We are only missing a final touch. Currently our logistic function outputs a vector of probabilities that sums to 1.

All of the intercepts \(\alpha_k\) along with the coefficients \(\beta_i\) are in log-cumulative-odds scale. If we apply the logistic function to the linear predictors \(\phi_k\) we get \(K-1\) probabilities: one for each \(\phi_k\)

We need a likelihood that can handle a vector of probabilities and outputs a single discrete value. The categorical distribution is the perfect candidate.

Tip

Categorical distribution is also called generalized Bernoulli distribution or multinoulli distribution.

A categorical distribution is a discrete probability distribution that describes the possible results of a random variable that can take on one of \(K\) possible categories, with the probability of each category separately specified. It is **parameterized by a vector of probabilities \(\mathbf{p}\) where each \(p_1, \dots, p_k\) is a valid probability and sums to 1:

\[p_k \geq 0, \sum^K_{k=1} p_k = 1\]

In Pumas (and, in general, in Julia) you can specify a categorical distribution using the Categorical distribution type. By default, it takes a sequence of arguments as probabilities with the constraint that they must sum up to 1. Very similar to the “math” version.

For example here it is a categorical distribution with 4 possible discrete values and for each value probability 0.2, 0.3, 0.1, 0.4; respectively:

p = [0.2, 0.3, 0.1, 0.4]
x = 1:length(p)
d = Categorical(p)
pdf_values = map(x -> pdf(d, x), x)
data((; x, pdf_values)) *
mapping(:x => "outcome", :pdf_values => "PMF") *
visual(BarPlot) |> draw

5 Full Statistical Model

So the full statistical model with all the transformations and parametrizations we exposed so far becomes:

\[\begin{aligned} \mathbf{y} &\sim \mathrm{Categorical}(\mathbf{p}) \\ \mathbf{p} &= \mathrm{logistic}(\boldsymbol{\phi}) \\ \boldsymbol{\phi} &= \boldsymbol{\alpha} + \mathbf{X} \cdot \boldsymbol{\beta} \\ \alpha_1 &= \mathrm{CDF}(y_1) \\ \alpha_k &= \mathrm{CDF}(y_k) - \mathrm{CDF}(y_{k-1}) \quad \mathrm{for} \quad 1 < k < K-1 \\ \alpha_{K-1} &= 1 - \mathrm{CDF}(y_{K-1}) \end{aligned}\]

where:

  • \(\mathbf{y}\): ordered discrete dependent variable
  • \(\mathbf{p}\): probability vector of length \(K\)
  • \(K\): number of possible values \(\mathbf{y}\) can take, i.e. number of ordered discrete values
  • \(\boldsymbol{\phi}\): log-cumulative-odds, i.e. cut points considering the intercepts and covariates effect
  • \(\alpha_k\): intercept in log-cumulative-odds for each \(k \in K-1\)
  • \(\mathbf{X}\): covariate data matrix
  • \(\boldsymbol{\beta}\): coefficient vector of the same length as the number of columns in \(\mathbf{X}\)
  • \(\mathrm{logistic}\): logistic function
  • \(\mathrm{CDF}\): cumulative distribution function
Note

Please note that the first cut off is only \(\alpha_1\) and the last cut point is \(1 - \alpha_{\mathrm{K-1}}\).

6 pk_painrelief Dataset

Let’s use the pk_painrelief dataset from PharmaDatasets.jl to run our categorical regression example in this lesson:

using PharmaDatasets
pk_painrelief = dataset("pk_painrelief")
first(pk_painrelief, 5)
5×7 DataFrame
Row Subject Time Conc PainRelief PainScore RemedStatus Dose
Int64 Float64 Float64 Int64 Int64 Int64 String7
1 1 0.0 0.0 0 3 1 20 mg
2 1 0.5 1.15578 1 1 0 20 mg
3 1 1.0 1.37211 1 0 0 20 mg
4 1 1.5 1.30058 1 0 0 20 mg
5 1 2.0 1.19195 1 1 0 20 mg

We can see that we have 1920 observations for 160 different subjects and also some key variables:

  • :Subject: discrete variable indicating subject ID.
  • :Time: continuous variable indicating time in hours.
  • :Conc: concentration (mg/L).
  • :PainRelief: discrete binary variable indicating if the subject experienced pain relief.
  • :PainScore (dependent variable): discrete ordinal ranging from 0 to 3.
  • :RemedStatus: discrete binary variable indicating if the subject has requested remedication.
  • :Dose: dosage of the pain medication. Either placebo, 5mg, 20mg or 80mg.

6.1 Exploratory Data Analysis

As we can see, the column :PainScore, our dependent variable, takes values from 0 to 3. However, since the categorical distribution expects discrete values that ranges from 1 to \(\infty\), we need add 1 to every value of :PainScore. This shifts its values, changing the range of possible values to become from 1 to 4:

pk_painrelief.PainScore |> unique |> sort
4-element Vector{Int64}:
 0
 1
 2
 3
@rtransform! pk_painrelief :PainScore = :PainScore + 1
pk_painrelief.PainScore |> unique |> sort
4-element Vector{Int64}:
 1
 2
 3
 4

Let’s explore our data with some summaries. First, we would like to know the mean and median of :PainRelief and :PainScore stratified by :Dose.

We can accomplish this with a groupby() inside @chain macro:

Note

Notice that we are using [:col1, :col2, ...] .=> [fun1 fun2 ...] inside a escaped $(...) call to the @combine macro. This will create all the possible combinations of cols and funs as summary functions to apply to our groups.

@chain pk_painrelief begin
    groupby(:Dose)
    @combine $([:PainRelief, :PainScore] .=> [mean median std])
    @orderby :PainRelief_mean
end
4×7 DataFrame
Row Dose PainRelief_mean PainScore_mean PainRelief_median PainScore_median PainRelief_std PainScore_std
String7 Float64 Float64 Float64 Float64 Float64 Float64
1 Placebo 0.283333 2.92917 0.0 3.0 0.451087 0.772156
2 5 mg 0.445833 2.58958 0.0 3.0 0.497576 0.854984
3 20 mg 0.595833 2.32917 1.0 2.0 0.491242 0.804465
4 80 mg 0.664583 2.16042 1.0 2.0 0.472628 0.823263

As we can see there is a significant correlation between :Dose, :PainRelief and :PainScore.

6.2 Visualizations

Let’s do some plotting. First, we’ll explore our dependent variable: :PainScore:

data(pk_painrelief) *
mapping(:Dose => sorter(["Placebo", "5 mg", "20 mg", "80 mg"]), :PainScore) *
AlgebraOfGraphics.expectation() |> draw

As the dose get’s higher, the perceived pain score gets lower.

Let’s talke a look at the relationship between concentration and pain relief stratified by dosage.

We need a dataset without the placebo:

pk_painrelief_drop_placebo = @rsubset pk_painrelief :Dose != "Placebo";
data(pk_painrelief_drop_placebo) *
mapping(:Conc, :PainScore; layout = :Dose => sorter(["5 mg", "20 mg", "80 mg"])) *
visual(; linewidth = 3) *
AlgebraOfGraphics.linear() |> (x -> draw(x; facet = (; linkxaxes = :none)))

We can clearily see a negative relationship between concentration and perceived pain score.

7 Pumas modeling

Finally, we’ll create a Pumas model for ordinal regression.

7.1 Pumas’ Workflow

  1. Define a model. It can be a PumasModel or a PumasEMModel.
  2. Define a Subject and Population.
  3. Fit your model.
  4. Perform inference on the model’s population-level parameters (also known as fixed effects).
  5. Predict from a fitted model using the empirical Bayes estimate or Simulate random observations from a fitted model using the sampling distribution (also known as likelihood).
  6. Model diagnostics and visual predictive checks.
Note

Pumas modeling is a highly iterative process. Eventually you’ll probably go back and forth in the stages of workflow. This is natural and expected in Pumas’ modeling. Just remember to have a good code organization and version control for all of the workflow iterations.

But first, notice that :Dose is a string with either 5mg, 20mg, 80mg, or placebo. We need to convert the :Dose column to a dummy variable (placebo versus treatment).

pk_painrelief.Dose |> unique
4-element Vector{InlineStrings.String7}:
 "20 mg"
 "80 mg"
 "Placebo"
 "5 mg"
@rtransform! pk_painrelief :isRx = Int(:Dose != "Placebo");

7.1.1 Defining a model in Pumas

To define a model in Pumas, you can use either the macros or the function interfaces. We will cover the macros interface in this lesson.

To define a model using the macro interface, you start with a begin .. end block of code with @model:

my_model = @model begin end
PumasModel
  Parameters: 
  Random effects: 
  Covariates: 
  Dynamical system variables: 
  Dynamical system type: No dynamical model
  Derived: 
  Observed: 

This creates an empty model. Now we need to populate it with model components. These are additional macros that we can include inside the @model definition. We won’t be covering all the possible macros that we can use in this lesson, but here is the full list:

  • @param, fixed effects specifications.
  • @random, random effects specifications.
  • @covariates, covariate names.
  • @pre, pre-processing variables for the dynamic system and statistical specification.
  • @dosecontrol, specification of any dose control parameters present in the model.
  • @vars, shorthand notation.
  • @init, initial conditions for the dynamic system.
  • @dynamics, dynamics of the model.
  • @derived, statistical modeling of dependent variables.
  • @observed, model information to be stored in the model solution.
7.1.1.1 Model parameter with @param

Ok, let’s start with our ordinal_model using the macro @model interface.

First, we’ll define our parameters with the @parameters macro:

@param begin
    ParameterName  Domain(...)
    ...
end
Tip

To use the “in” () operator in Julia, you can either replace for in or type \in <TAB> for the Unicode character.

By using the begin ... end block we can specify one parameter per line. We will just name the parameters the same name as it’s respective covariate, while also adding the β prefix to make clear that this is a coefficient.

Regarding the Domain(...), Pumas has several types of domains for you to specify in your @param block. Here is a list:

  • RealDomain for scalar parameters
  • VectorDomain for vectors
  • PDiagDomain for positive definite matrices with diagonal structure
  • PSDDomain for general positive semi-definite matrices

We will only use scalar parameters in our logistic regression model, so we just need the RealDomain.

Tip

If you don’t specify any arguments inside the domain constructor it will either error (for some domains that have required arguments) or will use the defaults. In the case of the RealDomain() without arguments it just uses the following arguments:

RealDomain(; lower = -∞, upper = ∞, init = 0)
@param begin
    # intercepts
    α₁  RealDomain(; init = 0.001)
    α₂  RealDomain(; init = 0.002)
    α₃  RealDomain(; init = 0.003)
    # coefficients
    βRx  RealDomain()
end
7.1.1.1.1 Ordered Intercepts αₖs

As you can see we opted for initial values very close to zero, but also they have another property: the initial values are ordered.

This makes it easier for the model to converge, and in our case not having these initial parameter values would not make the model converge at all.

7.1.1.2 Model covariates with @covariates

We also have to specify our model covariates using the macro @covariates. This can be done using a one-liner syntax, without begin ... end block:

@covariates Dose

Or using the begin ... end block and specifying one covariate per line. This is what we will do, since our model has several covariates, which makes easy to visualize them and also to comment out a covariate to perform iterative model exploration:

@covariates begin
    Dose
end
7.1.1.3 Pre-processing variables with @pre

We can specify all the necessary variable and statistical pre-processing with the @pre macro.

Note

The @pre block is traditionally used to specify the inputs of the Ordinary D ifferential Equations (ODE) system used for non-linear mixed-effects PK/PD models (NLME).

But we will use @pre to specify our variable transformations instead.

Here, we are defining all the coefficients-covariates multiplication operation here. We are also defining our linear predictors ϕₖs that should be added to the intercept and passed to the non-linear logistic function transformation to become cut points cutₖ.

Finally, we need our logistic function. It could be tricky to use our own user-defined logistic function because it would not be numerical stable. For this, we can use the logistic function from StatsFuns.jl package, which Pumas reexports to us.

Tip

To use the logistic function from StatsFuns.jl, you just need to load Pumas:

using Pumas
# now we should have logistic available
logistic(x)

So the @pre block becomes:

@pre begin
    # linear predictors in log-cumulative-odds
    isRx = (Dose != "Placebo") * βRx

    ϕ₁ = α₁ + isRx
    ϕ₂ = α₂ + isRx
    ϕ₃ = α₃ + isRx

    # cut points for >= 1, >=2 and >=3
    cut₁ = logistic(ϕ₁)
    cut₂ = logistic(ϕ₂)
    cut₃ = logistic(ϕ₃)

    # probabilities for PainRelief (Y) =1, =2, =3 and =4
    p₄ = 1.0 - cut₃
    p₃ = cut₃ - cut₂
    p₂ = cut₂ - cut₁
    p₁ = cut₁
end
7.1.1.4 Statistical modeling of dependent variables with @derived

Our final block, **@derived, is used to specify all the assumed distributions of observed variables that are derived from the blocks above. This is where we include our dependent variable: PainScore**.

Note that PainScore is being declared as following a Categorical distribution. This is why we use the tilde notation ~. It means (much like the mathematical model notation) that PainScore follows a Categorical distribution. Since PainScore is a vector of values, we need to broadcast, i.e. vectorize, the operation with the dot . operator:

PainScore ~ Categorical.(p₁, ..., pₖ)

where p₁, ..., pₖ are the probabilities of PainScore taking values 1, ..., k by subtracting the cut points. This is how we parametrize a Categorical distribution. With @. vectorization this becomes:

PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)
Note

We are using the @. macro which tells Julia to vectorize (add the “dot syntax”) to all operations and function calls to the right of it.

Tip

In this block we can use all variables defined in the previous blocks, in our example the @param, @covariates and @pre blocks.

Here is our full model, once we combined all the macro blocks we just covered:

ordinal_model = @model begin
    @param begin
        # intercepts
        α₁  RealDomain(; init = 0.001)
        α₂  RealDomain(; init = 0.002)
        α₃  RealDomain(; init = 0.003)
        # coefficients
        βRx  RealDomain()
    end

    @covariates begin
        isRx
    end

    @pre begin
        # linear predictors in log-cumulative-odds
        _isRx = isRx * βRx

        ϕ₁ = α₁ + _isRx
        ϕ₂ = α₂ + _isRx
        ϕ₃ = α₃ + _isRx

        # cut points for >= 1, >=2 and >=3
        cut₁ = logistic(ϕ₁)
        cut₂ = logistic(ϕ₂)
        cut₃ = logistic(ϕ₃)

        # probabilities for PainRelief (Y) =1, =2, =3 and =4
        p₄ = 1.0 - cut₃
        p₃ = cut₃ - cut₂
        p₂ = cut₂ - cut₁
        p₁ = cut₁
    end

    @derived begin
        PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)
    end
end
PumasModel
  Parameters: α₁, α₂, α₃, βRx
  Random effects: 
  Covariates: isRx
  Dynamical system variables: 
  Dynamical system type: No dynamical model
  Derived: PainScore
  Observed: PainScore
Caution

In this model example we have simplified some things for the sake of the tutorial. If you want a fail proof model, you’ll probably would want to add positive constraints on the pₖs to make sure that they are a valid probability (non-negative).

7.1.2 Define a Subject and Population

Once we have our model defined we have to specify a Subject and a Population.

In Pumas, subjects (i.e. observations) are represented by the Subject type and collections of subjects are represented as vectors of Subjects are defined as Population.

Subjects can be constructed with the Subject() constructor, for example:

Subject()
Subject
  ID: 1

We just constructed a Subject that has ID equal to 1 and no extra information. In order to provide extra information, you would need to pass keyword arguments to the Subject() constructor.

You can use the following keyword arguments inside Subject():

  • id: identifier.
  • observations: a named tuple of the dependent variables.
  • covariates: a named tuple containing the covariates, or nothing.
  • events: a vector of Events.
  • time: a vector of time stamps for the observations.

For this lesson, we will just use the covariates keyword argument.

This is an example of a Subject with ID equal to 42 and with some covariates specified as a named tuple:

Subject(id = 42, covariates = (; PainScore = 2, isRx = 0))
Subject
  ID: 42
  Covariates: PainScore, isRx

Since a Population is just a vector of Subjects, we can use a simple map() function over the IDs present at our nausea dataset:

pop = map(
    i -> Subject(
        id = i,
        covariates = (; PainScore = pk_painrelief.PainScore, isRx = pk_painrelief.isRx),
    ),
    1:nrow(pk_painrelief),
)
Population
  Subjects: 1920
  Covariates: PainScore, isRx
  Observations: 
7.1.2.1 Reading Subjects directly from a DataFrame

Another option is to use the read_pumas() function to read Subjects (and Population) directly from a DataFrame.

This is more convenient than the map() with the Subject() constructor inside.

The read_pumas() function accepts as first argument a DataFrame followed by the following keyword options (which are very similar to the Subject() constructor):

  • observations: dependent variables specified by a vector of column names, i.e. [:PainScore].
  • covariates: covariates specified by a vector of column names, i.e. [:Dose_5mg, :Dose_20mg, ...].
  • id: specifies the ID column of the DataFrame.
  • time: specifies the time column of the DataFrame.
  • event_data: toggles assertions applicable to event data, either true or `false.

Our pk_painrelief dataset do not have any event data, so we should set event_data=false.

Now, we can create our population with read_pumas():

pk_painrelief_pop = read_pumas(
    pk_painrelief;
    observations = [:PainScore],
    covariates = [:isRx],
    id = :Subject,
    time = :Time,
    event_data = false,
)
Population
  Subjects: 160
  Covariates: isRx
  Observations: PainScore

7.1.3 Fit your model

Now we are ready to fit our model! We already have a model specified, ordinal_model, along with a Population: nausea_pop. We can proceed with model fitting.

Model fiting in Pumas has the purpose of estimate parameters and is done by calling the fit() function with the following positional arguments:

  1. Pumas model.
  2. a Population.
  3. a named tuple of the initial parameter’s values.
  4. an inference algorithm.
7.1.3.1 Initial Parameter’s Values

We already covered model and Population, now let’s talk about initial parameter’s values. It is the 3rd positional argument inside fit().

You can specify you initial parameters as a named tuple. For instance, if you want to have a certain parameter, βRx, as having an initial value as 3.14, you can do so by passing it inside a named tuple in the 3rd positional argument of fit():

fit(model, population, (; βRx = 3.14))

You can also use the helper function init_params() which will recover all the initial parameters we specified inside the model’s @param block.

7.1.3.2 Inference Algorithms

Finally, our last (4th) positional argument is the choice of inference algorithm.

Pumas has the following available inference algorithms:

  • Marginal Likelihood Estimation:

    • NaivePooled(): first order approximation without random-effects.
    • FO(): first-order approximation.
    • FOCE(): first-order conditional estimation with automatic interaction detection.
    • LaplaceI(): second-order Laplace approximation.
  • Bayesian with Markov Chain Monte Carlo (MCMC):

    • BayesMCMC(): MCMC using No-U-Turn Sampler (NUTS).
Note

We can also use a Maximum A Posteriori (MAP) estimation procedure for any marginal likelihood estimation algorithm. You just need to call the MAP() constructor with the desired marginal likelihood algorithm inside, for instance:

fit(model, population, init_params(model), MAP(FOCE()))

Ok, we are ready to fit our model. Let’s use the NaivePooled() since we are not using random-effects in this lesson:

pk_painrelief_fit =
    fit(ordinal_model, pk_painrelief_pop, init_params(ordinal_model), Pumas.NaivePooled())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     1.230681e+04     7.478863e+05
 * time: 0.030219078063964844
     1     2.774236e+03     5.272884e+02
 * time: 0.6300721168518066
     2     2.773594e+03     5.263792e+02
 * time: 0.6343851089477539
     3     2.490586e+03     2.354869e+02
 * time: 0.6392760276794434
     4     2.427836e+03     1.586869e+02
 * time: 0.6441562175750732
     5     2.407966e+03     1.142587e+02
 * time: 0.6488771438598633
     6     2.406268e+03     1.174064e+02
 * time: 0.6535840034484863
     7     2.406173e+03     1.177969e+02
 * time: 0.6580641269683838
     8     2.406095e+03     1.177489e+02
 * time: 0.6624281406402588
     9     2.405790e+03     1.167671e+02
 * time: 0.7057530879974365
    10     2.405120e+03     1.134714e+02
 * time: 0.7098541259765625
    11     2.403414e+03     1.034002e+02
 * time: 0.7139830589294434
    12     2.399807e+03     8.350230e+01
 * time: 0.7184200286865234
    13     2.393612e+03     8.371963e+01
 * time: 0.7226791381835938
    14     2.387107e+03     6.956438e+01
 * time: 0.7265701293945312
    15     2.384192e+03     5.506164e+01
 * time: 0.7305381298065186
    16     2.383684e+03     5.470704e+01
 * time: 0.7348771095275879
    17     2.383628e+03     4.934203e+01
 * time: 0.7390711307525635
    18     2.383620e+03     4.820876e+01
 * time: 0.7440061569213867
    19     2.383580e+03     4.460237e+01
 * time: 0.7479650974273682
    20     2.383495e+03     3.989466e+01
 * time: 0.7524080276489258
    21     2.383255e+03     3.945416e+01
 * time: 0.7568280696868896
    22     2.382653e+03     4.049661e+01
 * time: 0.7610960006713867
    23     2.381116e+03     4.113760e+01
 * time: 0.7661290168762207
    24     2.377501e+03     6.052250e+01
 * time: 0.8039321899414062
    25     2.370218e+03     9.150939e+01
 * time: 0.8081130981445312
    26     2.360024e+03     1.181281e+02
 * time: 0.8125371932983398
    27     2.352292e+03     1.177431e+02
 * time: 0.8165960311889648
    28     2.349688e+03     9.926825e+01
 * time: 0.820594072341919
    29     2.349433e+03     8.973818e+01
 * time: 0.8245620727539062
    30     2.349410e+03     8.801720e+01
 * time: 0.828747034072876
    31     2.349370e+03     8.603872e+01
 * time: 0.8328931331634521
    32     2.349276e+03     8.310529e+01
 * time: 0.8370840549468994
    33     2.349025e+03     7.793090e+01
 * time: 0.8412411212921143
    34     2.348392e+03     6.897156e+01
 * time: 0.8453681468963623
    35     2.346829e+03     5.322713e+01
 * time: 0.8492710590362549
    36     2.343358e+03     5.614960e+01
 * time: 0.8532280921936035
    37     2.337197e+03     5.563204e+01
 * time: 0.8576991558074951
    38     2.330256e+03     3.881416e+01
 * time: 0.8853480815887451
    39     2.326721e+03     1.838918e+01
 * time: 0.8896830081939697
    40     2.326173e+03     4.369402e+00
 * time: 0.8940510749816895
    41     2.326149e+03     3.957062e-01
 * time: 0.8984510898590088
    42     2.326148e+03     3.741394e-02
 * time: 0.902705192565918
    43     2.326148e+03     1.798433e-03
 * time: 0.906663179397583
    44     2.326148e+03     5.160531e-05
 * time: 0.9107761383056641
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:              NaivePooled
Likelihood Optimizer:                         BFGS
Dynamical system type:          No dynamical model

Log-likelihood value:                    -2326.148
Number of subjects:                            160
Number of parameters:         Fixed      Optimized
                                  0              4
Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

-----------------
        Estimate
-----------------
α₁      -3.0216
α₂      -0.89655
α₃       1.0973
βRx      1.2369
-----------------

7.1.4 Perform inference on the model’s population-level parameters

Once the model is fitted, we can analyze our inference and estimates. But first, let’s discuss what the is the meaning of the estimated coefficients in a ordinal regression model.

7.1.4.1 Interpreting the Coefficients

We can see that after the model is fitted, it prints a result with a summary and a table of the parameter estimates.

We can also recover the estimates as a named tuple with coef():

coef(pk_painrelief_fit)
(α₁ = -3.021616028925512,
 α₂ = -0.8965453588452162,
 α₃ = 1.0972549105426705,
 βRx = 1.2368716303859093,)

Or as a DataFrame with coeftable():

coeftable(pk_painrelief_fit)
4×2 DataFrame
Row parameter estimate
String Float64
1 α₁ -3.02162
2 α₂ -0.896545
3 α₃ 1.09725
4 βRx 1.23687

Remember that the ordinal regression’s coefficients are the log-cumulative-odds of the cut points, i.e. the natural logarithm of the odds of \(y\) taking values less or equal to \(k\).

Log-cumulative-odds provide the facility that anything that is negative pulls down the odds (and, consequently, the probability) of \(y\) taking higher values. Anything positive pulls it up; and zero is a neutral effect. But since it is in a log-scale it is not easily interpretable. This is why, in order for interpret it, often we undo the log transformation by exponentiating the log-cumulative-odds, i.e. exponentiating the coefficients.

We can do that easily with the DataFrame returned by coeftable():

@rtransform coeftable(pk_painrelief_fit) :estimate_exp = exp(:estimate)
4×3 DataFrame
Row parameter estimate estimate_exp
String Float64 Float64
1 α₁ -3.02162 0.0487224
2 α₂ -0.896545 0.407977
3 α₃ 1.09725 2.99593
4 βRx 1.23687 3.44482

Now let us interpret the :estimate_exp column.

Our basal rates αₖs are log-cumulative-odds of the \(k\) cut points, i.e. the odds of \(y\) taking value less or equal than \(k\). For example α₁ has odds approximate to 0.05. That means that the odds of \(y \leq 1 = 0.05\).

Note

Ordinal regression, like linear and logistic regression, has the assumption that the effects are a linear combination. That means that a prediction can be made by having the basal rates αₖs and keep adding covariates values multiplied their β coefficients. This assumption also holds for parameter/model inference. Hence, all of our coefficients represent compounding effects into our dependent variable \(y\).

What we accomplish with the basal rates αₖs is just that we can violate the equidistant assumption.

As an example for the interpretation of coefficients, let us interpret the βRx: the covariate isRx’s treatment (non-placebo) coefficient. It’s value is approximate 3.44, which means that for treatment we can expect an increase in 2.44x in the odds of having higher values of perceived pain relief against placebo.

Tip

If you ever heard of odds ratio or relative risk, this is exactly what we are measuring here with our exponentiated log-cumulative-odds. It measures the odds ratio between having a dependent variable higher values over having lower values.

We can also inspect our estimated coefficients’ standard errors (SE) along with the 95% confidence intervals with the infer() function:

infer(pk_painrelief_fit)
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator

Successful minimization:                      true

Likelihood approximation:              NaivePooled
Likelihood Optimizer:                         BFGS
Dynamical system type:          No dynamical model

Log-likelihood value:                    -2326.148
Number of subjects:                            160
Number of parameters:         Fixed      Optimized
                                  0              4
Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

----------------------------------------------------------
       Estimate         SE                  95.0% C.I.
----------------------------------------------------------
α₁     -3.0216        0.20597        [-3.4253 ; -2.6179 ]
α₂     -0.89655       0.19315        [-1.2751 ; -0.51798]
α₃      1.0973        0.19705        [ 0.71104;  1.4835 ]
βRx     1.2369        0.21954        [ 0.80659;  1.6672 ]
----------------------------------------------------------

Also if you prefer other confidence interval band, you can choose with the keyword argument level inside infer().

For instance, one common band for the confidence intervals are 89%:

infer(pk_painrelief_fit; level = 0.89)
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator

Successful minimization:                      true

Likelihood approximation:              NaivePooled
Likelihood Optimizer:                         BFGS
Dynamical system type:          No dynamical model

Log-likelihood value:                    -2326.148
Number of subjects:                            160
Number of parameters:         Fixed      Optimized
                                  0              4
Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

----------------------------------------------------------
       Estimate         SE                  89.0% C.I.
----------------------------------------------------------
α₁     -3.0216        0.20597        [-3.3508 ; -2.6924 ]
α₂     -0.89655       0.19315        [-1.2052 ; -0.58785]
α₃      1.0973        0.19705        [ 0.78233;  1.4122 ]
βRx     1.2369        0.21954        [ 0.88601;  1.5877 ]
----------------------------------------------------------

7.1.5 Predict from a fitted model or Simulate random observations from a fitted model

Now that we understand our model estimates, we can use it to make predictions or simulate random observations.

7.1.5.1 Predictions with predict()

In order to calculate prediction from a fitted model, we need to use the predict() function and passing two positional arguments:

  1. a fitted Pumas model.
  2. either a single Subject or a collection of them with a Population.

For example to make predictions onto a single Subject, we can:

my_subject = Subject(id = 42, covariates = (; PainScore = 2, isRx = 0))
Subject
  ID: 42
  Covariates: PainScore, isRx
Caution

predict() also needs a keyword argument obstimes which represents the observation times that we want to predict the dependent variable. The value must be a vector. Since we used a dummy 0.0 value as :time variable in our read_pumas function, we can pass the same 0.0 but as a vector to the predict() function.

predict(pk_painrelief_fit, my_subject; obstimes = [0.0])
SubjectPrediction
  ID: 42
  Predictions: PainScore: (n=1)
  Covariates: PainScore, isRx
  Time: [0.0]

We can also generate prediction for a Population:

predict(pk_painrelief_fit, pk_painrelief_pop; obstimes = [0.0])
Prediction
  Subjects: 160
  Predictions: PainScore
  Covariates: isRx
Tip

We can also use the function probstable() for discrete models. It provides an easy and elegant way of outputting outcome probabilities of all discrete dependent variables in a handy DataFrame format.

See below, that we have the :PainScore_iprob1 up to :PainScore_iprob4 columns. These columns take the following formatting: :DEPVAR_iprobVALUE. Since PainScore can take four discrete outcomes, i.e. it can take values from 1 to 4, :PainScore_iprob1 maps to PainScore taking value 1 and so on.

probs_table = probstable(pk_painrelief_fit)
first(probs_table, 5)
5×6 DataFrame
Row id time PainScore_iprob1 PainScore_iprob2 PainScore_iprob3 PainScore_iprob4
String Float64 Float64 Float64 Float64 Float64
1 1 0.0 0.143718 0.440551 0.327394 0.0883358
2 1 0.5 0.143718 0.440551 0.327394 0.0883358
3 1 1.0 0.143718 0.440551 0.327394 0.0883358
4 1 1.5 0.143718 0.440551 0.327394 0.0883358
5 1 2.0 0.143718 0.440551 0.327394 0.0883358

7.1.6 Simulations with simobs()

We can generate random observations from our fitted Pumas model with the simobs() function. You need to supply at least three positional arguments:

  1. a non-fitted Pumas model, in our case ordinal_model.
  2. either a single Subject or a collection of them with a Population.
  3. a named tuple of parameter values.

The 1st and 2nd positional arguments we already covered in our previous examples. Let’s focus on the 3rd positional argument: param. It can be a named tuple with the parameter values. For instance, we can use our estimated parameters from the fitted model with coef(nausea_fit):

coef(pk_painrelief_fit)
(α₁ = -3.021616028925512,
 α₂ = -0.8965453588452162,
 α₃ = 1.0972549105426705,
 βRx = 1.2368716303859093,)
simobs(ordinal_model, pk_painrelief_pop, coef(pk_painrelief_fit));
Note

You can also pass a single parameter as a Pumas fitted model to simobs().

pk_painrelief_sim = simobs(pk_painrelief_fit);

7.1.7 Model diagnostics

Finally, our last step is to assess model diagnostics.

7.1.7.1 Assessing model diagnostics

To assess model diagnostics we can use the inspect() function in our fitted Pumas model:

inspect(pk_painrelief_fit)
[ Info: Calculating predictions.
[ Info: Calculating empirical bayes.
[ Info: Evaluating individual parameters.
[ Info: Evaluating dose control parameters.
[ Info: Done.
FittedPumasModelInspection

Fitting was successful: true
7.1.7.1.1 AIC

We can also check for Akaike Information Criterion (AIC):

aic(pk_painrelief_fit)
4660.296063423193
7.1.7.1.2 BIC

Besides AIC, there is also the Bayesian Information Criterion (BIC):

bic(pk_painrelief_fit)
4682.536385283281

7.2 Visual predictive checks

To conclude, we can inspect visual predictive checks with the vpc_plot() function. But first, we need to generate a VPC object with the vpc() function.

By default, vpc will use :time as the independent variable for VPC. We need to change that, because or :time variable is just a vector of 0s in the ordinal regression.

Let’s use :isRx:

pk_painrelief_vpc = vpc(pk_painrelief_fit; covariates = [:isRx]);

Now, we need to use the vpc_plot function into our newly created VPC object. For this, we need to import the package PumasUtilities:

using PumasUtilities
plots = vpc_plot(pk_painrelief_vpc; paginate = true, separate = true);
plots[1]

plots[4]

plots[2]

plots[3]

8 Conclusion

I hope you enjoyed this tutorial on ordinal regression. Whenever your dependent variable is ordinal, there’s a great chance that it violates the equidistant assumption. This is where ordinal regression comes to rescue. Keep it close by in your arsenal of discrete statistical modeling techniques.