Ordinal Regression

Author

Jose Storopoli

Ordinal regression is a regression model for discrete data, and more specifically, when the dependent variable values have a “natural order”.

For example, opinion pools with their ubiquously disagree-agree range of plausible values, or a patient pain perception.

Note

There are several resources available for a deep dive into ordinal regression. Specifically, we recommend:

Richard McElreath - Statistical Rethinking, 2nd Edition, Section 12.3.
Agresti, A. (2014). Categorical Data Analysis, Section 8.2.
Bürkner, P.-C., & Vuorre, M. (2019). Ordinal Regression Models in Psychology: A Tutorial. Advances in Methods and Practices in Psychological Science, 77–101. 10.1177/2515245918823199.

1 Why not just use Linear Regression?

The main reason for not using plain linear regression with ordinal outcomes (dependent variable) is that the response categories of an ordinal variable may not be equidistant. This is an assumption in linear regression (and in almost all models that uses metric dependent variables): the distance between, for example, 2 and 3 is not the same distance between 1 and 2. This assumption can be violated in an ordinal regression.

2 How to deal with Ordered Discrete Dependent Variable?

So, how we deal with ordered discrete responses in our dependent variable? This is similar with the previous logistic regression approach. We have to do a non-linear transformation of the dependent variable.

2.1 Cumulative Distribution Function (CDF)

In the case of ordinal regression, we need to first transform the dependent variable into a cumulative scale. We need to calculate the cumulative distribution function (CDF) of our dependent variable:

\[P(Y \leq y) = \sum^y_{i=y_{\text{min}}} P(Y = i)\]

The CDF is a monotonic increasing function that depicts the probability of our random variable $Y$ taking values less than a certain value. In our case, the discrete ordinal case, these values can be represented as positive integers ranging from 1 to the length of possible values. For instance, a 6-categorical ordered response variable will have 6 possible values, and all their CDFs will lie between 0 and 1. Furthermore, their sum will be 1; since it represents the total probability of the variable taking any of the possible values, i.e. 100%.

2.2 Log-cumulative-odds

That is still not enough, we need to apply the logit function to the CDF:

\[\mathrm{logit}(x) = \mathrm{logistic}^{-1}(x) = \ln\left(\frac{x}{1 -x}\right)\]

where $\ln$ is the natural logarithm function.

The logit is the inverse of the logistic transformation, it takes as a input any number between 0 and 1 (where a probability is the perfect candidate) and outputs a real number, which we call the log-odds.

Tip

Odds provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of events that produce that outcome to the number that do not. To convert probability $p$ to $\mathrm{odds}$ you can use the formula:

\[\mathrm{odds}(p) = \frac{p}{1-p}\]

The higher the odds, the higher the probability.

Since we are taking the log-odds of the CDF, we can call this complete transformation as log-odds of the CDF, or log-cumulative-odds.

2.3 $K-1$ Intercepts

Now, the next question is: what do we do with the log-cumulative-odds?

We need the log-cumulative-odds because it allows us to construct different intercepts for the possible values our ordinal dependent variable. We create an unique intercept for each possible outcome $k \in K$.

Note

Notice that the highest probable value of $Y$ will always have a log-cumulative-odds of $\infty$, since for $p=1$:

\[\ln \frac{p}{1-p} = \ln \frac{1}{1-1} = \ln 0 = \infty\]

Thus, we only need $K-1$ intercepts for a $K$ possible dependent variables’ response values.

These are known as cut points.

Each intercept implies a CDF for each value $K$. This allows us to violate the equidistant assumption absent in most ordinal variables.

Each intercept implies a log-cumulative-odds for each $k \in K$. We also need to undo the cumulative nature of the $K-1$ intercepts. We can accomplish this by first converting a log-cumulative-odds back to a cumulative probability. This is done by undoing the logit transformation and applying the logistic function:

\[\mathrm{logit}^{-1}(x) = \mathrm{logistic}(x) = \frac{1}{1 + e^{-x}}\]

Then, finally, we can remove the cumulative from “cumulative probability” by subtraction of each of the $k$ cut points by their previous $k-1$ cut point:

\[P(Y=k) = P(Y \leq k) - P(Y \leq k-1)\]

where $Y$ is the dependent variable and $k \in K$ are the cut points for each intercept.

2.3.1 Example with Synthetic Data

Let us show an example with some synthetic data.

First, we’ll load some packages:

using Pumas
using DataFrames
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics

Here we have a discrete variable x with 6 possible ordered values as response. The values range from 1 to 6 having probability, respectively: 10%, 15%, 33%, 25%, 10%, and 7%; represented with the probs vector. The underlying distribution is represented by a Categorical distribution from Distributions.jl, which takes a vector of probabilities as parameters (probs).

For each value we are calculating:

Probability Mass Function with the pdf function
Cumulative Distribution Function with the cdf function
Log-cumulative-odds with the logit transformation of the CDF

In the plot below there are 3 subplots:

Upper corner: histogram of x
Left lower corner: CDF of x
Right lower corner: log-cumulative-odds of x

probs = [0.10, 0.15, 0.33, 0.25, 0.10, 0.07]
dist = Categorical(probs)
x = 1:length(probs)
x_pmf = pdf.(dist, x)
x_cdf = cdf.(dist, x)
x_logodds_cdf = logit.(x_cdf)
df = DataFrame(; x, x_pmf, x_cdf, x_logodds_cdf)
labels = ["CDF", "Log-cumulative-odds"]
fig = Figure()
plt1 = data(df) * mapping(:x, :x_pmf) * visual(BarPlot)
plt2 =
    data(df) *
    mapping(:x, [:x_cdf, :x_logodds_cdf]; col = dims(1) => renamer(labels)) *
    visual(ScatterLines)
axis = (; xticks = 1:6)
draw!(fig[1, 2:3], plt1; axis)
draw!(fig[2, 1:4], plt2; axis, facet = (; linkyaxes = :none))
fig

Tip

We are using the convenient logit function from StatsFuns.jl package. It is safer and more efficient than a simple user-defined function.

As we can see, we have $K-1$ (in our case $6-1=5$) intercept values in log-cumulative-odds. You can carly see that these values they violate the equidistant assumption for metric response values. The spacing between the cut points are not the same, they vary.

Note

We generally assign $\alpha$ to intercept in statistical models. Since we have $K-1$ intercepts, we need a vector of intercepts. Thus, our single parameter $\alpha$ turns into a vector parameter with a bold $\boldsymbol{\alpha}$ to denote it is a vector.

3 Adding Coefficients $\boldsymbol{\beta}$

Ok, the $K-1$ intercepts $\boldsymbol{\alpha}$ are done. Now let’s add coefficients to act as covariate effects in our ordinal regression model.

We transformed everything into log-odds scale so that we can add effects (coefficients multiplying a covariate) or basal rates (intercepts) together. For each $k \in K-1$, we calculate:

\[\phi_k = \alpha_k + \beta_i x_i\]

where $\alpha_k$ is the log-cumulative-odds for the $k \in K-1$ intercepts, $\beta_i$ is the coefficient for the $i$th covariate $x$. Finally, $\phi_k$ represents the linear predictor for the $k$th intercept.

Note

Observe that the coefficient $\beta$ is being added to a log-cumulative-odds, such that, it will be expressed in a log-cumulative-odds also.

We can express it in matrix form:

\[\boldsymbol{\phi} = \boldsymbol{\alpha} + \mathbf{X} \cdot \boldsymbol{\beta}\]

where $\boldsymbol{\phi}$, $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ are vectors and $\mathbf{X}$ is the data matrix where each row is an observation and each column a covariate.

This still obeys the ordered constraint on the dependent variable possible values.

3.0.1 How to Interpret Coefficient $\boldsymbol{\beta}$?

Now, suppose we have found our ideal value for our $\boldsymbol{\beta}$. How we would interpret our $\boldsymbol{\beta}$ estimated values?

First, to recap, $\boldsymbol{\beta}$ measures the strength of association between the covariate $\mathbf{x}$ and dependent variable $\mathbf{y}$. But, $\boldsymbol{\beta}$ is nested inside a non-linear transformation called logistic function:

\[\mathrm{logistic}(\boldsymbol{\beta}) = \frac{1}{1 + e^{-\boldsymbol{\beta}}}\]

So, our first step is to undo the logistic function. There is a function that is called logit function that is the inverse of the logistic function:

\[\mathrm{logistic}^{-1}(x) = \mathrm{logit}(x) = \ln\left(\frac{x}{1 -x}\right)\]

where $\ln$ is the natural logarithm function.

If we analyze closely the logit function we can find that inside the $\ln$ there is a disguised odds in the $\frac{x}{1 -x}$. Hence, our $\boldsymbol{\beta}$ can be interpreted as the logarithm of the odds, or in short form: the log-odds.

Tip

\[\mathrm{odds}(p) = \frac{p}{1-p}\]

The higher the odds, the higher the probability.

Since odds take values between $(0, \infty)$, and 1 is interpreted as neutral odds (like probability 0.5); the log-odds have a nice intuition that any negative value, that is the $\ln$ of something between $(0, 1)$, enhances the probability of $y$ taking lower values, while any positive value, that is the $\ln$ of something between $(1, \infty)$, enhances the probability of $y$ taking higher values. Log-odds equal to 0 is exactly $\ln(1)$, i.e. odds as 1:1; which means neutral effect on probability of $y$ taking lower or higher values.

The log-odds are the key to interpret coefficient $\boldsymbol{\beta}$. Any positive value of $\beta$ means that there exists a positive association between $x$ and $y$, while any negative values indicate a negative association between $x$ and $y$. Values close to 0 demonstrates the lack of association between $x$ and $y$.

4 Likelihood

We have almost everything we need for our ordinal regression. We are only missing a final touch. Currently our logistic function outputs a vector of probabilities that sums to 1.

All of the intercepts $\alpha_k$ along with the coefficients $\beta_i$ are in log-cumulative-odds scale. If we apply the logistic function to the linear predictors $\phi_k$ we get $K-1$ probabilities: one for each $\phi_k$

We need a likelihood that can handle a vector of probabilities and outputs a single discrete value. The categorical distribution is the perfect candidate.

Tip

Categorical distribution is also called generalized Bernoulli distribution or multinoulli distribution.

A categorical distribution is a discrete probability distribution that describes the possible results of a random variable that can take on one of $K$ possible categories, with the probability of each category separately specified. It is **parameterized by a vector of probabilities $\mathbf{p}$ where each $p_1, \dots, p_k$ is a valid probability and sums to 1:

\[p_k \geq 0, \sum^K_{k=1} p_k = 1\]

In Pumas (and, in general, in Julia) you can specify a categorical distribution using the Categorical distribution type. By default, it takes a sequence of arguments as probabilities with the constraint that they must sum up to 1. Very similar to the “math” version.

For example here it is a categorical distribution with 4 possible discrete values and for each value probability 0.2, 0.3, 0.1, 0.4; respectively:

p = [0.2, 0.3, 0.1, 0.4]
x = 1:length(p)
d = Categorical(p)
pdf_values = map(x -> pdf(d, x), x)
data((; x, pdf_values)) *
mapping(:x => "outcome", :pdf_values => "PMF") *
visual(BarPlot) |> draw

5 Full Statistical Model

So the full statistical model with all the transformations and parametrizations we exposed so far becomes:

\[\begin{aligned} \mathbf{y} &\sim \mathrm{Categorical}(\mathbf{p}) \\ \mathbf{p} &= \mathrm{logistic}(\boldsymbol{\phi}) \\ \boldsymbol{\phi} &= \boldsymbol{\alpha} + \mathbf{X} \cdot \boldsymbol{\beta} \\ \alpha_1 &= \mathrm{CDF}(y_1) \\ \alpha_k &= \mathrm{CDF}(y_k) - \mathrm{CDF}(y_{k-1}) \quad \mathrm{for} \quad 1 < k < K-1 \\ \alpha_{K-1} &= 1 - \mathrm{CDF}(y_{K-1}) \end{aligned}\]

where:

$\mathbf{y}$: ordered discrete dependent variable
$\mathbf{p}$: probability vector of length $K$
$K$: number of possible values $\mathbf{y}$ can take, i.e. number of ordered discrete values
$\boldsymbol{\phi}$: log-cumulative-odds, i.e. cut points considering the intercepts and covariates effect
$\alpha_k$: intercept in log-cumulative-odds for each $k \in K-1$
$\mathbf{X}$: covariate data matrix
$\boldsymbol{\beta}$: coefficient vector of the same length as the number of columns in $\mathbf{X}$
$\mathrm{logistic}$: logistic function
$\mathrm{CDF}$: cumulative distribution function

Note

Please note that the first cut off is only $\alpha_1$ and the last cut point is $1 - \alpha_{\mathrm{K-1}}$.

6 `pk_painrelief` Dataset

Let’s use the pk_painrelief dataset from PharmaDatasets.jl to run our categorical regression example in this lesson:

using PharmaDatasets

pk_painrelief = dataset("pk_painrelief")
first(pk_painrelief, 5)

5×7 DataFrame

Row	Subject	Time	Conc	PainRelief	PainScore	RemedStatus	Dose
	Int64	Float64	Float64	Int64	Int64	Int64	String7
1	1	0.0	0.0	0	3	1	20 mg
2	1	0.5	1.15578	1	1	0	20 mg
3	1	1.0	1.37211	1	0	0	20 mg
4	1	1.5	1.30058	1	0	0	20 mg
5	1	2.0	1.19195	1	1	0	20 mg

We can see that we have 1920 observations for 160 different subjects and also some key variables:

:Subject: discrete variable indicating subject ID.
:Time: continuous variable indicating time in hours.
:Conc: concentration (mg/L).
:PainRelief: discrete binary variable indicating if the subject experienced pain relief.
:PainScore (dependent variable): discrete ordinal ranging from 0 to 3.
:RemedStatus: discrete binary variable indicating if the subject has requested remedication.
:Dose: dosage of the pain medication. Either placebo, 5mg, 20mg or 80mg.

6.1 Exploratory Data Analysis

As we can see, the column :PainScore, our dependent variable, takes values from 0 to 3. However, since the categorical distribution expects discrete values that ranges from 1 to $\infty$, we need add 1 to every value of :PainScore. This shifts its values, changing the range of possible values to become from 1 to 4:

pk_painrelief.PainScore |> unique |> sort

4-element Vector{Int64}:
 0
 1
 2
 3

@rtransform! pk_painrelief :PainScore = :PainScore + 1
pk_painrelief.PainScore |> unique |> sort

4-element Vector{Int64}:
 1
 2
 3
 4

Let’s explore our data with some summaries. First, we would like to know the mean and median of :PainRelief and :PainScore stratified by :Dose.

We can accomplish this with a groupby() inside @chain macro:

Note

Notice that we are using [:col1, :col2, ...] .=> [fun1 fun2 ...] inside a escaped $(...) call to the @combine macro. This will create all the possible combinations of cols and funs as summary functions to apply to our groups.

@chain pk_painrelief begin
    groupby(:Dose)
    @combine $([:PainRelief, :PainScore] .=> [mean median std])
    @orderby :PainRelief_mean
end

4×7 DataFrame

Row	Dose	PainRelief_mean	PainScore_mean	PainRelief_median	PainScore_median	PainRelief_std	PainScore_std
	String7	Float64	Float64	Float64	Float64	Float64	Float64
1	Placebo	0.283333	2.92917	0.0	3.0	0.451087	0.772156
2	5 mg	0.445833	2.58958	0.0	3.0	0.497576	0.854984
3	20 mg	0.595833	2.32917	1.0	2.0	0.491242	0.804465
4	80 mg	0.664583	2.16042	1.0	2.0	0.472628	0.823263

As we can see there is a significant correlation between :Dose, :PainRelief and :PainScore.

6.2 Visualizations

Let’s do some plotting. First, we’ll explore our dependent variable: :PainScore:

data(pk_painrelief) *
mapping(:Dose => sorter(["Placebo", "5 mg", "20 mg", "80 mg"]), :PainScore) *
AlgebraOfGraphics.expectation() |> draw

As the dose get’s higher, the perceived pain score gets lower.

Let’s talke a look at the relationship between concentration and pain relief stratified by dosage.

We need a dataset without the placebo:

pk_painrelief_drop_placebo = @rsubset pk_painrelief :Dose != "Placebo";

data(pk_painrelief_drop_placebo) *
mapping(:Conc, :PainScore; layout = :Dose => sorter(["5 mg", "20 mg", "80 mg"])) *
visual(; linewidth = 3) *
AlgebraOfGraphics.linear() |> (x -> draw(x; facet = (; linkxaxes = :none)))

We can clearily see a negative relationship between concentration and perceived pain score.

7 Pumas modeling

Finally, we’ll create a Pumas model for ordinal regression.

7.1 Pumas’ Workflow

Define a model. It can be a PumasModel or a PumasEMModel.
Define a Subject and Population.
Fit your model.
Perform inference on the model’s population-level parameters (also known as fixed effects).
Predict from a fitted model using the empirical Bayes estimate or Simulate random observations from a fitted model using the sampling distribution (also known as likelihood).
Model diagnostics and visual predictive checks.

Note

Pumas modeling is a highly iterative process. Eventually you’ll probably go back and forth in the stages of workflow. This is natural and expected in Pumas’ modeling. Just remember to have a good code organization and version control for all of the workflow iterations.

But first, notice that :Dose is a string with either 5mg, 20mg, 80mg, or placebo. We need to convert the :Dose column to a dummy variable (placebo versus treatment).

pk_painrelief.Dose |> unique

4-element Vector{InlineStrings.String7}:
 "20 mg"
 "80 mg"
 "Placebo"
 "5 mg"

@rtransform! pk_painrelief :isRx = Int(:Dose != "Placebo");

7.1.1 Defining a model in Pumas

To define a model in Pumas, you can use either the macros or the function interfaces. We will cover the macros interface in this lesson.

To define a model using the macro interface, you start with a begin .. end block of code with @model:

my_model = @model begin end

PumasModel
  Parameters:
  Random effects:
  Covariates:
  Dynamical system variables:
  Dynamical system type: No dynamical model
  Derived:
  Observed:

This creates an empty model. Now we need to populate it with model components. These are additional macros that we can include inside the @model definition. We won’t be covering all the possible macros that we can use in this lesson, but here is the full list:

@param, fixed effects specifications.
@random, random effects specifications.
@covariates, covariate names.
@pre, pre-processing variables for the dynamic system and statistical specification.
@dosecontrol, specification of any dose control parameters present in the model.
@vars, shorthand notation.
@init, initial conditions for the dynamic system.
@dynamics, dynamics of the model.
@derived, statistical modeling of dependent variables.
@observed, model information to be stored in the model solution.

7.1.1.1 Model parameter with `@param`

Ok, let’s start with our ordinal_model using the macro @model interface.

First, we’ll define our parameters with the @parameters macro:

@param begin
    ParameterName ∈ Domain(...)
    ...
end

Tip

To use the “in” (∈) operator in Julia, you can either replace ∈ for in or type \in <TAB> for the Unicode character.

By using the begin ... end block we can specify one parameter per line. We will just name the parameters the same name as it’s respective covariate, while also adding the β prefix to make clear that this is a coefficient.

Regarding the Domain(...), Pumas has several types of domains for you to specify in your @param block. Here is a list:

RealDomain for scalar parameters
VectorDomain for vectors
PDiagDomain for positive definite matrices with diagonal structure
PSDDomain for general positive semi-definite matrices

We will only use scalar parameters in our logistic regression model, so we just need the RealDomain.

Tip

If you don’t specify any arguments inside the domain constructor it will either error (for some domains that have required arguments) or will use the defaults. In the case of the RealDomain() without arguments it just uses the following arguments:

RealDomain(; lower = -∞, upper = ∞, init = 0)

@param begin
    # intercepts
    α₁ ∈ RealDomain(; init = 0.001)
    α₂ ∈ RealDomain(; init = 0.002)
    α₃ ∈ RealDomain(; init = 0.003)
    # coefficients
    βRx ∈ RealDomain()
end

7.1.1.1.1 Ordered Intercepts `αₖ`s

As you can see we opted for initial values very close to zero, but also they have another property: the initial values are ordered.

This makes it easier for the model to converge, and in our case not having these initial parameter values would not make the model converge at all.

7.1.1.2 Model covariates with @covariates

We also have to specify our model covariates using the macro @covariates. This can be done using a one-liner syntax, without begin ... end block:

@covariates Dose

Or using the begin ... end block and specifying one covariate per line. This is what we will do, since our model has several covariates, which makes easy to visualize them and also to comment out a covariate to perform iterative model exploration:

@covariates begin
    Dose
end

7.1.1.3 Pre-processing variables with `@pre`

We can specify all the necessary variable and statistical pre-processing with the @pre macro.

Note

The @pre block is traditionally used to specify the inputs of the Ordinary D ifferential Equations (ODE) system used for non-linear mixed-effects PK/PD models (NLME).

But we will use @pre to specify our variable transformations instead.

Here, we are defining all the coefficients-covariates multiplication operation here. We are also defining our linear predictors ϕₖs that should be added to the intercept and passed to the non-linear logistic function transformation to become cut points cutₖ.

Finally, we need our logistic function. It could be tricky to use our own user-defined logistic function because it would not be numerical stable. For this, we can use the logistic function from StatsFuns.jl package, which Pumas reexports to us.

Tip

To use the logistic function from StatsFuns.jl, you just need to load Pumas:

using Pumas
# now we should have logistic available
logistic(x)

So the @pre block becomes:

@pre begin
    # linear predictors in log-cumulative-odds
    isRx = (Dose != "Placebo") * βRx

    ϕ₁ = α₁ + isRx
    ϕ₂ = α₂ + isRx
    ϕ₃ = α₃ + isRx

    # cut points for >= 1, >=2 and >=3
    cut₁ = logistic(ϕ₁)
    cut₂ = logistic(ϕ₂)
    cut₃ = logistic(ϕ₃)

    # probabilities for PainRelief (Y) =1, =2, =3 and =4
    p₄ = 1.0 - cut₃
    p₃ = cut₃ - cut₂
    p₂ = cut₂ - cut₁
    p₁ = cut₁
end

7.1.1.4 Statistical modeling of dependent variables with `@derived`

Our final block, **@derived, is used to specify all the assumed distributions of observed variables that are derived from the blocks above. This is where we include our dependent variable: PainScore**.

Note that PainScore is being declared as following a Categorical distribution. This is why we use the tilde notation ~. It means (much like the mathematical model notation) that PainScore follows a Categorical distribution. Since PainScore is a vector of values, we need to broadcast, i.e. vectorize, the operation with the dot . operator:

PainScore ~ Categorical.(p₁, ..., pₖ)

where p₁, ..., pₖ are the probabilities of PainScore taking values 1, ..., k by subtracting the cut points. This is how we parametrize a Categorical distribution. With @. vectorization this becomes:

PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)

Note

We are using the @. macro which tells Julia to vectorize (add the “dot syntax”) to all operations and function calls to the right of it.

Tip

In this block we can use all variables defined in the previous blocks, in our example the @param, @covariates and @pre blocks.

Here is our full model, once we combined all the macro blocks we just covered:

ordinal_model = @model begin
    @param begin
        # intercepts
        α₁ ∈ RealDomain(; init = 0.001)
        α₂ ∈ RealDomain(; init = 0.002)
        α₃ ∈ RealDomain(; init = 0.003)
        # coefficients
        βRx ∈ RealDomain()
    end

    @covariates begin
        isRx
    end

    @pre begin
        # linear predictors in log-cumulative-odds
        _isRx = isRx * βRx

        ϕ₁ = α₁ + _isRx
        ϕ₂ = α₂ + _isRx
        ϕ₃ = α₃ + _isRx

        # cut points for >= 1, >=2 and >=3
        cut₁ = logistic(ϕ₁)
        cut₂ = logistic(ϕ₂)
        cut₃ = logistic(ϕ₃)

        # probabilities for PainRelief (Y) =1, =2, =3 and =4
        p₄ = 1.0 - cut₃
        p₃ = cut₃ - cut₂
        p₂ = cut₂ - cut₁
        p₁ = cut₁
    end

    @derived begin
        PainScore ~ @. Categorical(p₁, p₂, p₃, p₄)
    end
end

PumasModel
  Parameters: α₁, α₂, α₃, βRx
  Random effects:
  Covariates: isRx
  Dynamical system variables:
  Dynamical system type: No dynamical model
  Derived: PainScore
  Observed: PainScore

Caution

In this model example we have simplified some things for the sake of the tutorial. If you want a fail proof model, you’ll probably would want to add positive constraints on the pₖs to make sure that they are a valid probability (non-negative).

7.1.2 Define a `Subject` and `Population`

Once we have our model defined we have to specify a Subject and a Population.

In Pumas, subjects (i.e. observations) are represented by the Subject type and collections of subjects are represented as vectors of Subjects are defined as Population.

Subjects can be constructed with the Subject() constructor, for example:

Subject()

Subject
  ID: 1

We just constructed a Subject that has ID equal to 1 and no extra information. In order to provide extra information, you would need to pass keyword arguments to the Subject() constructor.

You can use the following keyword arguments inside Subject():

id: identifier.
observations: a named tuple of the dependent variables.
covariates: a named tuple containing the covariates, or nothing.
events: a vector of Events.
time: a vector of time stamps for the observations.

For this lesson, we will just use the covariates keyword argument.

This is an example of a Subject with ID equal to 42 and with some covariates specified as a named tuple:

Subject(id = 42, covariates = (; PainScore = 2, isRx = 0))

Subject
  ID: 42
  Covariates: PainScore, isRx

Since a Population is just a vector of Subjects, we can use a simple map() function over the IDs present at our nausea dataset:

pop = map(
    i -> Subject(
        id = i,
        covariates = (; PainScore = pk_painrelief.PainScore, isRx = pk_painrelief.isRx),
    ),
    1:nrow(pk_painrelief),
)

Population
  Subjects: 1920
  Covariates: PainScore, isRx
  Observations:

7.1.2.1 Reading `Subject`s directly from a `DataFrame`

Another option is to use the read_pumas() function to read Subjects (and Population) directly from a DataFrame.

This is more convenient than the map() with the Subject() constructor inside.

The read_pumas() function accepts as first argument a DataFrame followed by the following keyword options (which are very similar to the Subject() constructor):

observations: dependent variables specified by a vector of column names, i.e. [:PainScore].
covariates: covariates specified by a vector of column names, i.e. [:Dose_5mg, :Dose_20mg, ...].
id: specifies the ID column of the DataFrame.
time: specifies the time column of the DataFrame.
event_data: toggles assertions applicable to event data, either true or `false.

Our pk_painrelief dataset do not have any event data, so we should set event_data=false.

Now, we can create our population with read_pumas():

pk_painrelief_pop = read_pumas(
    pk_painrelief;
    observations = [:PainScore],
    covariates = [:isRx],
    id = :Subject,
    time = :Time,
    event_data = false,
)

Population
  Subjects: 160
  Covariates: isRx
  Observations: PainScore

7.1.3 Fit your model

Now we are ready to fit our model! We already have a model specified, ordinal_model, along with a Population: nausea_pop. We can proceed with model fitting.

Model fiting in Pumas has the purpose of estimate parameters and is done by calling the fit() function with the following positional arguments:

Pumas model.
a Population.
a named tuple of the initial parameter’s values.
an inference algorithm.

7.1.3.1 Initial Parameter’s Values

We already covered model and Population, now let’s talk about initial parameter’s values. It is the 3rd positional argument inside fit().

You can specify you initial parameters as a named tuple. For instance, if you want to have a certain parameter, βRx, as having an initial value as 3.14, you can do so by passing it inside a named tuple in the 3rd positional argument of fit():

fit(model, population, (; βRx = 3.14))

You can also use the helper function init_params() which will recover all the initial parameters we specified inside the model’s @param block.

7.1.3.2 Inference Algorithms

Finally, our last (4th) positional argument is the choice of inference algorithm.

Pumas has the following available inference algorithms:

Marginal Likelihood Estimation:
- NaivePooled(): first order approximation without random-effects.
- FO(): first-order approximation.
- FOCE(): first-order conditional estimation with automatic interaction detection.
- LaplaceI(): second-order Laplace approximation.
Bayesian with Markov Chain Monte Carlo (MCMC):
- BayesMCMC(): MCMC using No-U-Turn Sampler (NUTS).

Note

We can also use a Maximum A Posteriori (MAP) estimation procedure for any marginal likelihood estimation algorithm. You just need to call the MAP() constructor with the desired marginal likelihood algorithm inside, for instance:

fit(model, population, init_params(model), MAP(FOCE()))

Ok, we are ready to fit our model. Let’s use the NaivePooled() since we are not using random-effects in this lesson:

pk_painrelief_fit =
    fit(ordinal_model, pk_painrelief_pop, init_params(ordinal_model), Pumas.NaivePooled())

[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     1.230681e+04     7.478863e+05
 * time: 0.04258298873901367
     1     2.774236e+03     5.272884e+02
 * time: 1.644953966140747
     2     2.773594e+03     5.263792e+02
 * time: 1.6495308876037598
     3     2.490586e+03     2.354869e+02
 * time: 1.6539108753204346
     4     2.427836e+03     1.586869e+02
 * time: 1.6582388877868652
     5     2.407966e+03     1.142587e+02
 * time: 1.6625559329986572
     6     2.406268e+03     1.174064e+02
 * time: 1.6669738292694092
     7     2.406173e+03     1.177969e+02
 * time: 1.6713788509368896
     8     2.406095e+03     1.177489e+02
 * time: 1.6758089065551758
     9     2.405790e+03     1.167671e+02
 * time: 1.6802430152893066
    10     2.405120e+03     1.134714e+02
 * time: 1.6847479343414307
    11     2.403414e+03     1.034002e+02
 * time: 1.688459873199463
    12     2.399807e+03     8.350230e+01
 * time: 1.6919159889221191
    13     2.393612e+03     8.371963e+01
 * time: 1.6954119205474854
    14     2.387107e+03     6.956438e+01
 * time: 1.6988279819488525
    15     2.384192e+03     5.506164e+01
 * time: 1.7026519775390625
    16     2.383684e+03     5.470704e+01
 * time: 1.7060580253601074
    17     2.383628e+03     4.934203e+01
 * time: 1.7102458477020264
    18     2.383620e+03     4.820876e+01
 * time: 1.7136590480804443
    19     2.383580e+03     4.460237e+01
 * time: 1.7170979976654053
    20     2.383495e+03     3.989466e+01
 * time: 1.7205069065093994
    21     2.383255e+03     3.945416e+01
 * time: 1.7239048480987549
    22     2.382653e+03     4.049661e+01
 * time: 1.727301836013794
    23     2.381116e+03     4.113760e+01
 * time: 1.731097936630249
    24     2.377501e+03     6.052250e+01
 * time: 1.7344889640808105
    25     2.370218e+03     9.150939e+01
 * time: 1.7396998405456543
    26     2.360024e+03     1.181281e+02
 * time: 1.7445290088653564
    27     2.352292e+03     1.177431e+02
 * time: 1.8848538398742676
    28     2.349688e+03     9.926825e+01
 * time: 1.8897879123687744
    29     2.349433e+03     8.973818e+01
 * time: 1.892976999282837
    30     2.349410e+03     8.801720e+01
 * time: 1.8961210250854492
    31     2.349370e+03     8.603872e+01
 * time: 1.8993020057678223
    32     2.349276e+03     8.310529e+01
 * time: 1.9024698734283447
    33     2.349025e+03     7.793090e+01
 * time: 1.905622959136963
    34     2.348392e+03     6.897156e+01
 * time: 1.9088549613952637
    35     2.346829e+03     5.322713e+01
 * time: 1.9120519161224365
    36     2.343358e+03     5.614960e+01
 * time: 1.9153048992156982
    37     2.337197e+03     5.563204e+01
 * time: 1.918630838394165
    38     2.330256e+03     3.881416e+01
 * time: 1.9219720363616943
    39     2.326721e+03     1.838918e+01
 * time: 1.9252240657806396
    40     2.326173e+03     4.369402e+00
 * time: 1.9280798435211182
    41     2.326149e+03     3.957062e-01
 * time: 1.9309420585632324
    42     2.326148e+03     3.741394e-02
 * time: 1.9339768886566162
    43     2.326148e+03     1.798433e-03
 * time: 1.9371168613433838
    44     2.326148e+03     5.160531e-05
 * time: 1.94028902053833

FittedPumasModel

Dynamical system type:          No dynamical model

Number of subjects:                            160

Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

Number of parameters:      Constant      Optimized
                                  0              4

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                    -2326.148

---------------
      Estimate
---------------
α₁    -3.0216
α₂    -0.89655
α₃     1.0973
βRx    1.2369
---------------

7.1.4 Perform inference on the model’s population-level parameters

Once the model is fitted, we can analyze our inference and estimates. But first, let’s discuss what the is the meaning of the estimated coefficients in a ordinal regression model.

7.1.4.1 Interpreting the Coefficients

We can see that after the model is fitted, it prints a result with a summary and a table of the parameter estimates.

We can also recover the estimates as a named tuple with coef():

coef(pk_painrelief_fit)

(α₁ = -3.021616028925513,
 α₂ = -0.8965453588452168,
 α₃ = 1.0972549105426699,
 βRx = 1.23687163038591,)

Or as a DataFrame with coeftable():

coeftable(pk_painrelief_fit)

4×3 DataFrame

Row	parameter	constant	estimate
	String	Bool	Float64
1	α₁	false	-3.02162
2	α₂	false	-0.896545
3	α₃	false	1.09725
4	βRx	false	1.23687

Remember that the ordinal regression’s coefficients are the log-cumulative-odds of the cut points, i.e. the natural logarithm of the odds of $y$ taking values less or equal to $k$.

Log-cumulative-odds provide the facility that anything that is negative pulls down the odds (and, consequently, the probability) of $y$ taking higher values. Anything positive pulls it up; and zero is a neutral effect. But since it is in a log-scale it is not easily interpretable. This is why, in order for interpret it, often we undo the log transformation by exponentiating the log-cumulative-odds, i.e. exponentiating the coefficients.

We can do that easily with the DataFrame returned by coeftable():

@rtransform coeftable(pk_painrelief_fit) :estimate_exp = exp(:estimate)

4×4 DataFrame

Row	parameter	constant	estimate	estimate_exp
	String	Bool	Float64	Float64
1	α₁	false	-3.02162	0.0487224
2	α₂	false	-0.896545	0.407977
3	α₃	false	1.09725	2.99593
4	βRx	false	1.23687	3.44482

Now let us interpret the :estimate_exp column.

Our basal rates αₖs are log-cumulative-odds of the $k$ cut points, i.e. the odds of $y$ taking value less or equal than $k$. For example α₁ has odds approximate to 0.05. That means that the odds of $y \leq 1 = 0.05$.

Note

Ordinal regression, like linear and logistic regression, has the assumption that the effects are a linear combination. That means that a prediction can be made by having the basal rates αₖs and keep adding covariates values multiplied their β coefficients. This assumption also holds for parameter/model inference. Hence, all of our coefficients represent compounding effects into our dependent variable $y$.

What we accomplish with the basal rates αₖs is just that we can violate the equidistant assumption.

As an example for the interpretation of coefficients, let us interpret the βRx: the covariate isRx’s treatment (non-placebo) coefficient. It’s value is approximate 3.44, which means that for treatment we can expect an increase in 2.44x in the odds of having higher values of perceived pain relief against placebo.

Tip

If you ever heard of odds ratio or relative risk, this is exactly what we are measuring here with our exponentiated log-cumulative-odds. It measures the odds ratio between having a dependent variable higher values over having lower values.

We can also inspect our estimated coefficients’ standard errors (SE) along with the 95% confidence intervals with the infer() function:

infer(pk_painrelief_fit)

[ Info: Calculating: variance-covariance matrix.
[ Info: Done.

Asymptotic inference results using sandwich estimator

Dynamical system type:          No dynamical model

Number of subjects:                            160

Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

Number of parameters:      Constant      Optimized
                                  0              4

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                    -2326.148

-------------------------------------------------
      Estimate   SE        95.0% C.I.
-------------------------------------------------
α₁    -3.0216    0.20597   [ -3.4253 ; -2.6179 ]
α₂    -0.89655   0.19315   [ -1.2751 ; -0.51798]
α₃     1.0973    0.19705   [  0.71104;  1.4835 ]
βRx    1.2369    0.21954   [  0.80659;  1.6672 ]
-------------------------------------------------

Also if you prefer other confidence interval band, you can choose with the keyword argument level inside infer().

For instance, one common band for the confidence intervals are 89%:

infer(pk_painrelief_fit; level = 0.89)

[ Info: Calculating: variance-covariance matrix.
[ Info: Done.

Asymptotic inference results using sandwich estimator

Dynamical system type:          No dynamical model

Number of subjects:                            160

Observation records:         Active        Missing
    PainScore:                 1920              0
    Total:                     1920              0

Number of parameters:      Constant      Optimized
                                  0              4

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                    -2326.148

-------------------------------------------------
      Estimate   SE        89.0% C.I.
-------------------------------------------------
α₁    -3.0216    0.20597   [ -3.3508 ; -2.6924 ]
α₂    -0.89655   0.19315   [ -1.2052 ; -0.58785]
α₃     1.0973    0.19705   [  0.78233;  1.4122 ]
βRx    1.2369    0.21954   [  0.88601;  1.5877 ]
-------------------------------------------------

7.1.5 Predict from a fitted model or Simulate random observations from a fitted model

Now that we understand our model estimates, we can use it to make predictions or simulate random observations.

7.1.5.1 Predictions with `predict()`

In order to calculate prediction from a fitted model, we need to use the predict() function and passing two positional arguments:

a fitted Pumas model.
either a single Subject or a collection of them with a Population.

For example to make predictions onto a single Subject, we can:

my_subject = Subject(id = 42, covariates = (; PainScore = 2, isRx = 0))

Subject
  ID: 42
  Covariates: PainScore, isRx

Caution

predict() also needs a keyword argument obstimes which represents the observation times that we want to predict the dependent variable. The value must be a vector. Since we used a dummy 0.0 value as :time variable in our read_pumas function, we can pass the same 0.0 but as a vector to the predict() function.

predict(pk_painrelief_fit, my_subject; obstimes = [0.0])

SubjectPrediction
  ID: 42
  Predictions: PainScore: (n=1)
  Covariates: PainScore, isRx
  Time: [0.0]

We can also generate prediction for a Population:

predict(pk_painrelief_fit, pk_painrelief_pop; obstimes = [0.0])

Prediction
  Subjects: 160
  Predictions: PainScore
  Covariates: isRx

Tip

We can also use the function probstable() for discrete models. It provides an easy and elegant way of outputting outcome probabilities of all discrete dependent variables in a handy DataFrame format.

See below, that we have the :PainScore_iprob1 up to :PainScore_iprob4 columns. These columns take the following formatting: :DEPVAR_iprobVALUE. Since PainScore can take four discrete outcomes, i.e. it can take values from 1 to 4, :PainScore_iprob1 maps to PainScore taking value 1 and so on.

probs_table = probstable(pk_painrelief_fit)
first(probs_table, 5)

5×6 DataFrame

Row	id	time	PainScore_iprob1	PainScore_iprob2	PainScore_iprob3	PainScore_iprob4
	String	Float64	Float64	Float64	Float64	Float64
1	1	0.0	0.143718	0.440551	0.327394	0.0883358
2	1	0.5	0.143718	0.440551	0.327394	0.0883358
3	1	1.0	0.143718	0.440551	0.327394	0.0883358
4	1	1.5	0.143718	0.440551	0.327394	0.0883358
5	1	2.0	0.143718	0.440551	0.327394	0.0883358

7.1.6 Simulations with `simobs()`

We can generate random observations from our fitted Pumas model with the simobs() function. You need to supply at least three positional arguments:

a non-fitted Pumas model, in our case ordinal_model.
either a single Subject or a collection of them with a Population.
a named tuple of parameter values.

The 1st and 2nd positional arguments we already covered in our previous examples. Let’s focus on the 3rd positional argument: param. It can be a named tuple with the parameter values. For instance, we can use our estimated parameters from the fitted model with coef(nausea_fit):

coef(pk_painrelief_fit)

(α₁ = -3.021616028925513,
 α₂ = -0.8965453588452168,
 α₃ = 1.0972549105426699,
 βRx = 1.23687163038591,)

simobs(ordinal_model, pk_painrelief_pop, coef(pk_painrelief_fit));

Note

You can also pass a single parameter as a Pumas fitted model to simobs().

pk_painrelief_sim = simobs(pk_painrelief_fit);

7.1.7 Model diagnostics

Finally, our last step is to assess model diagnostics.

7.1.7.1 Assessing model diagnostics

To assess model diagnostics we can use the inspect() function in our fitted Pumas model:

inspect(pk_painrelief_fit)

[ Info: Calculating predictions.
[ Info: Calculating empirical bayes.
[ Info: Evaluating dose control parameters.
[ Info: Evaluating individual parameters.
[ Info: Done.

FittedPumasModelInspection

7.1.7.1.1 AIC

We can also check for Akaike Information Criterion (AIC):

aic(pk_painrelief_fit)

4660.296063423194

7.1.7.1.2 BIC

Besides AIC, there is also the Bayesian Information Criterion (BIC):

bic(pk_painrelief_fit)

4682.536385283282

7.2 Visual predictive checks

To conclude, we can inspect visual predictive checks with the vpc_plot() function. But first, we need to generate a VPC object with the vpc() function.

By default, vpc will use :time as the independent variable for VPC. We need to change that, because or :time variable is just a vector of 0s in the ordinal regression.

Let’s use :isRx:

pk_painrelief_vpc = vpc(pk_painrelief_fit; covariates = [:isRx]);

Now, we need to use the vpc_plot function into our newly created VPC object. For this, we need to import the package PumasUtilities:

using PumasUtilities

plots = vpc_plot(pk_painrelief_vpc; paginate = true, separate = true);

plots[1]

plots[4]

plots[2]

plots[3]

8 Conclusion

I hope you enjoyed this tutorial on ordinal regression. Whenever your dependent variable is ordinal, there’s a great chance that it violates the equidistant assumption. This is where ordinal regression comes to rescue. Keep it close by in your arsenal of discrete statistical modeling techniques.

Reuse

CC BY-SA 4.0

Ordinal Regression

1 Why not just use Linear Regression?

2 How to deal with Ordered Discrete Dependent Variable?

2.1 Cumulative Distribution Function (CDF)

2.2 Log-cumulative-odds

2.3 \(K-1\) Intercepts

2.3.1 Example with Synthetic Data

3 Adding Coefficients \(\boldsymbol{\beta}\)

3.0.1 How to Interpret Coefficient \(\boldsymbol{\beta}\)?

4 Likelihood

5 Full Statistical Model

6 `pk_painrelief` Dataset

6.1 Exploratory Data Analysis

6.2 Visualizations

7 Pumas modeling

7.1 Pumas’ Workflow

7.1.1 Defining a model in Pumas

7.1.1.1 Model parameter with `@param`

7.1.1.1.1 Ordered Intercepts `αₖ`s

7.1.1.2 Model covariates with @covariates

7.1.1.3 Pre-processing variables with `@pre`

7.1.1.4 Statistical modeling of dependent variables with `@derived`

7.1.2 Define a `Subject` and `Population`

7.1.2.1 Reading `Subject`s directly from a `DataFrame`

7.1.3 Fit your model

7.1.3.1 Initial Parameter’s Values

7.1.3.2 Inference Algorithms

7.1.4 Perform inference on the model’s population-level parameters

7.1.4.1 Interpreting the Coefficients

7.1.5 Predict from a fitted model or Simulate random observations from a fitted model

7.1.5.1 Predictions with `predict()`

7.1.6 Simulations with `simobs()`

7.1.7 Model diagnostics

7.1.7.1 Assessing model diagnostics

7.1.7.1.1 AIC

7.1.7.1.2 BIC

7.2 Visual predictive checks

8 Conclusion

Reuse

1 Why not just use Linear Regression?

2 How to deal with Ordered Discrete Dependent Variable?

2.1 Cumulative Distribution Function (CDF)

2.2 Log-cumulative-odds

2.3 \(K-1\) Intercepts

2.3.1 Example with Synthetic Data

3 Adding Coefficients \(\boldsymbol{\beta}\)

3.0.1 How to Interpret Coefficient \(\boldsymbol{\beta}\)?

4 Likelihood

5 Full Statistical Model

6 pk_painrelief Dataset

6.1 Exploratory Data Analysis

6.2 Visualizations

7 Pumas modeling

7.1 Pumas’ Workflow

7.1.1 Defining a model in Pumas

7.1.1.1 Model parameter with @param

7.1.1.1.1 Ordered Intercepts αₖs

7.1.1.2 Model covariates with @covariates

7.1.1.3 Pre-processing variables with @pre

7.1.1.4 Statistical modeling of dependent variables with @derived

7.1.2 Define a Subject and Population

7.1.2.1 Reading Subjects directly from a DataFrame

7.1.3 Fit your model

7.1.3.1 Initial Parameter’s Values

7.1.3.2 Inference Algorithms

7.1.4 Perform inference on the model’s population-level parameters

7.1.4.1 Interpreting the Coefficients

7.1.5 Predict from a fitted model or Simulate random observations from a fitted model

7.1.5.1 Predictions with predict()

7.1.6 Simulations with simobs()

7.1.7 Model diagnostics

7.1.7.1 Assessing model diagnostics

7.1.7.1.1 AIC

7.1.7.1.2 BIC

7.2 Visual predictive checks

8 Conclusion

Reuse

6 `pk_painrelief` Dataset

7.1.1.1 Model parameter with `@param`

7.1.1.1.1 Ordered Intercepts `αₖ`s

7.1.1.3 Pre-processing variables with `@pre`

7.1.1.4 Statistical modeling of dependent variables with `@derived`

7.1.2 Define a `Subject` and `Population`

7.1.2.1 Reading `Subject`s directly from a `DataFrame`

7.1.5.1 Predictions with `predict()`

7.1.6 Simulations with `simobs()`