using Pumas
using Pumas.Experimental: FiniteMarkov
using PumasUtilities
using PharmaDatasets
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using GLM: softmax
using SummaryTables
using Printf
using PumasReports
Discrete-Time Markov Models
FiniteMarkov is an experimental feature introduced in Pumas 2.8. The API may change in future releases without deprecation.
1 Introduction
In clinical trials and observational studies, we frequently encounter categorical outcomes that evolve over time. Consider the following scenarios:
- A patient’s disease severity recorded weekly as “none”, “mild”, “moderate”, or “severe”
- Treatment response assessed at each visit as “no response”, “partial response”, or “full response”
- An adverse event grade tracked daily from “absent” through “grade 1” to “grade 3”
- Sleep stages cycling through “awake”, “light sleep”, “deep sleep”, and “REM”
A common feature of these data is that the current observation depends heavily on the previous one. A patient who was in “mild” disease at the last visit is far more likely to remain in “mild” at the next visit than to suddenly jump to “severe”. This dependence between consecutive observations is called the Markov property, and models designed to capture it are called Markov models.
In this tutorial, we will learn how to build and fit discrete-time Markov models (DTMMs) in Pumas using the FiniteMarkov distribution. We will work through a complete clinical example — from recognizing Markovian features in data to fitting a PK-driven Markov model, interpreting the results, and evaluating the model.
2 What Makes Data “Markovian”?
Before building a Markov model, it helps to understand what Markovian data looks like and why standard approaches fall short.
2.1 Recognizing Markovian Features
Data with Markovian features are characterized by three signatures (Ooi, Plan & Bergstrand, CPT:PSP 2025):
- Frequent observations relative to changes — observations are collected more often than the outcome is expected to change. This leads to long runs of the same category.
- Many consecutive same-category observations — at the individual level, you see strings of identical values (e.g., “mild, mild, mild, moderate, moderate, moderate”).
- Positive correlation between current and previous values — a scatter plot of \(Y_t\) versus \(Y_{t-1}\) shows a clear positive trend.
2.2 Why Not Use Standard Models?
Standard categorical models (e.g., proportional odds, logistic regression) assume that observations are independent. When the data have Markovian features, ignoring this dependence can cause:
- Inflated Type I error in hypothesis tests — you may falsely identify covariate effects (e.g., a placebo effect) that do not exist
- Overestimated inter-individual variability (IIV) — the model absorbs the serial correlation into IIV
- Unrealistic simulations — simulated data will show too many state changes, inflated extreme-value occurrences, and under-predicted time spent in the same state
A dedicated Markov model solves these problems by explicitly conditioning each observation on the previous state.
3 The Markov Chain: Core Concepts
3.1 States and Transitions
A discrete-time Markov chain is a stochastic process where a subject occupies one of \(K\) discrete states at each time point. The probability of being in any state at time \(t\) depends only on the state at time \(t - 1\) — not on any earlier history. This is called the first-order Markov property:
\[ P(Y_t = k_t \mid Y_0, Y_1, \ldots, Y_{t-1}) = P(Y_t = k_t \mid Y_{t-1}) \]
“First-order” refers to memory depth, not step size. It means the next state depends only on the current state — not on the full history of previous states. A subject can jump from any state to any other state in a single time step; the transition probabilities determine how likely each jump is. If you want to enforce that only adjacent transitions are possible (e.g., a patient cannot skip from “none” to “severe”), you would explicitly set those transition probabilities to zero by fixing the corresponding logits to \(-\infty\). That is a modeling choice, not a property of first-order Markov chains.
Higher-order Markov models exist (where the predicted value depends on two or more previous states), but they require many more parameters and their results are harder to communicate. In pharmacometric practice, first-order models are almost always sufficient.
3.2 The Transition Probability Matrix
The transition probabilities are collected in a transition probability matrix \(P\). For \(K\) states, \(P\) is a \(K \times K\) matrix where each entry \(P_{ij}\) is the probability of moving from source state \(i\) to destination state \(j\):
\[ P = \begin{bmatrix} P_{11} & P_{12} & \cdots & P_{1K} \\ P_{21} & P_{22} & \cdots & P_{2K} \\ \vdots & \vdots & \ddots & \vdots \\ P_{K1} & P_{K2} & \cdots & P_{KK} \end{bmatrix} \]
Each row represents a complete probability distribution: “Given that a subject is currently in state \(i\), what is the probability of transitioning to each possible state?” Because the subject must end up somewhere (including staying in the same state), each row sums to 1.
Let’s consider a concrete example with three disease states (no response, partial response, full response). A plausible transition matrix might look like:
| From To | No Response | Partial | Full |
|---|---|---|---|
| No Response | 0.85 | 0.12 | 0.03 |
| Partial | 0.05 | 0.80 | 0.15 |
| Full | 0.02 | 0.08 | 0.90 |
Read by row — each row is a self-contained probability distribution for a subject currently in that state:
- Row 1 (No Response): A patient currently in “No Response” has an 85% chance of staying there, a 12% chance of improving to “Partial”, and a 3% chance of jumping directly to “Full Response”. These three probabilities sum to 1.
- Row 2 (Partial): A different patient currently in “Partial” has a 5% chance of worsening to “No Response”, an 80% chance of staying in “Partial”, and a 15% chance of improving to “Full Response”.
Do not read down columns. The 0.12 in Row 1 and the 0.05 in Row 2 of the “No Response” column are about different patients in different current states. Each row is independent — you pick the row that matches the subject’s current state, then read across to find the probabilities for their next state.
The strong diagonal dominance (high same-state probabilities) is typical of Markovian data — subjects tend to stay in their current state from one observation to the next.
3.3 Absorbing States
An absorbing state is one that a subject can never leave. Common examples include dropout, death, or cure. Once reached, \(P(\text{absorbing} \mid \text{absorbing}) = 1\) and all other transitions from that state are zero.
3.4 Equidistant Time Assumption
A key assumption of the standard DTMM is that the effect of time since the last observation on transition probabilities is negligible. This works well when observations are equidistant (e.g., daily, weekly). When time intervals vary substantially, a continuous-time Markov model (CTMM) should be considered instead.
Use a DTMM when:
- Observations are at regular, equidistant intervals
- You want a straightforward, easy-to-communicate model
- The number of states is moderate (2–5)
Consider a CTMM when:
- Time intervals between observations vary
- You want to simulate at different time intervals than observed
- Time-varying covariates change between observation times
- The variable has many ordered categories (CTMM is more parsimonious)
4 Parameterizing Transitions: The Logit Approach
4.1 Why Not Estimate Probabilities Directly?
Transition probabilities must satisfy two constraints:
- Each probability is between 0 and 1
- Each row sums to 1
Pumas uses a logit parameterization that maps unconstrained real numbers to valid probabilities.
4.2 The Logistic Transformation
Consider a subject currently in source state \(i\). They can transition to any of \(K\) destination states. We need \(K\) probabilities that are each between 0 and 1 and sum to 1.
The trick is to estimate only \(K - 1\) unconstrained parameters and derive the \(K\)-th probability from the others. We pick one destination state as the reference category — by convention, the last state (state \(K\)). The \(K - 1\) estimated parameters \(\tau_{i1}, \ldots, \tau_{i(K-1)}\) are called logits and can take any value on the real line (\(-\infty\) to \(+\infty\)).
The logistic transformation converts these logits into probabilities via a shared denominator that guarantees the row sums to 1:
\[ D_i = 1 + \sum_{m=1}^{K-1} \exp(\tau_{im}) \]
where \(m\) ranges over the \(K - 1\) non-reference destination states. This denominator is the same for every destination state in the row.
Probability of transitioning to a non-reference state (\(j < K\)):
\[ P(i \to j) = \frac{\exp(\tau_{ij})}{D_i} \]
Each non-reference destination state \(j\) contributes \(\exp(\tau_{ij})\) in the numerator. A larger logit \(\tau_{ij}\) means a larger \(\exp(\tau_{ij})\) and therefore a higher probability of transitioning to state \(j\).
Probability of transitioning to the reference state (\(j = K\)):
\[ P(i \to K) = \frac{1}{D_i} \]
The reference state has no estimated logit parameter — its logit is implicitly fixed at zero, so \(\exp(0) = 1\) appears in the numerator. Its probability is whatever is “left over” after the other states claim their share.
4.3 A Worked Example
Let’s make this concrete. Suppose we have \(K = 3\) destination states, so we estimate \(K - 1 = 2\) logit parameters for this row: \(\tau_1 = 2.0\) and \(\tau_2 = -1.0\). State 3 is the reference (implicit logit = 0).
Step 1 — Compute the shared denominator:
\[ D = 1 + \exp(2.0) + \exp(-1.0) = 1 + 7.389 + 0.368 = 8.757 \]
The leading 1 comes from the reference state: \(\exp(0) = 1\).
Step 2 — Divide each \(\exp(\text{logit})\) by the denominator:
| Destination | Logit | \(\exp(\text{logit})\) | Probability = \(\exp / D\) |
|---|---|---|---|
| State 1 | 2.0 (estimated) | 7.389 | 7.389 / 8.757 = 0.844 |
| State 2 | -1.0 (estimated) | 0.368 | 0.368 / 8.757 = 0.042 |
| State 3 | 0 (reference, fixed) | 1.000 | 1.000 / 8.757 = 0.114 |
| Total | 1.000 |
The probabilities sum to 1, as required. And critically, any real values of \(\tau_1\) and \(\tau_2\) will produce valid probabilities — the optimizer is free to explore the entire real line without ever violating the constraints.
4.4 This Is the Softmax (Multinomial Logistic) Function
The transformation we just performed has a well-known name: softmax (also called the multinomial logistic function). You may already know the binary logistic function, which converts a single logit to a probability:
\[ \text{logistic}(\tau) = \frac{\exp(\tau)}{1 + \exp(\tau)} \]
The softmax is its generalization to \(K\) categories. Given a vector of logits \([\tau_1, \tau_2, \ldots, \tau_K]\), softmax returns a vector of probabilities:
\[ \text{softmax}(\boldsymbol{\tau})_j = \frac{\exp(\tau_j)}{\sum_{m=1}^{K} \exp(\tau_m)} \]
In our parameterization, we estimate \(K - 1\) logits and fix the reference state logit to 0. So the full logit vector is \([\tau_1, \ldots, \tau_{K-1}, 0]\), and softmax produces exactly the probabilities we computed by hand above.
Let’s verify with code:
# The same worked example: τ₁ = 2.0, τ₂ = -1.0, reference logit = 0
softmax([2.0, -1.0, 0.0])3-element Vector{Float64}:
0.8437947344813396
0.042010066134066056
0.1141951993845945
This matches our manual calculation (0.844, 0.042, 0.114).
When \(K = 2\) (two states), the softmax with logits \([\tau, 0]\) reduces to:
\[ P(\text{State 1}) = \frac{\exp(\tau)}{\exp(\tau) + 1} = \text{logistic}(\tau) \]
So the binary logistic function is just softmax with two categories. The multinomial logistic (softmax) generalizes this to any number of categories.
4.5 Why This Parameterization Is Useful for Drug Effects
When a drug effect is additive on the logit scale, the model is:
\[ \tau_{ij}^{\text{drug}} = \tau_{ij} + \text{slope} \times \text{concentration} \]
A positive slope increases the logit for that transition, which increases the corresponding probability and proportionally decreases all other probabilities from that state. This is both mathematically clean and clinically interpretable.
5 Loading and Exploring the Dataset
Now let’s work through a complete example. We’ll use a clinical trial dataset that records disease states over time alongside PK information about drug exposure.
df = dataset("psptutorials/psp413278/markov-discrete-time")
first(df, 10)| Row | ID | TIME | AMT | II | ADDL | DV | EVID | TREA | ETA1 | ETA2 |
|---|---|---|---|---|---|---|---|---|---|---|
| Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Int64 | Float64 | Float64 | |
| 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
| 2 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
| 3 | 1 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
| 4 | 1 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
| 5 | 1 | 4 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
| 6 | 1 | 5 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
| 7 | 1 | 6 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
| 8 | 1 | 7 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
| 9 | 1 | 8 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
| 10 | 1 | 9 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754288 | -0.761771 |
overview_table(df)| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing |
| 1 | ID [Int64] |
Mean (sd): 50.5 (28.9) min ≤ med ≤ max: 1 ≤ 51 ≤ 100 IQR (CV): 50 (0.571) |
100 distinct values | 36650 (100%) |
0 (0%) |
|
| 2 | TIME [Int64] |
Mean (sd): 182 (106) min ≤ med ≤ max: 0 ≤ 182 ≤ 365 IQR (CV): 183 (0.581) |
366 distinct values | 36650 (100%) |
0 (0%) |
|
| 3 | AMT [Int64] |
Mean (sd): 0.136 (3.69) min ≤ med ≤ max: 0 ≤ 0 ≤ 100 IQR (CV): 0 (27.1) |
2 distinct values | 36650 (100%) |
0 (0%) |
|
| 4 | II [Int64] |
Mean (sd): 0.00136 (0.0369) min ≤ med ≤ max: 0 ≤ 0 ≤ 1 IQR (CV): 0 (27.1) |
2 distinct values | 36650 (100%) |
0 (0%) |
|
| 5 | ADDL [Int64] |
Mean (sd): 0.497 (13.4) min ≤ med ≤ max: 0 ≤ 0 ≤ 364 IQR (CV): 0 (27.1) |
2 distinct values | 36650 (100%) |
0 (0%) |
|
| 6 | DV [Int64] |
Mean (sd): 1.83 (0.926) min ≤ med ≤ max: 0 ≤ 2 ≤ 4 IQR (CV): 2 (0.506) |
5 distinct values | 36650 (100%) |
0 (0%) |
|
| 7 | EVID [Int64] |
Mean (sd): 0.00136 (0.0369) min ≤ med ≤ max: 0 ≤ 0 ≤ 1 IQR (CV): 0 (27.1) |
2 distinct values | 36650 (100%) |
0 (0%) |
|
| 8 | TREA [Int64] |
Mean (sd): 0.501 (0.5) min ≤ med ≤ max: 0 ≤ 1 ≤ 1 IQR (CV): 1 (0.999) |
2 distinct values | 36650 (100%) |
0 (0%) |
|
| 9 | ETA1 [Float64] |
Mean (sd): -0.0409 (0.293) min ≤ med ≤ max: -0.742 ≤ -0.0754 ≤ 0.598 IQR (CV): 0.437 (-7.17) |
100 distinct values | 36650 (100%) |
0 (0%) |
|
| 10 | ETA2 [Float64] |
Mean (sd): 0.0188 (0.352) min ≤ med ≤ max: -0.762 ≤ 0.0396 ≤ 0.756 IQR (CV): 0.479 (18.7) |
100 distinct values | 36650 (100%) |
0 (0%) |
|
| Dimensions: 36650 x 10 Duplicate rows: 0 | ||||||
The dataset contains:
ID: Subject identifierTIME: Observation timeEVID: Event identifier (1 = dosing event, 0 = observation)AMT,ADDL,II: Dosing information (amount, additional doses, dosing interval)DV: Disease state observation (integer 1, 2, 3, or 4)ETA1,ETA2: Pre-computed individual PK random effects
This dataset includes ETA1 and ETA2 columns representing pre-computed individual PK random effects from a prior PK analysis. They are used as covariates to account for individual PK variability without estimating random effects on PK parameters within the Markov model. This sequential approach is common when the focus is on the transition model rather than PK.
5.1 Data Preparation
Before fitting, we need two data wrangling steps:
- Add a
CMT(compartment) column set to1— required byread_pumas - Set
DVtomissingfor dosing events (EVID == 1) — dosing rows carry no disease state observation
@rtransform!(df, :CMT = 1);
@rtransform!(df, :DV = :EVID == 1 ? missing : :DV);FiniteMarkov does not support missing observations in the disease state sequence. The missing values we just created for dosing events are handled correctly by read_pumas (dosing events are not part of the observation vector). However, if your dataset has missing observations at non-dosing timepoints, you must impute them before fitting.
Now we read the data into a Pumas Population:
pop = read_pumas(
df;
id = :ID,
time = :TIME,
evid = :EVID,
cmt = :CMT,
amt = :AMT,
addl = :ADDL,
ii = :II,
observations = [:DV],
covariates = [:ETA1, :ETA2],
)Population
Subjects: 100
Covariates: ETA1, ETA2 (heterogenous)
Observations: DV (heterogenous)
5.2 Exploratory Data Analysis
Good EDA is essential before fitting a Markov model. Following the recommendations of Ooi et al. (2025), we examine three aspects:
- The longitudinal profile of state values per subject
- The distribution of states over time across the population
- The correlation between current and previous states (Markov signature)
5.2.1 Longitudinal Profiles
Let’s first look at the raw observations.
obs_df = @chain df begin
@rsubset !ismissing(:DV)
@rtransform :State = Int(:DV)
@rtransform :StateStr = string(Int(:DV))
end
simple_table(first(obs_df, 5))| ID | TIME | AMT | II | ADDL | DV | EVID | TREA | ETA1 | ETA2 | CMT | State | StateStr |
| 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754 | -0.762 | 1 | 1 | 1 |
| 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754 | -0.762 | 1 | 1 | 1 |
| 1 | 2 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754 | -0.762 | 1 | 1 | 1 |
| 1 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754 | -0.762 | 1 | 1 | 1 |
| 1 | 4 | 0 | 0 | 0 | 1 | 0 | 0 | -0.0754 | -0.762 | 1 | 1 | 1 |
# Longitudinal state profiles for a few subjects
selected_ids = unique(obs_df.ID)[1:min(6, length(unique(obs_df.ID)))]
sub_df = @rsubset(obs_df, :ID in selected_ids)
@rtransform!(sub_df, :IDStr = string(:ID));
plt_long =
data(sub_df) *
mapping(:TIME => "Time", :State => "State"; layout = :IDStr => "Subject") *
visual(Stairs; step = :post)
draw(plt_long; axis = (; yticks = 1:4))Notice the long runs of the same state — this is the hallmark of Markovian data. Subjects tend to stay in their current state for multiple consecutive observations before transitioning.
5.2.2 Distribution of States Over Time
Next, let’s look at how the population distributes across states at each time point.
state_counts = @chain obs_df begin
groupby([:TIME, :StateStr])
@combine :Count = length(:ID)
end
plt_states =
data(state_counts) *
mapping(
:TIME => "Time",
:Count => "Number of Subjects";
color = :StateStr => "State",
layout = :StateStr,
) *
visual(BarPlot; alpha = 0.7)
draw(plt_states)5.2.3 Transition Frequency Table
One of the most informative diagnostics is a transition frequency table — a count of how often each pair of transitions occurs in the data. This directly reveals the Markov structure:
# Create current-vs-previous state pairs
obs_by_id = groupby(obs_df, :ID)
transitions = DataFrame(ID = Int[], PrevState = Int[], CurrState = Int[])
for group in obs_by_id
states = group.State
ids = group.ID
for i = 2:length(states)
push!(transitions, (ID = ids[i], PrevState = states[i-1], CurrState = states[i]))
end
end
simple_table(first(transitions, 5))| ID | PrevState | CurrState |
| 1 | 1 | 1 |
| 1 | 1 | 1 |
| 1 | 1 | 1 |
| 1 | 1 | 1 |
| 1 | 1 | 1 |
# Transition frequency matrix
trans_counts = @chain transitions begin
groupby([:PrevState, :CurrState])
@combine :N = length(:ID)
end
# Pivot to matrix form
trans_wide = unstack(trans_counts, :PrevState, :CurrState, :N)
simple_table(trans_wide)| PrevState | 1 | 2 | 3 | 4 |
| 1 | 17432 | 231 | 2 | 5 |
| 2 | 183 | 8430 | 128 | 1 |
| 3 | 1 | 109 | 8502 | 1 |
| 4 | missing | missing | missing | 1475 |
Each row represents the previous state and each column represents the current state. Look for:
- Large diagonal values → strong Markov property (subjects tend to stay)
- Asymmetry → preferred directions of transition (e.g., more 1→2 than 2→1 may indicate disease progression)
- Zero or rare transitions → these can be fixed or constrained in the model
5.2.4 Computing Initial Logit Estimates from the Data
Good initial values help the optimizer converge faster and avoid local minima. We can derive them directly from the observed transition frequencies.
Step 1 — Convert counts to row-wise probabilities. For each source state (row), divide each transition count by the row total:
# Compute observed transition probabilities from the frequency table
trans_probs = @chain transitions begin
groupby([:PrevState, :CurrState])
@combine :N = length(:ID)
end
# Add row totals
row_totals = @chain trans_probs begin
groupby(:PrevState)
@combine :RowTotal = sum(:N)
end
trans_probs = leftjoin(trans_probs, row_totals; on = :PrevState)
@rtransform!(trans_probs, :Prob = :N / :RowTotal);
# Display as a pivot table
prob_wide = unstack(trans_probs, :PrevState, :CurrState, :Prob)
simple_table(prob_wide)| PrevState | 1 | 2 | 3 | 4 |
| 1 | 0.987 | 0.0131 | 0.000113 | 0.000283 |
| 2 | 0.0209 | 0.964 | 0.0146 | 0.000114 |
| 3 | 0.000116 | 0.0127 | 0.987 | 0.000116 |
| 4 | missing | missing | missing | 1 |
Each row sums to 1 — these are the empirical transition probabilities.
Step 2 — Convert probabilities to logits. The logit for transitioning from state \(i\) to state \(j\) (relative to the reference state \(K\)) is:
\[ \tau_{ij} = \log\left(\frac{P_{ij}}{P_{iK}}\right) \]
In our 4-state model, state 4 is the reference category:
K = 4
# For each source state (row), compute logits relative to state K
init_logits = Matrix{Float64}(undef, K - 1, K - 1)
for src = 1:(K-1)
row_probs = @rsubset(trans_probs, :PrevState == src)
p_ref = only(@rsubset(row_probs, :CurrState == K).Prob)
for dst = 1:(K-1)
p = only(@rsubset(row_probs, :CurrState == dst).Prob)
init_logits[src, dst] = log(p / p_ref)
end
@printf(
"State %d → logits [τ_%d1, τ_%d2, τ_%d3] = [%8.3f, %8.3f, %8.3f]\n",
src,
src,
src,
src,
init_logits[src, 1],
init_logits[src, 2],
init_logits[src, 3]
)
endState 1 → logits [τ_11, τ_12, τ_13] = [ 8.157, 3.833, -0.916]
State 2 → logits [τ_21, τ_22, τ_23] = [ 5.209, 9.040, 4.852]
State 3 → logits [τ_31, τ_32, τ_33] = [ 0.000, 4.691, 9.048]
These data-derived logits are baseline estimates (no drug effect). They make excellent starting values for the θ parameters in the model. The drug effect slopes (slope₁₂, slope₂₃) can be initialized at zero, since the baseline logits already capture the average transition pattern.
Note that state 4 (the absorbing state) has very few transitions from it, so its row is fixed to all \(-\infty\) in the model rather than estimated.
6 Building the Markov Chain Model
Now that we’ve confirmed the Markovian nature of the data, let’s build a model. Our model has three interconnected components:
- PK sub-model: A one-compartment model that tracks drug concentration over time
- Drug effect: Concentration-dependent shifts on specific transition probabilities
- Markov chain: A 4-state discrete-time Markov model (3 active states + 1 absorbing)
6.1 The Full Model
markov_model = @model begin
@param begin
"""
Clearance (L/d)
"""
θCL ∈ RealDomain(lower = 0)
"""
Volume (L)
"""
θV ∈ RealDomain(lower = 0)
"""
Transition logits from State 1
"""
θ₁₁ ∈ RealDomain()
θ₁₂ ∈ RealDomain()
θ₁₃ ∈ RealDomain()
"""
Transition logits from State 2
"""
θ₂₁ ∈ RealDomain()
θ₂₂ ∈ RealDomain()
θ₂₃ ∈ RealDomain()
"""
Transition logits from State 3
"""
θ₃₁ ∈ RealDomain()
θ₃₂ ∈ RealDomain()
θ₃₃ ∈ RealDomain()
"""
Drug effect slope on 1→2 transition
"""
slope₁₂ ∈ RealDomain()
"""
Drug effect slope on 2→3 transition
"""
slope₂₃ ∈ RealDomain()
end
@covariates ETA1 ETA2
@pre begin
CL = θCL * exp(ETA1)
V = θV * exp(ETA2)
end
@dynamics begin
Central' = -CL / V * Central
end
@vars begin
conc = Central / V
# Drug effects on transition probabilities (logit scale)
drugep₁₂ = slope₁₂ * conc
drugep₂₃ = slope₂₃ * conc
# Transition logits with drug effect added
θ₁₂de = θ₁₂ + drugep₁₂
θ₂₃de = θ₂₃ + drugep₂₃
end
@derived begin
# Time-varying transition matrices (logit parameterization)
# Each matrix is k × (k-1) where k = 4 states
# Row 4 is the absorbing state: -Inf means zero probability of leaving
Pθ := [
[
θ₁₁ _θ₁₂de θ₁₃
θ₂₁ θ₂₂ _θ₂₃de
θ₃₁ θ₃₂ θ₃₃
-Inf -Inf -Inf
] for (_θ₁₂de, _θ₂₃de) in zip(θ₁₂de, θ₂₃de)
]
DV ~ FiniteMarkov(Pθ, [θ₁₁, θ₁₂, θ₁₃])
end
endPumasModel
Parameters: θCL, θV, θ₁₁, θ₁₂, θ₁₃, θ₂₁, θ₂₂, θ₂₃, θ₃₁, θ₃₂, θ₃₃, slope₁₂, slope₂₃
Random effects:
Covariates: ETA1, ETA2
Dynamical system variables: Central
Dynamical system type: Matrix exponential
Derived: DV, conc, drugep₁₂, drugep₂₃, θ₁₂de, θ₂₃de
Observed: DV, conc, drugep₁₂, drugep₂₃, θ₁₂de, θ₂₃de
6.2 Step-by-Step Model Walkthrough
Let’s break this model down piece by piece.
6.2.1 The @param Block: What We’re Estimating
The model has 13 parameters:
| Parameter | Role | Domain |
|---|---|---|
θCL, θV |
PK clearance and volume | Fixed (from prior PK analysis) |
θ₁₁, θ₁₂, θ₁₃ |
Logits for transitions from State 1 to States 1, 2, 3 | Unconstrained |
θ₂₁, θ₂₂, θ₂₃ |
Logits for transitions from State 2 to States 1, 2, 3 | Unconstrained |
θ₃₁, θ₃₂, θ₃₃ |
Logits for transitions from State 3 to States 1, 2, 3 | Unconstrained |
slope₁₂ |
Drug effect on 1→2 transition | Unconstrained |
slope₂₃ |
Drug effect on 2→3 transition | Unconstrained |
The nine θ parameters define a \(3 \times 3\) baseline transition matrix (on the logit scale). Each row also has an implicit fourth entry (logit = 0) for State 4 (the absorbing state). So we actually model a \(4 \times 3\) logit matrix — 3 logits per row for 4 states.
For \(K\) states, a DTMM needs \(K \times (K-1)\) transition logit parameters (one row per source state, \(K-1\) logits per row). With \(K = 4\): \(4 \times 3 = 12\) parameters. But the absorbing state row is all \(-\infty\) (fixed), so we estimate \(3 \times 3 = 9\) logits, plus 2 drug effect slopes = 11 free parameters.
6.2.2 The @covariates and @pre Blocks: Individual PK
ETA1 and ETA2 are pre-computed individual PK random effects that are read as covariates from the dataset. The @pre block uses these to compute individual clearance and volume:
CL = θCL * exp(ETA1) # Individual clearance
V = θV * exp(ETA2) # Individual volume
6.2.3 The @dynamics Block: PK Concentration Over Time
A standard one-compartment oral PK model tracks the amount of drug in the central compartment:
Central' = -CL / V * Central
The concentration at each time point is then conc = Central / V.
6.2.4 The @vars Block: Drug Effects on Transitions
This is where pharmacology meets the Markov model. The drug concentration modifies two specific transition logits:
drugep₁₂ = slope₁₂ * conc # Drug effect on State 1 → State 2
drugep₂₃ = slope₂₃ * conc # Drug effect on State 2 → State 3
θ₁₂de = θ₁₂ + drugep₁₂ # Modified logit
θ₂₃de = θ₂₃ + drugep₂₃ # Modified logit
The drug effect is additive on the logit scale and linear in concentration (a slope model). This is the most common parameterization for drug effects in DTMMs.
Drug effects need not be placed on all transitions. The choice should be guided by:
- Clinical hypothesis — which transitions does the drug affect?
- Data support — rare transitions may not support drug effect estimation
- Parsimony — each drug effect adds a parameter
In our model, the drug affects transitions toward worse states (1→2 and 2→3), which would be typical for a model where higher drug exposure exacerbates disease progression. Alternatively, in a beneficial drug model, you might place effects on transitions toward better states.
6.2.5 The @derived Block: Assembling the FiniteMarkov Distribution
The transition matrices are assembled as a vector of matrices, one per observation time. This is because the drug concentration (and therefore the transition probabilities) changes over time:
Pθ := [
[
θ₁₁ _θ₁₂de θ₁₃ # Row 1: transitions from State 1
θ₂₁ θ₂₂ _θ₂₃de # Row 2: transitions from State 2
θ₃₁ θ₃₂ θ₃₃ # Row 3: transitions from State 3
-Inf -Inf -Inf # Row 4: absorbing state (can't leave)
] for (_θ₁₂de, _θ₂₃de) in zip(θ₁₂de, θ₂₃de)
]Each matrix in Pθ is \(4 \times 3\) — 4 rows (one per source state) and 3 columns (logits for transition to States 1, 2, and 3). The probability of transitioning to State 4 (the absorbing state) is implicit (the reference category).
The FiniteMarkov call:
DV ~ FiniteMarkov(Pθ, [θ₁₁, θ₁₂, θ₁₃])takes two arguments:
Pθ: The vector of transition matrices (one per observation time)[θ₁₁, θ₁₂, θ₁₃]: The initial state distribution (logits for States 1, 2, 3; State 4 logit is implicitly 0)
The initial state vector [θ₁₁, θ₁₂, θ₁₃] determines the probability of each state at the first observation. The first transition matrix in Pθ is not referenced — only the initial distribution is used for time 1. From time 2 onwards, each observation uses the corresponding transition matrix from Pθ.
6.3 The Transition Matrix Laid Out
To make the model concrete, here is the matrix of logits that the model assembles at a given time. This is exactly what goes into Pθ, with the implicit reference column for State 4 (logit \(= 0\)) shown for completeness:
\[ \begin{array}{c|cccc} \text{From} \,\backslash\, \text{To} & \text{State 1} & \text{State 2} & \text{State 3} & \text{State 4 (ref)} \\ \hline \text{State 1} & \theta_{11} & \theta_{12} + \text{slope}_{12}\, c & \theta_{13} & 0 \\ \text{State 2} & \theta_{21} & \theta_{22} & \theta_{23} + \text{slope}_{23}\, c & 0 \\ \text{State 3} & \theta_{31} & \theta_{32} & \theta_{33} & 0 \\ \text{State 4} & -\infty & -\infty & -\infty & 0 \end{array} \]
where \(c\) is the drug concentration at that time. The transition probabilities are obtained by applying the softmax to each row of this logit matrix. For source state \(i\), the probability of moving to state \(j\) is:
\[ P_{ij} = \frac{\exp(\tau_{ij})}{\sum_{l=1}^{4} \exp(\tau_{il})}, \]
where \(\tau_{ij}\) is the logit in row \(i\), column \(j\) of the array above (and \(\tau_{i4} = 0\) for every active row). For the absorbing State 4, the \(-\infty\) logits give \(P_{4j} = 0\) for \(j \neq 4\) and \(P_{44} = 1\).
The two drug effects (slope₁₂ and slope₂₃) cause the transition probabilities to be time-varying — they shift as drug concentration rises and falls after each dose.
7 Fitting the Model
We fit the model using NaivePooled estimation. Since individual PK variability is already handled via the ETA1/ETA2 covariates, no additional random effects are needed. We hold the PK parameters constant at their known values.
The initial parameter estimates combine the PK values from a prior analysis with the data-derived logits we computed earlier:
params = (
θCL = 4.8,
θV = 10.0,
θ₁₁ = init_logits[1, 1],
θ₁₂ = init_logits[1, 2],
θ₁₃ = init_logits[1, 3],
θ₂₁ = init_logits[2, 1],
θ₂₂ = init_logits[2, 2],
θ₂₃ = init_logits[2, 3],
θ₃₁ = init_logits[3, 1],
θ₃₂ = init_logits[3, 2],
θ₃₃ = init_logits[3, 3],
slope₁₂ = 0.0,
slope₂₃ = 0.0,
)(θCL = 4.8, θV = 10.0, θ₁₁ = 8.156624964190355, θ₁₂ = 3.832979798087693, θ₁₃ = -0.916290731874155, θ₂₁ = 5.209486152841421, θ₂₂ = 9.039552050995901, θ₂₃ = 4.852030263919617, θ₃₁ = 0.0, θ₃₂ = 4.6913478822291435, θ₃₃ = 9.048056708918736, slope₁₂ = 0.0, slope₂₃ = 0.0)
fpm = fit(
markov_model,
pop,
params,
NaivePooled();
constantcoef = (:θCL, :θV),
optim_options = (; show_trace = true, show_every = 20),
)[ Info: Checking the initial parameter values. [ Info: The initial negative log likelihood and its gradient are finite. Check passed. Iter Function value Gradient norm 0 3.467011e+03 1.628651e+03 * time: 0.09011411666870117 20 3.419425e+03 1.886967e+01 * time: 16.273836135864258 40 3.419316e+03 4.406543e-02 * time: 25.863924026489258
FittedPumasModel
Dynamical system type: Matrix exponential
Number of subjects: 100
Observation records: Active Missing
DV: 36600 0
Total: 36600 0
Number of parameters: Constant Optimized
2 11
Likelihood approximation: NaivePooled
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -3419.3066
----------------------
Estimate
----------------------
† θCL 4.8
† θV 10.0
θ₁₁ 8.1627
θ₁₂ 3.4056
θ₁₃ -0.91548
θ₂₁ 5.2082
θ₂₂ 9.0382
θ₂₃ 4.1391
θ₃₁ -6.37e-13
θ₃₂ 4.6913
θ₃₃ 9.0481
slope₁₂ 0.037161
slope₂₃ 0.03561
----------------------
† indicates constant parameters
NaivePooled?
NaivePooled treats each subject’s observation sequence as independent conditional on the model parameters and their covariates. The Markov structure already captures within-subject dependence through the conditional transition probabilities. This is the standard and most common estimation approach for DTMMs.
If you wanted to estimate IIV on the transition parameters, you would need LaplaceI instead. Estimation of Markov models with random effects can sometimes be difficult, and this tutorial follows the published tutorial of Ooi et al. (2025) in not estimating with random effects.
The transition logit parameters default to 0 (equal probability for all states). If you have prior estimates (e.g., from observed transition frequencies), use them as initial values to speed up convergence. You can compute initial logits from observed transition probabilities as: \(\tau_{ij} = \log(P_{ij} / P_{iK})\) where \(P_{iK}\) is the probability for the reference state.
7.1 Examining the Estimates
simple_table(coeftable(fpm))| parameter | constant | estimate |
| θCL | true | 4.8 |
| θV | true | 10 |
| θ₁₁ | false | 8.16 |
| θ₁₂ | false | 3.41 |
| θ₁₃ | false | -0.915 |
| θ₂₁ | false | 5.21 |
| θ₂₂ | false | 9.04 |
| θ₂₃ | false | 4.14 |
| θ₃₁ | false | -6.37e-13 |
| θ₃₂ | false | 4.69 |
| θ₃₃ | false | 9.05 |
| slope₁₂ | false | 0.0372 |
| slope₂₃ | false | 0.0356 |
The logit-scale parameters are not directly interpretable. Let’s convert them to probabilities.
8 Interpreting the Results
8.1 Converting Logits to Transition Probabilities
We recover the actual transition probabilities at baseline (zero drug concentration) by assembling the logit matrix and applying the softmax to each row:
pars = coef(fpm)
# Baseline 4×4 logit matrix (zero drug concentration).
# The last column is the reference state (logit 0); the last row is the
# absorbing state (−Inf logits, so it can never leave State 4).
logit_matrix = [
pars.θ₁₁ pars.θ₁₂ pars.θ₁₃ 0.0
pars.θ₂₁ pars.θ₂₂ pars.θ₂₃ 0.0
pars.θ₃₁ pars.θ₃₂ pars.θ₃₃ 0.0
-Inf -Inf -Inf 0.0
]
# Convert each row of logits to transition probabilities with the softmax
P = mapslices(softmax, logit_matrix; dims = 2)4×4 Matrix{Float64}:
0.99109 0.00851461 0.000113116 0.00028256
0.0210906 0.971553 0.00724125 0.0001154
0.000116104 0.0126553 0.987113 0.000116104
0.0 0.0 0.0 1.0
The baseline transition matrix can be displayed as a formatted table:
labels = ["State 1", "State 2", "State 3", "State 4"]
P_table = DataFrame(round.(P; digits = 6), labels)
insertcols!(P_table, 1, "From \\ To" => labels)
simple_table(P_table)| From \\ To | State 1 | State 2 | State 3 | State 4 |
| State 1 | 0.991 | 0.00852 | 0.000113 | 0.000283 |
| State 2 | 0.0211 | 0.972 | 0.00724 | 0.000115 |
| State 3 | 0.000116 | 0.0127 | 0.987 | 0.000116 |
| State 4 | 0 | 0 | 0 | 1 |
- Large diagonal values indicate strong Markov behavior (subjects tend to stay in their current state)
- The magnitude of State 4 probabilities shows how quickly subjects drop out from each state
- Asymmetry between \(P_{12}\) and \(P_{21}\) reveals preferred directions of transitions
8.2 State Occupation Probabilities with probstable
Pumas provides the probstable function to compute how the probability of occupying each state evolves over time, accounting for the full sequence of time-varying transition matrices. This is one of the most useful diagnostic tools for Markov models:
pt = probstable(fpm)
simple_table(first(pt, 10))| id | time | DV_iprob1 | DV_iprob2 | DV_iprob3 | DV_iprob4 |
| 1 | 0 | 0.991 | 0.00851 | 0.000113 | 0.000283 |
| 1 | 1 | 0.982 | 0.0167 | 0.000285 | 0.000564 |
| 1 | 2 | 0.974 | 0.0246 | 0.000514 | 0.000843 |
| 1 | 3 | 0.966 | 0.0322 | 0.000796 | 0.00112 |
| 1 | 4 | 0.958 | 0.0395 | 0.00113 | 0.0014 |
| 1 | 5 | 0.95 | 0.0466 | 0.00151 | 0.00167 |
| 1 | 6 | 0.943 | 0.0534 | 0.00193 | 0.00195 |
| 1 | 7 | 0.935 | 0.0599 | 0.0024 | 0.00222 |
| 1 | 8 | 0.928 | 0.0662 | 0.00291 | 0.00249 |
| 1 | 9 | 0.922 | 0.0722 | 0.00346 | 0.00276 |
The columns DV_iprob1 through DV_iprob4 give the probability of being in each state at each time point. These always sum to 1:
# Verify probabilities sum to 1
prob_cols = select(pt, :DV_iprob1, :DV_iprob2, :DV_iprob3, :DV_iprob4)
all_sums = [sum(row) for row in eachrow(prob_cols)]
println("All probability sums ≈ 1.0: ", all(x -> isapprox(x, 1.0), all_sums))All probability sums ≈ 1.0: true
Let’s visualize the state occupation probabilities for a representative subject:
pt_sub1 = @rsubset(pt, :id == first(pt.id))
pt_long = stack(
pt_sub1,
[:DV_iprob1, :DV_iprob2, :DV_iprob3, :DV_iprob4];
variable_name = :State,
value_name = :Probability,
)
@rtransform!(
pt_long,
:State = replace(
string(:State),
"DV_iprob1" => "State 1",
"DV_iprob2" => "State 2",
"DV_iprob3" => "State 3",
"DV_iprob4" => "State 4 (Absorbing)",
)
);
plt_prob =
data(pt_long) *
mapping(:time => "Time", :Probability => "State Probability"; color = :State) *
visual(Lines; linewidth = 2)
draw(
plt_prob;
axis = (; title = "State Occupation Probabilities — Subject $(first(pt.id))",),
legend = (position = :bottom, title = "State", titlealignment = :left),
)These probabilities show the model’s prediction of the chance of being in each state at each time. As time progresses:
- The absorbing state probability typically increases (more subjects have dropped out)
- Active state probabilities may shift as drug concentrations change
- The rate of change reflects both the baseline transition rates and the drug effect
8.3 Understanding the Drug Effect
The drug effects in our model operate on the logit scale. Let’s visualize how drug concentration changes the transition probabilities.
A positive slope₁₂ means that higher drug concentration increases the logit for the 1→2 transition, shifting probability toward State 2. Let’s compute this across a range of concentrations:
concentrations = 0.0:0.5:10.0
probs_12 = map(concentrations) do c
softmax([pars.θ₁₁, pars.θ₁₂ + pars.slope₁₂ * c, pars.θ₁₃, 0.0])[2]
end
probs_23 = map(concentrations) do c
softmax([pars.θ₂₁, pars.θ₂₂, pars.θ₂₃ + pars.slope₂₃ * c, 0.0])[3]
end
drug_df = DataFrame(
Concentration = repeat(collect(concentrations), 2),
Probability = vcat(probs_12, probs_23),
Transition = vcat(
fill("P(State 1 → State 2)", length(concentrations)),
fill("P(State 2 → State 3)", length(concentrations)),
),
)
plt_drug =
data(drug_df) *
mapping(
:Concentration => "Drug Concentration",
:Probability => "Transition Probability";
color = :Transition,
) *
visual(Lines; linewidth = 2)
draw(
plt_drug;
axis = (; title = "Drug Effect on Transition Probabilities",),
legend = (position = :bottom, title = "Transition", titlealignment = :left),
)Because the drug effect is on the logit scale, the relationship between concentration and probability is sigmoidal (S-shaped), not linear. At low concentrations, the effect is approximately linear. At high concentrations, the probability saturates (it can never exceed 1). This built-in saturation is a natural advantage of the logit parameterization.
9 Inference and Model Evaluation
9.1 Parameter Uncertainty
FiniteMarkov models fitted with NaivePooled support standard error computation:
fpmi = infer(fpm)[ Info: Calculating: variance-covariance matrix. [ Info: Done.
Asymptotic inference results using sandwich estimator
Dynamical system type: Matrix exponential
Number of subjects: 100
Observation records: Active Missing
DV: 36600 0
Total: 36600 0
Number of parameters: Constant Optimized
2 11
Likelihood approximation: NaivePooled
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -3419.3066
------------------------------------------------------------
Estimate SE 95.0% C.I.
------------------------------------------------------------
† θCL 4.8 NaN [ NaN ; NaN ]
† θV 10.0 NaN [ NaN ; NaN ]
θ₁₁ 8.1627 0.45064 [ 7.2794 ; 9.0459 ]
θ₁₂ 3.4056 0.46148 [ 2.5012 ; 4.3101 ]
θ₁₃ -0.91548 0.83588 [ -2.5538 ; 0.72281 ]
θ₂₁ 5.2082 0.99104 [ 3.2658 ; 7.1506 ]
θ₂₂ 9.0382 0.99691 [ 7.0843 ; 10.992 ]
θ₂₃ 4.1391 1.0318 [ 2.1169 ; 6.1614 ]
θ₃₁ -6.37e-13 1.4186 [ -2.7805 ; 2.7805 ]
θ₃₂ 4.6913 1.0175 [ 2.6971 ; 6.6856 ]
θ₃₃ 9.0481 0.99835 [ 7.0913 ; 11.005 ]
slope₁₂ 0.037161 0.0044194 [ 0.028499; 0.045823]
slope₂₃ 0.03561 0.0061912 [ 0.023475; 0.047744]
------------------------------------------------------------
† indicates constant parameters
This provides standard errors, confidence intervals, and p-values for all estimated parameters (on the logit scale).
The confidence intervals from infer are on the logit scale. While you can interpret whether individual logit parameters are significantly different from zero, translating these to the probability scale requires care. The logistic is a multivariate transformation — confidence intervals on individual logits do not directly translate to confidence intervals on probabilities. For probability-scale uncertainty, simulation-based approaches (e.g., bootstrap) are recommended.
9.2 Log-Likelihood
The model log-likelihood can be used for model comparison:
ll = loglikelihood(fpm)
println("Log-likelihood: ", ll)
println("-2LL: ", -2 * ll)Log-likelihood: -3419.3065604734097
-2LL: 6838.613120946819
The \(-2 \text{LL}\) value can be compared between nested models using the likelihood ratio test (the difference is approximately \(\chi^2\) distributed with degrees of freedom equal to the number of extra parameters).
9.3 Simulation and Model Checking
Markov models support simulation via simobs. This is the foundation for model evaluation: if the model is adequate, simulated data should reproduce the key features of the observed data.
sims = simobs(markov_model, pop, coef(fpm))Simulated population (Vector{<:Subject})
Simulated subjects: 100
Simulated variables: DV, conc, drugep₁₂, drugep₂₃, θ₁₂de, θ₂₃de
9.4 Simulation Roundtripping
We can verify model identifiability by simulating data from the fitted model and re-estimating the parameters. If the model is well-identified, the re-estimated parameters should be close to the original:
sim_pop = Subject.(sims)
refit_fpm = fit(
markov_model,
sim_pop,
coef(fpm),
NaivePooled();
constantcoef = (:θCL, :θV),
optim_options = (; show_trace = true, show_every = 20),
)[ Info: Checking the initial parameter values. [ Info: The initial negative log likelihood and its gradient are finite. Check passed. Iter Function value Gradient norm 0 3.496158e+03 6.586136e+02 * time: 3.0040740966796875e-5 20 3.489921e+03 1.445022e+01 * time: 11.985846996307373 40 3.488356e+03 8.734019e-01 * time: 21.271727085113525 60 3.488335e+03 2.436686e-02 * time: 31.257397174835205
FittedPumasModel
Dynamical system type: Matrix exponential
Number of subjects: 100
Observation records: Active Missing
DV: 36600 0
Total: 36600 0
Number of parameters: Constant Optimized
2 11
Likelihood approximation: NaivePooled
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -3488.3351
-----------------------
Estimate
-----------------------
† θCL 4.8
† θV 10.0
θ₁₁ 7.3316
θ₁₂ 2.6501
θ₁₃ -1.2993
θ₂₁ 5.2781
θ₂₂ 9.12
θ₂₃ 4.2492
θ₃₁ -12.291
θ₃₂ 19.979
θ₃₃ 24.288
slope₁₂ 0.037031
slope₂₃ 0.027865
-----------------------
† indicates constant parameters
# Compare original and re-estimated parameters
simple_table(compare_estimates(; original = fpm, refit = refit_fpm))| parameter | original | refit |
| θCL | 4.8 | 4.8 |
| θV | 10 | 10 |
| θ₁₁ | 8.16 | 7.33 |
| θ₁₂ | 3.41 | 2.65 |
| θ₁₃ | -0.915 | -1.3 |
| θ₂₁ | 5.21 | 5.28 |
| θ₂₂ | 9.04 | 9.12 |
| θ₂₃ | 4.14 | 4.25 |
| θ₃₁ | -6.37e-13 | -12.3 |
| θ₃₂ | 4.69 | 20 |
| θ₃₃ | 9.05 | 24.3 |
| slope₁₂ | 0.0372 | 0.037 |
| slope₂₃ | 0.0356 | 0.0279 |
10 References
- Ooi QX, Plan E, Bergstrand M. A tutorial on pharmacometric Markov models. CPT: Pharmacometrics & Systems Pharmacology. 2025;14(2):197-216.
- Girard P, Blaschke TF, Kastrissios H, Sheiner LB. A Markov mixed effect regression model for drug compliance. Statistics in Medicine. 1998;17(20):2313-2333.
- Karlsson MO, Schoemaker RC, Kemp B, et al. A pharmacodynamic Markov mixed-effects model for the effect of temazepam on sleep. Clinical Pharmacology & Therapeutics. 2000;68(2):175-188.
- Lacroix BD, Lovern MR, Stockis A, et al. A pharmacodynamic Markov mixed-effects model for determining the effect of exposure to certolizumab pegol on the ACR20 score in patients with rheumatoid arthritis. Clinical Pharmacology & Therapeutics. 2009;86(4):387-395.