Discrete-Time Markov Models

1 Introduction

In clinical trials and observational studies, we frequently encounter categorical outcomes that evolve over time. Consider the following scenarios:

A patient’s disease severity recorded weekly as “none”, “mild”, “moderate”, or “severe”
Treatment response assessed at each visit as “no response”, “partial response”, or “full response”
An adverse event grade tracked daily from “absent” through “grade 1” to “grade 3”
Sleep stages cycling through “awake”, “light sleep”, “deep sleep”, and “REM”

A common feature of these data is that the current observation depends heavily on the previous one. A patient who was in “mild” disease at the last visit is far more likely to remain in “mild” at the next visit than to suddenly jump to “severe”. This dependence between consecutive observations is called the Markov property, and models designed to capture it are called Markov models.

This tutorial is derived work based on the pharmacometric Markov model tutorial of Ooi, Plan & Bergstrand (CPT: Pharmacometrics & Systems Pharmacology, 2025; see the references at the end of this document). We follow their worked example, recommendations, and dataset closely, adapting the workflow to Pumas and the FiniteMarkov distribution.

In this tutorial, we will learn how to build and fit discrete-time Markov models (DTMMs) in Pumas using the FiniteMarkov distribution. We will work through a complete clinical example — from recognizing serial dependence in data to fitting a PK-driven Markov model, interpreting the results, and evaluating the model.

using Pumas
using Pumas.Experimental: FiniteMarkov
using PumasUtilities
using PharmaDatasets
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using GLM: softmax
using SummaryTables
using PumasReports

2 Recognizing Serial Dependence

Before building a Markov model, it helps to understand what serially dependent data look like and why standard approaches fall short. The categorical outcomes above are examples of serially dependent finite states, which can often be well modeled by a Markov chain.

2.1 Signatures of Serial Dependence

Such data are characterized by three signatures (Ooi, Plan & Bergstrand, CPT:PSP 2025):

Frequent observations relative to changes — observations are collected more often than the outcome is expected to change. This leads to long runs of the same category.
Many consecutive same-category observations — at the individual level, you see strings of identical values (e.g., “mild, mild, mild, moderate, moderate, moderate”).
Positive correlation between current and previous values — a scatter plot of \(Y_t\) versus \(Y_{t-1}\) shows a clear positive trend.

2.2 Why Not Use Standard Models?

Standard categorical models (e.g., proportional odds, logistic regression) assume that observations are independent. When the data are serially dependent, ignoring this dependence can cause:

Inflated Type I error in hypothesis tests — you may falsely identify covariate effects (e.g., a placebo effect) that do not exist
Overestimated inter-individual variability (IIV) — the model absorbs the serial correlation into IIV
Unrealistic simulations — simulated data will show too many state changes, inflated extreme-value occurrences, and under-predicted time spent in the same state

A dedicated Markov model solves these problems by explicitly conditioning each observation on the previous state.

3 The Markov Chain: Core Concepts

3.1 States and Transitions

A discrete-time Markov chain is a stochastic process where a subject occupies one of \(K\) discrete states at each time point. The probability of being in any state at time \(t\) depends only on the state at time \(t - 1\) — not on any earlier history. This is called the first-order Markov property:

\[ P(Y_t = k_t \mid Y_0, Y_1, \ldots, Y_{t-1}) = P(Y_t = k_t \mid Y_{t-1}) \]

First-Order Means Short Memory, Not Small Steps

“First-order” refers to memory depth, not step size. It means the next state depends only on the current state — not on the full history of previous states. A subject can jump from any state to any other state in a single time step; the transition probabilities determine how likely each jump is. If you want to enforce that only adjacent transitions are possible (e.g., a patient cannot skip from “none” to “severe”), you would explicitly set those transition probabilities to zero by fixing the corresponding logits to \(-\infty\). That is a modeling choice, not a property of first-order Markov chains.

Higher-order Markov models exist (where the predicted value depends on two or more previous states), but they require many more parameters and their results are harder to communicate. In pharmacometric practice, first-order models are almost always sufficient.

3.2 The Transition Probability Matrix

The transition probabilities are collected in a transition probability matrix \(P\). For \(K\) states, \(P\) is a \(K \times K\) matrix where each entry \(P_{ij}\) is the probability of moving from source state \(i\) to destination state \(j\):

\[ P = \begin{bmatrix} P_{11} & P_{12} & \cdots & P_{1K} \\ P_{21} & P_{22} & \cdots & P_{2K} \\ \vdots & \vdots & \ddots & \vdots \\ P_{K1} & P_{K2} & \cdots & P_{KK} \end{bmatrix} \]

Each row represents a complete probability distribution: “Given that a subject is currently in state \(i\), what is the probability of transitioning to each possible state?” Because the subject must end up somewhere (including staying in the same state), each row sums to 1.

Let’s consider a concrete example with three disease states (no response, partial response, full response). A plausible transition matrix might look like:

From To	No Response	Partial	Full
No Response	0.85	0.12	0.03
Partial	0.05	0.80	0.15
Full	0.02	0.08	0.90

How to Read the Transition Matrix

Read by row — each row is a self-contained probability distribution for a subject currently in that state:

Row 1 (No Response): A patient currently in “No Response” has an 85% chance of staying there, a 12% chance of improving to “Partial”, and a 3% chance of jumping directly to “Full Response”. These three probabilities sum to 1.
Row 2 (Partial): A different patient currently in “Partial” has a 5% chance of worsening to “No Response”, an 80% chance of staying in “Partial”, and a 15% chance of improving to “Full Response”.

Do not read down columns. The 0.12 in Row 1 and the 0.05 in Row 2 of the “No Response” column are about different patients in different current states. Each row is independent — you pick the row that matches the subject’s current state, then read across to find the probabilities for their next state.

The strong diagonal dominance (high same-state probabilities) is typical of serially dependent data — subjects tend to stay in their current state from one observation to the next.

3.3 Absorbing States

An absorbing state is one that a subject can never leave. Common examples include dropout, death, or cure. Once reached, \(P(\text{absorbing} \mid \text{absorbing}) = 1\) and all other transitions from that state are zero.

3.4 Equidistant Time Assumption

A key assumption of the standard DTMM is that the effect of time since the last observation on transition probabilities is negligible. This works well when observations are equidistant (e.g., daily, weekly). When time intervals vary substantially, a continuous-time Markov model (CTMM) should be considered instead.

DTMM vs. CTMM: Quick Guide

Use a DTMM when:

Observations are at regular, equidistant intervals
You want a straightforward, easy-to-communicate model
The number of states is moderate (2–5)

Consider a CTMM when:

Time intervals between observations vary
You want to simulate at different time intervals than observed
Time-varying covariates change between observation times
The variable has many ordered categories (CTMM is more parsimonious)

4 Loading and Exploring the Dataset

Now let’s work through a complete example. We’ll use a clinical trial dataset that records disease states over time alongside PK information about drug exposure.

df = dataset("psptutorials/psp413278/markov-discrete-time")
first(df, 10)

10×10 DataFrame

Row	ID	TIME	AMT	II	ADDL	DV	EVID	TREA	ETA1	ETA2
	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Float64	Float64
1	1	0	0	0	0	1	0	0	-0.0754288	-0.761771
2	1	1	0	0	0	1	0	0	-0.0754288	-0.761771
3	1	2	0	0	0	1	0	0	-0.0754288	-0.761771
4	1	3	0	0	0	1	0	0	-0.0754288	-0.761771
5	1	4	0	0	0	1	0	0	-0.0754288	-0.761771
6	1	5	0	0	0	1	0	0	-0.0754288	-0.761771
7	1	6	0	0	0	1	0	0	-0.0754288	-0.761771
8	1	7	0	0	0	1	0	0	-0.0754288	-0.761771
9	1	8	0	0	0	1	0	0	-0.0754288	-0.761771
10	1	9	0	0	0	1	0	0	-0.0754288	-0.761771

overview_table(df)


No	Variable	Stats / Values	Freqs (% of Valid)	Graph	Valid	Missing

1	ID [Int64]	Mean (sd): 50.5 (28.9) min ≤ med ≤ max: 1 ≤ 51 ≤ 100 IQR (CV): 50 (0.571)	100 distinct values		36650 (100%)	0 (0%)
2	TIME [Int64]	Mean (sd): 182 (106) min ≤ med ≤ max: 0 ≤ 182 ≤ 365 IQR (CV): 183 (0.581)	366 distinct values		36650 (100%)	0 (0%)
3	AMT [Int64]	Mean (sd): 0.136 (3.69) min ≤ med ≤ max: 0 ≤ 0 ≤ 100 IQR (CV): 0 (27.1)	2 distinct values		36650 (100%)	0 (0%)
4	II [Int64]	Mean (sd): 0.00136 (0.0369) min ≤ med ≤ max: 0 ≤ 0 ≤ 1 IQR (CV): 0 (27.1)	2 distinct values		36650 (100%)	0 (0%)
5	ADDL [Int64]	Mean (sd): 0.497 (13.4) min ≤ med ≤ max: 0 ≤ 0 ≤ 364 IQR (CV): 0 (27.1)	2 distinct values		36650 (100%)	0 (0%)
6	DV [Int64]	Mean (sd): 1.83 (0.926) min ≤ med ≤ max: 0 ≤ 2 ≤ 4 IQR (CV): 2 (0.506)	5 distinct values		36650 (100%)	0 (0%)
7	EVID [Int64]	Mean (sd): 0.00136 (0.0369) min ≤ med ≤ max: 0 ≤ 0 ≤ 1 IQR (CV): 0 (27.1)	2 distinct values		36650 (100%)	0 (0%)
8	TREA [Int64]	Mean (sd): 0.501 (0.5) min ≤ med ≤ max: 0 ≤ 1 ≤ 1 IQR (CV): 1 (0.999)	2 distinct values		36650 (100%)	0 (0%)
9	ETA1 [Float64]	Mean (sd): -0.0409 (0.293) min ≤ med ≤ max: -0.742 ≤ -0.0754 ≤ 0.598 IQR (CV): 0.437 (-7.17)	100 distinct values		36650 (100%)	0 (0%)
10	ETA2 [Float64]	Mean (sd): 0.0188 (0.352) min ≤ med ≤ max: -0.762 ≤ 0.0396 ≤ 0.756 IQR (CV): 0.479 (18.7)	100 distinct values		36650 (100%)	0 (0%)

Dimensions: 36650 x 10 Duplicate rows: 0

The dataset contains:

ID: Subject identifier
TIME: Observation time
EVID: Event identifier (1 = dosing event, 0 = observation)
AMT, ADDL, II: Dosing information (amount, additional doses, dosing interval)
DV: Disease state observation (integer 1, 2, 3, or 4)
ETA1, ETA2: Pre-computed individual PK random effects

Pre-Computed Random Effects

This dataset includes ETA1 and ETA2 columns representing pre-computed individual PK random effects from a prior PK analysis. They are used as covariates to account for individual PK variability without estimating random effects on PK parameters within the Markov model. This sequential approach is common when the focus is on the transition model rather than PK.

4.1 Data Preparation

Before fitting, we need two data wrangling steps:

Add a CMT (compartment) column set to 1 — required by read_pumas
Set DV to missing for dosing events (EVID == 1) — dosing rows carry no disease state observation

@rtransform!(df, :CMT = 1);
@rtransform!(df, :DV = :EVID == 1 ? missing : :DV);

Missing Values in FiniteMarkov

FiniteMarkov does not support missing observations in the disease state sequence. The missing values we just created for dosing events are handled correctly by read_pumas (dosing events are not part of the observation vector). However, if your dataset has missing observations at non-dosing timepoints, you must impute them before fitting.

Now we read the data into a Pumas Population:

pop = read_pumas(
    df;
    id = :ID,
    time = :TIME,
    evid = :EVID,
    cmt = :CMT,
    amt = :AMT,
    addl = :ADDL,
    ii = :II,
    observations = [:DV],
    covariates = [:ETA1, :ETA2],
)

Population
  Subjects: 100
  Covariates: ETA1, ETA2 (heterogenous)
  Observations: DV (heterogenous)

4.2 Exploratory Data Analysis

Good EDA is essential before fitting a Markov model. Following the recommendations of Ooi et al. (2025), we examine three aspects:

The longitudinal profile of state values per subject
The distribution of states over time across the population
The correlation between current and previous states (Markov signature)

4.2.1 Longitudinal Profiles

Let’s first look at the raw observations.

obs_df = @chain df begin
    @rsubset !ismissing(:DV)
    @rtransform :State = Int(:DV)
    @rtransform :StateStr = string(Int(:DV))
end
simple_table(first(obs_df, 5))


ID	TIME	AMT	II	ADDL	DV	EVID	TREA	ETA1	ETA2	CMT	State	StateStr

1	0	0	0	0	1	0	0	-0.0754	-0.762	1	1	1
1	1	0	0	0	1	0	0	-0.0754	-0.762	1	1	1
1	2	0	0	0	1	0	0	-0.0754	-0.762	1	1	1
1	3	0	0	0	1	0	0	-0.0754	-0.762	1	1	1
1	4	0	0	0	1	0	0	-0.0754	-0.762	1	1	1

# Longitudinal state profiles for a few subjects
selected_ids = unique(obs_df.ID)[1:min(6, length(unique(obs_df.ID)))]
sub_df = @rsubset(obs_df, :ID in selected_ids)
@rtransform!(sub_df, :IDStr = string(:ID));
plt_long =
    data(sub_df) *
    mapping(:TIME => "Time", :State => "State"; layout = :IDStr => "Subject") *
    visual(Stairs; step = :post)
draw(plt_long; axis = (; yticks = 1:4))

Notice the long runs of the same state — this is the hallmark of serial dependence. Subjects tend to stay in their current state for multiple consecutive observations before transitioning.

4.2.2 Distribution of States Over Time

Next, let’s look at how the population distributes across states at each time point.

state_counts = @chain obs_df begin
    groupby([:TIME, :StateStr])
    @combine :Count = length(:ID)
end
plt_states =
    data(state_counts) *
    mapping(
        :TIME => "Time",
        :Count => "Number of Subjects";
        color = :StateStr => "State",
        layout = :StateStr,
    ) *
    visual(BarPlot; alpha = 0.7)
draw(plt_states)

4.2.3 Transition Frequency Table

One of the most informative diagnostics is a transition frequency table — a count of how often each pair of transitions occurs in the data. This directly reveals the Markov structure:

# Create current-vs-previous state pairs
obs_by_id = groupby(obs_df, :ID)
transitions = DataFrame(ID = Int[], PrevState = Int[], CurrState = Int[])
for group in obs_by_id
    states = group.State
    ids = group.ID
    for i = 2:length(states)
        push!(transitions, (ID = ids[i], PrevState = states[i-1], CurrState = states[i]))
    end
end
simple_table(first(transitions, 5))


ID	PrevState	CurrState

1	1	1
1	1	1
1	1	1
1	1	1
1	1	1

# Tally each (previous, current) transition into a K×K matrix. Cells for
# transitions that never occur stay at their initial zero.
K = 4
state_labels = ["State $i" for i = 1:K]
trans_counts = @chain transitions begin
    groupby([:PrevState, :CurrState])
    @combine :N = length(:ID)
end
trans_mat = zeros(Int, K, K)
for r in eachrow(trans_counts)
    trans_mat[r.PrevState, r.CurrState] = r.N
end
freq_table = DataFrame(trans_mat, state_labels)
insertcols!(freq_table, 1, "From \\ To" => state_labels)
simple_table(freq_table)


From \\ To	State 1	State 2	State 3	State 4

State 1	17432	231	2	5
State 2	183	8430	128	1
State 3	1	109	8502	1
State 4	0	0	0	1475

Reading the Transition Table

Each row represents the previous state and each column represents the current state. Look for:

Large diagonal values → strong Markov property (subjects tend to stay)
Asymmetry → preferred directions of transition (e.g., more 1→2 than 2→1 may indicate disease progression)
Zero or rare transitions → these can be fixed or constrained in the model

5 Parameterizing Transitions: The Logit Approach

The exploratory analysis gave us empirical transition probabilities. To estimate them in a model — and to let drug exposure shift them — we need a parameterization that keeps every probability valid. This section introduces the logit approach Pumas uses, and at the end we apply it to the data to derive initial estimates.

5.1 Why Not Estimate Probabilities Directly?

Transition probabilities must satisfy two constraints:

Each probability is between 0 and 1
Each row sums to 1

Pumas uses a logit parameterization that maps unconstrained real numbers to valid probabilities.

5.2 The Logistic Transformation

Consider a subject currently in source state \(i\). They can transition to any of \(K\) destination states. We need \(K\) probabilities that are each between 0 and 1 and sum to 1.

The trick is to estimate only \(K - 1\) unconstrained parameters and derive the \(K\)-th probability from the others. We pick one destination state as the reference category — by convention, the last state (state \(K\)). The \(K - 1\) estimated parameters \(\tau_{i1}, \ldots, \tau_{i(K-1)}\) are called logits and can take any value on the real line (\(-\infty\) to \(+\infty\)).

The logistic transformation converts these logits into probabilities via a shared denominator that guarantees the row sums to 1:

\[ D_i = 1 + \sum_{m=1}^{K-1} \exp(\tau_{im}) \]

where \(m\) ranges over the \(K - 1\) non-reference destination states. This denominator is the same for every destination state in the row.

Probability of transitioning to a non-reference state (\(j < K\)):

\[ P(i \to j) = \frac{\exp(\tau_{ij})}{D_i} \]

Each non-reference destination state \(j\) contributes \(\exp(\tau_{ij})\) in the numerator. A larger logit \(\tau_{ij}\) means a larger \(\exp(\tau_{ij})\) and therefore a higher probability of transitioning to state \(j\).

Probability of transitioning to the reference state (\(j = K\)):

\[ P(i \to K) = \frac{1}{D_i} \]

The reference state has no estimated logit parameter — its logit is implicitly fixed at zero, so \(\exp(0) = 1\) appears in the numerator. Its probability is whatever is “left over” after the other states claim their share.

5.3 A Worked Example

Let’s make this concrete. Suppose we have \(K = 3\) destination states, so we estimate \(K - 1 = 2\) logit parameters for this row: \(\tau_1 = 2.0\) and \(\tau_2 = -1.0\). State 3 is the reference (implicit logit = 0).

Step 1 — Compute the shared denominator:

\[ D = 1 + \exp(2.0) + \exp(-1.0) = 1 + 7.389 + 0.368 = 8.757 \]

The leading 1 comes from the reference state: \(\exp(0) = 1\).

Step 2 — Divide each \(\exp(\text{logit})\) by the denominator:

Destination	Logit	\(\exp(\text{logit})\)	Probability = \(\exp / D\)
State 1	2.0 (estimated)	7.389	7.389 / 8.757 = 0.844
State 2	-1.0 (estimated)	0.368	0.368 / 8.757 = 0.042
State 3	0 (reference, fixed)	1.000	1.000 / 8.757 = 0.114
		Total	1.000

The probabilities sum to 1, as required. And critically, any real values of \(\tau_1\) and \(\tau_2\) will produce valid probabilities — the optimizer is free to explore the entire real line without ever violating the constraints.

5.4 This Is the Softmax (Multinomial Logistic) Function

The transformation we just performed has a well-known name: softmax (also called the multinomial logistic function). You may already know the binary logistic function, which converts a single logit to a probability:

\[ \text{logistic}(\tau) = \frac{\exp(\tau)}{1 + \exp(\tau)} \]

The softmax is its generalization to \(K\) categories. Given a vector of logits \([\tau_1, \tau_2, \ldots, \tau_K]\), softmax returns a vector of probabilities:

\[ \text{softmax}(\boldsymbol{\tau})_j = \frac{\exp(\tau_j)}{\sum_{m=1}^{K} \exp(\tau_m)} \]

In our parameterization, we estimate \(K - 1\) logits and fix the reference state logit to 0. So the full logit vector is \([\tau_1, \ldots, \tau_{K-1}, 0]\), and softmax produces exactly the probabilities we computed by hand above.

Let’s verify with code:

# The same worked example: τ₁ = 2.0, τ₂ = -1.0, reference logit = 0
softmax([2.0, -1.0, 0.0])

3-element Vector{Float64}:
 0.8437947344813396
 0.042010066134066056
 0.1141951993845945

This matches our manual calculation (0.844, 0.042, 0.114).

Binary Logistic Is a Special Case

When \(K = 2\) (two states), the softmax with logits \([\tau, 0]\) reduces to:

\[ P(\text{State 1}) = \frac{\exp(\tau)}{\exp(\tau) + 1} = \text{logistic}(\tau) \]

So the binary logistic function is just softmax with two categories. The multinomial logistic (softmax) generalizes this to any number of categories.

5.5 Why This Parameterization Is Useful for Drug Effects

When a drug effect is additive on the logit scale, the model is:

\[ \tau_{ij}^{\text{drug}} = \tau_{ij} + \text{slope} \times \text{concentration} \]

A positive slope increases the logit for that transition, which increases the corresponding probability and proportionally decreases all other probabilities from that state. This is both mathematically clean and clinically interpretable.

5.6 Computing Initial Logit Estimates from the Data

Good initial values help the optimizer converge faster and avoid local minima. We can derive them directly from the observed transition frequencies.

Step 1 — Convert counts to row-wise probabilities. For each source state (row), divide each transition count by the row total:

# The probability table is the frequency matrix with each row scaled to sum to
# 1, so the zeros carry straight over — nothing to refill.
prob_mat = trans_mat ./ sum(trans_mat; dims = 2)
prob_table = DataFrame(prob_mat, state_labels)
insertcols!(prob_table, 1, "From \\ To" => state_labels)
simple_table(prob_table)


From \\ To	State 1	State 2	State 3	State 4

State 1	0.987	0.0131	0.000113	0.000283
State 2	0.0209	0.964	0.0146	0.000114
State 3	0.000116	0.0127	0.987	0.000116
State 4	0	0	0	1

Each row sums to 1 — these are the empirical transition probabilities.

Step 2 — Convert probabilities to logits. The logit for transitioning from state \(i\) to state \(j\) (relative to the reference state \(K\)) is:

\[ \tau_{ij} = \log\left(\frac{P_{ij}}{P_{iK}}\right) \]

In our 4-state model, state 4 is the reference category:

# Logits for the active source states (1:K-1), each relative to reference state K
init_logits = log.(prob_mat[1:(K-1), 1:(K-1)] ./ prob_mat[1:(K-1), K])

3×3 Matrix{Float64}:
 8.15662  3.83298  -0.916291
 5.20949  9.03955   4.85203
 0.0      4.69135   9.04806

Using These as Initial Values

These data-derived logits are baseline estimates (no drug effect). They make excellent starting values for the θ parameters in the model. The drug effect slopes (slope₁₂, slope₂₃) can be initialized at zero, since the baseline logits already capture the average transition pattern.

Note that state 4 is absorbing: its only observed transition is 4 → 4, so its probability row is all zeros except a 1 on the diagonal. That is why we compute baseline logits only for the active source states (1–3), and why the model fixes state 4’s row to all \(-\infty\) rather than estimating it.

6 Building the Markov Chain Model

Now that we’ve confirmed the serial dependence in the data, let’s build a model. Our model has three interconnected components:

PK sub-model: A one-compartment model that tracks drug concentration over time
Drug effect: Concentration-dependent shifts on specific transition probabilities
Markov chain: A 4-state discrete-time Markov model (3 active states + 1 absorbing)

6.1 The Full Model

markov_model = @model begin
    @param begin
        """
        Clearance (L/d)
        """
        θCL ∈ RealDomain(lower = 0)
        """
        Volume (L)
        """
        θV ∈ RealDomain(lower = 0)
        """
        Transition logits from State 1
        """
        θ₁₁ ∈ RealDomain()
        θ₁₂ ∈ RealDomain()
        θ₁₃ ∈ RealDomain()
        """
        Transition logits from State 2
        """
        θ₂₁ ∈ RealDomain()
        θ₂₂ ∈ RealDomain()
        θ₂₃ ∈ RealDomain()
        """
        Transition logits from State 3
        """
        θ₃₁ ∈ RealDomain()
        θ₃₂ ∈ RealDomain()
        θ₃₃ ∈ RealDomain()
        """
        Drug effect slope on 1→2 transition
        """
        slope₁₂ ∈ RealDomain()
        """
        Drug effect slope on 2→3 transition
        """
        slope₂₃ ∈ RealDomain()
    end

    @covariates ETA1 ETA2

    @pre begin
        CL = θCL * exp(ETA1)
        V = θV * exp(ETA2)
    end

    @dynamics begin
        Central' = -CL / V * Central
    end

    @vars begin
        conc = Central / V

        # Drug effects on transition probabilities (logit scale)
        drugep₁₂ = slope₁₂ * conc
        drugep₂₃ = slope₂₃ * conc

        # Transition logits with drug effect added
        θ₁₂de = θ₁₂ + drugep₁₂
        θ₂₃de = θ₂₃ + drugep₂₃
    end

    @derived begin
        # Time-varying transition matrices (logit parameterization)
        # Each matrix is k × (k-1) where k = 4 states
        # Row 4 is the absorbing state: -Inf means zero probability of leaving
        Pθ := [
            [
                θ₁₁ _θ₁₂de θ₁₃
                θ₂₁ θ₂₂ _θ₂₃de
                θ₃₁ θ₃₂ θ₃₃
                -Inf -Inf -Inf
            ] for (_θ₁₂de, _θ₂₃de) in zip(θ₁₂de, θ₂₃de)
        ]

        DV ~ FiniteMarkov(Pθ, [θ₁₁, θ₁₂, θ₁₃])
    end
end

PumasModel
  Parameters: θCL, θV, θ₁₁, θ₁₂, θ₁₃, θ₂₁, θ₂₂, θ₂₃, θ₃₁, θ₃₂, θ₃₃, slope₁₂, slope₂₃
  Random effects:
  Covariates: ETA1, ETA2
  Dynamical system variables: Central
  Dynamical system type: Matrix exponential
  Derived: DV, conc, drugep₁₂, drugep₂₃, θ₁₂de, θ₂₃de
  Observed: DV, conc, drugep₁₂, drugep₂₃, θ₁₂de, θ₂₃de

6.2 Step-by-Step Model Walkthrough

Let’s break this model down piece by piece.

6.2.1 The `@param` Block: What We’re Estimating

The model has 13 parameters:

Parameter	Role	Domain
`θCL`, `θV`	PK clearance and volume	Fixed (from prior PK analysis)
`θ₁₁`, `θ₁₂`, `θ₁₃`	Logits for transitions from State 1 to States 1, 2, 3	Unconstrained
`θ₂₁`, `θ₂₂`, `θ₂₃`	Logits for transitions from State 2 to States 1, 2, 3	Unconstrained
`θ₃₁`, `θ₃₂`, `θ₃₃`	Logits for transitions from State 3 to States 1, 2, 3	Unconstrained
`slope₁₂`	Drug effect on 1→2 transition	Unconstrained
`slope₂₃`	Drug effect on 2→3 transition	Unconstrained

The nine θ parameters define a \(3 \times 3\) baseline transition matrix (on the logit scale). Each row also has an implicit fourth entry (logit = 0) for State 4 (the absorbing state). So we actually model a \(4 \times 3\) logit matrix — 3 logits per row for 4 states.

Parameter Count

For \(K\) states, a DTMM needs \(K \times (K-1)\) transition logit parameters (one row per source state, \(K-1\) logits per row). With \(K = 4\): \(4 \times 3 = 12\) parameters. But the absorbing state row is all \(-\infty\) (fixed), so we estimate \(3 \times 3 = 9\) logits, plus 2 drug effect slopes = 11 free parameters.

6.2.2 The `@covariates` and `@pre` Blocks: Individual PK

ETA1 and ETA2 are pre-computed individual PK random effects that are read as covariates from the dataset. The @pre block uses these to compute individual clearance and volume:

CL = θCL * exp(ETA1)     # Individual clearance
V  = θV  * exp(ETA2)     # Individual volume

6.2.3 The `@dynamics` Block: PK Concentration Over Time

A standard one-compartment oral PK model tracks the amount of drug in the central compartment:

Central' = -CL / V * Central

The concentration at each time point is then conc = Central / V.

6.2.4 The `@vars` Block: Drug Effects on Transitions

This is where pharmacology meets the Markov model. The drug concentration modifies two specific transition logits:

drugep₁₂ = slope₁₂ * conc    # Drug effect on State 1 → State 2
drugep₂₃ = slope₂₃ * conc    # Drug effect on State 2 → State 3

θ₁₂de = θ₁₂ + drugep₁₂      # Modified logit
θ₂₃de = θ₂₃ + drugep₂₃      # Modified logit

The drug effect is additive on the logit scale and linear in concentration (a slope model). This is the most common parameterization for drug effects in DTMMs.

Choosing Which Transitions to Modify

Drug effects need not be placed on all transitions. The choice should be guided by:

Clinical hypothesis — which transitions does the drug affect?
Data support — rare transitions may not support drug effect estimation
Parsimony — each drug effect adds a parameter

In our model, the drug affects transitions toward worse states (1→2 and 2→3), which would be typical for a model where higher drug exposure exacerbates disease progression. Alternatively, in a beneficial drug model, you might place effects on transitions toward better states.

6.2.5 The `@derived` Block: Assembling the FiniteMarkov Distribution

The transition matrices are assembled as a vector of matrices, one per observation time. This is because the drug concentration (and therefore the transition probabilities) changes over time:

Pθ := [
    [
        θ₁₁ _θ₁₂de θ₁₃        # Row 1: transitions from State 1
        θ₂₁ θ₂₂ _θ₂₃de      # Row 2: transitions from State 2
        θ₃₁ θ₃₂ θ₃₃         # Row 3: transitions from State 3
        -Inf -Inf -Inf          # Row 4: absorbing state (can't leave)
    ] for (_θ₁₂de, _θ₂₃de) in zip(θ₁₂de, θ₂₃de)
]

Each matrix in Pθ is \(4 \times 3\) — 4 rows (one per source state) and 3 columns (logits for transition to States 1, 2, and 3). The probability of transitioning to State 4 (the absorbing state) is implicit (the reference category).

The FiniteMarkov call:

DV ~ FiniteMarkov(Pθ, [θ₁₁, θ₁₂, θ₁₃])

takes two arguments:

Pθ: The vector of transition matrices (one per observation time)
[θ₁₁, θ₁₂, θ₁₃]: The initial state distribution (logits for States 1, 2, 3; State 4 logit is implicitly 0)

Starting the Markov Chain

The initial state vector [θ₁₁, θ₁₂, θ₁₃] determines the probability of each state at the first observation. The first transition matrix in Pθ is not referenced — only the initial distribution is used for time 1. From time 2 onwards, each observation uses the corresponding transition matrix from Pθ.

6.3 The Transition Matrix Laid Out

To make the model concrete, here is the matrix of logits that the model assembles at a given time. This is exactly what goes into Pθ, with the implicit reference column for State 4 (logit \(= 0\)) shown for completeness:

\[ \begin{array}{c|cccc} \text{From} \,\backslash\, \text{To} & \text{State 1} & \text{State 2} & \text{State 3} & \text{State 4 (ref)} \\ \hline \text{State 1} & \theta_{11} & \theta_{12} + \text{slope}_{12}\, c & \theta_{13} & 0 \\ \text{State 2} & \theta_{21} & \theta_{22} & \theta_{23} + \text{slope}_{23}\, c & 0 \\ \text{State 3} & \theta_{31} & \theta_{32} & \theta_{33} & 0 \\ \text{State 4} & -\infty & -\infty & -\infty & 0 \end{array} \]

where \(c\) is the drug concentration at that time. The transition probabilities are obtained by applying the softmax to each row of this logit matrix. For source state \(i\), the probability of moving to state \(j\) is:

\[ P_{ij} = \frac{\exp(\tau_{ij})}{\sum_{l=1}^{4} \exp(\tau_{il})}, \]

where \(\tau_{ij}\) is the logit in row \(i\), column \(j\) of the array above (and \(\tau_{i4} = 0\) for every active row). For the absorbing State 4, the \(-\infty\) logits give \(P_{4j} = 0\) for \(j \neq 4\) and \(P_{44} = 1\).

The two drug effects (slope₁₂ and slope₂₃) cause the transition probabilities to be time-varying — they shift as drug concentration rises and falls after each dose.

7 Fitting the Model

We fit the model using NaivePooled estimation. Since individual PK variability is already handled via the ETA1/ETA2 covariates, no additional random effects are needed. We hold the PK parameters constant at their known values.

The initial parameter estimates combine the PK values from a prior analysis with the data-derived logits we computed earlier:

params = (
    θCL = 4.8,
    θV = 10.0,
    θ₁₁ = init_logits[1, 1],
    θ₁₂ = init_logits[1, 2],
    θ₁₃ = init_logits[1, 3],
    θ₂₁ = init_logits[2, 1],
    θ₂₂ = init_logits[2, 2],
    θ₂₃ = init_logits[2, 3],
    θ₃₁ = init_logits[3, 1],
    θ₃₂ = init_logits[3, 2],
    θ₃₃ = init_logits[3, 3],
    slope₁₂ = 0.0,
    slope₂₃ = 0.0,
)

(θCL = 4.8, θV = 10.0, θ₁₁ = 8.156624964190355, θ₁₂ = 3.832979798087693, θ₁₃ = -0.916290731874155, θ₂₁ = 5.209486152841421, θ₂₂ = 9.039552050995901, θ₂₃ = 4.852030263919617, θ₃₁ = 0.0, θ₃₂ = 4.6913478822291435, θ₃₃ = 9.048056708918736, slope₁₂ = 0.0, slope₂₃ = 0.0)

fpm = fit(
    markov_model,
    pop,
    params,
    NaivePooled();
    constantcoef = (:θCL, :θV),
    optim_options = (; show_trace = true, show_every = 20),
)

[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     3.467011e+03     1.628651e+03
 * time: 0.03258514404296875
    20     3.419425e+03     1.886967e+01
 * time: 11.487014055252075
    40     3.419316e+03     4.406543e-02
 * time: 18.467947006225586

FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                            100

Observation records:         Active        Missing
    DV:                       36600              0
    Total:                    36600              0

Number of parameters:      Constant      Optimized
                                  2             11

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -3419.3066

----------------------
            Estimate
----------------------
† θCL        4.8
† θV        10.0
  θ₁₁        8.1627
  θ₁₂        3.4056
  θ₁₃       -0.91548
  θ₂₁        5.2082
  θ₂₂        9.0382
  θ₂₃        4.1391
  θ₃₁       -6.37e-13
  θ₃₂        4.6913
  θ₃₃        9.0481
  slope₁₂    0.037161
  slope₂₃    0.03561
----------------------
† indicates constant parameters

Why NaivePooled?

NaivePooled treats each subject’s observation sequence as independent conditional on the model parameters and their covariates. The Markov structure already captures within-subject dependence through the conditional transition probabilities. This is the standard and most common estimation approach for DTMMs.

If you wanted to estimate IIV on the transition parameters, you would need LaplaceI instead. Estimation of Markov models with random effects can sometimes be difficult, and this tutorial follows the published tutorial of Ooi et al. (2025) in not estimating with random effects.

Setting Initial Values

The transition logit parameters default to 0 (equal probability for all states). If you have prior estimates (e.g., from observed transition frequencies), use them as initial values to speed up convergence. You can compute initial logits from observed transition probabilities as: \(\tau_{ij} = \log(P_{ij} / P_{iK})\) where \(P_{iK}\) is the probability for the reference state.

7.1 Examining the Estimates

simple_table(coeftable(fpm))


parameter	constant	estimate

θCL	true	4.8
θV	true	10
θ₁₁	false	8.16
θ₁₂	false	3.41
θ₁₃	false	-0.915
θ₂₁	false	5.21
θ₂₂	false	9.04
θ₂₃	false	4.14
θ₃₁	false	-6.37e-13
θ₃₂	false	4.69
θ₃₃	false	9.05
slope₁₂	false	0.0372
slope₂₃	false	0.0356

The logit-scale parameters are not directly interpretable. Let’s convert them to probabilities.

8 Interpreting the Results

8.1 Converting Logits to Transition Probabilities

We recover the actual transition probabilities at baseline (zero drug concentration) by assembling the logit matrix and applying the softmax to each row:

pars = coef(fpm)

# Baseline 4×4 logit matrix (zero drug concentration).
# The last column is the reference state (logit 0); the last row is the
# absorbing state (−Inf logits, so it can never leave State 4).
logit_matrix = [
    pars.θ₁₁ pars.θ₁₂ pars.θ₁₃ 0.0
    pars.θ₂₁ pars.θ₂₂ pars.θ₂₃ 0.0
    pars.θ₃₁ pars.θ₃₂ pars.θ₃₃ 0.0
    -Inf -Inf -Inf 0.0
]

# Convert each row of logits to transition probabilities with the softmax
P = mapslices(softmax, logit_matrix; dims = 2)

4×4 Matrix{Float64}:
 0.99109      0.00851461  0.000113116  0.00028256
 0.0210906    0.971553    0.00724125   0.0001154
 0.000116104  0.0126553   0.987113     0.000116104
 0.0          0.0         0.0          1.0

The baseline transition matrix can be displayed as a formatted table:

labels = ["State 1", "State 2", "State 3", "State 4"]
P_table = DataFrame(round.(P; digits = 6), labels)
insertcols!(P_table, 1, "From \\ To" => labels)
simple_table(P_table)


From \\ To	State 1	State 2	State 3	State 4

State 1	0.991	0.00852	0.000113	0.000283
State 2	0.0211	0.972	0.00724	0.000115
State 3	0.000116	0.0127	0.987	0.000116
State 4	0	0	0	1

What to Look For

Large diagonal values indicate strong Markov behavior (subjects tend to stay in their current state)
The magnitude of State 4 probabilities shows how quickly subjects drop out from each state
Asymmetry between \(P_{12}\) and \(P_{21}\) reveals preferred directions of transitions

8.2 State Occupation Probabilities with `probstable`

Pumas provides the probstable function to compute how the probability of occupying each state evolves over time, accounting for the full sequence of time-varying transition matrices. This is one of the most useful diagnostic tools for Markov models:

pt = probstable(fpm)
simple_table(first(pt, 10))


id	time	DV_iprob1	DV_iprob2	DV_iprob3	DV_iprob4

1	0	0.991	0.00851	0.000113	0.000283
1	1	0.982	0.0167	0.000285	0.000564
1	2	0.974	0.0246	0.000514	0.000843
1	3	0.966	0.0322	0.000796	0.00112
1	4	0.958	0.0395	0.00113	0.0014
1	5	0.95	0.0466	0.00151	0.00167
1	6	0.943	0.0534	0.00193	0.00195
1	7	0.935	0.0599	0.0024	0.00222
1	8	0.928	0.0662	0.00291	0.00249
1	9	0.922	0.0722	0.00346	0.00276

The columns DV_iprob1 through DV_iprob4 give the probability of being in each state at each time point. These always sum to 1:

# Verify probabilities sum to 1
prob_cols = select(pt, :DV_iprob1, :DV_iprob2, :DV_iprob3, :DV_iprob4)
all_sums = [sum(row) for row in eachrow(prob_cols)]
println("All probability sums ≈ 1.0: ", all(x -> isapprox(x, 1.0), all_sums))

All probability sums ≈ 1.0: true

Let’s visualize the state occupation probabilities for a representative subject:

pt_sub1 = @rsubset(pt, :id == first(pt.id))
pt_long = stack(
    pt_sub1,
    [:DV_iprob1, :DV_iprob2, :DV_iprob3, :DV_iprob4];
    variable_name = :State,
    value_name = :Probability,
)
@rtransform!(
    pt_long,
    :State = replace(
        string(:State),
        "DV_iprob1" => "State 1",
        "DV_iprob2" => "State 2",
        "DV_iprob3" => "State 3",
        "DV_iprob4" => "State 4 (Absorbing)",
    )
);
plt_prob =
    data(pt_long) *
    mapping(:time => "Time", :Probability => "State Probability"; color = :State) *
    visual(Lines; linewidth = 2)
draw(
    plt_prob;
    axis = (; title = "State Occupation Probabilities — Subject $(first(pt.id))",),
    legend = (position = :bottom, title = "State", titlealignment = :left),
)

Interpreting State Occupation Probabilities

These probabilities show the model’s prediction of the chance of being in each state at each time. As time progresses:

The absorbing state probability typically increases (more subjects have dropped out)
Active state probabilities may shift as drug concentrations change
The rate of change reflects both the baseline transition rates and the drug effect

8.3 Understanding the Drug Effect

The drug effects in our model operate on the logit scale. Let’s visualize how drug concentration changes the transition probabilities.

A positive slope₁₂ means that higher drug concentration increases the logit for the 1→2 transition, shifting probability toward State 2. Let’s compute this across a range of concentrations:

concentrations = 0.0:0.5:10.0

probs_12 = map(concentrations) do c
    softmax([pars.θ₁₁, pars.θ₁₂ + pars.slope₁₂ * c, pars.θ₁₃, 0.0])[2]
end

probs_23 = map(concentrations) do c
    softmax([pars.θ₂₁, pars.θ₂₂, pars.θ₂₃ + pars.slope₂₃ * c, 0.0])[3]
end

drug_df = DataFrame(
    Concentration = repeat(collect(concentrations), 2),
    Probability = vcat(probs_12, probs_23),
    Transition = vcat(
        fill("P(State 1 → State 2)", length(concentrations)),
        fill("P(State 2 → State 3)", length(concentrations)),
    ),
)
plt_drug =
    data(drug_df) *
    mapping(
        :Concentration => "Drug Concentration",
        :Probability => "Transition Probability";
        color = :Transition,
    ) *
    visual(Lines; linewidth = 2)
draw(
    plt_drug;
    axis = (; title = "Drug Effect on Transition Probabilities",),
    legend = (position = :bottom, title = "Transition", titlealignment = :left),
)

Drug Effect Interpretation

Because the drug effect is on the logit scale, the relationship between concentration and probability is sigmoidal (S-shaped), not linear. At low concentrations, the effect is approximately linear. At high concentrations, the probability saturates (it can never exceed 1). This built-in saturation is a natural advantage of the logit parameterization.

9 Inference and Model Evaluation

9.1 Parameter Uncertainty

FiniteMarkov models fitted with NaivePooled support standard error computation:

fpmi = infer(fpm)

[ Info: Calculating: variance-covariance matrix.
[ Info: Done.

Asymptotic inference results using sandwich estimator

Dynamical system type:          Matrix exponential

Number of subjects:                            100

Observation records:         Active        Missing
    DV:                       36600              0
    Total:                    36600              0

Number of parameters:      Constant      Optimized
                                  2             11

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -3419.3066

------------------------------------------------------------
            Estimate    SE          95.0% C.I.
------------------------------------------------------------
† θCL        4.8        NaN         [  NaN     ;  NaN     ]
† θV        10.0        NaN         [  NaN     ;  NaN     ]
  θ₁₁        8.1627     0.45064     [  7.2794  ;  9.0459  ]
  θ₁₂        3.4056     0.46148     [  2.5012  ;  4.3101  ]
  θ₁₃       -0.91548    0.83588     [ -2.5538  ;  0.72281 ]
  θ₂₁        5.2082     0.99104     [  3.2658  ;  7.1506  ]
  θ₂₂        9.0382     0.99691     [  7.0843  ; 10.992   ]
  θ₂₃        4.1391     1.0318      [  2.1169  ;  6.1614  ]
  θ₃₁       -6.37e-13   1.4186      [ -2.7805  ;  2.7805  ]
  θ₃₂        4.6913     1.0175      [  2.6971  ;  6.6856  ]
  θ₃₃        9.0481     0.99835     [  7.0913  ; 11.005   ]
  slope₁₂    0.037161   0.0044194   [  0.028499;  0.045823]
  slope₂₃    0.03561    0.0061912   [  0.023475;  0.047744]
------------------------------------------------------------
† indicates constant parameters

This provides standard errors, confidence intervals, and p-values for all estimated parameters (on the logit scale).

Inference on the Logit Scale

The confidence intervals from infer are on the logit scale. While you can interpret whether individual logit parameters are significantly different from zero, translating these to the probability scale requires care. The logistic is a multivariate transformation — confidence intervals on individual logits do not directly translate to confidence intervals on probabilities. For probability-scale uncertainty, simulation-based approaches (e.g., bootstrap) are recommended.

9.2 Log-Likelihood

The model log-likelihood can be used for model comparison:

ll = loglikelihood(fpm)
println("Log-likelihood: ", ll)
println("-2LL: ", -2 * ll)

Log-likelihood: -3419.3065604734097
-2LL: 6838.613120946819

The \(-2 \text{LL}\) value can be compared between nested models using the likelihood ratio test (the difference is approximately \(\chi^2\) distributed with degrees of freedom equal to the number of extra parameters).

9.3 Simulation and Model Checking

Markov models support simulation via simobs. This is the foundation for model evaluation: if the model is adequate, simulated data should reproduce the key features of the observed data.

sims = simobs(markov_model, pop, coef(fpm))

Simulated population (Vector{<:Subject})
  Simulated subjects: 100
  Simulated variables: DV, conc, drugep₁₂, drugep₂₃, θ₁₂de, θ₂₃de

9.4 Simulation Roundtripping

We can verify model identifiability by simulating data from the fitted model and re-estimating the parameters. If the model is well-identified, the re-estimated parameters should be close to the original:

sim_pop = Subject.(sims)
refit_fpm = fit(
    markov_model,
    sim_pop,
    coef(fpm),
    NaivePooled();
    constantcoef = (:θCL, :θV),
    optim_options = (; show_trace = true, show_every = 20),
)

[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     3.549067e+03     2.117447e+02
 * time: 1.2159347534179688e-5
    20     3.546349e+03     5.177422e-01
 * time: 11.075675964355469
    40     3.546262e+03     5.269960e-03
 * time: 18.13558316230774

FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                            100

Observation records:         Active        Missing
    DV:                       36600              0
    Total:                    36600              0

Number of parameters:      Constant      Optimized
                                  2             11

Likelihood approximation:              NaivePooled
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -3546.2616

----------------------
            Estimate
----------------------
† θCL        4.8
† θV        10.0
  θ₁₁        9.0637
  θ₁₂        4.3274
  θ₁₃        0.69333
  θ₂₁        5.2207
  θ₂₂        9.0802
  θ₂₃        4.2675
  θ₃₁       -0.69295
  θ₃₂        4.0344
  θ₃₃        8.3976
  slope₁₂    0.037454
  slope₂₃    0.035061
----------------------
† indicates constant parameters

# Compare original and re-estimated parameters
simple_table(compare_estimates(; original = fpm, refit = refit_fpm))


parameter	original	refit

θCL	4.8	4.8
θV	10	10
θ₁₁	8.16	9.06
θ₁₂	3.41	4.33
θ₁₃	-0.915	0.693
θ₂₁	5.21	5.22
θ₂₂	9.04	9.08
θ₂₃	4.14	4.27
θ₃₁	-6.37e-13	-0.693
θ₃₂	4.69	4.03
θ₃₃	9.05	8.4
slope₁₂	0.0372	0.0375
slope₂₃	0.0356	0.0351

10 References

Ooi QX, Plan E, Bergstrand M. A tutorial on pharmacometric Markov models. CPT: Pharmacometrics & Systems Pharmacology. 2025;14(2):197-216.
Girard P, Blaschke TF, Kastrissios H, Sheiner LB. A Markov mixed effect regression model for drug compliance. Statistics in Medicine. 1998;17(20):2313-2333.
Karlsson MO, Schoemaker RC, Kemp B, et al. A pharmacodynamic Markov mixed-effects model for the effect of temazepam on sleep. Clinical Pharmacology & Therapeutics. 2000;68(2):175-188.
Lacroix BD, Lovern MR, Stockis A, et al. A pharmacodynamic Markov mixed-effects model for determining the effect of exposure to certolizumab pegol on the ACR20 score in patients with rheumatoid arthritis. Clinical Pharmacology & Therapeutics. 2009;86(4):387-395.

1 Introduction

2 Recognizing Serial Dependence

2.1 Signatures of Serial Dependence

2.2 Why Not Use Standard Models?

3 The Markov Chain: Core Concepts

3.1 States and Transitions

3.2 The Transition Probability Matrix

3.3 Absorbing States

3.4 Equidistant Time Assumption

4 Loading and Exploring the Dataset

4.1 Data Preparation

4.2 Exploratory Data Analysis

4.2.1 Longitudinal Profiles

4.2.2 Distribution of States Over Time

4.2.3 Transition Frequency Table

5 Parameterizing Transitions: The Logit Approach

5.1 Why Not Estimate Probabilities Directly?

5.2 The Logistic Transformation

5.3 A Worked Example

5.4 This Is the Softmax (Multinomial Logistic) Function

5.5 Why This Parameterization Is Useful for Drug Effects

5.6 Computing Initial Logit Estimates from the Data

6 Building the Markov Chain Model

6.1 The Full Model

6.2 Step-by-Step Model Walkthrough

6.2.1 The @param Block: What We’re Estimating

6.2.2 The @covariates and @pre Blocks: Individual PK

6.2.3 The @dynamics Block: PK Concentration Over Time

6.2.4 The @vars Block: Drug Effects on Transitions

6.2.5 The @derived Block: Assembling the FiniteMarkov Distribution

6.3 The Transition Matrix Laid Out

7 Fitting the Model

7.1 Examining the Estimates

8 Interpreting the Results

8.1 Converting Logits to Transition Probabilities

8.2 State Occupation Probabilities with probstable

8.3 Understanding the Drug Effect

9 Inference and Model Evaluation

9.1 Parameter Uncertainty

9.2 Log-Likelihood

9.3 Simulation and Model Checking

9.4 Simulation Roundtripping

10 References

Reuse

6.2.1 The `@param` Block: What We’re Estimating

6.2.2 The `@covariates` and `@pre` Blocks: Individual PK

6.2.3 The `@dynamics` Block: PK Concentration Over Time

6.2.4 The `@vars` Block: Drug Effects on Transitions

6.2.5 The `@derived` Block: Assembling the FiniteMarkov Distribution

8.2 State Occupation Probabilities with `probstable`