# Read in as a Pumas dataset
= read_pumas(examp_df; observations = [:pkconc]) examp_df_pumas
Module 9: Covariate Modeling
1 Module Introduction and Objectives
The target audience are pharmacometricians with experience in NONMEM or other equivalent non-linear modeling software commonly used in pharmacometrics. The Module builds on concepts described in Learning Path 2: Introduction to Pumas and other Modules pertaining to Learning Path 3.
1.1 Objectives
The objectives of Module 9: Covariate Modeling are:
- Re-enforce how covariates need to be represented in a Pumas dataset through
read_pumas
- Demonstrate how covariates are read into the
PumasModel
using@covariates
- Explain special considerations required for time-dependent covariates
- Present examples for coding continuous and categorical covariates proportional to the population-typical value
- Explain the concept of nested models
- Explain the concept of the likelihood ratio test and how to interpret the results when comparing 2 nested models
- Provide the Pumas code for performing the likelihood ratio test on 2
FittedPumasModels
Covariate modeling workflows (including stepwise covariate modeling) will be addressed in later Modules. However, additional information is available at tutorials.pumas.ai.
In this Module, it is already assumed that a base model has been fitted (mod_fit
) to the warfarin concentrations over time where the analysis dataset was named, examp_df_pumas
, and the dependent variable was named, pkconc
.
The model is a 1-compartment model with linear elimination and first-order absorption, log-normally distributed inter-individual variability on clearance (CL) and volume of distribution of the central compartment (VC), and a proportional residual error model.
# Define the Pumas model
= @model begin
mod_code @param begin
# Definition of fixed effect parameters
∈ RealDomain(; lower = 0.0)
θCL ∈ RealDomain(; lower = 0.0)
θVC ∈ RealDomain(; lower = 0.0)
θKA # Random effect parameters
# Variance-covariance matrix for inter-individual variability
∈ PSDDomain(2)
Ω # Residual unexplained variability
∈ RealDomain(; lower = 0.0)
σpro end
@random begin
# Sampling random effect parameters
~ MvNormal(Ω)
η end
@pre begin
# Derived variables
# Covariates
# None
# Individual PK parameters
= θCL * exp(η[1])
CL = θVC * exp(η[2])
VC = θKA
KA end
@init begin
# Define initial conditions
= 0.0
Depot = 0.0
Central end
@vars begin
# Concentrations in compartments
:= Central / VC
centc end
@dynamics begin
# Differential equations
' = -KA * Depot
Depot' = KA * Depot - CL * centc
Centralend
@derived begin
# Definition of derived variables
# Individual-predicted concentration
:= @.(Central / VC)
ipre # Dependent variable
"""
Warfarin Concentration (mg/L)
"""
~ @.Normal(ipre, sqrt((ipre * σpro)^2))
pkconc end
end
# Define the initial estimates
= (θCL = 1, θVC = 10, θKA = 1, Ω = [
init_params 0.09 0.01
0.01 0.09
= 0.3) ], σpro
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -433.53018
Number of subjects: 31
Number of parameters: Fixed Optimized
0 7
Observation records: Active Missing
pkconc: 240 47
Total: 240 47
-------------------
Estimate
-------------------
θCL 0.13199
θVC 8.1121
θKA 0.72068
Ω₁,₁ 0.052772
Ω₂,₁ 0.024541
Ω₂,₂ 0.036845
σpro 0.22063
-------------------
2 Defining Covariates
Variables intended to be evaluated as covariates in Pumas require definition from both the analysis dataset and the Pumas model perspectives.
2.1 Analysis Dataset Specifications
When the input Population
is constructed using the read_pumas
function, covariate variables need to be defined by passing a vector of column names to the keyword argument, covariates
.
In our example, the covariate variables are weight, age, and sex. Additionally, these variables are considered to be subject-level where each individual has a single value for all dosing and observation records.
# Read in as a Pumas dataset
=
examp_df_pumas read_pumas(examp_df; observations = [:pkconc], covariates = [:WEIGHT, :AGE, :SEX])
id | time | evid | pkconc | amt | cmt | rate | duration | ss | ii | route | WEIGHT | AGE | SEX | tad | dosenum |
1 | 0 | 0 | missing | 0 | missing | 0 | 0 | 0 | 0 | NCA.NullRoute | 66.7 | 50 | Male | 0 | 1 |
1 | 0 | 1 | missing | 100 | 1 | 0 | 0 | 0 | 0 | NCA.NullRoute | 66.7 | 50 | Male | 0 | 1 |
1 | 24 | 0 | 9.2 | 0 | missing | 0 | 0 | 0 | 0 | NCA.NullRoute | 66.7 | 50 | Male | 24 | 1 |
1 | 36 | 0 | 8.5 | 0 | missing | 0 | 0 | 0 | 0 | NCA.NullRoute | 66.7 | 50 | Male | 36 | 1 |
1 | 48 | 0 | 6.4 | 0 | missing | 0 | 0 | 0 | 0 | NCA.NullRoute | 66.7 | 50 | Male | 48 | 1 |
1 | 72 | 0 | 4.8 | 0 | missing | 0 | 0 | 0 | 0 | NCA.NullRoute | 66.7 | 50 | Male | 72 | 1 |
1 | 96 | 0 | 3.1 | 0 | missing | 0 | 0 | 0 | 0 | NCA.NullRoute | 66.7 | 50 | Male | 96 | 1 |
1 | 120 | 0 | 2.5 | 0 | missing | 0 | 0 | 0 | 0 | NCA.NullRoute | 66.7 | 50 | Male | 120 | 1 |
Pumas can handle categorical variables with their descriptive labels. There is no need to convert numerical formats for categorical variables.
2.1.1 Time-Dependent Covariates
Additional considerations are required for time-dependent covariates:
- Values of time-dependent covariates do not need to be time-matched with existing dosing and dependent variable observation records. Consider the following DataFrame where some
dv
values aremissing
at times whenweight
has values, and vice versa:
id | time | evid | dv | mdv | weight |
1 | 0 | 0 | missing | 1 | 70 |
1 | 1 | 0 | 0.599 | 0 | missing |
1 | 2 | 0 | 0.652 | 0 | missing |
1 | 3 | 0 | 0.955 | 0 | missing |
1 | 4 | 0 | 0.0654 | 0 | missing |
1 | 5 | 0 | missing | 1 | 80 |
1 | 6 | 0 | 0.895 | 0 | 85 |
1 | 7 | 0 | 0.259 | 0 | missing |
1 | 8 | 0 | 0.698 | 0 | 70 |
- Missing values for time-dependent covariates are interpolated when the
Population
is constructed viaread_pumas
. An option of:left
(last observation carried forward, default) or:right
(next observation carried backward) is passed to the keyword argument,covariates_direction
:
# Convert to a Pumas Population
= read_pumas(
timecov_pumas_locf
timecov_df;= [:dv],
observations = :mdv,
mdv = [:weight],
covariates = :left,
covariates_direction = false,
event_data )
id | time | evid | dv | weight |
1 | 0 | 0 | missing | 70 |
1 | 1 | 0 | 0.599 | 70 |
1 | 2 | 0 | 0.652 | 70 |
1 | 3 | 0 | 0.955 | 70 |
1 | 4 | 0 | 0.0654 | 70 |
1 | 5 | 0 | missing | 80 |
1 | 6 | 0 | 0.895 | 85 |
1 | 7 | 0 | 0.259 | 85 |
1 | 8 | 0 | 0.698 | 70 |
# Convert to a Pumas Population
= read_pumas(
timecov_pumas_nocb
timecov_df;= [:dv],
observations = :mdv,
mdv = [:weight],
covariates = :right,
covariates_direction = false,
event_data )
id | time | evid | dv | weight |
1 | 0 | 0 | missing | 70 |
1 | 1 | 0 | 0.599 | 80 |
1 | 2 | 0 | 0.652 | 80 |
1 | 3 | 0 | 0.955 | 80 |
1 | 4 | 0 | 0.0654 | 80 |
1 | 5 | 0 | missing | 80 |
1 | 6 | 0 | 0.895 | 85 |
1 | 7 | 0 | 0.259 | 70 |
1 | 8 | 0 | 0.698 | 70 |
2.1.2 Missing Covariate Values
Missing values for covariates are not handled by the PumasModel
. As shown in Section 2.1.1, missing values are interpolated during construction of the Population
with read_pumas
and use last observation carried forward or next observation carried backward methods.
If it is intended that missing
values for covariate variables are to be imputed with the median or mode for the individual or the analysis population then this needs to be handled at the analysis dataset construction stage (i.e., prior to constructing the Population
with read_pumas
).
2.2 Model Specification
From the PumasModel
perspective, covariate variables need to be specified in the @covariates
block. Quote blocks can be used to assign labels to the covariates which can be leveraged for axis labels in subsequent plots based on the FittedPumasModel
.
Note: Not every variable defined as a covariate in the Population
needs to be defined in @covariates
as there are other utilities beyond the PumasModel
that may use the population’s covariates. However, every variable used as a covariate in the model needs to be defined in both @covariates
in the PumasModel
and passed to the covariates
keyword argument of read_pumas
.
# Defining the covariates block
@covariates begin
"""
Body Weight (kg)
"""
WEIGHT"""
Age (years)
"""
AGE"""
SEX
"""
Sexend
The definition of fixed effect parameters quantifying the effect of a covariate on a typical parameter and the definition of covariate model equations are described in Section 3.
3 Covariate Model Parameterization
Covariate models typically require:
A model structure describing the relationship between a covariate value and the population-typical parameter
- This can be defined in the
@pre
block for covariate effects on parameters associated with rate or magnitude of changes in responses (i.e., clearance and central volume of distribution for PK models or baseline, maximum drug effect, half-maximal concentration of effect for PK/PD models) - This can also be defined in the
@dosecontrol
block for covariate effects on parameters associated modifications to the dosing input such as bioavailability, input rate or duration, or lags
- This can be defined in the
An estimated fixed effect parameter that quantifies the magnitude and direction of the relationship
- This is defined in the
@param
block
- This is defined in the
3.1 Categorical Covariates
For the examples outlined below, it should be noted that Pumas does not check if the covariate category defined as part of the covariate model is a value that is available for the corresponding variable in the source data. Ensure that you check the spelling of defined categories in your model or for safety, implement error messages when none of the defined conditions are met (as shown in the examples below).
Consider the following form for the effect of female sex on clearance:
\text{COVSEXCL} = \begin{cases} 1 & \text{if SEX = Male}\\ 1 + \theta_{SEXCL} & \text{if SEX = Female} \end{cases} CL_{i} = \theta_{CL}\cdot\text{COVSEXCL}
Categorical covariates can be defined using either an if-else
statement or the ternary operator
:
# Define the Pumas model
= @model begin
mod_code_sex @param begin
# Definition of fixed effect parameters
∈ RealDomain(; lower = 0.0)
θCL ∈ RealDomain(; lower = 0.0)
θVC ∈ RealDomain(; lower = 0.0)
θKA ∈ RealDomain(; lower = -0.999)
θSEXCLF # Random effect parameters
# Variance-covariance matrix for inter-individual variability
∈ PSDDomain(2)
Ω # Residual unexplained variability
∈ RealDomain(; lower = 0.0)
σpro end
@random begin
# Sampling random effect parameters
~ MvNormal(Ω)
η end
@covariates begin
WEIGHT
AGE
SEXend
@pre begin
# Derived variables
# Covariates
# Effect of female sex on CL
= if SEX == "Female"
COVSEXCL # For SEX == "Female", estimate the effect
1 + θSEXCLF
elseif SEX == "Male"
# For SEX == "Male", set to the reference value
1
else
# Return error message if specified conditions are not met
error("Expected SEX to be either \"Female\" or \"Male\" but the value was: $SEX")
end
# Individual PK parameters
= θCL * exp(η[1]) * COVSEXCL
CL = θVC * exp(η[2])
VC = θKA
KA end
@init begin
# Define initial conditions
= 0.0
Depot = 0.0
Central end
@vars begin
# Concentrations in compartments
:= Central / VC
centc end
@dynamics begin
# Differential equations
' = -KA * Depot
Depot' = KA * Depot - CL * centc
Centralend
@derived begin
# Definition of derived variables
# Individual-predicted concentration
:= @.(Central / VC)
ipre # Dependent variable
"""
Warfarin Concentration (mg/L)
"""
~ @.Normal(ipre, sqrt((ipre * σpro)^2))
pkconc end
end
# Define the initial estimates
= (θCL = 1, θVC = 10, θKA = 1, θSEXCLF = 0.01, Ω = [
init_params_sex 0.09 0.01
0.01 0.09
= 0.3) ], σpro
# Define the Pumas model
= @model begin
mod_code_sex @param begin
# Definition of fixed effect parameters
∈ RealDomain(; lower = 0.0)
θCL ∈ RealDomain(; lower = 0.0)
θVC ∈ RealDomain(; lower = 0.0)
θKA ∈ RealDomain(; lower = -0.999)
θSEXCLF # Random effect parameters
# Variance-covariance matrix for inter-individual variability
∈ PSDDomain(2)
Ω # Residual unexplained variability
∈ RealDomain(; lower = 0.0)
σpro end
@covariates begin
WEIGHT
AGE
SEXend
@random begin
# Sampling random effect parameters
~ MvNormal(Ω)
η end
@pre begin
# Derived variables
# Covariates
# Effect of female sex on CL
=
COVSEXCL == "Female" ? 1 + θSEXCLF :
SEX
(== "Male" ? 1 :
SEX error(
"Expected SEX to be either \"Female\" or \"Male\" but the value was: $SEX",
)
)
# Individual PK parameters
= θCL * exp(η[1]) * COVSEXCL
CL = θVC * exp(η[2])
VC = θKA
KA end
@init begin
# Define initial conditions
= 0.0
Depot = 0.0
Central end
@vars begin
# Concentrations in compartments
:= Central / VC
centc end
@dynamics begin
# Differential equations
' = -KA * Depot
Depot' = KA * Depot - CL * centc
Centralend
@derived begin
# Definition of derived variables
# Individual-predicted concentration
:= @.(Central / VC)
ipre # Dependent variable
"""
Warfarin Concentration (mg/L)
"""
~ @.Normal(ipre, sqrt((ipre * σpro)^2))
pkconc end
end
# Define the initial estimates
= (θCL = 1, θVC = 10, θKA = 1, θSEXCLF = 0.01, Ω = [
init_params_sex 0.09 0.01
0.01 0.09
= 0.3) ], σpro
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -431.859
Number of subjects: 31
Number of parameters: Fixed Optimized
0 8
Observation records: Active Missing
pkconc: 240 47
Total: 240 47
----------------------
Estimate
----------------------
θCL 0.12692
θVC 8.0882
θKA 0.72761
θSEXCLF 0.27971
Ω₁,₁ 0.06673
Ω₂,₁ 0.036912
Ω₂,₂ 0.033654
σpro 0.22089
----------------------
3.2 Continuous Covariates
Similarly, continuous covariates are also defined in the @pre
(or @dosecontrol
) blocks. In this example, the allometric scaling exponents for the effect of body weight on PK parameters such as CL and VC are being estimated to demonstrate the definition and estimation of continuous covariates in a Pumas model.
Note: In other Modules for Population PK and PK/PD Modeling with Pumas, the normalization of body weight to 70 kg and the implementation of fixed allometric scaling exponents (0.75 for CL, and 1.0 for VC) was incorporated as part of the analysis dataset rather than defined within the Pumas model (Data Representation in Pumas). There may be computational efficiencies for cases like these where additional parameters are not being estimated.
Consider the following form for the effect of body weight on a population pharmacokinetic parameter (such as clearance or central volume of distribution):
P_{i} = \theta_{P}\cdot\left(\frac{WT_{i}}{WT_{ref}}\right)^{\theta_{WTP}}
# Define the Pumas model
= @model begin
mod_code_wt @param begin
# Definition of fixed effect parameters
∈ RealDomain(; lower = 0.0)
θCL ∈ RealDomain(; lower = 0.0)
θVC ∈ RealDomain(; lower = 0.0)
θKA ∈ RealDomain()
θWTCL ∈ RealDomain()
θWTVC # Random effect parameters
# Variance-covariance matrix for inter-individual variability
∈ PSDDomain(2)
Ω # Residual unexplained variability
∈ RealDomain(; lower = 0.0)
σpro end
@covariates begin
WEIGHT
AGE
SEXend
@random begin
# Sampling random effect parameters
~ MvNormal(Ω)
η end
@pre begin
# Derived variables
# Covariates
# Effect of body weight on CL
= (WEIGHT / 70)^θWTCL
COVWTCL
# Effect of body weight on VC
= (WEIGHT / 70)^θWTVC
COVWTVC
# Individual PK parameters
= θCL * exp(η[1]) * COVWTCL
CL = θVC * exp(η[2]) * COVWTVC
VC = θKA
KA end
@init begin
# Define initial conditions
= 0.0
Depot = 0.0
Central end
@vars begin
# Concentrations in compartments
:= Central / VC
centc end
@dynamics begin
# Differential equations
' = -KA * Depot
Depot' = KA * Depot - CL * centc
Centralend
@derived begin
# Definition of derived variables
# Individual-predicted concentration
:= @.(Central / VC)
ipre # Dependent variable
"""
Warfarin Concentration (mg/L)
"""
~ @.Normal(ipre, sqrt((ipre * σpro)^2))
pkconc end
end
# Define the initial estimates
= (
init_params_wt = 1,
θCL = 10,
θVC = 1,
θKA = 0.75,
θWTCL = 1.0,
θWTVC = [
Ω 0.09 0.01
0.01 0.09
],= 0.3,
σpro )
FittedPumasModel
Successful minimization: false
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -420.74875
Number of subjects: 31
Number of parameters: Fixed Optimized
0 9
Observation records: Active Missing
pkconc: 240 47
Total: 240 47
---------------------
Estimate
---------------------
θCL 0.13261
θVC 8.1619
θKA 0.75481
θWTCL 0.58518
θWTVC 0.79722
Ω₁,₁ 0.037722
Ω₂,₁ 0.0025223
Ω₂,₂ 0.0050672
σpro 0.22096
---------------------
4 Comparing Nested Models with the Likelihood Ratio Test
Nested models are compared using the likelihood-ratio test (LRT).
A LRT is a statistical hypothesis test used in the field of statistics and probability theory to compare two statistical models and determine which one provides a better fit to a given set of observed data. It is particularly useful in the context of maximum likelihood estimation (MLE) and is commonly used for hypothesis testing in parametric statistical modeling.
The likelihood ratio test compares the likelihoods of 2 competing models:
Null Hypothesis (H_0): This is the model that you want to test against, typically the “base” model. It represents a specific set of parameter values or restrictions on the model.
Alternative Hypothesis (H_a): This is the alternative model, often a more complex one or the one you want to support, typically the “covariate” model.
The test statistic is calculated as the ratio of the likelihood under the alternative model (H_a) to the likelihood under the null model (H_0). Mathematically, it can be expressed as:
\operatorname{LRT} = - 2 \log \left( \frac{\mathcal{L}(H_0)}{\mathcal{L}(H_a)} \right)
where:
- \operatorname{LRT}: likelihood ratio test statistic
- \mathcal{L}(H_0): likelihood under H_0, the likelihood of the data under the null hypothesis
- \mathcal{L}(H_a): likelihood under H_a, the likelihood of the data under the alternative hypothesis
The LRT statistic follows a \chi^2 (chi-squared) distribution with degrees of freedom equal to the difference in the number of parameters between the two models (i.e., the degrees of freedom is the number of additional estimable parameters in the alternative model).
In practice, you compare the LRT statistic to \chi^2 distribution to determine whether the alternative model is a significantly better fit to the data than the null model.
If the p-value derived from the LRT statistic is lower than your desired \alpha (the type-1 error rate, commonly set to 0.05), you reject the null hypothesis in favor of the alternative hypothesis, indicating that the alternative model provides a better fit to the data.
The likelihood-ratio test requires that the models be nested, i.e. the more complex model can be transformed into the simpler model by imposing constraints on the former’s parameters.
This is generally the case when performing LRT in a covariate selection context. However, be mindful of not violating this assumption when performing LRT.
4.1 Performing Likelihood Ratio Test in Pumas
The lrtest
function is used to perform the LRT in Pumas. It takes 2 positional arguments as competing models:
- Model under H_0 (i.e. the model with less parameters)
- Model under H_a (i.e. the model with more parameters)
For comparing the base model to the model with the effect of female sex on clearance:
# Perform likelihood ratio test between base model and covariate model
= lrtest(mod_fit, mod_fit_sex) lrt
Statistic: 3.34
Degrees of freedom: 1
P-value: 0.068
In the output from lrtest
, the:
Statistic
is the difference in the -2\cdotloglikelihood between the null and alternative modelsDegrees of freedom
is the difference in the number of estimable parameters between the null and alternative modelsP-value
is the p-value for the test statistic given the degrees of freedom.
The p-value and the degrees of freedom can also be extracted from the output of lrtest
as below:
# Extract the p-value from the lrtest output
= pvalue(lrt)
lrt_pvalue
# Extract the degrees of freedom from the lrtest output
= lrt.Δdf lrt_df
The effect of female sex on clearance did not statistically improve the fit
to the observed data compared to the base model (p-value = 0.0675 with 1
degree of freedom).
5 Summary
Considerations for covariate modeling in Pumas is required as the analysis dataset preparation stage as handling of missing values is required prior to constructing the Population
object (Pumas interpolates over the missing values using last observation carried forward as default). Categorical variables with descriptive values can be used directly within a Pumas model with no need to convert to numeric values for the purposes of defining covariate models. The likelihood ratio test can be simply performed on 2 competing models using the lrtest
function.