Module 9: Covariate Modeling

1 Module Introduction and Objectives

The target audience are pharmacometricians with experience in NONMEM or other equivalent non-linear modeling software commonly used in pharmacometrics. The Module builds on concepts described in Learning Path 2: Introduction to Pumas and other Modules pertaining to Learning Path 3.

1.1 Objectives

The objectives of Module 9: Covariate Modeling are:

Re-enforce how covariates need to be represented in a Pumas dataset through read_pumas
Demonstrate how covariates are read into the PumasModel using @covariates
Explain special considerations required for time-dependent covariates
Present examples for coding continuous and categorical covariates proportional to the population-typical value
Explain the concept of nested models
Explain the concept of the likelihood ratio test and how to interpret the results when comparing 2 nested models
Provide the Pumas code for performing the likelihood ratio test on 2 FittedPumasModels

Covariate modeling workflows (including stepwise covariate modeling) will be addressed in later Modules. However, additional information is available at tutorials.pumas.ai.

Additional Information for Covariate Modeling

In this Module, it is already assumed that a base model has been fitted (mod_fit) to the warfarin concentrations over time where the analysis dataset was named, examp_df_pumas, and the dependent variable was named, conc.

Warfarin Population Pharmacokinetic Model

The model is a 1-compartment model with linear elimination and first-order absorption, log-normally distributed inter-individual variability on clearance (CL) and volume of distribution of the central compartment (VC), and a proportional residual error model.

# Read in as a Pumas dataset
examp_df_pumas = read_pumas(examp_df; observations = [:conc])

# Define the Pumas model
mod_code = @model begin
    @param begin
        # Definition of fixed effect parameters
        θCL ∈ RealDomain(; lower = 0.0)
        θVC ∈ RealDomain(; lower = 0.0)
        θKA ∈ RealDomain(; lower = 0.0)
        # Random effect parameters
        # Variance-covariance matrix for inter-individual variability
        Ω ∈ PSDDomain(2)
        # Residual unexplained variability
        σpro ∈ RealDomain(; lower = 0.0)
    end
    @random begin
        # Sampling random effect parameters
        η ~ MvNormal(Ω)
    end
    @pre begin
        # Derived variables
        # Covariates
        # None

        # Individual PK parameters
        CL = θCL * exp(η[1])
        VC = θVC * exp(η[2])
        KA = θKA
    end
    @init begin
        # Define initial conditions
        Depot = 0.0
        Central = 0.0
    end
    @vars begin
        # Concentrations in compartments
        centc := Central / VC
    end
    @dynamics begin
        # Differential equations
        Depot' = -KA * Depot
        Central' = KA * Depot - CL * centc
    end
    @derived begin
        # Definition of derived variables
        # Individual-predicted concentration
        ipre := @.(Central / VC)
        # Dependent variable
        """
        Warfarin Concentration (mg/L)
        """
        conc ~ @.Normal(ipre, sqrt((ipre * σpro)^2))
    end
end

# Define the initial estimates
params0 = (θCL = 1, θVC = 10, θKA = 1, Ω = [
    0.09 0.01
    0.01 0.09
], σpro = 0.3)

FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              7

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                    -468.8711

----------------
       Estimate
----------------
θCL    0.1361
θVC    8.2234
θKA    0.62015
Ω₁,₁   0.066661
Ω₂,₁   0.031618
Ω₂,₂   0.044949
σpro   0.23737
----------------

2 Defining Covariates

Variables intended to be evaluated as covariates in Pumas require definition from both the analysis dataset and the Pumas model perspectives.

2.1 Analysis Dataset Specifications

When the input Population is constructed using the read_pumas function, covariate variables need to be defined by passing a vector of column names to the keyword argument, covariates.

In our example, the covariate variables are weight, age, and sex. Additionally, these variables are considered to be subject-level where each individual has a single value for all dosing and observation records.

# Read in as a Pumas dataset
examp_df_pumas =
    read_pumas(examp_df; observations = [:conc], covariates = [:wtbl, :age, :sex])


id	time	evid	conc	amt	cmt	rate	duration	ss	ii	route	wtbl	age	sex	tad	dosenum

1	0	1	missing	100	1	0	0	0	0	NCA.NullRoute	66.7	50	M	0	1
1	0.5	0	0	0	missing	0	0	0	0	missing	66.7	50	M	0.5	1
1	1	0	1.9	0	missing	0	0	0	0	missing	66.7	50	M	1	1
1	2	0	3.3	0	missing	0	0	0	0	missing	66.7	50	M	2	1
1	3	0	6.6	0	missing	0	0	0	0	missing	66.7	50	M	3	1
1	6	0	9.1	0	missing	0	0	0	0	missing	66.7	50	M	6	1
1	9	0	10.8	0	missing	0	0	0	0	missing	66.7	50	M	9	1
1	12	0	8.6	0	missing	0	0	0	0	missing	66.7	50	M	12	1
1	24	0	5.6	0	missing	0	0	0	0	missing	66.7	50	M	24	1
1	36	0	4	0	missing	0	0	0	0	missing	66.7	50	M	36	1
1	48	0	2.7	0	missing	0	0	0	0	missing	66.7	50	M	48	1
1	72	0	0.8	0	missing	0	0	0	0	missing	66.7	50	M	72	1
1	96	0	missing	0	missing	0	0	0	0	missing	66.7	50	M	96	1
1	120	0	missing	0	missing	0	0	0	0	missing	66.7	50	M	120	1
1	144	0	missing	0	missing	0	0	0	0	missing	66.7	50	M	144	1

Categorical Variables

Pumas can handle categorical variables with their descriptive labels. There is no need to convert numerical formats for categorical variables.

2.1.1 Time-Dependent Covariates

Additional considerations are required for time-dependent covariates:

Values of time-dependent covariates do not need to be time-matched with existing dosing and dependent variable observation records. Consider the following DataFrame where some dv values are missing at times when weight has values, and vice versa:


id	time	evid	dv	mdv	weight

1	0	0	missing	1	70
1	1	0	0.532	0	missing
1	2	0	0.593	0	missing
1	3	0	0.726	0	missing
1	4	0	0.617	0	missing
1	5	0	missing	1	80
1	6	0	0.202	0	85
1	7	0	0.503	0	missing
1	8	0	0.243	0	70

Missing values for time-dependent covariates are interpolated when the Population is constructed via read_pumas. An option of :left (last observation carried forward, default) or :right (next observation carried backward) is passed to the keyword argument, covariates_direction:

# Convert to a Pumas Population
timecov_pumas_locf = read_pumas(
    timecov_df;
    observations = [:dv],
    mdv = :mdv,
    covariates = [:weight],
    covariates_direction = :left,
    event_data = false,
)


id	time	evid	dv	weight

1	0	0	missing	70
1	1	0	0.532	70
1	2	0	0.593	70
1	3	0	0.726	70
1	4	0	0.617	70
1	5	0	missing	80
1	6	0	0.202	85
1	7	0	0.503	85
1	8	0	0.243	70

# Convert to a Pumas Population
timecov_pumas_nocb = read_pumas(
    timecov_df;
    observations = [:dv],
    mdv = :mdv,
    covariates = [:weight],
    covariates_direction = :right,
    event_data = false,
)


id	time	evid	dv	weight

1	0	0	missing	70
1	1	0	0.532	80
1	2	0	0.593	80
1	3	0	0.726	80
1	4	0	0.617	80
1	5	0	missing	80
1	6	0	0.202	85
1	7	0	0.503	70
1	8	0	0.243	70

2.1.2 Missing Covariate Values

Missing values for covariates are not handled by the PumasModel. As shown in Section 2.1.1, missing values are interpolated during construction of the Population with read_pumas and use last observation carried forward or next observation carried backward methods.

If it is intended that missing values for covariate variables are to be imputed with the median or mode for the individual or the analysis population then this needs to be handled at the analysis dataset construction stage (i.e., prior to constructing the Population with read_pumas).

2.2 Model Specification

From the PumasModel perspective, covariate variables need to be specified in the @covariates block. Quote blocks can be used to assign labels to the covariates which can be leveraged for axis labels in subsequent plots based on the FittedPumasModel.

Note: Not every variable defined as a covariate in the Population needs to be defined in @covariates as there are other utilities beyond the PumasModel that may use the population’s covariates. However, every variable used as a covariate in the model needs to be defined in both @covariates in the PumasModel and passed to the covariates keyword argument of read_pumas.

# Defining the covariates block
@covariates begin
    """
    Body Weight (kg)
    """
    wtbl
    """
    Age (years)
    """
    age
    """
    sex
    """
    Sex
end

The definition of fixed effect parameters quantifying the effect of a covariate on a typical parameter and the definition of covariate model equations are described in Section 3.

3 Covariate Model Parameterization

Covariate models typically require:

A model structure describing the relationship between a covariate value and the population-typical parameter
- This can be defined in the @pre block for covariate effects on parameters associated with rate or magnitude of changes in responses (i.e., clearance and central volume of distribution for PK models or baseline, maximum drug effect, half-maximal concentration of effect for PK/PD models)
- This can also be defined in the @dosecontrol block for covariate effects on parameters associated modifications to the dosing input such as bioavailability, input rate or duration, or lags
An estimated fixed effect parameter that quantifies the magnitude and direction of the relationship
- This is defined in the @param block

3.1 Categorical Covariates

Pumas Does Not Check if Categories are Available in Source Data

For the examples outlined below, it should be noted that Pumas does not check if the covariate category defined as part of the covariate model is a value that is available for the corresponding variable in the source data. Ensure that you check the spelling of defined categories in your model or for safety, implement error messages when none of the defined conditions are met (as shown in the examples below).

Consider the following form for the effect of female sex on clearance:

\text{COVsexCL} = \begin{cases} 1 & \text{if sex = M}\\ 1 + \theta_{sexCL} & \text{if sex = F} \end{cases} CL_{i} = \theta_{CL}\cdot\text{COVsexCL}

Categorical covariates can be defined using either an if-else statement or the ternary operator:

# Define the Pumas model
mod_code_sex = @model begin
    @param begin
        # Definition of fixed effect parameters
        θCL ∈ RealDomain(; lower = 0.0)
        θVC ∈ RealDomain(; lower = 0.0)
        θKA ∈ RealDomain(; lower = 0.0)
        θsexCLF ∈ RealDomain(; lower = -0.999)
        # Random effect parameters
        # Variance-covariance matrix for inter-individual variability
        Ω ∈ PSDDomain(2)
        # Residual unexplained variability
        σpro ∈ RealDomain(; lower = 0.0)
    end
    @random begin
        # Sampling random effect parameters
        η ~ MvNormal(Ω)
    end
    @covariates begin
        wtbl
        age
        sex
    end
    @pre begin
        # Derived variables
        # Covariates
        # Effect of female sex on CL
        COVsexCL = if sex == "F"
            # For sex == "F", estimate the effect
            1 + θsexCLF
        elseif sex == "M"
            # For sex == "M", set to the reference value
            1
        else
            # Return error message if specified conditions are not met
            error("Expected sex to be either \"F\" or \"M\" but the value was: $sex")
        end

        # Individual PK parameters
        CL = θCL * exp(η[1]) * COVsexCL
        VC = θVC * exp(η[2])
        KA = θKA
    end
    @init begin
        # Define initial conditions
        Depot = 0.0
        Central = 0.0
    end
    @vars begin
        # Concentrations in compartments
        centc := Central / VC
    end
    @dynamics begin
        # Differential equations
        Depot' = -KA * Depot
        Central' = KA * Depot - CL * centc
    end
    @derived begin
        # Definition of derived variables
        # Individual-predicted concentration
        ipre := @.(Central / VC)
        # Dependent variable
        """
        Warfarin Concentration (mg/L)
        """
        conc ~ @.Normal(ipre, sqrt((ipre * σpro)^2))
    end
end

# Define the initial estimates
params0_sex = (θCL = 1, θVC = 10, θKA = 1, θsexCLF = 0.01, Ω = [
    0.09 0.01
    0.01 0.09
], σpro = 0.3)

# Define the Pumas model
mod_code_sex = @model begin
    @param begin
        # Definition of fixed effect parameters
        θCL ∈ RealDomain(; lower = 0.0)
        θVC ∈ RealDomain(; lower = 0.0)
        θKA ∈ RealDomain(; lower = 0.0)
        θsexCLF ∈ RealDomain(; lower = -0.999)
        # Random effect parameters
        # Variance-covariance matrix for inter-individual variability
        Ω ∈ PSDDomain(2)
        # Residual unexplained variability
        σpro ∈ RealDomain(; lower = 0.0)
    end
    @covariates begin
        wtbl
        age
        sex
    end
    @random begin
        # Sampling random effect parameters
        η ~ MvNormal(Ω)
    end
    @pre begin
        # Derived variables
        # Covariates
        # Effect of female sex on CL
        COVsexCL =
            sex == "F" ? 1 + θsexCLF :
            (
                sex == "M" ? 1 :
                error("Expected sex to be either \"F\" or \"M\" but the value was: $sex")
            )

        # Individual PK parameters
        CL = θCL * exp(η[1]) * COVsexCL
        VC = θVC * exp(η[2])
        KA = θKA
    end
    @init begin
        # Define initial conditions
        Depot = 0.0
        Central = 0.0
    end
    @vars begin
        # Concentrations in compartments
        centc := Central / VC
    end
    @dynamics begin
        # Differential equations
        Depot' = -KA * Depot
        Central' = KA * Depot - CL * centc
    end
    @derived begin
        # Definition of derived variables
        # Individual-predicted concentration
        ipre := @.(Central / VC)
        # Dependent variable
        """
        Warfarin Concentration (mg/L)
        """
        conc ~ @.Normal(ipre, sqrt((ipre * σpro)^2))
    end
end

# Define the initial estimates
params0_sex = (θCL = 1, θVC = 10, θKA = 1, θsexCLF = 0.01, Ω = [
    0.09 0.01
    0.01 0.09
], σpro = 0.3)

FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              8

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -467.21084

-------------------
          Estimate
-------------------
θCL       0.13049
θVC       8.2105
θKA       0.62563
θsexCLF   0.30716
Ω₁,₁      0.082764
Ω₂,₁      0.046443
Ω₂,₂      0.042338
σpro      0.23787
-------------------

3.2 Continuous Covariates

Similarly, continuous covariates are also defined in the @pre (or @dosecontrol) blocks. In this example, the allometric scaling exponents for the effect of body weight on PK parameters such as CL and VC are being estimated to demonstrate the definition and estimation of continuous covariates in a Pumas model.

Note: In other Modules for Population PK and PK/PD Modeling with Pumas, the normalization of body weight to 70 kg and the implementation of fixed allometric scaling exponents (0.75 for CL, and 1.0 for VC) was incorporated as part of the analysis dataset rather than defined within the Pumas model (Data Representation in Pumas). There may be computational efficiencies for cases like these where additional parameters are not being estimated.

Consider the following form for the effect of body weight on a population pharmacokinetic parameter (such as clearance or central volume of distribution):

P_{i} = \theta_{P}\cdot\left(\frac{WT_{i}}{WT_{ref}}\right)^{\theta_{WTP}}

# Define the Pumas model
mod_code_wt = @model begin
    @param begin
        # Definition of fixed effect parameters
        θCL ∈ RealDomain(; lower = 0.0)
        θVC ∈ RealDomain(; lower = 0.0)
        θKA ∈ RealDomain(; lower = 0.0)
        θWTCL ∈ RealDomain()
        θWTVC ∈ RealDomain()
        # Random effect parameters
        # Variance-covariance matrix for inter-individual variability
        Ω ∈ PSDDomain(2)
        # Residual unexplained variability
        σpro ∈ RealDomain(; lower = 0.0)
    end
    @covariates begin
        wtbl
        age
        sex
    end
    @random begin
        # Sampling random effect parameters
        η ~ MvNormal(Ω)
    end
    @pre begin
        # Derived variables
        # Covariates
        # Effect of body weight on CL
        COVWTCL = (wtbl / 70)^θWTCL

        # Effect of body weight on VC
        COVWTVC = (wtbl / 70)^θWTVC

        # Individual PK parameters
        CL = θCL * exp(η[1]) * COVWTCL
        VC = θVC * exp(η[2]) * COVWTVC
        KA = θKA
    end
    @init begin
        # Define initial conditions
        Depot = 0.0
        Central = 0.0
    end
    @vars begin
        # Concentrations in compartments
        centc := Central / VC
    end
    @dynamics begin
        # Differential equations
        Depot' = -KA * Depot
        Central' = KA * Depot - CL * centc
    end
    @derived begin
        # Definition of derived variables
        # Individual-predicted concentration
        ipre := @.(Central / VC)
        # Dependent variable
        """
        Warfarin Concentration (mg/L)
        """
        conc ~ @.Normal(ipre, sqrt((ipre * σpro)^2))
    end
end

# Define the initial estimates
params0_wt = (
    θCL = 1,
    θVC = 10,
    θKA = 1,
    θWTCL = 0.75,
    θWTVC = 1.0,
    Ω = [
        0.09 0.01
        0.01 0.09
    ],
    σpro = 0.3,
)

FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -458.44628

-----------------
        Estimate
-----------------
θCL     0.13755
θVC     8.3605
θKA     0.63708
θWTCL   0.58107
θWTVC   0.88447
Ω₁,₁    0.057566
Ω₂,₁    0.011589
Ω₂,₂    0.01475
σpro    0.23505
-----------------

4 Comparing Nested Models with the Likelihood Ratio Test

Nested models are compared using the likelihood-ratio test (LRT).

A LRT is a statistical hypothesis test used in the field of statistics and probability theory to compare two statistical models and determine which one provides a better fit to a given set of observed data. It is particularly useful in the context of maximum likelihood estimation (MLE) and is commonly used for hypothesis testing in parametric statistical modeling.

The likelihood ratio test compares the likelihoods of 2 competing models:

Null Hypothesis (H_0): This is the model that you want to test against, typically the “base” model. It represents a specific set of parameter values or restrictions on the model.
Alternative Hypothesis (H_a): This is the alternative model, often a more complex one or the one you want to support, typically the “covariate” model.

The test statistic is calculated as the ratio of the likelihood under the alternative model (H_a) to the likelihood under the null model (H_0). Mathematically, it can be expressed as:

\operatorname{LRT} = - 2 \log \left( \frac{\mathcal{L}(H_0)}{\mathcal{L}(H_a)} \right)

where:

\operatorname{LRT}: likelihood ratio test statistic
\mathcal{L}(H_0): likelihood under H_0, the likelihood of the data under the null hypothesis
\mathcal{L}(H_a): likelihood under H_a, the likelihood of the data under the alternative hypothesis

The LRT statistic follows a \chi^2 (chi-squared) distribution with degrees of freedom equal to the difference in the number of parameters between the two models (i.e., the degrees of freedom is the number of additional estimable parameters in the alternative model).

In practice, you compare the LRT statistic to \chi^2 distribution to determine whether the alternative model is a significantly better fit to the data than the null model.

If the p-value derived from the LRT statistic is lower than your desired \alpha (the type-1 error rate, commonly set to 0.05), you reject the null hypothesis in favor of the alternative hypothesis, indicating that the alternative model provides a better fit to the data.

Note

The likelihood-ratio test requires that the models be nested, i.e. the more complex model can be transformed into the simpler model by imposing constraints on the former’s parameters.

This is generally the case when performing LRT in a covariate selection context. However, be mindful of not violating this assumption when performing LRT.

4.1 Performing Likelihood Ratio Test in Pumas

The lrtest function is used to perform the LRT in Pumas. It takes 2 positional arguments as competing models:

Model under H_0 (i.e. the model with less parameters)
Model under H_a (i.e. the model with more parameters)

For comparing the base model to the model with the effect of female sex on clearance:

# Perform likelihood ratio test between base model and covariate model
lrt = lrtest(mod_fit, mod_fit_sex)

Statistic:            3.32
Degrees of freedom:      1
P-value:             0.068

In the output from lrtest, the:

Statistic is the difference in the -2\cdotloglikelihood between the null and alternative models
Degrees of freedom is the difference in the number of estimable parameters between the null and alternative models
P-value is the p-value for the test statistic given the degrees of freedom.

The p-value and the degrees of freedom can also be extracted from the output of lrtest as below:

# Extract the p-value from the lrtest output
lrt_pvalue = pvalue(lrt)

# Extract the degrees of freedom from the lrtest output
lrt_df = lrt.Δdf

The effect of female sex on clearance did not statistically improve the fit
to the observed data compared to the base model (p-value = 0.0684 with 1
degree of freedom).

5 Summary

Considerations for covariate modeling in Pumas is required as the analysis dataset preparation stage as handling of missing values is required prior to constructing the Population object (Pumas interpolates over the missing values using last observation carried forward as default). Categorical variables with descriptive values can be used directly within a Pumas model with no need to convert to numeric values for the purposes of defining covariate models. The likelihood ratio test can be simply performed on 2 competing models using the lrtest function.