NONMEM style BLQ models (M1-M4) models in Pumas

Author

Patrick Kofod Mogensen

Before we begin we need to load the necessary packages and set a seed for reproducibility.

using Pumas
using DataFramesMeta
using PumasUtilities
using Random
Webinar

This tutorial is about handling the missing data problem related to observations being outside the limits of quantification. In 2022, we held a webinar about the theory of such statistical modeling. It was was recorded and is available on YouTube. The video is embedded below.

1 Handling Lower Limit of Quantification

In the real world, assays often come with limitations such as lower or upper limits of quantification. When pharmacometricians receive data it has often been processed by the lab or by statisticians in other departments. The data can have various levels of information about the measured concentration if it was at or below the Lower Limit of Quantification (LLOQ). Unfortunately, such measurements are often simply labeled "BQL" in the dependent variable column to indicate that the measurement was taken but it was Below Quatification Limit (BQL). If the value was provided it could be modelled in a more explicit manner, but with just BQL indications, we need to treat it as a missing data problem.

In pharmacometrics, the main references on how to handle this kind of data are Beal (2001) for a conceptual description and Bergstrand and Karlsson (2009) for a simulation exercise. These papers use the M1-M7 nomenclature of the former. In Pumas, we try to always refer to things according to their actual distributional assumptions or data transformation steps. In this tutorial, we will try to use both to ease the transition for a NONMEM workflow to a Pumas workflow. The discussions around M* methods assumes familiarity with the nomenclature described in the above papers. See our other tutorials for BQL introductions that does not directly refer to NONMEM methods if you are not transitioning, as the discussion in terms of M* methods is mostly done for legacy reasons.

2 Definitions

Before we start, we need to define some terms:

Truncation is a statistical term that refers to the removal of data points that fall outside a certain range. In the context of pharmacometrics, truncation is used to remove data points that fall below the lower limit of quantification (LLOQ). Mathematically, say that \(X\) is our dependent variable, i.e. the quantity we want to measure:

\[X \sim N(\mu, \sigma^2)\]

\(X\) is generally measured with some instrument. To account for the fact that the instrument might have some limitations, we generally observe in our dataset either the truncated or censored values of \(X\). We denote \(Y^t\) as the truncated value of \(X\) and \(Y^c\) as the censored value of \(X\).

In trucation, we observe the value of \(X\) if it is between lower and higher thresholds, \(a\) and \(b\), respectively. For some reason, the data processing step has removed the data points that fall outside this range. We can represent a truncated variable \(Y^t\) as:

\[Y^t = X \mid a \leq X \leq b\]

The censoring case is different. In censoring, we observe the value of \(X\) if it is above a certain threshold, \(a\); or below a certain threshold, \(b\). The censored variable \(Y^c\) is then:

\[Y^c = \min(\max(X, a), b)\]

Here are examples of truncation and censoring:

  • Truncation (missing information): A device measures a plasma concentration. We only see data points if they are within the truncation limits.
  • Censoring (missing information): A device measures a plasma concentration as before. However, below some measurement value (LLOQ) the number cannot be trusted so we only know that our measurement is below that value.
Data Truncation versus Truncated Distribution

There’s a difference between data truncation and a truncated distribution. Data truncation refers to the removal of data points that fall outside a certain range. Whereas a truncated distribution is a probability distribution that is restricted to a certain range. It assumes that the Data Generating Process (DGP) is truncated. In the sense that the data is generated from a distribution that is truncated.

For example an assay that cannot return negative values, or a measurement device that cannot measure values above a certain threshold. In these cases the Normal distribution, which originally has support on the entire real line, is truncated to a certain range.

3 Generating data

To study a specific statistical phenomenon and properties of any estimator it is often important to be able to separately specify:

  1. the Data Generating Process (DGP); and
  2. the estimation procedure or estimator.

The data in this tutorial will come from a simple, simulated data set with 90 subjects that receive an injection of 200 units at time 0. The assay used to measure the concentration of the drug in plasma has a Lower Limit of Quantification (LLOQ) of 0.9 units of the concentration. By the simple properties of a two compartment model with linear elimination rate, all subjects will eventually have concentrations that are sufficiently low that samples will have a chance to be Below the Quantification Limit (BQL). The error model is a combined error model. With such a distribution we can have observations that are not BQL even at high t (model time), but the proportion of BQL values should be increasing over time.

Notice, that the above type of BQL observations is only one type of many. The proportion of BQL is simply a function of time. In some cases, the probability of a being BQL may be a function of covariates as well. Some patients may have a higher volume of distribution for example. This does not change how we handle the issue statistically, but it may affect how we decide whether or not to ignore or explicitly handle the BQL issue. You could argue that you should always explicitly handle the BQL values, but the choice may not be so simple. The methods that explicitly model the BQL values in a statistical manner require the use of the LaplaceI method if we are using first order approximation methods. This method is known to be slower and sometimes less numerically robust than FOCE. For this reason, an analyst may choose to only use BQL methods if the bias from simply ignoring the problematic observations is expected to be high. An alternative is to use Expectation-Maximization (EM) methods such as SAEM. The treatment of such an approach is done in a separate tutorial.

4 Data Generating Process

The Data Generating Process (DGP) consists of the specification of:

  1. the population
  2. the doses
  3. the sampling times
  4. the model
  5. the parameters

4.1 The population, doses, and sampling times

As mentioned previously, each subject is dosed 200 units of the drug at time zero. There are 90 subjects in total and we sample their plasma at the time of dosing and each hours thereafter for twelve hours.

pop = [
    Subject(
        id = string(i),
        observations = (CONC = nothing,),
        events = DosageRegimen(200),
        time = [1.0, 3.0, 5.0, 7.0, 9.0, 11.0],
    ) for i = 1:90
]

This population is empty in the sense that it has no simulated observations yet. To do that, we need to define a model and the parameters. The following model can be seen as a model for the distribution and elimination of the drug (the dynamical model) as well as the values from the assay (the derived block) without any masking of BQL values.

iv2cmt_dgp = @model begin
    @param begin
        θcl  RealDomain(lower = 0.1)
        θvc  RealDomain(lower = 1.0)
        θq  RealDomain(lower = 0.1)
        θvp  RealDomain(lower = 1.1)
        Ω  PDiagDomain(2)
        σ_add  RealDomain(lower = 0.0)
        σ_prop  RealDomain(lower = 0.0)
    end

    @random begin
        η ~ MvNormal(Ω)
    end

    @pre begin
        CL = θcl * exp(η[1])
        Vc = θvc * exp(η[2])
        Q = θq
        Vp = θvp
    end

    @dynamics Central1Periph1

    @derived begin
        conc_model := @. Central / Vc
        CONC ~ @. Normal(conc_model, sqrt(σ_add^2 + (abs(conc_model) * σ_prop)^2))
    end
end
PumasModel
  Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
  Random effects: η
  Covariates: 
  Dynamical variables: Central, Peripheral
  Derived: CONC
  Observed: CONC

We can also choose some parameters to simulate from:

param0 = (;
    θcl = 2.9,
    θvc = 7.0,
    θq = 1.8,
    θvp = 9.0,
    Ω = Diagonal([0.09, 0.09]),
    σ_add = sqrt(0.03),
    σ_prop = sqrt(0.03),
)
(θcl = 2.9,
 θvc = 7.0,
 θq = 1.8,
 θvp = 9.0,
 Ω = [0.09 0.0; 0.0 0.09],
 σ_add = 0.17320508075688773,
 σ_prop = 0.17320508075688773,)

4.2 A simulated population

To simulate the population, we simple use simobs and the objects defined above.

# Set a seed for reproducibility
rng = Random.seed!(2142)
simpop = simobs(iv2cmt_dgp, pop, param0; rng)
DataFrame(simpop)

simpop carries the simulated information about the assay, but does not yet apply the BQL masking. This is done in the next step.

4.3 Lower Limit Of Quantification (LLOQ)

In this example, we will assume that our assay has a lower limit of quantification at 0.9. We can convert our simulated population to a tabular format (DataFrame) and calculate what rows are BQL. This information can be used to produce plots showing the censoring, and to create our masked dependent variable.

LLOQ = 0.9
pkdata = DataFrame(simpop)
pkdata.CONC .= pkdata.CONC
pkdata.BQL = pkdata.CONC .<= LLOQ
pkdata_obs = filter(x -> x.evid == 0, pkdata)

4.3.1 Calculating proportions BQL by time

To investigate whether censoring of the dependent variable is an issue or not, we can calculate the proportion of BQL at each observation time.

pkdata.BQL = pkdata.CONC .<= LLOQ
pkdata_obs_gdf = groupby(pkdata_obs, :time)
pkdata_obs_gdf_comb = combine(pkdata_obs_gdf, :BQL => mean => "BQL_proportion")
6×2 DataFrame
Row time BQL_proportion
Float64 Float64
1 1.0 0.0
2 3.0 0.0
3 5.0 0.0444444
4 7.0 0.0666667
5 9.0 0.177778
6 11.0 0.244444

We see that the value of the BQL_proportion column is increasing, and towards the end we do have a fair amount of BQL values. Another way to investigate this number is in a plot.

4.3.2 Visualizing BQL proportions

Let us visualize the effect of the censoring in the following plot.

Show/Hide Code
using AlgebraOfGraphics

f = AlgebraOfGraphics.Figure()
f1 = draw!(
    f[1:2, 1],
    data(pkdata_obs) *
    mapping(:time, :CONC => "CONC", color = :BQL) *
    visual(AlgebraOfGraphics.Scatter) +
    mapping([LLOQ], [0.0]) * visual(AlgebraOfGraphics.ABLines; color = :orange);
    axis = (; xlabel = "",),
    palettes = (; color = fcolor(i) = (:black, :orange)[i+1]),
)
f2 = draw!(
    f[3, 1],
    data(pkdata_obs_gdf_comb) *
    mapping(:time, :BQL_proportion => "BLQ proportion") *
    visual(AlgebraOfGraphics.Scatter; color = :orange);
    axis = (limits = (nothing, nothing, 0, 1),),
)
legend!(f[1:2, 2], f1)
f

In the bottom panel, we see that the proportion of BQL values increases over time. The censoring is a little bit hard to see. In the next plot, it is easier to see the effect when we plot the y-axis as the log-transformed variable.

Show/Hide Code
using AlgebraOfGraphics

f = AlgebraOfGraphics.Figure()
f1 = draw!(
    f[1:2, 1],
    data(pkdata_obs) *
    mapping(:time, :CONC => log => "log(CONC)", color = :BQL) *
    visual(AlgebraOfGraphics.Scatter) +
    mapping([log(LLOQ)], [0.0]) * visual(AlgebraOfGraphics.ABLines; color = :orange);
    axis = (; xlabel = ""),
    palettes = (; color = fcolor(i) = (:black, :orange)[i+1]),
)
f2 = draw!(
    f[3, 1],
    data(pkdata_obs_gdf_comb) *
    mapping(:time, :BQL_proportion => "BLQ proportion") *
    visual(AlgebraOfGraphics.Scatter; color = :orange);
    axis = (; limits = (nothing, nothing, 0, 1)),
)
legend!(f[1:2, 2], f1)
f

4.4 Masking the data

As mentioned in the introduction, real world data often comes without the assay values if they are BQL.

pkdata_censored = copy(pkdata)
pkdata_censored.CONC .= max.(LLOQ, pkdata_censored.CONC)
first(pkdata_censored, 20)
20×20 DataFrame
Row id time CONC evid amt cmt rate duration ss ii route η_1 η_2 Central Peripheral CL Vc Q Vp BQL
String? Float64 Float64? Int64? Float64? Symbol? Float64? Float64? Int8? Float64? NCA.Route? Float64? Float64? Float64? Float64? Float64? Float64? Float64? Float64? Bool?
1 1 0.0 missing 1 200.0 Central 0.0 0.0 0 0.0 NullRoute -0.205373 -0.122347 missing missing 2.3616 6.19389 1.8 9.0 missing
2 1 1.0 9.71942 0 missing missing missing missing missing missing missing -0.205373 -0.122347 105.662 38.3016 2.3616 6.19389 1.8 9.0 false
3 1 3.0 5.33975 0 missing missing missing missing missing missing missing -0.205373 -0.122347 39.0625 55.6195 2.3616 6.19389 1.8 9.0 false
4 1 5.0 3.72554 0 missing missing missing missing missing missing missing -0.205373 -0.122347 21.914 50.8124 2.3616 6.19389 1.8 9.0 false
5 1 7.0 1.82011 0 missing missing missing missing missing missing missing -0.205373 -0.122347 15.8283 42.8137 2.3616 6.19389 1.8 9.0 false
6 1 9.0 2.26987 0 missing missing missing missing missing missing missing -0.205373 -0.122347 12.5344 35.3837 2.3616 6.19389 1.8 9.0 false
7 1 11.0 1.31686 0 missing missing missing missing missing missing missing -0.205373 -0.122347 10.1926 29.0999 2.3616 6.19389 1.8 9.0 false
8 2 0.0 missing 1 200.0 Central 0.0 0.0 0 0.0 NullRoute -0.461936 0.160303 missing missing 1.82718 8.21707 1.8 9.0 missing
9 2 1.0 16.1205 0 missing missing missing missing missing missing missing -0.461936 0.160303 131.695 32.1012 1.82718 8.21707 1.8 9.0 false
10 2 3.0 8.51043 0 missing missing missing missing missing missing missing -0.461936 0.160303 67.2258 54.7327 1.82718 8.21707 1.8 9.0 false
11 2 5.0 5.3066 0 missing missing missing missing missing missing missing -0.461936 0.160303 42.711 55.6004 1.82718 8.21707 1.8 9.0 false
12 2 7.0 3.00945 0 missing missing missing missing missing missing missing -0.461936 0.160303 31.6951 50.3485 1.82718 8.21707 1.8 9.0 false
13 2 9.0 2.95567 0 missing missing missing missing missing missing missing -0.461936 0.160303 25.5099 43.9187 1.82718 8.21707 1.8 9.0 false
14 2 11.0 3.32116 0 missing missing missing missing missing missing missing -0.461936 0.160303 21.2751 37.7975 1.82718 8.21707 1.8 9.0 false
15 3 0.0 missing 1 200.0 Central 0.0 0.0 0 0.0 NullRoute 0.15667 -0.226875 missing missing 3.39187 5.57914 1.8 9.0 missing
16 3 1.0 19.9397 0 missing missing missing missing missing missing missing 0.15667 -0.226875 82.1791 37.885 3.39187 5.57914 1.8 9.0 false
17 3 3.0 4.29876 0 missing missing missing missing missing missing missing 0.15667 -0.226875 21.2668 47.0343 3.39187 5.57914 1.8 9.0 false
18 3 5.0 1.96556 0 missing missing missing missing missing missing missing 0.15667 -0.226875 10.9715 39.2597 3.39187 5.57914 1.8 9.0 false
19 3 7.0 1.26342 0 missing missing missing missing missing missing missing 0.15667 -0.226875 7.86242 31.1639 3.39187 5.57914 1.8 9.0 false
20 3 9.0 0.9 0 missing missing missing missing missing missing missing 0.15667 -0.226875 6.07933 24.5369 3.39187 5.57914 1.8 9.0 true

Here LLOQ is the variable defined above, and has the value of 0.9. This type of dataset has the dependent variable column set to the actual LLOQ. The data sometimes has the string "BQL" or a missing value and the BQL in a separate column.

pkdata_string_bql = copy(pkdata)
@rtransform! pkdata_string_bql :CONC = !ismissing(:CONC) && LLOQ >= :CONC ? "BQL" : :CONC
first(pkdata_string_bql, 20)
20×20 DataFrame
Row id time CONC evid amt cmt rate duration ss ii route η_1 η_2 Central Peripheral CL Vc Q Vp BQL
String? Float64 Any Int64? Float64? Symbol? Float64? Float64? Int8? Float64? NCA.Route? Float64? Float64? Float64? Float64? Float64? Float64? Float64? Float64? Bool?
1 1 0.0 missing 1 200.0 Central 0.0 0.0 0 0.0 NullRoute -0.205373 -0.122347 missing missing 2.3616 6.19389 1.8 9.0 missing
2 1 1.0 9.71942 0 missing missing missing missing missing missing missing -0.205373 -0.122347 105.662 38.3016 2.3616 6.19389 1.8 9.0 false
3 1 3.0 5.33975 0 missing missing missing missing missing missing missing -0.205373 -0.122347 39.0625 55.6195 2.3616 6.19389 1.8 9.0 false
4 1 5.0 3.72554 0 missing missing missing missing missing missing missing -0.205373 -0.122347 21.914 50.8124 2.3616 6.19389 1.8 9.0 false
5 1 7.0 1.82011 0 missing missing missing missing missing missing missing -0.205373 -0.122347 15.8283 42.8137 2.3616 6.19389 1.8 9.0 false
6 1 9.0 2.26987 0 missing missing missing missing missing missing missing -0.205373 -0.122347 12.5344 35.3837 2.3616 6.19389 1.8 9.0 false
7 1 11.0 1.31686 0 missing missing missing missing missing missing missing -0.205373 -0.122347 10.1926 29.0999 2.3616 6.19389 1.8 9.0 false
8 2 0.0 missing 1 200.0 Central 0.0 0.0 0 0.0 NullRoute -0.461936 0.160303 missing missing 1.82718 8.21707 1.8 9.0 missing
9 2 1.0 16.1205 0 missing missing missing missing missing missing missing -0.461936 0.160303 131.695 32.1012 1.82718 8.21707 1.8 9.0 false
10 2 3.0 8.51043 0 missing missing missing missing missing missing missing -0.461936 0.160303 67.2258 54.7327 1.82718 8.21707 1.8 9.0 false
11 2 5.0 5.3066 0 missing missing missing missing missing missing missing -0.461936 0.160303 42.711 55.6004 1.82718 8.21707 1.8 9.0 false
12 2 7.0 3.00945 0 missing missing missing missing missing missing missing -0.461936 0.160303 31.6951 50.3485 1.82718 8.21707 1.8 9.0 false
13 2 9.0 2.95567 0 missing missing missing missing missing missing missing -0.461936 0.160303 25.5099 43.9187 1.82718 8.21707 1.8 9.0 false
14 2 11.0 3.32116 0 missing missing missing missing missing missing missing -0.461936 0.160303 21.2751 37.7975 1.82718 8.21707 1.8 9.0 false
15 3 0.0 missing 1 200.0 Central 0.0 0.0 0 0.0 NullRoute 0.15667 -0.226875 missing missing 3.39187 5.57914 1.8 9.0 missing
16 3 1.0 19.9397 0 missing missing missing missing missing missing missing 0.15667 -0.226875 82.1791 37.885 3.39187 5.57914 1.8 9.0 false
17 3 3.0 4.29876 0 missing missing missing missing missing missing missing 0.15667 -0.226875 21.2668 47.0343 3.39187 5.57914 1.8 9.0 false
18 3 5.0 1.96556 0 missing missing missing missing missing missing missing 0.15667 -0.226875 10.9715 39.2597 3.39187 5.57914 1.8 9.0 false
19 3 7.0 1.26342 0 missing missing missing missing missing missing missing 0.15667 -0.226875 7.86242 31.1639 3.39187 5.57914 1.8 9.0 false
20 3 9.0 BQL 0 missing missing missing missing missing missing missing 0.15667 -0.226875 6.07933 24.5369 3.39187 5.57914 1.8 9.0 true

The string version can be handled with the inverse @rtransform! that changes the "BQL" value to LLOQ. If the dependent variable is missing and a BQL flag is present, something like the following can be used instead.

@rtransform! pkdata_bql_missing :CONC = ismissing(:CONC) && :BQL == true ? LLOQ : :CONC

In any case, it is important to note, that the format we expect in Pumas is, that the value of the dependent variable is set to the LLOQ or a value below. This means that setting the BQL rows to any number at or below LLOQ (0.9 in this case) is fine, including the value 0.

5 Estimating a model with BQL values

We will now estimate some models:

  1. M1: remove the data
  2. M2: truncated observations
  3. M3: censored observations
  4. M4: censored and positive

5.1 M1: remove the data

Now, we are finally ready to estimate a model based on our censored data. The most simple approach is to simply discard the data. Beal (2001) calls this approach “M1”. To perform such an analysis, we can filter away the BQL values and fit the model using iv2cmt_dgp.

pkdata_m1 = copy(pkdata_censored)
@rtransform! pkdata_m1 :CONC = :evid == 0 && :BQL == 0 ? :CONC : missing
first(pkdata_m1, 20)

We can now fit the model.

pop_m1 = read_pumas(pkdata_m1; observations = [:CONC])
res_m1 = fit(iv2cmt_dgp, pop_m1, param0, LaplaceI(); optim_options = (; show_trace = false))
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                 LaplaceI
Log-likelihood value:                   -633.05437
Number of subjects:                             90
Number of parameters:         Fixed      Optimized
                                  0              8
Observation records:         Active        Missing
    CONC:                       492             48
    Total:                      492             48

---------------------
           Estimate
---------------------
θcl         2.6745
θvc         6.5711
θq          1.9778
θvp        12.206
Ω₁,₁        0.077309
Ω₂,₂        0.11843
σ_add       0.053029
σ_prop      0.17819
---------------------

5.2 M2: Truncated Observations

Another method to use is the so-called M2 method. M1 and M2 both discard BQL observations, or alternatively they are the only option because the BQL observations were already dropped from the dataset by someone else! For an application of M2, we approach the data as if we know the LLOQ value, but if any observations was indeed BQL it was dropped from the data. Then, the appropriate distributional assumption on the observed values remaining in the dataset is to use a truncated combined normal if we believe the original data was indeed a combined normal.

iv2cmt_m2 = @model begin
    @param begin
        θcl  RealDomain(lower = 0.1)
        θvc  RealDomain(lower = 1.0)
        θq  RealDomain(lower = 0.1)
        θvp  RealDomain(lower = 1.1)
        Ω  PDiagDomain(2)
        σ_add  RealDomain(lower = 0.0)
        σ_prop  RealDomain(lower = 0.0)
    end

    @random begin
        η ~ MvNormal(Ω)
    end

    @pre begin
        CL = θcl * exp(η[1])
        Vc = θvc * exp(η[2])
        Q = θq
        Vp = θvp
    end

    @dynamics Central1Periph1

    @derived begin
        conc_model := @. Central / Vc
        CONC_normal := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
        CONC ~ @. truncated(CONC_normal, lower = LLOQ)
    end
end
PumasModel
  Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
  Random effects: η
  Covariates: 
  Dynamical variables: Central, Peripheral
  Derived: CONC
  Observed: CONC
pop_m2 = read_pumas(pkdata_m1; observations = [:CONC])
res_m2 = fit(iv2cmt_dgp, pop_m2, param0, LaplaceI(); optim_options = (; show_trace = false))
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                 LaplaceI
Log-likelihood value:                   -633.05437
Number of subjects:                             90
Number of parameters:         Fixed      Optimized
                                  0              8
Observation records:         Active        Missing
    CONC:                       492             48
    Total:                      492             48

---------------------
           Estimate
---------------------
θcl         2.6745
θvc         6.5711
θq          1.9778
θvp        12.206
Ω₁,₁        0.077309
Ω₂,₂        0.11843
σ_add       0.053029
σ_prop      0.17819
---------------------

5.3 M3: Censored Observations

If we do have the observations flagged BQL, but not the actual values, we can also use a censored distribution. That is, we either observe the data directly or we observe that it was at LLOQ or below. This is exactly what a censored distribution allows us to model, because it places all the observations below LLOQ in a point mass at the LLOQ. This is still useful information, and allows us to estimate parameters in the underlying distribution (the latent variable or non-censored distribution) from our assumed data generating process (DGP).

iv2cmt_m3 = @model begin
    @param begin
        θcl  RealDomain(lower = 0.1)
        θvc  RealDomain(lower = 1.0)
        θq  RealDomain(lower = 0.1)
        θvp  RealDomain(lower = 1.1)
        Ω  PDiagDomain(2)
        σ_add  RealDomain(lower = 0.0)
        σ_prop  RealDomain(lower = 0.0)
    end

    @random begin
        η ~ MvNormal(Ω)
    end

    @pre begin
        CL = θcl * exp(η[1])
        Vc = θvc * exp(η[2])
        Q = θq
        Vp = θvp
    end

    @dynamics Central1Periph1

    @derived begin
        conc_model := @. Central / Vc
        CONC_normal := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
        CONC ~ @. Censored(CONC_normal, LLOQ, Inf)
    end
end
PumasModel
  Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
  Random effects: η
  Covariates: 
  Dynamical variables: Central, Peripheral
  Derived: CONC
  Observed: CONC
pop_m3 = read_pumas(pkdata; observations = [:CONC])
res_m3 = fit(iv2cmt_m3, pop_m3, param0, LaplaceI(); optim_options = (show_trace = false,))
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                 LaplaceI
Log-likelihood value:                   -667.50211
Number of subjects:                             90
Number of parameters:         Fixed      Optimized
                                  0              8
Observation records:         Active        Missing
    CONC:                       540              0
    Total:                      540              0

---------------------
           Estimate
---------------------
θcl         2.821
θvc         6.6741
θq          1.8474
θvp        10.165
Ω₁,₁        0.087478
Ω₂,₂        0.14104
σ_add       0.15322
σ_prop      0.16691
---------------------

5.4 M4: Censored and positive

The last possibility we will consider is the M4 method. This method takes the approach that BQL values are represented with the probability of being in the left tail of the combined normal distribution given all parameters, but adds the fact that the observations must have been positive as well for concentration data. In our humble opinion, this approach is slightly strange given the assumed data generating process. Instead, we find it more useful to just think in terms of the DGP + Censoring. Then, M4 is simply M3 with the assumption that the original data was also naturally truncated. In other words, if M3 is “add censoring”, M4 is not really a separate statistical handling of BQL. It is simply a different assumption on the DGP.

iv2cmt_m4 = @model begin
    @param begin
        θcl  RealDomain(lower = 0.1)
        θvc  RealDomain(lower = 1.0)
        θq  RealDomain(lower = 0.1)
        θvp  RealDomain(lower = 1.1)
        Ω  PDiagDomain(2)
        σ_add  RealDomain(lower = 0.0)
        σ_prop  RealDomain(lower = 0.0)
    end

    @random begin
        η ~ MvNormal(Ω)
    end

    @pre begin
        CL = θcl * exp(η[1])
        Vc = θvc * exp(η[2])
        Q = θq
        Vp = θvp
    end

    @dynamics Central1Periph1

    @derived begin
        conc_model := @. Central / Vc
        CONC_normal := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
        CONC_trunc ~ @. truncated(CONC_normal, lower = 0)
        CONC ~ @. Censored(CONC_trunc, LLOQ, Inf)
    end
end
PumasModel
  Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
  Random effects: η
  Covariates: 
  Dynamical variables: Central, Peripheral
  Derived: CONC_trunc, CONC
  Observed: CONC_trunc, CONC
res_m4 = fit(iv2cmt_m4, pop_m3, param0, LaplaceI(); optim_options = (show_trace = false,))
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                 LaplaceI
Log-likelihood value:                   -667.50251
Number of subjects:                             90
Number of parameters:         Fixed      Optimized
                                  0              8
Observation records:         Active        Missing
    CONC:                       540              0
    Total:                      540              0

---------------------
           Estimate
---------------------
θcl         2.821
θvc         6.6741
θq          1.8474
θvp        10.165
Ω₁,₁        0.087478
Ω₂,₂        0.14104
σ_add       0.15318
σ_prop      0.16691
---------------------

The DGP for such data could have been represented with the following model.

iv2cmt_dgp_trunc = @model begin
    @param begin
        θcl  RealDomain(lower = 0.1)
        θvc  RealDomain(lower = 1.0)
        θq  RealDomain(lower = 0.1)
        θvp  RealDomain(lower = 1.1)
        Ω  PDiagDomain(2)
        σ_add  RealDomain(lower = 0.0)
        σ_prop  RealDomain(lower = 0.0)
    end

    @random begin
        η ~ MvNormal(Ω)
    end

    @pre begin
        CL = θcl * exp(η[1])
        Vc = θvc * exp(η[2])
        Q = θq
        Vp = θvp
    end

    @dynamics Central1Periph1

    @derived begin
        conc_model := @. Central / Vc
        CONC_normal := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
        CONC ~ @. truncated(CONC_normal, lower = 0)
    end
end
PumasModel
  Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
  Random effects: η
  Covariates: 
  Dynamical variables: Central, Peripheral
  Derived: CONC
  Observed: CONC

6 Subject-dependent LLOQ

It is possible that some subjects have different LLOQ values because they come from different studies, their plasma samples were sent to different labs, and so on. In this case, the LLOQ value should still be set as the value of the CONC at the time of censoring, and the specific censoring cen then be retrieved as a covariate. For example, the following population has two different LLOQs depending on the study.

pop1 = [
    Subject(
        id = string(i),
        observations = (CONC = nothing,),
        events = DosageRegimen(200),
        time = [1.0, 3.0, 5.0, 7.0, 9.0, 11.0],
        covariates = (STUDY = 001, LLOQ_cov = 0.9),
    ) for i = 1:90
]
pop2 = [
    Subject(
        id = string(i),
        observations = (CONC = nothing,),
        events = DosageRegimen(200),
        time = [1.0, 3.0, 5.0, 7.0, 9.0, 11.0],
        covariates = (STUDY = 002, LLOQ_cov = 0.09),
    ) for i = 1:10
]
pop = vcat(pop1, pop2)
Population
  Subjects: 100
  Covariates: STUDY, LLOQ_cov
  Observations: CONC
iv2cmt_m4 = @model begin
    @param begin
        θcl  RealDomain(lower = 0.1)
        θvc  RealDomain(lower = 1.0)
        θq  RealDomain(lower = 0.1)
        θvp  RealDomain(lower = 1.1)
        Ω  PDiagDomain(2)
        σ_add  RealDomain(lower = 0.0)
        σ_prop  RealDomain(lower = 0.0)
    end

    @covariates STUDY LLOQ_cov

    @random begin
        η ~ MvNormal(Ω)
    end

    @pre begin
        CL = θcl * exp(η[1])
        Vc = θvc * exp(η[2])
        Q = θq
        Vp = θvp
    end

    @dynamics Central1Periph1

    @derived begin
        conc_model := @. Central / Vc
        CONC_normal := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
        CONC_trunc ~ @. truncated(CONC_normal; lower = 0)
        CONC ~ @. Censored(CONC_trunc, LLOQ_cov, Inf)
    end
end
PumasModel
  Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
  Random effects: η
  Covariates: STUDY, LLOQ_cov
  Dynamical variables: Central, Peripheral
  Derived: CONC_trunc, CONC
  Observed: CONC_trunc, CONC

7 Conclusion

In this tutorial, we showed how to use filtering and use Censored to handle missing data problems for models fitted with linearized approximations to the likelihood function of a non-linear mixed effects model. Whether or not to include the BQL data rows in the data set depends on the expectations about any bias that may be introduced from ignoring the missing data problem. If the bias is expected to be significant, this tutorial showed how to write up a model that takes the missing data into consideration and estimate parameters more accurately.

8 References

Beal, Stuart L. 2001. “Ways to Fit a PK Model with Some Data Below the Quantification Limit.” Journal of Pharmacokinetics and Pharmacodynamics 28: 481–504.
Bergstrand, Martin, and Mats O Karlsson. 2009. “Handling Data Below the Limit of Quantification in Mixed Effect Models.” The AAPS Journal 11: 371–80.

Reuse