using Pumas
using DataFramesMeta
using PumasUtilities
using Random
NONMEM style BQL models (M1-M4) models in Pumas
This tutorial is about handling the missing data problem related to observations being outside the limits of quantification. In 2022, we held a webinar about the theory of such statistical modeling. It was was recorded and is available on YouTube. The video is embedded below.
Before we begin we need to load the necessary packages and set a seed for reproducibility.
1 Handling Lower Limit of Quantification
In the real world, assays often come with limitations such as lower or upper limits of quantification. When pharmacometricians receive data it has often been processed by the lab or by statisticians in other departments. The data can have various levels of information about the measured concentration if it was at or below the Lower Limit of Quantification (LLOQ). Unfortunately, such measurements are often simply labeled "BQL"
in the dependent variable column to indicate that the measurement was taken but it was Below Quantification Limit (BQL). If the value was provided it could be modelled in a more explicit manner, but with just BQL
indications, we need to treat it as a missing data problem.
In pharmacometrics, the main references on how to handle this kind of data are Beal (2001) for a conceptual description and Bergstrand and Karlsson (2009) for a simulation exercise. These papers use the M1-M7 nomenclature of the former. In Pumas, we try to always refer to observations according to their actual distributional assumptions or data transformation steps. In this tutorial, we will try to use both to ease the transition for a NONMEM workflow to a Pumas workflow. The discussions around M* methods assumes familiarity with the nomenclature described in the above papers. See our other tutorials for BQL introductions that do not directly refer to NONMEM methods if you are not transitioning from NONMEM to Pumas, as the discussion in terms of M* methods is mostly done for legacy reasons.
2 Definitions
Before we start, we need to define some terms:
Truncation is a statistical term that refers to the removal of data points that fall outside a certain range. In the context of pharmacometrics, truncation is used to remove data points that fall below the lower limit of quantification (LLOQ). Mathematically, say that \(X\) is our dependent variable, i.e. the quantity we want to measure:
\[X \sim N(\mu, \sigma^2)\]
\(X\) is generally measured with some instrument. To account for the fact that the instrument might have some limitations, we generally observe in our dataset either the truncated or censored values of \(X\). We denote \(Y^t\) as the truncated value of \(X\) and \(Y^c\) as the censored value of \(X\).
In truncation, we observe the value of \(X\) if it is between lower and higher thresholds, \(a\) and \(b\), respectively. For some reason, the data processing step has removed the data points that fall outside this range. We can represent a truncated variable \(Y^t\) as:
\[Y^t = X \mid a \leq X \leq b\]
The censoring case is different. In censoring, we observe the value of \(X\) if it is above a certain threshold, \(a\); or below a certain threshold, \(b\). The censored variable \(Y^c\) is then:
\[Y^c = \min(\max(X, a), b)\]
Here are examples of truncation and censoring:
- Truncation (missing information): A device measures a plasma concentration. We only see data points if they are within the truncation limits.
- Censoring (missing information): A device measures a plasma concentration as before. However, below some measurement value (LLOQ) the number cannot be trusted so we only know that our measurement is below that value.
There’s a difference between data truncation and a truncated distribution. Data truncation refers to the removal of data points that fall outside a certain range, whereas a truncated distribution is a probability distribution that is restricted to a certain range. It assumes that the Data Generating Process (DGP) is truncated. In the sense that the data is generated from a distribution that is truncated.
For example an assay that cannot return negative values, or a measurement device that cannot measure values above a certain threshold. In these cases the Normal distribution, which originally has support on the entire real line, is truncated to a certain range.
3 Generating data
To study a specific statistical phenomenon and properties of any estimator it is often important to be able to separately specify:
- the Data Generating Process (DGP); and
- the estimation procedure or estimator.
The data in this tutorial will come from a simple, simulated data set with 90 subjects that receive an injection of 200 units at time 0. The assay used to measure the concentration of the drug in plasma has a Lower Limit of Quantification (LLOQ) of 0.9 units of the concentration. By the simple properties of a two compartment model with linear elimination rate, all subjects will eventually have concentrations that are sufficiently low that samples will have a chance to be Below the Quantification Limit (BQL). The error model is a combined error model. With such a distribution we can have observations that are not BQL even at high t
(model time), but the proportion of BQL values should be increasing over time.
Notice, that the above type of BQL observations is only one type of many. The proportion of BQL is simply a function of time. In some cases, the probability of a being BQL may be a function of covariates as well. Some patients may have a higher volume of distribution for example. This does not change how we handle the issue statistically, but it may affect how we decide whether or not to ignore or explicitly handle the BQL issue. You could argue that you should always explicitly handle the BQL values, but the choice may not be so simple. The methods that explicitly model the BQL values in a statistical manner require the use of the LaplaceI
method if we are using first order approximation methods. This method is known to be slower and sometimes less numerically robust than FOCE
. For this reason, an analyst may choose to only use BQL methods if the bias from simply ignoring the problematic observations is expected to be high. An alternative is to use Expectation-Maximization (EM) methods such as SAEM
. The treatment of such an approach is done in a separate tutorial.
4 Data Generating Process
The Data Generating Process (DGP) consists of the specification of:
- the population
- the doses
- the sampling times
- the model
- the parameters
4.1 The population, doses, and sampling times
As mentioned previously, each subject is dosed 200 units of the drug at time zero. There are 90 subjects in total and we sample their plasma at the time of dosing and each hours thereafter for twelve hours.
= [
pop Subject(
= string(i),
id = (CONC = nothing,),
observations = DosageRegimen(200),
events = [1.0, 3.0, 5.0, 7.0, 9.0, 11.0],
time = 1:90
) for i ]
This population is empty in the sense that it has no simulated observations yet. To do that, we need to define a model and the parameters. The following model can be seen as a model for the distribution and elimination of the drug (the dynamical model) as well as the values from the assay (the derived block) without any masking of BQL values.
= @model begin
iv2cmt_dgp @param begin
∈ RealDomain(lower = 0.1)
θcl ∈ RealDomain(lower = 1.0)
θvc ∈ RealDomain(lower = 0.1)
θq ∈ RealDomain(lower = 1.1)
θvp ∈ PDiagDomain(2)
Ω ∈ RealDomain(lower = 0.0)
σ_add ∈ RealDomain(lower = 0.0)
σ_prop end
@random begin
~ MvNormal(Ω)
η end
@pre begin
= θcl * exp(η[1])
CL = θvc * exp(η[2])
Vc = θq
Q = θvp
Vp end
@dynamics Central1Periph1
@derived begin
:= @. Central / Vc
conc_model ~ @. Normal(conc_model, sqrt(σ_add^2 + (abs(conc_model) * σ_prop)^2))
CONC end
end
PumasModel
Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
Random effects: η
Covariates:
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: CONC
Observed: CONC
We can also choose some parameters to simulate from:
= (;
param0 = 2.9,
θcl = 7.0,
θvc = 1.8,
θq = 9.0,
θvp = Diagonal([0.09, 0.09]),
Ω = sqrt(0.03),
σ_add = sqrt(0.03),
σ_prop )
(θcl = 2.9,
θvc = 7.0,
θq = 1.8,
θvp = 9.0,
Ω = [0.09 0.0; 0.0 0.09],
σ_add = 0.17320508075688773,
σ_prop = 0.17320508075688773,)
4.2 A simulated population
To simulate the population, we simple use simobs
and the objects defined above.
# Set a seed for reproducibility
= Random.seed!(2142)
rng = simobs(iv2cmt_dgp, pop, param0; rng)
simpop DataFrame(simpop)
simpop
carries the simulated information about the assay, but does not yet apply the BQL masking. This is done in the next step.
4.3 Lower Limit Of Quantification (LLOQ)
In this example, we will assume that our assay has a lower limit of quantification at 0.9
. We can convert our simulated population to a tabular format (DataFrame
) and calculate what rows are BQL. This information can be used to produce plots showing the censoring, and to create our masked dependent variable.
= 0.9
LLOQ = DataFrame(simpop)
pkdata = pkdata.CONC .<= LLOQ
pkdata.BQL = filter(x -> x.evid == 0, pkdata) pkdata_obs
4.3.1 Calculating proportions BQL by time
To investigate whether censoring of the dependent variable is an issue or not, we can calculate the proportion of BQL at each observation time.
= groupby(pkdata_obs, :time)
pkdata_obs_gdf = combine(pkdata_obs_gdf, :BQL => mean => "BQL_proportion") pkdata_obs_gdf_comb
Row | time | BQL_proportion |
---|---|---|
Float64 | Float64 | |
1 | 1.0 | 0.0 |
2 | 3.0 | 0.0 |
3 | 5.0 | 0.0111111 |
4 | 7.0 | 0.0888889 |
5 | 9.0 | 0.2 |
6 | 11.0 | 0.311111 |
We see that the value of the BQL_proportion
column is increasing, and towards the end we do have a fair amount of BQL values. Another way to investigate this number is in a plot.
4.3.2 Visualizing BQL proportions
Let us visualize the effect of the censoring in the following plot.
Show/Hide Code
using AlgebraOfGraphics
= AlgebraOfGraphics.Figure()
f = draw!(
f1 1:2, 1],
f[data(pkdata_obs) *
mapping(:time, :CONC => "CONC", color = :BQL) *
visual(AlgebraOfGraphics.Scatter) +
mapping([LLOQ], [0.0]) * visual(AlgebraOfGraphics.ABLines; color = :orange),
scales(Color = (; palette = [:black, :orange]));
= (; xlabel = "",),
axis
)= draw!(
f2 3, 1],
f[data(pkdata_obs_gdf_comb) *
mapping(:time, :BQL_proportion => "BQL proportion") *
visual(AlgebraOfGraphics.Scatter; color = :orange);
= (limits = (nothing, nothing, 0, 1),),
axis
)legend!(f[1:2, 2], f1)
f
In the bottom panel, we see that the proportion of BQL values increases over time. The censoring is a little bit hard to see. In the next plot, it is easier to see the effect when we plot the y-axis as the log-transformed variable.
Show/Hide Code
using AlgebraOfGraphics
= AlgebraOfGraphics.Figure()
f = draw!(
f1 1:2, 1],
f[data(pkdata_obs) *
mapping(:time, :CONC => log => "log(CONC)", color = :BQL) *
visual(AlgebraOfGraphics.Scatter) +
mapping([log(LLOQ)], [0.0]) * visual(AlgebraOfGraphics.ABLines; color = :orange),
scales(Color = (; palette = [:black, :orange]));
= (; xlabel = ""),
axis
)= draw!(
f2 3, 1],
f[data(pkdata_obs_gdf_comb) *
mapping(:time, :BQL_proportion => "BQL proportion") *
visual(AlgebraOfGraphics.Scatter; color = :orange);
= (; limits = (nothing, nothing, 0, 1)),
axis
)legend!(f[1:2, 2], f1)
f
4.4 Masking the data
As mentioned in the introduction, real world data often comes without the assay values if they are BQL.
= copy(pkdata)
pkdata_censored .= max.(LLOQ, pkdata_censored.CONC)
pkdata_censored.CONC first(pkdata_censored, 20)
Row | id | time | CONC | evid | amt | cmt | rate | duration | ss | ii | route | Central | Peripheral | CL | Vc | Q | Vp | η_1 | η_2 | BQL |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String | Float64 | Float64? | Int64 | Float64? | Symbol? | Float64? | Float64? | Int8? | Float64? | NCA.Route? | Float64? | Float64? | Float64? | Float64? | Float64? | Float64? | Float64 | Float64 | Bool? | |
1 | 1 | 0.0 | missing | 1 | 200.0 | Central | 0.0 | 0.0 | 0 | 0.0 | NullRoute | missing | missing | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | missing |
2 | 1 | 1.0 | 17.9539 | 0 | missing | missing | missing | missing | missing | missing | missing | 50.6759 | 57.7767 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
3 | 1 | 3.0 | 3.59012 | 0 | missing | missing | missing | missing | missing | missing | missing | 10.6936 | 59.8385 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
4 | 1 | 5.0 | 1.89758 | 0 | missing | missing | missing | missing | missing | missing | missing | 7.24901 | 48.6145 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
5 | 1 | 7.0 | 1.99616 | 0 | missing | missing | missing | missing | missing | missing | missing | 5.75758 | 39.0359 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
6 | 1 | 9.0 | 1.62626 | 0 | missing | missing | missing | missing | missing | missing | missing | 4.61737 | 31.3244 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
7 | 1 | 11.0 | 0.9 | 0 | missing | missing | missing | missing | missing | missing | missing | 3.70496 | 25.1354 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | true |
8 | 2 | 0.0 | missing | 1 | 200.0 | Central | 0.0 | 0.0 | 0 | 0.0 | NullRoute | missing | missing | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | missing |
9 | 2 | 1.0 | 12.8523 | 0 | missing | missing | missing | missing | missing | missing | missing | 102.014 | 47.5713 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
10 | 2 | 3.0 | 8.46052 | 0 | missing | missing | missing | missing | missing | missing | missing | 38.0232 | 68.2487 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
11 | 2 | 5.0 | 4.2123 | 0 | missing | missing | missing | missing | missing | missing | missing | 23.0299 | 63.0127 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
12 | 2 | 7.0 | 3.32903 | 0 | missing | missing | missing | missing | missing | missing | missing | 17.7494 | 54.2878 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
13 | 2 | 9.0 | 3.54978 | 0 | missing | missing | missing | missing | missing | missing | missing | 14.6831 | 46.0971 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
14 | 2 | 11.0 | 2.82836 | 0 | missing | missing | missing | missing | missing | missing | missing | 12.3548 | 39.0171 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
15 | 3 | 0.0 | missing | 1 | 200.0 | Central | 0.0 | 0.0 | 0 | 0.0 | NullRoute | missing | missing | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | missing |
16 | 3 | 1.0 | 21.4362 | 0 | missing | missing | missing | missing | missing | missing | missing | 81.4514 | 51.2734 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
17 | 3 | 3.0 | 6.00094 | 0 | missing | missing | missing | missing | missing | missing | missing | 23.4483 | 64.5451 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
18 | 3 | 5.0 | 3.0174 | 0 | missing | missing | missing | missing | missing | missing | missing | 14.0357 | 55.8149 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
19 | 3 | 7.0 | 2.37679 | 0 | missing | missing | missing | missing | missing | missing | missing | 10.9155 | 46.2582 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
20 | 3 | 9.0 | 2.04447 | 0 | missing | missing | missing | missing | missing | missing | missing | 8.9062 | 38.1072 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
Here LLOQ
is the variable defined above, and has the value of 0.9
. This type of dataset has the dependent variable column set to the actual LLOQ. The data sometimes has the string "BQL"
or a missing value and the BQL
in a separate column.
= copy(pkdata)
pkdata_string_bql @rtransform! pkdata_string_bql :CONC = !ismissing(:CONC) && LLOQ >= :CONC ? "BQL" : :CONC
first(pkdata_string_bql, 20)
Row | id | time | CONC | evid | amt | cmt | rate | duration | ss | ii | route | Central | Peripheral | CL | Vc | Q | Vp | η_1 | η_2 | BQL |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String | Float64 | Any | Int64 | Float64? | Symbol? | Float64? | Float64? | Int8? | Float64? | NCA.Route? | Float64? | Float64? | Float64? | Float64? | Float64? | Float64? | Float64 | Float64 | Bool? | |
1 | 1 | 0.0 | missing | 1 | 200.0 | Central | 0.0 | 0.0 | 0 | 0.0 | NullRoute | missing | missing | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | missing |
2 | 1 | 1.0 | 17.9539 | 0 | missing | missing | missing | missing | missing | missing | missing | 50.6759 | 57.7767 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
3 | 1 | 3.0 | 3.59012 | 0 | missing | missing | missing | missing | missing | missing | missing | 10.6936 | 59.8385 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
4 | 1 | 5.0 | 1.89758 | 0 | missing | missing | missing | missing | missing | missing | missing | 7.24901 | 48.6145 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
5 | 1 | 7.0 | 1.99616 | 0 | missing | missing | missing | missing | missing | missing | missing | 5.75758 | 39.0359 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
6 | 1 | 9.0 | 1.62626 | 0 | missing | missing | missing | missing | missing | missing | missing | 4.61737 | 31.3244 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | false |
7 | 1 | 11.0 | BQL | 0 | missing | missing | missing | missing | missing | missing | missing | 3.70496 | 25.1354 | 2.5274 | 2.95 | 1.8 | 9.0 | -0.13752 | -0.864106 | true |
8 | 2 | 0.0 | missing | 1 | 200.0 | Central | 0.0 | 0.0 | 0 | 0.0 | NullRoute | missing | missing | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | missing |
9 | 2 | 1.0 | 12.8523 | 0 | missing | missing | missing | missing | missing | missing | missing | 102.014 | 47.5713 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
10 | 2 | 3.0 | 8.46052 | 0 | missing | missing | missing | missing | missing | missing | missing | 38.0232 | 68.2487 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
11 | 2 | 5.0 | 4.2123 | 0 | missing | missing | missing | missing | missing | missing | missing | 23.0299 | 63.0127 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
12 | 2 | 7.0 | 3.32903 | 0 | missing | missing | missing | missing | missing | missing | missing | 17.7494 | 54.2878 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
13 | 2 | 9.0 | 3.54978 | 0 | missing | missing | missing | missing | missing | missing | missing | 14.6831 | 46.0971 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
14 | 2 | 11.0 | 2.82836 | 0 | missing | missing | missing | missing | missing | missing | missing | 12.3548 | 39.0171 | 1.70968 | 4.89822 | 1.8 | 9.0 | -0.528407 | -0.357039 | false |
15 | 3 | 0.0 | missing | 1 | 200.0 | Central | 0.0 | 0.0 | 0 | 0.0 | NullRoute | missing | missing | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | missing |
16 | 3 | 1.0 | 21.4362 | 0 | missing | missing | missing | missing | missing | missing | missing | 81.4514 | 51.2734 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
17 | 3 | 3.0 | 6.00094 | 0 | missing | missing | missing | missing | missing | missing | missing | 23.4483 | 64.5451 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
18 | 3 | 5.0 | 3.0174 | 0 | missing | missing | missing | missing | missing | missing | missing | 14.0357 | 55.8149 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
19 | 3 | 7.0 | 2.37679 | 0 | missing | missing | missing | missing | missing | missing | missing | 10.9155 | 46.2582 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
20 | 3 | 9.0 | 2.04447 | 0 | missing | missing | missing | missing | missing | missing | missing | 8.9062 | 38.1072 | 2.1089 | 4.09481 | 1.8 | 9.0 | -0.318543 | -0.536189 | false |
The string version can be handled with the inverse @rtransform!
that changes the "BQL"
value to LLOQ
. If the dependent variable is missing
and a BQL
flag is present, something like the following can be used instead.
@rtransform! pkdata_bql_missing :CONC = ismissing(:CONC) && :BQL == true ? LLOQ : :CONC
In any case, it is important to note, that the format we expect in Pumas is, that the value of the dependent variable is set to the LLOQ or a value below. This means that setting the BQL rows to any number at or below LLOQ
(0.9 in this case) is fine, including the value 0.
5 Estimating a model with BQL values
We will now estimate some models:
- Remove the data (M1)
- Truncated observations (M2)
- Censored observations (M3)
- Censored and positive (M4)
5.1 Remove the data (M1)
Now, we are finally ready to estimate a model based on our censored data. The most simple approach is to simply discard the data. Beal (2001) calls this approach “M1”. To perform such an analysis, we can filter away the BQL values and fit the model using iv2cmt_dgp
.
= copy(pkdata_censored)
pkdata_m1 @rtransform! pkdata_m1 :CONC = :evid == 0 && :BQL == 0 ? :CONC : missing
first(pkdata_m1, 20)
We can now fit the model.
= read_pumas(pkdata_m1; observations = [:CONC])
pop_m1 = fit(iv2cmt_dgp, pop_m1, param0, LaplaceI(); optim_options = (; show_trace = false)) res_m1
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Successful minimization: true
Likelihood approximation: LaplaceI
Likelihood Optimizer: BFGS
Dynamical system type: Closed form
Log-likelihood value: -669.29258
Number of subjects: 90
Number of parameters: Fixed Optimized
0 8
Observation records: Active Missing
CONC: 485 55
Total: 485 55
---------------------
Estimate
---------------------
θcl 2.636
θvc 6.7184
θq 1.8271
θvp 10.38
Ω₁,₁ 0.077215
Ω₂,₂ 0.11287
σ_add 0.079008
σ_prop 0.18028
---------------------
5.2 Truncated observations (M2)
Another method to use is the so-called M2 method. M1 and M2 both discard BQL observations, or alternatively they are the only option because the BQL observations were already dropped from the dataset by someone else! For an application of M2, we approach the data as if we know the LLOQ value, but if any observations was indeed BQL it was dropped from the data. Then, the appropriate distributional assumption on the observed values remaining in the dataset is to use a truncated combined normal if we believe the original data was indeed a combined normal.
= @model begin
iv2cmt_m2 @param begin
∈ RealDomain(lower = 0.1)
θcl ∈ RealDomain(lower = 1.0)
θvc ∈ RealDomain(lower = 0.1)
θq ∈ RealDomain(lower = 1.1)
θvp ∈ PDiagDomain(2)
Ω ∈ RealDomain(lower = 0.0)
σ_add ∈ RealDomain(lower = 0.0)
σ_prop end
@random begin
~ MvNormal(Ω)
η end
@pre begin
= θcl * exp(η[1])
CL = θvc * exp(η[2])
Vc = θq
Q = θvp
Vp end
@dynamics Central1Periph1
@derived begin
:= @. Central / Vc
conc_model := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
CONC_normal ~ @. truncated_latent(CONC_normal, lower = LLOQ)
CONC end
end
PumasModel
Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
Random effects: η
Covariates:
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: CONC
Observed: CONC
= read_pumas(pkdata_m1; observations = [:CONC])
pop_m2 = fit(iv2cmt_dgp, pop_m2, param0, LaplaceI(); optim_options = (; show_trace = false)) res_m2
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Successful minimization: true
Likelihood approximation: LaplaceI
Likelihood Optimizer: BFGS
Dynamical system type: Closed form
Log-likelihood value: -669.29258
Number of subjects: 90
Number of parameters: Fixed Optimized
0 8
Observation records: Active Missing
CONC: 485 55
Total: 485 55
---------------------
Estimate
---------------------
θcl 2.636
θvc 6.7184
θq 1.8271
θvp 10.38
Ω₁,₁ 0.077215
Ω₂,₂ 0.11287
σ_add 0.079008
σ_prop 0.18028
---------------------
5.3 Censored observations (M3)
If we do have the observations flagged BQL, but not the actual values, we can also use a censored distribution. That is, we either observe the data directly or we observe that it was at LLOQ or below. This is exactly what a censored distribution allows us to model, because it places all the observations below LLOQ in a point mass at the LLOQ. This is still useful information, and allows us to estimate parameters in the underlying distribution (the latent variable or non-censored distribution) from our assumed data generating process (DGP).
= @model begin
iv2cmt_m3 @param begin
∈ RealDomain(lower = 0.1)
θcl ∈ RealDomain(lower = 1.0)
θvc ∈ RealDomain(lower = 0.1)
θq ∈ RealDomain(lower = 1.1)
θvp ∈ PDiagDomain(2)
Ω ∈ RealDomain(lower = 0.0)
σ_add ∈ RealDomain(lower = 0.0)
σ_prop end
@random begin
~ MvNormal(Ω)
η end
@pre begin
= θcl * exp(η[1])
CL = θvc * exp(η[2])
Vc = θq
Q = θvp
Vp end
@dynamics Central1Periph1
@derived begin
:= @. Central / Vc
conc_model := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
CONC_normal ~ @. censored_latent(CONC_normal, LLOQ, Inf)
CONC end
end
PumasModel
Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
Random effects: η
Covariates:
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: CONC
Observed: CONC
= read_pumas(pkdata_censored; observations = [:CONC])
pop_m3 = fit(iv2cmt_m3, pop_m3, param0, LaplaceI(); optim_options = (show_trace = false,)) res_m3
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Successful minimization: true
Likelihood approximation: LaplaceI
Likelihood Optimizer: BFGS
Dynamical system type: Closed form
Log-likelihood value: -713.28729
Number of subjects: 90
Number of parameters: Fixed Optimized
0 8
Observation records: Active Missing
CONC: 540 0
Total: 540 0
---------------------
Estimate
---------------------
θcl 2.8251
θvc 6.8525
θq 1.7065
θvp 8.0187
Ω₁,₁ 0.088402
Ω₂,₂ 0.12025
σ_add 0.20152
σ_prop 0.1683
---------------------
5.4 Censored and positive (M4)
The last possibility we will consider is the M4 method. This method takes the approach that BQL values are represented with the probability of being in the left tail of the combined normal distribution given all parameters, but adds the fact that the observations must have been positive as well for concentration data. In our humble opinion, this approach is slightly strange given the assumed data generating process. Instead, we find it more useful to just think in terms of the DGP + Censoring. Then, M4 is simply M3 with the assumption that the original data was also naturally truncated. In other words, if M3 is “add censoring”, M4 is not really a separate statistical handling of BQL. It is simply a different assumption on the DGP.
= @model begin
iv2cmt_m4 @param begin
∈ RealDomain(lower = 0.1)
θcl ∈ RealDomain(lower = 1.0)
θvc ∈ RealDomain(lower = 0.1)
θq ∈ RealDomain(lower = 1.1)
θvp ∈ PDiagDomain(2)
Ω ∈ RealDomain(lower = 0.0)
σ_add ∈ RealDomain(lower = 0.0)
σ_prop end
@random begin
~ MvNormal(Ω)
η end
@pre begin
= θcl * exp(η[1])
CL = θvc * exp(η[2])
Vc = θq
Q = θvp
Vp end
@dynamics Central1Periph1
@derived begin
:= @. Central / Vc
conc_model := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
CONC_normal ~ @. truncated(CONC_normal, lower = 0)
CONC_trunc ~ @. Censored(CONC_trunc, LLOQ, Inf)
CONC end
end
PumasModel
Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
Random effects: η
Covariates:
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: CONC_trunc, CONC
Observed: CONC_trunc, CONC
= fit(iv2cmt_m4, pop_m3, param0, LaplaceI(); optim_options = (show_trace = false,)) res_m4
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Successful minimization: true
Likelihood approximation: LaplaceI
Likelihood Optimizer: BFGS
Dynamical system type: Closed form
Log-likelihood value: -713.2916
Number of subjects: 90
Number of parameters: Fixed Optimized
0 8
Observation records: Active Missing
CONC: 540 0
Total: 540 0
---------------------
Estimate
---------------------
θcl 2.8251
θvc 6.8525
θq 1.7066
θvp 8.019
Ω₁,₁ 0.088404
Ω₂,₂ 0.12024
σ_add 0.20126
σ_prop 0.16833
---------------------
The DGP for such data could have been represented with the following model.
= @model begin
iv2cmt_dgp_trunc @param begin
∈ RealDomain(lower = 0.1)
θcl ∈ RealDomain(lower = 1.0)
θvc ∈ RealDomain(lower = 0.1)
θq ∈ RealDomain(lower = 1.1)
θvp ∈ PDiagDomain(2)
Ω ∈ RealDomain(lower = 0.0)
σ_add ∈ RealDomain(lower = 0.0)
σ_prop end
@random begin
~ MvNormal(Ω)
η end
@pre begin
= θcl * exp(η[1])
CL = θvc * exp(η[2])
Vc = θq
Q = θvp
Vp end
@dynamics Central1Periph1
@derived begin
:= @. Central / Vc
conc_model := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
CONC_normal ~ @. truncated(CONC_normal, lower = 0)
CONC end
end
PumasModel
Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
Random effects: η
Covariates:
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: CONC
Observed: CONC
6 Subject-dependent LLOQ
It is possible that some subjects have different LLOQ values because they come from different studies, and, for example, their plasma samples were sent to different labs. In this case, the LLOQ value should still be set as the value of the CONC
at the time of censoring, and the specific censoring can then be retrieved as a covariate. For example, the following population has two different LLOQs depending on the study.
= [
pop1 Subject(
= string(i),
id = (CONC = nothing,),
observations = DosageRegimen(200),
events = [1.0, 3.0, 5.0, 7.0, 9.0, 11.0],
time = (STUDY = 001, LLOQ_cov = 0.9),
covariates = 1:90
) for i
]= [
pop2 Subject(
= string(i),
id = (CONC = nothing,),
observations = DosageRegimen(200),
events = [1.0, 3.0, 5.0, 7.0, 9.0, 11.0],
time = (STUDY = 002, LLOQ_cov = 0.09),
covariates = 1:10
) for i
]= vcat(pop1, pop2) pop
Population
Subjects: 100
Covariates: STUDY, LLOQ_cov
Observations: CONC
= @model begin
iv2cmt_m4 @param begin
∈ RealDomain(lower = 0.1)
θcl ∈ RealDomain(lower = 1.0)
θvc ∈ RealDomain(lower = 0.1)
θq ∈ RealDomain(lower = 1.1)
θvp ∈ PDiagDomain(2)
Ω ∈ RealDomain(lower = 0.0)
σ_add ∈ RealDomain(lower = 0.0)
σ_prop end
@covariates STUDY LLOQ_cov
@random begin
~ MvNormal(Ω)
η end
@pre begin
= θcl * exp(η[1])
CL = θvc * exp(η[2])
Vc = θq
Q = θvp
Vp end
@dynamics Central1Periph1
@derived begin
:= @. Central / Vc
conc_model := @. Normal(conc_model, sqrt(σ_add^2 + (conc_model * σ_prop)^2))
CONC_normal ~ @. truncated(CONC_normal; lower = 0)
CONC_trunc ~ @. Censored(CONC_trunc, LLOQ_cov, Inf)
CONC end
end
┌ Warning: Covariate STUDY is not used in the model.
└ @ Pumas ~/_work/PumasTutorials.jl/PumasTutorials.jl/custom_julia_depot/packages/Pumas/aZRyj/src/dsl/model_macro.jl:2856
PumasModel
Parameters: θcl, θvc, θq, θvp, Ω, σ_add, σ_prop
Random effects: η
Covariates: STUDY, LLOQ_cov
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: CONC_trunc, CONC
Observed: CONC_trunc, CONC
7 Conclusion
In this tutorial, we showed how to use filtering, truncation, and censoring to handle missing data problems for models fitted with linearized approximations to the likelihood function of a non-linear mixed effects model. Whether or not to include the BQL data rows in the data set depends on the expectations about any bias that may be introduced from ignoring the missing data problem. If the bias is expected to be significant, this tutorial showed how to write up a model that takes the missing data into consideration and estimate parameters more accurately.