using Pumas
using PharmaDatasets
Model Selection Methods - Mixed Routines
As stated in Introduction to Covariate Selection Methods, broadly, there are two main types of Stepwise Covariate Model (SCM):
- Forward Selection (FS)
- Backward Elimination (BE)
In this tutorial we’ll explore a mixed routine based on both Forward Selection (FS) and Backward Elimination (BE).
1 Mixed Routines
One common mixed routine is to first perform a FS, followed by a BE. In all of these SCM routines, we’ll use the following criterion:
\[\operatorname{criterion}(f_p, f_b) = \operatorname{LRT}(f_p, f_b, \alpha)\]
where:
- \(f_p\) is the “proposed” model fit at the current iteration
- \(f_b\) is the model fit at the previous iteration (i.e. the current “best model”)
- \(\operatorname{LRT}\) is the likelihood-ratio test between models \(f_p\) and \(f_m\) using the \(\alpha\) value as the significance threshold
We’ll use different \(\alpha\) values depending on stage of the analysis. Common values are:
- FS: \(\alpha = 0.01\)
- BE: \(\alpha = 0.001\)
The rationale is that a less strict criterion value is used when adding variables during FS, and a more stringent criterion is used for allowing covariates to remain in the model during BE.
2 Example in Pumas
Let’s showcase an example that builds from the model we used in Covariate Selection Methods - Introduction.
First, let’s import the following packages:
We are going to use the po_sad_1
dataset from PharmaDatasets
:
= dataset("po_sad_1")
df first(df, 5)
Row | id | time | dv | amt | evid | cmt | rate | age | wt | doselevel | isPM | isfed | sex | route |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | Float64 | Float64? | Float64? | Int64 | Int64? | Float64 | Int64 | Int64 | Int64 | String3 | String3 | String7 | String3 | |
1 | 1 | 0.0 | missing | 30.0 | 1 | 1 | 0.0 | 51 | 74 | 30 | no | yes | male | ev |
2 | 1 | 0.25 | 35.7636 | missing | 0 | missing | 0.0 | 51 | 74 | 30 | no | yes | male | ev |
3 | 1 | 0.5 | 71.9551 | missing | 0 | missing | 0.0 | 51 | 74 | 30 | no | yes | male | ev |
4 | 1 | 0.75 | 97.3356 | missing | 0 | missing | 0.0 | 51 | 74 | 30 | no | yes | male | ev |
5 | 1 | 1.0 | 128.919 | missing | 0 | missing | 0.0 | 51 | 74 | 30 | no | yes | male | ev |
This is an oral dosing (route = "ev"
) NMTRAN-formatted dataset. It has 18 subjects, each with 1 dosing event (evid = 1
) and 18 measurement events (evid = 0
); and the following covariates:
age
: age in years (continuous)wt
: weight in kg (continuous)doselevel
: dosing amount, either30
,60
or90
milligrams (categorical)isPM
: subject is a poor metabolizer (binary)isfed
: subject is fed (binary)sex
: subject sex (binary)
=
population read_pumas(df; observations = [:dv], covariates = [:wt, :isPM, :isfed], route = :route)
Population
Subjects: 18
Covariates: wt, isPM, isfed
Observations: dv
We’ll use the same 2-compartment oral absorption model as in Covariate Selection Methods - Introduction, but now adding additional covariates and covariate effects:
= @model begin
model @metadata begin
= "full covariate model"
desc = u"hr"
timeu end
@param begin
"""
Clearance (L/hr)
"""
∈ RealDomain(; lower = 0)
tvcl """
Central Volume (L)
"""
∈ RealDomain(; lower = 0)
tvvc """
Peripheral Volume (L)
"""
∈ RealDomain(; lower = 0)
tvvp """
Distributional Clearance (L/hr)
"""
∈ RealDomain(; lower = 0)
tvq """
Fed - Absorption rate constant (h-1)
"""
∈ RealDomain(; lower = 0)
tvkafed """
Fasted - Absorption rate constant (h-1)
"""
∈ RealDomain(; lower = 0)
tvkafasted """
Power exponent on weight for Clearance
"""
∈ RealDomain()
dwtcl """
Proportional change in CL due to PM
"""
∈ RealDomain(; lower = -1, upper = 1)
ispmoncl """
- ΩCL
- ΩVc
- ΩKa
- ΩVp
- ΩQ
"""
∈ PDiagDomain(5)
Ω """
Proportional RUV (SD scale)
"""
∈ RealDomain(; lower = 0)
σₚ end
@random begin
~ MvNormal(Ω)
η end
@covariates begin
"""
Poor Metabolizer
"""
isPM"""
Fed status
"""
isfed"""
Weight (kg)
"""
wtend
@pre begin
= tvcl * (wt / 70)^dwtcl * (1 + ispmoncl * (isPM == "yes")) * exp(η[1])
CL = tvvc * (wt / 70) * exp(η[2])
Vc = (tvkafed + (isfed == "no") * (tvkafasted)) * exp(η[3])
Ka = tvq * (wt / 70)^0.75 * exp(η[4])
Q = tvvp * (wt / 70) * exp(η[5])
Vp end
@dynamics Depots1Central1Periph1
@derived begin
:= @. 1_000 * (Central / Vc)
cp """
DrugY Concentration (ng/mL)
"""
~ @. Normal(cp, cp * σₚ)
dv end
end
PumasModel
Parameters: tvcl, tvvc, tvvp, tvq, tvkafed, tvkafasted, dwtcl, ispmoncl, Ω, σₚ
Random effects: η
Covariates: isPM, isfed, wt
Dynamical system variables: Depot, Central, Peripheral
Dynamical system type: Closed form
Derived: dv
Observed: dv
With the model done, now we define our initial parameters:
= (;
iparams = 0.4,
tvkafed = 1.5,
tvkafasted = 4.0,
tvcl = 70.0,
tvvc = 4.0,
tvq = 50.0,
tvvp = 0.75,
dwtcl = -0.7,
ispmoncl = Diagonal(fill(0.04, 5)),
Ω = 0.1,
σₚ )
(tvkafed = 0.4,
tvkafasted = 1.5,
tvcl = 4.0,
tvvc = 70.0,
tvq = 4.0,
tvvp = 50.0,
dwtcl = 0.75,
ispmoncl = -0.7,
Ω = [0.04 0.0 … 0.0 0.0; 0.0 0.04 … 0.0 0.0; … ; 0.0 0.0 … 0.04 0.0; 0.0 0.0 … 0.0 0.04],
σₚ = 0.1,)
Now we create our criterion:
lrt_criterion(f; α = 0.01) = dof(f) * cquantile(Chisq(1), α) - (2 * loglikelihood(f))
lrt_criterion (generic function with 1 method)
lrt_criterion
takes a model fit f
and a desired \(\alpha\) value α
. It calculates the LRT by first multiplying the model’s degree-of-freedom (dof
) by the complementary quantile (cquantile
) of the \(\alpha\) value from a \(\chi^2\) distribution with one degree of freedom. Then, it subtracts from this value the model fit -2LL (minus 2 times log-likelihood) with constant. We use one degree-of-freedom because our SCM methods as currently implemented for both FS and BE, always add or remove a single covariate effect in each model evaluation.
First, we’ll run covariate_select
with lrt_criterion
as a criterion and the FS method. Finally, we run covariate_select
with lrt_criterion
as a criterion and the BE method.
Our α = 0.01
, so we use the default value defined in lrt_criterion
function.
= covariate_select(
covar_result_fs
model,
population,
iparams,FOCE();
= (:dwtcl, :tvkafasted, :ispmoncl),
control_param = lrt_criterion,
criterion = CovariateSelection.Forward,
method )
[ Info: fitting baseline model with none of the control_param parameters being estimated
[ Info: criterion: 2765.9438260321062
[ Info: fitting model with no restrictions on: dwtcl
[ Info: criterion: 2768.8716816089895
[ Info: fitting model with no restrictions on: tvkafasted
[ Info: criterion: 2735.691364008045
[ Info: fitting model with no restrictions on: ispmoncl
[ Info: criterion: 2758.210017916407
[ Info: current best model has no restrictions on tvkafasted
[ Info: criterion improved, continuing!
[ Info: fitting model with no restrictions on: dwtcl, tvkafasted
[ Info: criterion: 2738.5517536233997
[ Info: fitting model with no restrictions on: tvkafasted, ispmoncl
[ Info: criterion: 2727.974107213667
[ Info: current best model has no restrictions on tvkafasted, ispmoncl
[ Info: criterion improved, continuing!
[ Info: fitting model with no restrictions on: dwtcl, tvkafasted, ispmoncl
[ Info: criterion: 2706.474498734029
[ Info: current best model has no restrictions on dwtcl, tvkafasted, ispmoncl
[ Info: criterion improved, continuing!
┌ Info: final best model summary
│ best_criterion = 2706.474498734029
└ best_zero_restrictions = ()
Pumas.CovariateSelectionResults
Criterion: lrt_criterion
Method: Forward
Best model:
Unrestricted parameters: dwtcl, tvkafasted, ispmoncl
Zero restricted parameters:
Criterion value: 2706.47
Fitted model summary:
7×4 DataFrame
Row │ estimated_parameters restricted_parameters fit ⋯
│ Tuple… Tuple… Fit ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ () (:dwtcl, :tvkafasted, :ispmoncl) Fit ⋯
2 │ (:dwtcl,) (:tvkafasted, :ispmoncl) Fit
3 │ (:tvkafasted,) (:dwtcl, :ispmoncl) Fit
4 │ (:ispmoncl,) (:dwtcl, :tvkafasted) Fit
5 │ (:dwtcl, :tvkafasted) (:ispmoncl,) Fit ⋯
6 │ (:tvkafasted, :ispmoncl) (:dwtcl,) Fit
7 │ (:dwtcl, :tvkafasted, :ispmoncl) () Fit
2 columns omitted
Now we extract the best model from the covar_result
by accessing the best_model
field. That gives us a model fit, which we can use to extract the best model:
= covar_result_fs.best_model.model best_model_fs
PumasModel
Parameters: tvcl, tvvc, tvvp, tvq, tvkafed, tvkafasted, dwtcl, ispmoncl, Ω, σₚ
Random effects: η
Covariates: isPM, isfed, wt
Dynamical system variables: Depot, Central, Peripheral
Dynamical system type: Closed form
Derived: dv
Observed: dv
Our α = 0.001
, so need to override the default value defined in lrt_criterion
function.
= covariate_select(
covar_result_bs
best_model_fs,
population,
iparams,FOCE();
= (:dwtcl, :tvkafasted, :ispmoncl),
control_param = x -> lrt_criterion(x; α = 0.001),
criterion = CovariateSelection.Backward,
method )
[ Info: fitting baseline model with all of the control_param parameters being estimated
[ Info: criterion: 2765.17187270901
[ Info: fitting model with restrictions on: dwtcl
[ Info: criterion: 2782.478811619007
[ Info: fitting model with restrictions on: tvkafasted
[ Info: criterion: 2792.02605330052
[ Info: fitting model with restrictions on: ispmoncl
[ Info: criterion: 2793.0564580287396
[ Info: criterion did not improve, stopping!
┌ Info: final best model summary
│ best_criterion = 2765.17187270901
└ best_zero_restrictions = ()
Pumas.CovariateSelectionResults
Criterion: #16
Method: Backward
Best model:
Unrestricted parameters: dwtcl, tvkafasted, ispmoncl
Zero restricted parameters:
Criterion value: 2765.17
Fitted model summary:
4×4 DataFrame
Row │ estimated_parameters restricted_parameters fitted_model ⋯
│ Tuple… Tuple{Vararg{Symbol}} FittedPumas… ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ (:dwtcl, :tvkafasted, :ispmoncl) () FittedPumasMod ⋯
2 │ (:tvkafasted, :ispmoncl) (:dwtcl,) FittedPumasMod
3 │ (:dwtcl, :ispmoncl) (:tvkafasted,) FittedPumasMod
4 │ (:dwtcl, :tvkafasted) (:ispmoncl,) FittedPumasMod
2 columns omitted
x -> fun(x; ...)
is an anonymous function. Since we need to override the default keyword argument α
in lrt_criterion
for the backward elimination pass, we cannot use:
= lrt_criterion criterion
We need to use:
= x -> lrt_criterion(x; α = 0.001) criterion
As you can see, the results of this mixed routine: FS followed by BE process; results in the same final model as the FS and BE processes used in the Covariate Selection Methods - Forward Selection and Covariate Selection Methods - Backward Elimination.
covar_result_bs.best_model
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Closed form
Log-likelihood value: -1306.793
Number of subjects: 18
Number of parameters: Fixed Optimized
0 14
Observation records: Active Missing
dv: 270 0
Total: 270 0
-------------------------
Estimate
-------------------------
tvcl 3.6888
tvvc 70.044
tvvp 48.706
tvq 4.3506
tvkafed 0.41114
tvkafasted 1.0894
dwtcl 0.81856
ispmoncl -0.62382
Ω₁,₁ 0.022271
Ω₂,₂ 0.050192
Ω₃,₃ 0.031194
Ω₄,₄ 0.10595
Ω₅,₅ 0.035072
σₚ 0.098043
-------------------------