# Model Selection Methods - Mixed Routines

Authors

Jose Storopoli

Andreas Noack

Joel Owen

As stated in Introduction to Covariate Selection Methods, broadly, there are two main types of Stepwise Covariate Model (SCM):

1. Forward Selection (FS)
2. Backward Elimination (BE)

In this tutorial we’ll explore a mixed routine based on both Forward Selection (FS) and Backward Elimination (BE).

## 1 Mixed Routines

One common mixed routine is to first perform a FS, followed by a BE. In all of these SCM routines, we’ll use the following criterion:

$\operatorname{criterion}(f_p, f_b) = \operatorname{LRT}(f_p, f_b, \alpha)$

where:

• $$f_p$$ is the “proposed” model fit at the current iteration
• $$f_b$$ is the model fit at the previous iteration (i.e. the current “best model”)
• $$\operatorname{LRT}$$ is the likelihood-ratio test between models $$f_p$$ and $$f_m$$ using the $$\alpha$$ value as the significance threshold

We’ll use different $$\alpha$$ values depending on stage of the analysis. Common values are:

1. FS: $$\alpha = 0.01$$
2. BE: $$\alpha = 0.001$$

The rationale is that a less strict criterion value is used when adding variables during FS, and a more stringent criterion is used for allowing covariates to remain in the model during BE.

## 2 Example in Pumas

Let’s showcase an example that builds from the model we used in Covariate Selection Methods - Introduction.

First, let’s import the following packages:

using Pumas
using PharmaDatasets

We are going to use the po_sad_1 dataset from PharmaDatasets:

df = dataset("po_sad_1")
first(df, 5)
5×14 DataFrame
Row id time dv amt evid cmt rate age wt doselevel isPM isfed sex route
Int64 Float64 Float64? Float64? Int64 Int64? Float64 Int64 Int64 Int64 String3 String3 String7 String3
1 1 0.0 missing 30.0 1 1 0.0 51 74 30 no yes male ev
2 1 0.25 35.7636 missing 0 missing 0.0 51 74 30 no yes male ev
3 1 0.5 71.9551 missing 0 missing 0.0 51 74 30 no yes male ev
4 1 0.75 97.3356 missing 0 missing 0.0 51 74 30 no yes male ev
5 1 1.0 128.919 missing 0 missing 0.0 51 74 30 no yes male ev

This is an oral dosing (route = "ev") NMTRAN-formatted dataset. It has 18 subjects, each with 1 dosing event (evid = 1) and 18 measurement events (evid = 0); and the following covariates:

• age: age in years (continuous)
• wt: weight in kg (continuous)
• doselevel: dosing amount, either 30, 60 or 90 milligrams (categorical)
• isPM: subject is a poor metabolizer (binary)
• isfed: subject is fed (binary)
• sex: subject sex (binary)
population =
read_pumas(df; observations = [:dv], covariates = [:wt, :isPM, :isfed], route = :route)
Population
Subjects: 18
Covariates: wt, isPM, isfed
Observations: dv

We’ll use the same 2-compartment oral absorption model as in Covariate Selection Methods - Introduction, but now adding additional covariates and covariate effects:

model = @model begin
desc = "full covariate model"
timeu = u"hr"
end

@param begin
"""
Clearance (L/hr)
"""
tvcl ∈ RealDomain(; lower = 0)
"""
Central Volume (L)
"""
tvvc ∈ RealDomain(; lower = 0)
"""
Peripheral Volume (L)
"""
tvvp ∈ RealDomain(; lower = 0)
"""
Distributional Clearance (L/hr)
"""
tvq ∈ RealDomain(; lower = 0)
"""
Fed - Absorption rate constant (h-1)
"""
tvkafed ∈ RealDomain(; lower = 0)
"""
Fasted - Absorption rate constant (h-1)
"""
tvkafasted ∈ RealDomain(; lower = 0)
"""
Power exponent on weight for Clearance
"""
dwtcl ∈ RealDomain()
"""
Proportional change in CL due to PM
"""
ispmoncl ∈ RealDomain(; lower = -1, upper = 1)
"""
- ΩCL
- ΩVc
- ΩKa
- ΩVp
- ΩQ
"""
Ω ∈ PDiagDomain(5)
"""
Proportional RUV (SD scale)
"""
σₚ ∈ RealDomain(; lower = 0)
end

@random begin
η ~ MvNormal(Ω)
end

@covariates begin
"""
Poor Metabolizer
"""
isPM
"""
Fed status
"""
isfed
"""
Weight (kg)
"""
wt
end

@pre begin
CL = tvcl * (wt / 70)^dwtcl * (1 + ispmoncl * (isPM == "yes")) * exp(η[1])
Vc = tvvc * (wt / 70) * exp(η[2])
Ka = (tvkafed + (isfed == "no") * (tvkafasted)) * exp(η[3])
Q = tvq * (wt / 70)^0.75 * exp(η[4])
Vp = tvvp * (wt / 70) * exp(η[5])
end

@dynamics Depots1Central1Periph1

@derived begin
cp := @. 1_000 * (Central / Vc)
"""
DrugY Concentration (ng/mL)
"""
dv ~ @. Normal(cp, cp * σₚ)
end
end
PumasModel
Parameters: tvcl, tvvc, tvvp, tvq, tvkafed, tvkafasted, dwtcl, ispmoncl, Ω, σₚ
Random effects: η
Covariates: isPM, isfed, wt
Dynamical system variables: Depot, Central, Peripheral
Dynamical system type: Closed form
Derived: dv
Observed: dv

With the model done, now we define our initial parameters:

iparams = (;
tvkafed = 0.4,
tvkafasted = 1.5,
tvcl = 4.0,
tvvc = 70.0,
tvq = 4.0,
tvvp = 50.0,
dwtcl = 0.75,
ispmoncl = -0.7,
Ω = Diagonal(fill(0.04, 5)),
σₚ = 0.1,
)
(tvkafed = 0.4,
tvkafasted = 1.5,
tvcl = 4.0,
tvvc = 70.0,
tvq = 4.0,
tvvp = 50.0,
dwtcl = 0.75,
ispmoncl = -0.7,
Ω = [0.04 0.0 … 0.0 0.0; 0.0 0.04 … 0.0 0.0; … ; 0.0 0.0 … 0.04 0.0; 0.0 0.0 … 0.0 0.04],
σₚ = 0.1,)

Now we create our criterion:

lrt_criterion(f; α = 0.01) = dof(f) * cquantile(Chisq(1), α) - (2 * loglikelihood(f))
lrt_criterion (generic function with 1 method)

lrt_criterion takes a model fit f and a desired $$\alpha$$ value α. It calculates the LRT by first multiplying the model’s degree-of-freedom (dof) by the complementary quantile (cquantile) of the $$\alpha$$ value from a $$\chi^2$$ distribution with one degree of freedom. Then, it subtracts from this value the model fit -2LL (minus 2 times log-likelihood) with constant. We use one degree-of-freedom because our SCM methods as currently implemented for both FS and BE, always add or remove a single covariate effect in each model evaluation.

First, we’ll run covariate_select with lrt_criterion as a criterion and the FS method. Finally, we run covariate_select with lrt_criterion as a criterion and the BE method.

Note

Our α = 0.01, so we use the default value defined in lrt_criterion function.

covar_result_fs = covariate_select(
model,
population,
iparams,
FOCE();
control_param = (:dwtcl, :tvkafasted, :ispmoncl),
criterion = lrt_criterion,
method = CovariateSelection.Forward,
)
[ Info: fitting model with zero restrictions on: dwtcl, tvkafasted, ispmoncl
[ Info: criterion: 2765.9438260321062
[ Info: fitting model with zero restrictions on: tvkafasted, ispmoncl
[ Info: criterion: 2768.87168160899
[ Info: fitting model with zero restrictions on: dwtcl, ispmoncl
[ Info: criterion: 2735.691364008045
[ Info: fitting model with zero restrictions on: dwtcl, tvkafasted
[ Info: criterion: 2758.2100188121717
[ Info: current best model has zero restrictions on dwtcl, ispmoncl
[ Info: criterion improved, continuing!
[ Info: fitting model with zero restrictions on: ispmoncl
[ Info: criterion: 2738.5517536233997
[ Info: fitting model with zero restrictions on: dwtcl
[ Info: criterion: 2727.974107213668
[ Info: current best model has zero restrictions on dwtcl
[ Info: criterion improved, continuing!
[ Info: fitting model with zero restrictions on:
[ Info: criterion: 2706.4744987331405
[ Info: current best model has zero restrictions on
[ Info: criterion improved, continuing!
┌ Info: final best model summary
│   best_criterion = 2706.4744987331405
└   best_zero_restrictions = ()
Pumas.CovariateSelectionResults
Criterion:                                     lrt_criterion
Method:                                              Forward

Best model:
Unrestricted parameters:       dwtcl, tvkafasted, ispmoncl
Zero restricted parameters:
Criterion value:                                   2706.47

Fitted model summary:
7×4 DataFrame
Row │ free_parameters                   zero_parameters                   fit ⋯
│ Tuple…                            Tuple…                            Fit ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ ()                                (:dwtcl, :tvkafasted, :ispmoncl)  Fit ⋯
2 │ (:dwtcl,)                         (:tvkafasted, :ispmoncl)          Fit
3 │ (:tvkafasted,)                    (:dwtcl, :ispmoncl)               Fit
4 │ (:ispmoncl,)                      (:dwtcl, :tvkafasted)             Fit
5 │ (:dwtcl, :tvkafasted)             (:ispmoncl,)                      Fit ⋯
6 │ (:tvkafasted, :ispmoncl)          (:dwtcl,)                         Fit
7 │ (:dwtcl, :tvkafasted, :ispmoncl)  ()                                Fit
2 columns omitted

Now we extract the best model from the covar_result by accessing the best_model field. That gives us a model fit, which we can use to extract the best model:

best_model_fs = covar_result_fs.best_model.model
PumasModel
Parameters: tvcl, tvvc, tvvp, tvq, tvkafed, tvkafasted, dwtcl, ispmoncl, Ω, σₚ
Random effects: η
Covariates: isPM, isfed, wt
Dynamical system variables: Depot, Central, Peripheral
Dynamical system type: Closed form
Derived: dv
Observed: dv
Note

Our α = 0.001, so need to override the default value defined in lrt_criterion function.

covar_result_bs = covariate_select(
best_model_fs,
population,
iparams,
FOCE();
control_param = (:dwtcl, :tvkafasted, :ispmoncl),
criterion = x -> lrt_criterion(x; α = 0.001),
method = CovariateSelection.Backward,
)
[ Info: fitting model with zero restrictions on:
[ Info: criterion: 2765.1718727081216
[ Info: fitting model with zero restrictions on: dwtcl
[ Info: criterion: 2782.478811619008
[ Info: fitting model with zero restrictions on: tvkafasted
[ Info: criterion: 2792.026053317039
[ Info: fitting model with zero restrictions on: ispmoncl
[ Info: criterion: 2793.0564580287396
[ Info: current best model has zero restrictions on dwtcl
[ Info: criterion did not improve, stopping!
┌ Info: final best model summary
│   best_criterion = 2765.1718727081216
└   best_zero_restrictions = ()
Pumas.CovariateSelectionResults
Criterion:                                               #16
Method:                                             Backward

Best model:
Unrestricted parameters:       dwtcl, tvkafasted, ispmoncl
Zero restricted parameters:
Criterion value:                                   2765.17

Fitted model summary:
4×4 DataFrame
Row │ free_parameters                   zero_parameters  fitted_model         ⋯
│ Tuple…                            Tuple…           FittedPumas…         ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ (:dwtcl, :tvkafasted, :ispmoncl)  ()               FittedPumasModel(... ⋯
2 │ (:tvkafasted, :ispmoncl)          (:dwtcl,)        FittedPumasModel(...
3 │ (:dwtcl, :ispmoncl)               (:tvkafasted,)   FittedPumasModel(...
4 │ (:dwtcl, :tvkafasted)             (:ispmoncl,)     FittedPumasModel(...
2 columns omitted
Note

x -> fun(x; ...) is an anonymous function. Since we need to override the default keyword argument α in lrt_criterion for the backward elimination pass, we cannot use:

criterion = lrt_criterion

We need to use:

criterion = x -> lrt_criterion(x; α = 0.001)

As you can see, the results of this mixed routine: FS followed by BE process; results in the same final model as the FS and BE processes used in the Covariate Selection Methods - Forward Selection and Covariate Selection Methods - Backward Elimination.

covar_result_bs.best_model
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Likelihood Optimizer:                         BFGS
Dynamical system type:                 Closed form

Log-likelihood value:                    -1306.793
Number of subjects:                             18
Number of parameters:         Fixed      Optimized
0             14
Observation records:         Active        Missing
dv:                         270              0
Total:                      270              0

-------------------------
Estimate
-------------------------
tvcl            3.6888
tvvc           70.044
tvvp           48.706
tvq             4.3506
tvkafed         0.41114
tvkafasted      1.0894
dwtcl           0.81856
ispmoncl       -0.62382
Ω₁,₁            0.022271
Ω₂,₂            0.050192
Ω₃,₃            0.031194
Ω₄,₄            0.10595
Ω₅,₅            0.035072
σₚ              0.098043
-------------------------