using Pumas
using PharmaDatasets
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using PumasUtilities
using Markdown
Interpreting Results from Pumas Fits
1 Introduction
After following a typical workflow for fitting a Pumas model it is important to assess and interpret the results. This includes steps such as
- Looking at log-likelihood related quantities such as
loglikelihood
,aic
andbic
to compare the fit to other existing fits. - Assessing convergence of the maximum likelihood optimization.
- Calculating the condition number of the correlation matrix implied by the estimated variance-covariance matrix.
- Calculating shrinkage for random effects and the dependent variable.
The following sections will walk through these steps using a one-compartment PK model for Warfarin, focusing on the PK aspects only.
2 Model and Data
2.1 Model Definition
Below is the PK model, named warfarin_pk_model
, defined in Pumas. This model contains:
- Fixed effects (population parameters):
pop_CL, pop_Vc, pop_tabs, pop_lag
- Inter-individual variability (IIV) components:
pk_Ω
- Residual error model parameters:
σ_prop, σ_add
- Covariates for scaling:
FSZCL
andFSZV
- Differential equations describing the PK behavior in the compartments.
= @model begin
warfarin_pk_model
@metadata begin
= "Warfarin 1-compartment PK model (PD removed)"
desc = u"hr"
timeu end
@param begin
# PK parameters
"""
Clearance (L/hr)
"""
∈ RealDomain(lower = 0.0, init = 0.134)
pop_CL """
Central volume (L)
"""
∈ RealDomain(lower = 0.0, init = 8.11)
pop_Vc """
Absorption lag time (hr)
"""
∈ RealDomain(lower = 0.0, init = 0.523)
pop_tabs """
Lag time (hr)
"""
∈ RealDomain(lower = 0.0, init = 0.1)
pop_lag
# Inter-individual variability
"""
- ΩCL: Clearance
- ΩVc: Central volume
- Ωtabs: Absorption lag time
"""
∈ PDiagDomain([0.01, 0.01, 0.01])
pk_Ω
# Residual variability
"""
σ_prop: Proportional error
"""
∈ RealDomain(lower = 0.0, init = 0.00752)
σ_prop """
σ_add: Additive error
"""
∈ RealDomain(lower = 0.0, init = 0.0661)
σ_add end
@random begin
~ MvNormal(pk_Ω) # mean = 0, covariance = pk_Ω
pk_η end
@covariates begin
"""
FSZCL: Clearance scaling factor
"""
FSZCL"""
FSZV: Volume scaling factor
"""
FSZVend
@pre begin
= FSZCL * pop_CL * exp(pk_η[1])
CL = FSZV * pop_Vc * exp(pk_η[2])
Vc
= pop_tabs * exp(pk_η[3])
tabs = log(2) / tabs
Ka end
@dosecontrol begin
= (Depot = pop_lag,)
lags end
@vars begin
:= Central / Vc
cp end
@dynamics begin
' = -Ka * Depot
Depot' = Ka * Depot - (CL / Vc) * Central
Centralend
@derived begin
"""
Concentration (ng/mL)
"""
~ @. Normal(cp, sqrt((σ_prop * cp)^2 + σ_add^2))
conc end
end
PumasModel
Parameters: pop_CL, pop_Vc, pop_tabs, pop_lag, pk_Ω, σ_prop, σ_add
Random effects: pk_η
Covariates: FSZCL, FSZV
Dynamical system variables: Depot, Central
Dynamical system type: Matrix exponential
Derived: conc
Observed: conc
2.2 Data Preparation
The Warfarin data used in this tutorial is pulled from PharmaDatasets
for demonstration purposes. Note how the code reshapes and prepares the data in “wide” format for reading into Pumas. Only the conc
column is treated as observations for the PK model.
= dataset("pumas/warfarin_nonmem")
warfarin_data
# Step 2: Fix Duplicate Time Points
# -------------------------------
# Some subjects have duplicate time points for dvid = 1
# For this dataset, the triple (id, time, dvid) should define
# a row uniquely, but
nrow(warfarin_data)
nrow(unique(warfarin_data, ["id", "time", "dvid"]))
# We can identify the problematic rows by grouping on the index variables
@chain warfarin_data begin
@groupby :id :time :dvid
@transform :tmp = length(:id)
@rsubset :tmp > 1
end
# Transform the data in a single chain of operations
= @chain warfarin_data begin
warfarin_data_wide @rtransform begin
# Scaling factors
:FSZV = :wtbl / 70 # volume scaling
:FSZCL = (:wtbl / 70)^0.75 # clearance scaling (allometric)
# Column name for the dv
:dvname = "dv$(:dvid)"
# Dosing indicator columns
:cmt = ismissing(:amt) ? missing : 1
:evid = iszero(:amt) ? 0 : 1
end
unstack(Not([:dvid, :dvname, :dv]), :dvname, :dv)
rename!(:dv1 => :conc, :dv2 => :pca)
select!(Not([:dv0]))
end
first(warfarin_data_wide, 10)
Row | id | time | evid | amt | wtbl | age | sex | FSZV | FSZCL | cmt | conc | pca |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | Float64 | Int64 | Float64 | Float64 | Int64 | String1 | Float64 | Float64 | Int64 | Float64? | Float64? | |
1 | 1 | 0.0 | 1 | 100.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | missing | missing |
2 | 1 | 0.5 | 0 | 0.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | 0.0 | missing |
3 | 1 | 1.0 | 0 | 0.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | 1.9 | missing |
4 | 1 | 2.0 | 0 | 0.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | 3.3 | missing |
5 | 1 | 3.0 | 0 | 0.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | 6.6 | missing |
6 | 1 | 6.0 | 0 | 0.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | 9.1 | missing |
7 | 1 | 9.0 | 0 | 0.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | 10.8 | missing |
8 | 1 | 12.0 | 0 | 0.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | 8.6 | missing |
9 | 1 | 24.0 | 0 | 0.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | 5.6 | 44.0 |
10 | 1 | 36.0 | 0 | 0.0 | 66.7 | 50 | M | 0.952857 | 0.96443 | 1 | 4.0 | 27.0 |
2.3 Creating a Pumas Population
Below is the creation of a population object in Pumas using read_pumas
. Only the conc
data are treated as the observation variable:
= read_pumas(
pop
warfarin_data_wide;= :id,
id = :time,
time = :amt,
amt = :cmt,
cmt = :evid,
evid = [:sex, :wtbl, :FSZV, :FSZCL],
covariates = [:conc],
observations )
Population
Subjects: 32
Covariates: sex, wtbl, FSZV, FSZCL
Observations: conc
# A named tuple of parameter values
= (
param_vals = 0.134,
pop_CL = 8.11,
pop_Vc = 0.523,
pop_tabs = 0.1,
pop_lag = Diagonal([0.01, 0.01, 0.01]),
pk_Ω = 0.00752,
σ_prop = 0.0661,
σ_add
)
= fit(warfarin_pk_model, pop, param_vals, FOCE()) foce_fit
[ Info: Checking the initial parameter values. [ Info: The initial negative log likelihood and its gradient are finite. Check passed. Iter Function value Gradient norm 0 1.209064e+04 1.489225e+04 * time: 0.03635907173156738 1 2.643772e+03 3.167516e+03 * time: 1.5721640586853027 2 1.836601e+03 2.118430e+03 * time: 1.6450309753417969 3 9.351337e+02 8.722439e+02 * time: 1.7171721458435059 4 6.402300e+02 4.199225e+02 * time: 1.786393165588379 5 5.103664e+02 1.642121e+02 * time: 1.8539941310882568 6 4.760464e+02 5.453749e+01 * time: 1.9224891662597656 7 4.703757e+02 3.643518e+01 * time: 1.988415002822876 8 4.699019e+02 3.135992e+01 * time: 2.054548978805542 9 4.697614e+02 2.953531e+01 * time: 2.1192541122436523 10 4.693153e+02 2.463233e+01 * time: 2.1858232021331787 11 4.685743e+02 2.580427e+01 * time: 2.253462076187134 12 4.675133e+02 3.864937e+01 * time: 2.4556210041046143 13 4.666775e+02 5.495470e+01 * time: 2.5227930545806885 14 4.661197e+02 5.692101e+01 * time: 2.5870871543884277 15 4.656782e+02 4.770992e+01 * time: 2.650909185409546 16 4.651802e+02 3.087698e+01 * time: 2.7152509689331055 17 4.645523e+02 1.184834e+01 * time: 2.7812790870666504 18 4.641447e+02 1.162249e+01 * time: 2.8452391624450684 19 4.639978e+02 1.125144e+01 * time: 2.9101979732513428 20 4.639307e+02 1.156463e+01 * time: 2.972553014755249 21 4.638001e+02 1.312870e+01 * time: 3.0364041328430176 22 4.635282e+02 1.480920e+01 * time: 3.100142002105713 23 4.630353e+02 2.169377e+01 * time: 3.166457176208496 24 4.623847e+02 4.478029e+01 * time: 3.2339820861816406 25 4.617426e+02 6.468975e+01 * time: 3.301582098007202 26 4.610293e+02 7.776996e+01 * time: 3.421640157699585 27 4.597628e+02 8.785260e+01 * time: 3.486584186553955 28 4.566753e+02 9.769803e+01 * time: 3.552165985107422 29 4.490421e+02 1.008838e+02 * time: 3.6203081607818604 30 4.391868e+02 9.978816e+01 * time: 3.6893789768218994 31 4.130704e+02 5.917685e+01 * time: 3.7594661712646484 32 4.055780e+02 3.852825e+01 * time: 3.8291990756988525 33 4.023118e+02 3.889617e+01 * time: 3.8994221687316895 34 4.012516e+02 3.694774e+01 * time: 3.982931137084961 35 4.004391e+02 2.061972e+01 * time: 4.0525970458984375 36 3.983039e+02 3.508422e+01 * time: 4.124250173568726 37 3.969705e+02 3.841031e+01 * time: 4.219356060028076 38 3.965462e+02 3.738338e+01 * time: 4.290049076080322 39 3.950409e+02 3.064794e+01 * time: 4.360884189605713 40 3.945750e+02 2.876435e+01 * time: 4.431478977203369 41 3.937725e+02 2.571446e+01 * time: 4.501197099685669 42 3.933955e+02 2.436113e+01 * time: 4.575390100479126 43 3.927565e+02 2.051055e+01 * time: 4.645941972732544 44 3.916020e+02 1.629016e+01 * time: 4.7156150341033936 45 3.886991e+02 2.689821e+01 * time: 4.78617000579834 46 3.870054e+02 2.298579e+01 * time: 4.855947971343994 47 3.853691e+02 2.614990e+01 * time: 4.9517011642456055 48 3.841730e+02 2.207550e+01 * time: 5.022073030471802 49 3.825113e+02 2.204392e+01 * time: 5.091864109039307 50 3.808880e+02 2.444774e+01 * time: 5.162142038345337 51 3.800407e+02 1.250603e+01 * time: 5.231981992721558 52 3.798092e+02 1.167926e+01 * time: 5.300029993057251 53 3.797789e+02 1.162382e+01 * time: 5.367969989776611 54 3.797069e+02 1.152441e+01 * time: 5.436172008514404 55 3.794424e+02 1.132717e+01 * time: 5.520421981811523 56 3.788131e+02 2.006483e+01 * time: 5.588903188705444 57 3.771525e+02 3.584773e+01 * time: 5.65826416015625 58 3.731297e+02 5.697386e+01 * time: 5.727363109588623 59 3.658669e+02 6.542151e+01 * time: 5.79776406288147 60 3.604193e+02 4.036440e+01 * time: 5.868426084518433 61 3.532839e+02 1.573820e+01 * time: 5.938542127609253 62 3.520181e+02 1.393394e+01 * time: 6.025071144104004 63 3.517984e+02 6.702128e+00 * time: 6.094166994094849 64 3.517541e+02 3.503534e+00 * time: 6.162482976913452 65 3.516438e+02 8.698938e+00 * time: 6.232764005661011 66 3.511839e+02 1.404105e+01 * time: 6.312251091003418 67 3.510644e+02 2.541515e+00 * time: 6.380890130996704 68 3.510207e+02 3.156649e+00 * time: 6.465133190155029 69 3.509958e+02 3.043873e+00 * time: 6.532257080078125 70 3.509765e+02 2.672353e+00 * time: 6.600190162658691 71 3.509752e+02 2.603857e+00 * time: 6.666201114654541 72 3.509724e+02 2.505573e+00 * time: 6.732197999954224 73 3.509665e+02 2.379626e+00 * time: 6.7990500926971436 74 3.509503e+02 3.583048e+00 * time: 6.8670079708099365 75 3.509120e+02 6.021411e+00 * time: 6.95074200630188 76 3.508281e+02 8.835299e+00 * time: 7.01765513420105 77 3.506934e+02 9.696807e+00 * time: 7.0848000049591064 78 3.505761e+02 6.057279e+00 * time: 7.152677059173584 79 3.505357e+02 1.712984e+00 * time: 7.220663070678711 80 3.505314e+02 6.749771e-01 * time: 7.2867491245269775 81 3.505313e+02 6.722315e-01 * time: 7.351997137069702 82 3.505312e+02 6.699466e-01 * time: 7.431652069091797 83 3.505307e+02 6.606583e-01 * time: 7.49639105796814 84 3.505298e+02 6.412634e-01 * time: 7.561689138412476 85 3.505274e+02 9.092421e-01 * time: 7.627287149429321 86 3.505221e+02 1.339986e+00 * time: 7.693188190460205 87 3.505129e+02 1.608305e+00 * time: 7.7596211433410645 88 3.505026e+02 1.291018e+00 * time: 7.841353178024292 89 3.504973e+02 5.119495e-01 * time: 7.909154176712036 90 3.504963e+02 6.289982e-02 * time: 7.975205183029175 91 3.504963e+02 3.116578e-03 * time: 8.041146039962769 92 3.504963e+02 5.625186e-04 * time: 8.105208158493042
FittedPumasModel
Dynamical system type: Matrix exponential
Number of subjects: 32
Observation records: Active Missing
conc: 251 47
Total: 251 47
Number of parameters: Constant Optimized
0 9
Likelihood approximation: FOCE
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -350.49625
--------------------
Estimate
--------------------
pop_CL 0.13465
pop_Vc 8.0535
pop_tabs 0.55061
pop_lag 0.87158
pk_Ω₁,₁ 0.070642
pk_Ω₂,₂ 0.018302
pk_Ω₃,₃ 0.91326
σ_prop 0.090096
σ_add 0.39115
--------------------
2.4 Evaluating the termination of the fit
Fitting non-linear mixed effects model is not always a simple task. The model has to be appropriate to the data, data must be accurately sampled and recorded, no information should be truncated, and so on for the fits to run reliably. Even in the most perfect version of data collection and modeling there can still be numerical challenges in evaluating various expressions, calculating empirical bayes estimates in an inner loop, handling vanishing variances, local optima and more.
In Pumas we use Optim.jl (Mogensen and Riseth 2018) to perform the minimization of any FO
, FOCE
, or LaplaceI
fits with out without MAP
for maximum aposterori fitting. The returned FittedPumasModel
contains the optimization results structure from Optim.jl in the optim
field. This can be useful when assessing convergence.
Convergence in non-linear optimization is not always a simply task to assess:
- We can look at the gradient element sizes and we often look at the infinity norm of the gradient to this end. The infinity norm of a vector picks out the largest absolute value of the elements. By default we require this to be less than
1e-3
to declare convergence, but some problems may be more or less numerically stable so that number may be large or small depending on the circumstances.
- Gradient Elements
- The gradient is a vector that tells us which direction to move to increase a function’s value the fastest
- Each element in this vector corresponds to how much the function changes when we adjust one parameter
- “Gradient element sizes” refers to the magnitude (absolute value) of each of these directional changes
To illustrate with a simple example, consider a model with 3 parameters (like CL
, V
, and Ka
), a gradient might look like:
= [0.0001, -0.002, 0.0005] gradient
3-element Vector{Float64}:
0.0001
-0.002
0.0005
Here:
- 0.0001 tells us how much the function would change if we adjust the first parameter (
CL
) - -0.002 for the second parameter (
V
) - 0.0005 for the third parameter (
Ka
)
- Infinity Norm of the Gradient
- The infinity norm is simply the largest absolute value among all gradient elements
- In our example above:
|0.0001| = 0.0001
|-0.002| = 0.002 # <- This is the largest!
|0.0005| = 0.0005
Therefore, infinity norm = 0.002
Why This Matters for Convergence:
- When optimizing a model (finding the best parameters), we want all gradient elements to be close to zero
- A large gradient element means we can still improve our solution by moving in that direction
- A small gradient (all elements close to zero) suggests we’re at or near the optimal solution
- By convention in Pumas, if the infinity norm is less than 0.001 (1e-3), we consider the optimization “converged”
Think of it like hiking to find the highest point:
- The gradient tells you which direction to walk
- The size of each gradient element tells you how steep it is in each direction
- When you’re at the top, it should be flat in all directions (all gradient elements close to zero)
- The infinity norm tells you about the steepest direction you could still go
- If the steepest direction is very gentle (< 0.001), you’re probably at the peak!
- We do not stop according to small changes in the parameters or objective function value by default. This is because it is typically not a direct sign of convergence but rather that progress is stalled. That said, many researches set a relative change tolerance in the parameters in the same way that is done in NONMEM because often you will see many final iterations with next to no improvement. This can be done by setting
optim_options = (x_reltol = 1e-5)
infit
to match NONMEM for example.
Let us consider a bunch of different cases and how you can find and interpret the information in the optim
field. This looks something like the following.
julia> foce_fit.optim
L1 * Status: success
L2
L3 * Candidate solution
L4 Final objective value: 3.199344e+02
L5
L6 * Found with
L7 Algorithm: BFGS
L8
L9 * Convergence measures
L10 |x - x'| = 4.36e-04 ≰ 0.0e+00
L11 |x - x'|/|x'| = 1.10e-04 ≰ 0.0e+00
L12 |f(x) - f(x')| = 3.57e-06 ≰ 0.0e+00
L13 |f(x) - f(x')|/|f(x')| = 1.12e-08 ≰ 0.0e+00
L14 |g(x)| = 5.62e-04 ≤ 1.0e-03
L15
L16 * Work counters
L17 Seconds run: 1 (vs limit Inf)
L18 Iterations: 95
L19 f(x) calls: 101
L20 ∇f(x) calls: 97
We will use the line numbers to the left below to refer to the relevant part of the output. While interpreting the output it may be worth remembering that the optimization is cast as the problem of minimizing the negative log of the marginal likelihood.
2.4.1 Case 1: gradient convergence
Since a (local) minimum of a smooth optimization function satisfies that the partial derivatives are zero we will often look at the norm of the gradient to declare convergence. The default value for the tolerance set as the g_tol
option in optim_options
in Pumas is 1e-3
. The final state of the measure is found in L14
in the output above where it is seen that the tolerance was not met. This tolerance is set to match what is often possible to achieve in well-specified models, but smaller is always better. To derive a theoretically well-reasoned value we would need to know the Hessian at the optimum as this is not possible to know before running a fit and very expensive to calculate exactly after a fit. Several factors affect the level that can be reached in practice including the accuracy of the solution of the embedded dynamical system and the curvature of the objective function surface. If the gradient norm is lower than 1e-3 we will in general accept the fit as having converged, but “near convergence” is often accepted as well if there are not other indications of issues with the model.
2.4.2 Case 2: no change in parameters
If x_abstol
or x_reltol
was not manually set by the user we will only terminate the optimization procedure if no change in parameters could be made at all from one iteration to the next. The final state of these termination tests are found in L10
and L11
respectively. This can indicate a few things but essentially indicates that the procedure that scales the search direction to find the next value of the parameters failed. If this occurs with a relatively small gradient norm we may accept the solution though the termination reason cannot be used to conclude convergence in isolation. The reasons why no step led to an improvement of the objective function value can be many. Some reasons include:
- The objective function is not smooth due to numerical noise from calculations in the objective function including model solutions.
- The inverse Hessian approximation in the BFGS algorithm is far from the true inverse Hessian of the objective function. Restarting the fit may help.
As mentioned above, NONMEM sets x_reltol=1e-5
by default but it is 0 in Pumas.
2.4.3 Case 3: small change in parameters
As mentioned above, Pumas defaults to only terminate due to the size of the change in parameters from one iteration to the next is exactly zero. If the relative change tolerance x_reltol
is set to a number such as 1e-5
to match other software such as NONMEM then the user may well accept the solution since they deliberately used this criterion. In general, we advise using this setting with caution as it can terminate with quite large gradient norm.
Again, this information is seen in L10
and L11
but the limit after the inequality will change to the number provided.
2.4.4 Case 4: no change in objective function
This case is related to Case 2. We do not allow for termination due to the absence of improvement in the objective function so in general this should be seen as a failure. The rest of the comments about reasons for stopping, acceptance based on gradient norms etc applies here as well.
The information about these termination tests can be found in L12
and L13
.
2.4.5 Case 5: small change in objective function
Just a Case 4 is related to Case 2 this case is related to Case 3. A very small improvement is often not a valid reason to accept the returned solution as the maximum likelihood estimates. If the user has good reason to actively sets one of the options f_abstol
or f_reltol
the result may be accepted.
As in Case 4 the information can be found in L12
and L13
but the right-hand side of the inequalities are updated as in Case 3.
2.4.6 Case 6: “maximum number of iterations reached”
The pre-specified limit on the number of iterations was reached. This can be increased in optim_options
by setting the iterations
keyword. This defaults to 1000 which should generally be plenty so termination according to this criterion is not a sign of convergence. The output may be accepted if the solution seems to cycle between parameter values with a gradient norm close to the prescribed tolerance.
The number iterations can be found in L18
.
2.4.7 Case 7: line search failed
This case is related to Case 2, 3, 4, and 5. Given the search direction it was not possible to find a step that sufficiently decreases the objective function value. The advice found in Case 2 applies here as well.
In the output this will be indicates in L1
.
2.4.8 Case 8: “user provided callback termination”
This is an advanced use-case where a user-provided callback function returned the value true
to indicate that the optimization should terminate. Can only happen if the user supplied a custom callback function as part of optim_options
.
In the output this will be indicated in L1
.
2.4.9 Case 9: “time limit reached”
The optimization routine allows for a soft time limit that is checked after every iteration. The default allows for infinite run-time, but if the time_limit
option in optim_options
is set this termination reason may occur.
In the output this will be indicated in L17
.
2.4.10 Case 10: “objective evaluation limit reached
The optimization routine allows for a limit to be placed on the total number of objective function evaluations through the f_calls_limit
option in optim_options
. It is not set by default in Pumas. The interpretation is similar to Case 6.
In the output this will be indicated in L19
.
2.4.11 Case 11: “gradient evaluation limit reached
The optimization routine allows for a limit to be placed on the total number of gradient function evaluations through the g_calls_limit
option in optim_options
. It is not set by default in Pumas. The interpretation is similar to Case 6.
In the output this will be indicated in L20
.
2.4.12 Case 12: “hessian evaluation limit reached
The optimization routine allows for a limit to be placed on the total number of hessian function evaluations through the h_calls_limit
option in optim_options
. It is not set by default in Pumas. The interpretation is similar to Case 6.
In the output this would be placed after L20
if it was present, but it is not because the BFGS algorithm only maintains an approximation to the inverse Hessian it doesn’t actually compute and utilize the real Hessian.
2.4.13 Case 13: “objective function increased”
This can happen if the allow_f_increases
option is set to false
in optim_options
. The default is true
. The causes are the same as in Case 2.
This would be indicated in L1
if it failed.
2.5 Optimization didn’t succeed - then what?
If the optimization did not success according to the description above it is always important to think about what to do to improve the situation if possible. If the model is or initial parameters are grossly wrong there may not be much to do about the failed optimization. It may always be a good idea to try to reduce the complexity of the model by removing random effects that cannot be supported by the data or fixing parameters that may be know with high certainty from the relevant literature.
Before changing the model too much it may be worth trying a few different things.
2.5.1 Try different starting parameters
Given the complexity and high number of parameters in many NLME models it is always important to initialize the fit from a set of starting parameters instead of only one parameter NamedTuple
. It is quite likely that the optimization can end up in a local minimum that is not the global minimum, so we need to check the robustness of our maximum likelihood estimate candidates. Performing this robustness step is explained more in detail in a different tutorial, but just to show the effect we can try with a different initial fixed effect guess for volume of distribution.
= fit(
foce_fit2
warfarin_pk_model,
pop,merge(param_vals, (; pop_Vc = 12.0)),
FOCE();
= (show_trace = false,),
optim_options )
[ Info: Checking the initial parameter values. [ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Dynamical system type: Matrix exponential
Number of subjects: 32
Observation records: Active Missing
conc: 251 47
Total: 251 47
Number of parameters: Constant Optimized
0 9
Likelihood approximation: FOCE
Likelihood optimizer: BFGS
Termination Reason: NoXChange
Log-likelihood value: -350.49625
--------------------
Estimate
--------------------
pop_CL 0.13465
pop_Vc 8.0534
pop_tabs 0.55063
pop_lag 0.8716
pk_Ω₁,₁ 0.070656
pk_Ω₂,₂ 0.018293
pk_Ω₃,₃ 0.91324
σ_prop 0.090094
σ_add 0.39116
--------------------
2.5.2 Resetting the optimization procedure
Whenever a model fit concludes we will have ended up with a final set of fixed effect estimates (population level parameters) and well as a set of empirical bayes estimates. We also end up with a final inverse Hessian approximation. There are ways to restart the optimization at the final fixed effects parameters while resetting the random effects to the central value of the prior and to reset the optimization procedure.
To restart the optimization procedure and use the final fixed effects and empirical bayes estimates use the following.
= fit(foce_fit2; reset_optim = true) foce_fit3
[ Info: Checking the initial parameter values. [ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Dynamical system type: Matrix exponential
Number of subjects: 32
Observation records: Active Missing
conc: 251 47
Total: 251 47
Number of parameters: Constant Optimized
0 9
Likelihood approximation: FOCE
Likelihood optimizer: BFGS
Termination Reason: NoXChange
Log-likelihood value: -350.49625
--------------------
Estimate
--------------------
pop_CL 0.13465
pop_Vc 8.0534
pop_tabs 0.55063
pop_lag 0.8716
pk_Ω₁,₁ 0.070656
pk_Ω₂,₂ 0.018293
pk_Ω₃,₃ 0.91324
σ_prop 0.090094
σ_add 0.39116
--------------------
2.5.3 Resetting the optimization procedure and the empirical Bayes estimates
To restart both, we can simply invoke fit
as normal with coef(foce_fit3)
as the initial fixed effects.
= fit(
foce_fit4
warfarin_pk_model,
pop,coef(foce_fit3),
FOCE();
= (show_trace = false,),
optim_options )
[ Info: Checking the initial parameter values. [ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Dynamical system type: Matrix exponential
Number of subjects: 32
Observation records: Active Missing
conc: 251 47
Total: 251 47
Number of parameters: Constant Optimized
0 9
Likelihood approximation: FOCE
Likelihood optimizer: BFGS
Termination Reason: NoXChange
Log-likelihood value: -350.49625
--------------------
Estimate
--------------------
pop_CL 0.13465
pop_Vc 8.0534
pop_tabs 0.55063
pop_lag 0.8716
pk_Ω₁,₁ 0.070656
pk_Ω₂,₂ 0.018293
pk_Ω₃,₃ 0.91324
σ_prop 0.090094
σ_add 0.39116
--------------------
Restarting the optimization is often a good way to avoid problems with gradient convergence.
2.5.4 A computationally expensive alternative
If gradient convergence below a certain threshold is very important it can often be achieved with GradientDescent
. Gradient Descent has very good asymptotic (for iterations going to infinity) properties and avoids potential numerical issues with the inverse Hessian approximation in situations with numerical noise in the objective that could for example come from numerical integration of the dynamical system. However, since it is not using (an approximation to the) second order information, it may also take quite large steps that lead to extreme values of the parameters causing warnings to appear on screen from the ODE model solvers and it may well use a lot of iterations to reach the prescribed tolerance. Consider the following.
= fit(
foce_fit5
warfarin_pk_model,
pop,coef(foce_fit3),
FOCE();
= Pumas.Optim.GradientDescent(),
optim_alg = (g_tol = 1e-5, show_trace = false),
optim_options
) foce_fit5.optim
[ Info: Checking the initial parameter values. [ Info: The initial negative log likelihood and its gradient are finite. Check passed.
* Status: success
* Candidate solution
Final objective value: 3.504963e+02
* Found with
Algorithm: Gradient Descent
* Convergence measures
|x - x'| = 1.84e-08 ≰ 0.0e+00
|x - x'|/|x'| = 4.60e-09 ≰ 0.0e+00
|f(x) - f(x')| = 2.90e-07 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 8.27e-10 ≰ 0.0e+00
|g(x)| = 9.86e-06 ≤ 1.0e-05
* Work counters
Seconds run: 73 (vs limit Inf)
Iterations: 458
f(x) calls: 1144
∇f(x) calls: 1144
We were indeed able to get a very low gradient tolerance, but notice the many iterations used! This may simply not be feasible for very large data sets or computationally very expensive models.
3 Calculating Some Key Diagnostics
3.1 Model Fit and Comparison
When we specify FOCE()
, LaplaceI()
or FO()
we are doing Maximum Likelihood Estimation. This also means that the most natural number we may have to gauge and compare model fit is the log-likelihood. If we want to extract the log-likelihood at the maximum likelihood estimates we can simply use the loglikelihood
function that has a method for just a FittedPumasModel
.
loglikelihood(foce_fit5)
-350.49625325074004
We see that this is the same number as in the printed output although it was obviously rounded there. We can then compare this number to the log-likelihood of another fitted model, and if they are nested we can even do a log-likelihood ratio test with lrtest
. Please consult the docstring for how to use this function ?lrtest
.
Other more general approaches of model comparison involves information criteria such as aic
, aicc
, bic
, and more. These all have methods for FittedPumasModel
so to calculate them simply use them as follows
= aic(foce_fit5), bic(foce_fit5) aic_fit, bic_fit
(718.9925065014801, 750.7215829536661)
3.2 Computing Parameter Precision with infer
The infer
function in Pumas estimates the uncertainty (precision) of parameter estimates. Depending on the chosen method, infer
can provide standard errors, confidence intervals, correlation matrices, and condition numbers.
An example usage:
= infer(foce_fit4; level = 0.95) inference_results
[ Info: Calculating: variance-covariance matrix. [ Info: Done.
Asymptotic inference results using sandwich estimator
Dynamical system type: Matrix exponential
Number of subjects: 32
Observation records: Active Missing
conc: 251 47
Total: 251 47
Number of parameters: Constant Optimized
0 9
Likelihood approximation: FOCE
Likelihood optimizer: BFGS
Termination Reason: NoXChange
Log-likelihood value: -350.49625
---------------------------------------------------------
Estimate SE 95.0% C.I.
---------------------------------------------------------
pop_CL 0.13465 0.0066548 [ 0.12161 ; 0.1477 ]
pop_Vc 8.0534 0.22109 [ 7.6201 ; 8.4867 ]
pop_tabs 0.55063 0.18705 [ 0.18402 ; 0.91723 ]
pop_lag 0.8716 0.056682 [ 0.7605 ; 0.98269 ]
pk_Ω₁,₁ 0.070656 0.024587 [ 0.022467 ; 0.11885 ]
pk_Ω₂,₂ 0.018293 0.0051507 [ 0.0081975; 0.028388]
pk_Ω₃,₃ 0.91324 0.40643 [ 0.11665 ; 1.7098 ]
σ_prop 0.090094 0.014521 [ 0.061634 ; 0.11855 ]
σ_add 0.39116 0.065402 [ 0.26298 ; 0.51935 ]
---------------------------------------------------------
This augments the printed table from above with the uncertainty information. A useful function to know is the coefficients_table
from PumasUtilities
. It needs the model and inference objects as inputs and returns a DataFrame
with model parameter names, descriptions from the model annotations, estimates, standard errors, relative standard errors, and confidence intercal limits.
coefficients_table(foce_fit5, inference_results)
Row | Parameter | Description | Constant | Estimate | Standard Error | Relative_Se | CI Lower | CI Upper |
---|---|---|---|---|---|---|---|---|
String | SubStrin… | Bool | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | pop_CL | Clearance (L/hr) | false | 0.135 | 0.007 | 0.049 | 0.122 | 0.148 |
2 | pop_Vc | Central volume (L) | false | 8.053 | 0.221 | 0.027 | 7.62 | 8.487 |
3 | pop_tabs | Absorption lag time (hr) | false | 0.551 | 0.187 | 0.34 | 0.184 | 0.917 |
4 | pop_lag | Lag time (hr) | false | 0.872 | 0.057 | 0.065 | 0.761 | 0.983 |
5 | pk_Ω₁,₁ | ΩCL: Clearance | false | 0.071 | 0.025 | 0.348 | 0.022 | 0.119 |
6 | pk_Ω₂,₂ | ΩVc: Central volume | false | 0.018 | 0.005 | 0.282 | 0.008 | 0.028 |
7 | pk_Ω₃,₃ | Ωtabs: Absorption lag time | false | 0.913 | 0.406 | 0.445 | 0.117 | 1.71 |
8 | σ_prop | σ_prop: Proportional error | false | 0.09 | 0.015 | 0.161 | 0.062 | 0.119 |
9 | σ_add | σ_add: Additive error | false | 0.391 | 0.065 | 0.167 | 0.263 | 0.519 |
In another tutorial we will look closer at different ways of calculating uncertainty for the maximum likelihood estimates.
3.3 Calculate the Condition Number
Some researchers like to include the condition number of the correlation matrix implied by the variance-covarinace matrix that was estimated by infer
. This is simply calculated by using the cond
function on the output of infer
above.
cond(inference_results)
50.12806079924137
The researcher may set a limit as to how large this number can be. Models with large condition numbers are sometimes labeled “unstable” and models with small condition numbers may be labeled “stable”. This categorization is contested and discussed among practitioners and the condition number should be reported and interpreted with care outside of the linear regression context where the interpretation was originally proposed.
3.4 Shrinkage
A Non-linear Mixed Effects Model includes random effects to represent individual specific variability and well as sampling noise. In cases where the model is over-parameterized compared to the true model or to the available data we may observe what has been called shrinkage in the literature. The thing that is shrinking is the distribution of empirical bayes estimates (η-shrinkage) or in the individual weighted residuals (ϵ-shrinkage). If there is insufficient data or the model is over-parameterized it will be possible to find exact or almost exact fits for each individual. There is not enough data to move the parameters away from the prior on the random effects and the individual distributions collapse into the population distribution.
3.4.1 Random Effect (eta) shrinkage
η-shrinkage is defined as \[ ηsh = 1-\frac{sd(\eta_i)}{\omega_\eta} \] Such that if there is agreement between the prior standard deviation (\(\omega_\eta\)) and the standard deviation of \(\eta_i\) we get no shrinkage or a shrinkage of 0. As \(sd(\eta_i)\) shrinks to 0 we get shrinkages closer and closer to 1 indicating an over-parameterization or lack individual data.
In Pumas, this number is simply calculated by invoking ηshrinkage
on the fitted model.
ηshrinkage(foce_fit)
(pk_η = [0.0072149466850324195, 0.1356596215313841, 0.4123026162888239],)
This returns a NamedTuple
with one element per ~
in the @random
-block.
3.4.2 Error Model (sigma) shrinkage
Sometimes a number called ϵ-shrinkage is reported as well. The definition is analogous to the above. It divides a standard deviation by another standard deviation and subtracts it from zero to gauge how well they match. The problem for the derived variables is that we often have heteroscedasticity built into our models. This means that the standard deviation is a function of the model prediction such as in the proportional or combined gaussian error models. Then, there is not a single standard deviation to compare to. For this reason, we compute the individual weighted residuals (IWRES) and calculate the standard deviation of that. Since IWRES is already weighted we simply subtract the standard deviation from 1. \[
ϵsh = 1-sd(IWRES_i)
\] The interpretation is as above. We can calculate this using the ϵshrinkage
in Pumas.
ϵshrinkage(foce_fit)
(conc = 0.13818444464688695,)
Will return a NamedTuple
with one element per ~
that defines a Normal
distribution in the @derived
-block.
3.5 Reporting Metrics
PumasUtilities
contains a convenience function called metrics_table
that can be used to report a number of metrics in a compact format.
metrics_table(foce_fit)
WARNING: using deprecated binding Distributions.MatrixReshaped in Pumas.
, use Distributions.ReshapedDistribution{2, S, D} where D<:Distributions.Distribution{Distributions.ArrayLikeVariate{1}, S} where S<:Distributions.ValueSupport instead.
Row | Metric | Value |
---|---|---|
Any | Any | |
1 | Successful | true |
2 | Estimation Time | 8.105 |
3 | Subjects | 32 |
4 | Fixed Parameters | 0 |
5 | Optimized Parameters | 9 |
6 | conc Active Observations | 251 |
7 | conc Missing Observations | 47 |
8 | Total Active Observations | 251 |
9 | Total Missing Observations | 47 |
10 | Likelihood Approximation | Pumas.FOCE{Optim.NewtonTrustRegion{Float64}, Optim.Options{Float64, Nothing}} |
11 | RawTypst("LogLikelihood (LL)") | -350.496 |
12 | RawTypst("-2LL") | 700.993 |
13 | AIC | 718.993 |
14 | BIC | 750.722 |
15 | RawTypst("(η-shrinkage) pk_η₁") | 0.007 |
16 | RawTypst("(η-shrinkage) pk_η₂") | 0.136 |
17 | RawTypst("(η-shrinkage) pk_η₃") | 0.412 |
18 | RawTypst("(ϵ-shrinkage) conc") | 0.138 |
This will return a DataFrame
with the metrics.
4 Concluding Remarks
This tutorial showcased a subset of the steps users want to perform after a model fit:
- Calculate uncertainty of the estimated parameters
- Calculate log-likelihood and information criteria for fit assessment and model comparison purposes
- Calculate shrinkage numbers to assess over-parametrization or excessive sparseness of data
Other tutorials will cover many more diagnostics steps to be done to asses the quality of the fitted model.