using Pumas
using PharmaDatasets
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using PumasUtilities
using Markdown
Interpreting Results from Pumas Fits
1 Introduction
After following a typical workflow for fitting a Pumas model it is important to assess and interpret the results. This includes steps such as
- Looking at log-likelihood related quantities such as
loglikelihood
,aic
andbic
to compare the fit to other existing fits. - Assessing convergence of the maximum likelihood optimization.
- Calculating the condition number of the correlation matrix implied by the estimated variance-covariance matrix.
- Calculating shrinkage for random effects and the dependent variable.
The following sections will walk through these steps using a one-compartment PK model for Warfarin, focusing on the PK aspects only.
2 Model and Data
2.1 Model Definition
Below is the PK model, named warfarin_pk_model
, defined in Pumas. This model contains:
- Fixed effects (population parameters):
pop_CL, pop_Vc, pop_tabs, pop_lag
- Inter-individual variability (IIV) components:
pk_Ω
- Residual error model parameters:
σ_prop, σ_add
- Covariates for scaling:
FSZCL
andFSZV
- Differential equations describing the PK behavior in the compartments.
= @model begin
warfarin_pk_model
@metadata begin
= "Warfarin 2-compartment PK model (PD removed)"
desc = u"hr"
timeu end
@param begin
# PK parameters
"""
Clearance (L/hr)
"""
∈ RealDomain(lower = 0.0, init = 0.134)
pop_CL """
Central volume (L)
"""
∈ RealDomain(lower = 0.0, init = 8.11)
pop_Vc """
Absorption lag time (hr)
"""
∈ RealDomain(lower = 0.0, init = 0.523)
pop_tabs """
Lag time (hr)
"""
∈ RealDomain(lower = 0.0, init = 0.1)
pop_lag
# Inter-individual variability
"""
- ΩCL: Clearance
- ΩVc: Central volume
- Ωtabs: Absorption lag time
"""
∈ PDiagDomain([0.01, 0.01, 0.01])
pk_Ω
# Residual variability
"""
σ_prop: Proportional error
"""
∈ RealDomain(lower = 0.0, init = 0.00752)
σ_prop """
σ_add: Additive error
"""
∈ RealDomain(lower = 0.0, init = 0.0661)
σ_add end
@random begin
~ MvNormal(pk_Ω) # mean = 0, covariance = pk_Ω
pk_η end
@covariates begin
"""
FSZCL: Clearance scaling factor
"""
FSZCL"""
FSZV: Volume scaling factor
"""
FSZVend
@pre begin
= FSZCL * pop_CL * exp(pk_η[1])
CL = FSZV * pop_Vc * exp(pk_η[2])
Vc
= pop_tabs * exp(pk_η[3])
tabs = log(2) / tabs
Ka end
@dosecontrol begin
= (Depot = pop_lag,)
lags end
@vars begin
:= Central / Vc
cp end
@dynamics begin
' = -Ka * Depot
Depot' = Ka * Depot - (CL / Vc) * Central
Centralend
@derived begin
"""
Concentration (ng/mL)
"""
~ @. Normal(cp, sqrt((σ_prop * cp)^2 + σ_add^2))
conc end
end
PumasModel
Parameters: pop_CL, pop_Vc, pop_tabs, pop_lag, pk_Ω, σ_prop, σ_add
Random effects: pk_η
Covariates: FSZCL, FSZV
Dynamical system variables: Depot, Central
Dynamical system type: Matrix exponential
Derived: conc
Observed: conc
2.2 Data Preparation
The Warfarin data used in this tutorial is pulled from PharmaDatasets
for demonstration purposes. Note how the code reshapes and prepares the data in “wide” format for reading into Pumas. Only the conc
column is treated as observations for the PK model.
= dataset("paganz2024/warfarin_long")
warfarin_data
# Step 1: Fix Duplicate Time Points
# -------------------------------
# Some subjects have duplicate time points for DVID = 1
# For this dataset, the triple (ID, TIME, DVID) should define
# a row uniquely, but
nrow(warfarin_data)
nrow(unique(warfarin_data, ["ID", "TIME", "DVID"]))
# We can identify the problematic rows by grouping on the index variables
@chain warfarin_data begin
@groupby :ID :TIME :DVID
@transform :tmp = length(:ID)
@rsubset :tmp > 1
end
# It is important to understand the reason for the duplicate values.
# Sometimes the duplication is caused by recording errors, sometimes
# it is a data processing error, e.g. when joining tables, or it can
# be genuine records, e.g. when samples have been analyzed in multiple
# labs. The next step depends on which of the causes are behind the
# duplications.
#
# In this case, we will assume that both values are informative and
# we will therefore just adjust the time stamp a bit for the second
# observation.
= @chain warfarin_data begin
warfarin_data_processed @groupby :ID :TIME :DVID
@transform :tmp = 1:length(:ID)
@rtransform :TIME = :tmp == 1 ? :TIME : :TIME + 1e-6
@select Not(:tmp)
end
# Transform the data in a single chain of operations
= @chain warfarin_data_processed begin
warfarin_data_wide @rsubset !contains(:ID, "#")
@rtransform begin
# Scaling factors
:FSZV = :WEIGHT / 70 # volume scaling
:FSZCL = (:WEIGHT / 70)^0.75 # clearance scaling (allometric)
# Column name for the DV
:DVNAME = "DV$(:DVID)"
# Dosing indicator columns
:CMT = ismissing(:AMOUNT) ? missing : 1
:EVID = ismissing(:AMOUNT) ? 0 : 1
end
unstack(Not([:DVID, :DVNAME, :DV]), :DVNAME, :DV)
rename!(:DV1 => :conc, :DV2 => :pca)
end
first(warfarin_data_wide, 10)
Row | ID | TIME | WEIGHT | AGE | SEX | AMOUNT | FSZV | FSZCL | CMT | EVID | DV0 | pca | conc |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String3 | Float64 | Float64 | Int64 | Int64 | Float64? | Float64 | Float64 | Int64? | Int64 | Float64? | Float64? | Float64? | |
1 | 1 | 0.0 | 66.7 | 50 | 1 | 100.0 | 0.952857 | 0.96443 | 1 | 1 | missing | missing | missing |
2 | 1 | 0.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 100.0 | missing |
3 | 1 | 24.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 49.0 | 9.2 |
4 | 1 | 36.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 32.0 | 8.5 |
5 | 1 | 48.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 26.0 | 6.4 |
6 | 1 | 72.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 22.0 | 4.8 |
7 | 1 | 96.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 28.0 | 3.1 |
8 | 1 | 120.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 33.0 | 2.5 |
9 | 2 | 0.0 | 66.7 | 31 | 1 | 100.0 | 0.952857 | 0.96443 | 1 | 1 | missing | missing | missing |
10 | 2 | 0.0 | 66.7 | 31 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 100.0 | missing |
2.3 Creating a Pumas Population
Below is the creation of a population object in Pumas using read_pumas
. Only the conc
data are treated as the observation variable:
= read_pumas(
pop
warfarin_data_wide;= :ID,
id = :TIME,
time = :AMOUNT,
amt = :CMT,
cmt = :EVID,
evid = [:SEX, :WEIGHT, :FSZV, :FSZCL],
covariates = [:conc],
observations )
Population
Subjects: 31
Covariates: SEX, WEIGHT, FSZV, FSZCL
Observations: conc
# A named tuple of parameter values
= (
param_vals = 0.134,
pop_CL = 8.11,
pop_Vc = 0.523,
pop_tabs = 0.1,
pop_lag = Diagonal([0.01, 0.01, 0.01]),
pk_Ω = 0.00752,
σ_prop = 0.0661,
σ_add
)
= fit(warfarin_pk_model, pop, param_vals, FOCE()) foce_fit
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter Function value Gradient norm
0 1.162661e+04 1.466365e+04
* time: 0.025365114212036133
1 2.406460e+03 3.085514e+03
* time: 1.021204948425293
2 1.659276e+03 2.059323e+03
* time: 1.0773899555206299
3 8.461002e+02 8.448119e+02
* time: 1.1318740844726562
4 5.854346e+02 4.065946e+02
* time: 1.1869699954986572
5 4.690447e+02 1.615353e+02
* time: 1.2430610656738281
6 4.366893e+02 5.711013e+01
* time: 1.2941889762878418
7 4.311441e+02 3.977658e+01
* time: 1.345263957977295
8 4.306433e+02 3.487033e+01
* time: 1.396347999572754
9 4.304605e+02 3.296424e+01
* time: 1.447127103805542
10 4.299081e+02 2.832467e+01
* time: 1.5609760284423828
11 4.290418e+02 2.641483e+01
* time: 1.6108300685882568
12 4.279296e+02 3.246969e+01
* time: 1.6602370738983154
13 4.271390e+02 4.261715e+01
* time: 1.7093570232391357
14 4.265564e+02 4.580960e+01
* time: 1.758587121963501
15 4.259866e+02 3.939591e+01
* time: 1.8074491024017334
16 4.253361e+02 2.541924e+01
* time: 1.8563799858093262
17 4.246790e+02 1.065400e+01
* time: 1.905911922454834
18 4.243666e+02 1.008258e+01
* time: 1.9565610885620117
19 4.242953e+02 9.698523e+00
* time: 2.0095551013946533
20 4.242637e+02 9.555638e+00
* time: 2.062530040740967
21 4.241737e+02 9.311705e+00
* time: 2.116131067276001
22 4.239779e+02 1.039981e+01
* time: 2.169063091278076
23 4.235278e+02 2.112614e+01
* time: 2.2540299892425537
24 4.226852e+02 4.178215e+01
* time: 2.3052849769592285
25 4.214158e+02 6.509012e+01
* time: 2.357930898666382
26 4.199242e+02 8.198864e+01
* time: 2.4097890853881836
27 4.180994e+02 8.928912e+01
* time: 2.462177038192749
28 4.147253e+02 1.054369e+02
* time: 2.516242027282715
29 4.034609e+02 9.933613e+01
* time: 2.5748190879821777
30 3.980708e+02 8.697802e+01
* time: 2.661442995071411
31 3.955519e+02 8.275864e+01
* time: 2.738002061843872
32 3.873347e+02 6.262355e+01
* time: 2.7946860790252686
33 3.832793e+02 7.480027e+01
* time: 2.851979970932007
34 3.752425e+02 6.106840e+01
* time: 2.9374709129333496
35 3.747485e+02 5.955920e+01
* time: 3.0083720684051514
36 3.733720e+02 5.190330e+01
* time: 3.0635058879852295
37 3.716220e+02 4.957865e+01
* time: 3.118527889251709
38 3.676007e+02 3.354968e+01
* time: 3.1739320755004883
39 3.652893e+02 2.435479e+01
* time: 3.2284951210021973
40 3.633350e+02 2.229678e+01
* time: 3.285543918609619
41 3.623167e+02 1.560549e+01
* time: 3.346513032913208
42 3.618529e+02 1.072487e+01
* time: 3.40645694732666
43 3.617373e+02 1.102560e+01
* time: 3.46566104888916
44 3.616625e+02 1.117617e+01
* time: 3.526158094406128
45 3.615728e+02 1.129104e+01
* time: 3.5859251022338867
46 3.614525e+02 1.142667e+01
* time: 3.664763927459717
47 3.612791e+02 1.160763e+01
* time: 3.7193970680236816
48 3.609742e+02 1.204366e+01
* time: 3.773339033126831
49 3.603227e+02 1.748493e+01
* time: 3.8282411098480225
50 3.588011e+02 2.344365e+01
* time: 3.883455991744995
51 3.553125e+02 4.035077e+01
* time: 3.9389710426330566
52 3.490364e+02 6.686821e+01
* time: 3.9982779026031494
53 3.444280e+02 3.741419e+01
* time: 4.060928106307983
54 3.426003e+02 3.103704e+01
* time: 4.1237170696258545
55 3.425553e+02 4.206850e+01
* time: 4.187546968460083
56 3.406514e+02 1.515976e+01
* time: 4.249005079269409
57 3.404596e+02 4.108500e+01
* time: 4.311809062957764
58 3.402361e+02 2.379647e+01
* time: 4.3919970989227295
59 3.400932e+02 2.407116e+01
* time: 4.445549011230469
60 3.396735e+02 2.432991e+01
* time: 4.499437093734741
61 3.389708e+02 2.664139e+01
* time: 4.553574085235596
62 3.366166e+02 3.272679e+01
* time: 4.608957052230835
63 3.311055e+02 6.090289e+01
* time: 4.667623996734619
64 3.282486e+02 4.297799e+01
* time: 4.727225065231323
65 3.252658e+02 2.043436e+01
* time: 4.787807941436768
66 3.228006e+02 1.400875e+01
* time: 4.849309921264648
67 3.206502e+02 1.022665e+01
* time: 4.909164905548096
68 3.203177e+02 5.080303e+00
* time: 4.96812891960144
69 3.200898e+02 3.214824e+00
* time: 5.051217079162598
70 3.200164e+02 5.239993e+00
* time: 5.104501962661743
71 3.199774e+02 3.229390e+00
* time: 5.157756090164185
72 3.199714e+02 1.117530e+00
* time: 5.210805892944336
73 3.199711e+02 7.270447e-01
* time: 5.2631330490112305
74 3.199710e+02 7.165060e-01
* time: 5.312026023864746
75 3.199707e+02 7.049190e-01
* time: 5.36190390586853
76 3.199702e+02 6.943763e-01
* time: 5.414783000946045
77 3.199687e+02 1.317285e+00
* time: 5.470673084259033
78 3.199653e+02 2.280205e+00
* time: 5.527374982833862
79 3.199582e+02 3.290792e+00
* time: 5.583611011505127
80 3.199475e+02 3.541242e+00
* time: 5.6400721073150635
81 3.199386e+02 2.314603e+00
* time: 5.697320938110352
82 3.199355e+02 6.986412e-01
* time: 5.777375936508179
83 3.199351e+02 9.990660e-02
* time: 5.8290510177612305
84 3.199351e+02 9.912811e-02
* time: 5.878021955490112
85 3.199351e+02 9.910770e-02
* time: 5.925197124481201
FittedPumasModel
Successful minimization: false
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -319.93509
Number of subjects: 31
Number of parameters: Fixed Optimized
0 9
Observation records: Active Missing
conc: 239 47
Total: 239 47
-----------------------
Estimate
-----------------------
pop_CL 0.13065
pop_Vc 8.0708
pop_tabs 0.37156
pop_lag 0.9261
pk_Ω₁,₁ 0.051091
pk_Ω₂,₂ 0.018918
pk_Ω₃,₃ 1.1401
σ_prop 0.090635
σ_add 0.3336
-----------------------
2.4 Evaluating the termination of the fit
Fitting non-linear mixed effects model is not always a simple task. The model has to be appropriate to the data, data must be accurately sampled and recorded, no information should be truncated, and so on for the fits to run reliably. Even in the most perfect version of data collection and modeling there can still be numerical challenges in evaluating various expressions, calculating empirical bayes estimates in an inner loop, handling vanishing variances, local optima and more.
In Pumas we use Optim.jl (Mogensen and Riseth 2018) to perform the minimization of any FO
, FOCE
, or LaplaceI
fits with out without MAP
for maximum aposterori fitting. The returned FittedPumasModel
contains the optimization results structure from Optim.jl in the optim
field. This can be useful when assessing convergence.
Convergence in non-linear optimization is not always a simply task to assess:
- We can look at the gradient element sizes and we often look at the infinity norm of the gradient to this end. The infinity norm of a vector picks out the largest absolute value of the elements. By default we require this to be less than
1e-3
to declare convergence, but some problems may be more or less numerically stable so that number may be large or small depending on the circumstances.
- Gradient Elements
- The gradient is a vector that tells us which direction to move to increase a function’s value the fastest
- Each element in this vector corresponds to how much the function changes when we adjust one parameter
- “Gradient element sizes” refers to the magnitude (absolute value) of each of these directional changes
To illustrate with a simple example, consider a model with 3 parameters (like CL
, V
, and Ka
), a gradient might look like:
= [0.0001, -0.002, 0.0005] gradient
3-element Vector{Float64}:
0.0001
-0.002
0.0005
Here:
- 0.0001 tells us how much the function would change if we adjust the first parameter (
CL
) - -0.002 for the second parameter (
V
) - 0.0005 for the third parameter (
Ka
)
- Infinity Norm of the Gradient
- The infinity norm is simply the largest absolute value among all gradient elements
- In our example above:
|0.0001| = 0.0001
|-0.002| = 0.002 # <- This is the largest!
|0.0005| = 0.0005
Therefore, infinity norm = 0.002
Why This Matters for Convergence:
- When optimizing a model (finding the best parameters), we want all gradient elements to be close to zero
- A large gradient element means we can still improve our solution by moving in that direction
- A small gradient (all elements close to zero) suggests we’re at or near the optimal solution
- By convention in Pumas, if the infinity norm is less than 0.001 (1e-3), we consider the optimization “converged”
Think of it like hiking to find the highest point:
- The gradient tells you which direction to walk
- The size of each gradient element tells you how steep it is in each direction
- When you’re at the top, it should be flat in all directions (all gradient elements close to zero)
- The infinity norm tells you about the steepest direction you could still go
- If the steepest direction is very gentle (< 0.001), you’re probably at the peak!
- We do not stop according to small changes in the parameters or objective function value by default. This is because it is typically not a direct sign of convergence but rather that progress is stalled. That said, many researches set a relative change tolerance in the parameters in the same way that is done in NONMEM because often you will see many final iterations with next to no improvement. This can be done by setting
optim_options = (x_reltol = 1e-5)
infit
to match NONMEM for example.
Let us consider a bunch of different cases and how you can find and interpret the information in the optim
field. This looks something like the following.
julia> foce_fit.optim
L1 * Status: success
L2
L3 * Candidate solution
L4 Final objective value: 3.199344e+02
L5
L6 * Found with
L7 Algorithm: BFGS
L8
L9 * Convergence measures
L10 |x - x'| = 4.36e-04 ≰ 0.0e+00
L11 |x - x'|/|x'| = 1.10e-04 ≰ 0.0e+00
L12 |f(x) - f(x')| = 3.57e-06 ≰ 0.0e+00
L13 |f(x) - f(x')|/|f(x')| = 1.12e-08 ≰ 0.0e+00
L14 |g(x)| = 5.62e-04 ≤ 1.0e-03
L15
L16 * Work counters
L17 Seconds run: 1 (vs limit Inf)
L18 Iterations: 95
L19 f(x) calls: 101
L20 ∇f(x) calls: 97
We will use the line numbers to the left below to refer to the relevant part of the output. While interpreting the output it may be worth remembering that the optimization is cast as the problem of minimizing the negative log of the marginal likelihood.
2.4.1 Case 1: gradient convergence
Since a (local) minimum of a smooth optimization function satisfies that the partial derivatives are zero we will often look at the norm of the gradient to declare convergence. The default value for the tolerance set as the g_tol
option in optim_options
in Pumas is 1e-3
. The final state of the measure is found in L14
in the output above where it is seen that the tolerance was not met. This tolerance is set to match what is often possible to achieve in well-specified models, but smaller is always better. To derive a theoretically well-reasoned value we would need to know the Hessian at the optimum as this is not possible to know before running a fit and very expensive to calculate exactly after a fit. Several factors affect the level that can be reached in practice including the accuracy of the solution of the embedded dynamical system and the curvature of the objective function surface. If the gradient norm is lower than 1e-3 we will in general accept the fit as having converged, but “near convergence” is often accepted as well if there are not other indications of issues with the model.
2.4.2 Case 2: no change in parameters
If x_abstol
or x_reltol
was not manually set by the user we will only terminate the optimization procedure if no change in parameters could be made at all from one iteration to the next. The final state of these termination tests are found in L10
and L11
respectively. This can indicate a few things but essentially indicates that the procedure that scales the search direction to find the next value of the parameters failed. If this occurs with a relatively small gradient norm we may accept the solution though the termination reason cannot be used to conclude convergence in isolation. The reasons why no step led to an improvement of the objective function value can be many. Some reasons include:
- The objective function is not smooth due to numerical noise from calculations in the objective function including model solutions.
- The inverse Hessian approximation in the BFGS algorithm is far from the true inverse Hessian of the objective function. Restarting the fit may help.
As mentioned above, NONMEM sets x_reltol=1e-5
by default but it is 0 in Pumas.
2.4.3 Case 3: small change in parameters
As mentioned above, Pumas defaults to only terminate due to the size of the change in parameters from one iteration to the next is exactly zero. If the relative change tolerance x_reltol
is set to a number such as 1e-5
to match other software such as NONMEM then the user may well accept the solution since they deliberately used this criterion. In general, we advise using this setting with caution as it can terminate with quite large gradient norm.
Again, this information is seen in L10
and L11
but the limit after the inequality will change to the number provided.
2.4.4 Case 4: no change in objective function
This case is related to Case 2. We do not allow for termination due to the absence of improvement in the objective function so in general this should be seen as a failure. The rest of the comments about reasons for stopping, acceptance based on gradient norms etc applies here as well.
The information about these termination tests can be found in L12
and L13
.
2.4.5 Case 5: small change in objective function
Just a Case 4 is related to Case 2 this case is related to Case 3. A very small improvement is often not a valid reason to accept the returned solution as the maximum likelihood estimates. If the user has good reason to actively sets one of the options f_abstol
or f_reltol
the result may be accepted.
As in Case 4 the information can be found in L12
and L13
but the right-hand side of the inequalities are updated as in Case 3.
2.4.6 Case 6: “maximum number of iterations reached”
The pre-specified limit on the number of iterations was reached. This can be increased in optim_options
by setting the iterations
keyword. This defaults to 1000 which should generally be plenty so termination according to this criterion is not a sign of convergence. The output may be accepted if the solution seems to cycle between parameter values with a gradient norm close to the prescribed tolerance.
The number iterations can be found in L18
.
2.4.7 Case 7: line search failed
This case is related to Case 2, 3, 4, and 5. Given the search direction it was not possible to find a step that sufficiently decreases the objective function value. The advice found in Case 2 applies here as well.
In the output this will be indicates in L1
.
2.4.8 Case 8: “user provided callback termination”
This is an advanced use-case where a user-provided callback function returned the value true
to indicate that the optimization should terminate. Can only happen if the user supplied a custom callback function as part of optim_options
.
In the output this will be indicated in L1
.
2.4.9 Case 9: “time limit reached”
The optimization routine allows for a soft time limit that is checked after every iteration. The default allows for infinite run-time, but if the time_limit
option in optim_options
is set this termination reason may occur.
In the output this will be indicated in L17
.
2.4.10 Case 10: “objective evaluation limit reached
The optimization routine allows for a limit to be placed on the total number of objective function evaluations through the f_calls_limit
option in optim_options
. It is not set by default in Pumas. The interpretation is similar to Case 6.
In the output this will be indicated in L19
.
2.4.11 Case 11: “gradient evaluation limit reached
The optimization routine allows for a limit to be placed on the total number of gradient function evaluations through the g_calls_limit
option in optim_options
. It is not set by default in Pumas. The interpretation is similar to Case 6.
In the output this will be indicated in L20
.
2.4.12 Case 12: “hessian evaluation limit reached
The optimization routine allows for a limit to be placed on the total number of hessian function evaluations through the h_calls_limit
option in optim_options
. It is not set by default in Pumas. The interpretation is similar to Case 6.
In the output this would be placed after L20
if it was present, but it is not because the BFGS algorithm only maintains an approximation to the inverse Hessian it doesn’t actually compute and utilize the real Hessian.
2.4.13 Case 13: “objective function increased”
This can happen if the allow_f_increases
option is set to false
in optim_options
. The default is true
. The causes are the same as in Case 2.
This would be indicated in L1
if it failed.
2.5 Optimization didn’t succeed - then what?
If the optimization did not success according to the description above it is always important to think about what to do to improve the situation if possible. If the model is or initial parameters are grossly wrong there may not be much to do about the failed optimization. It may always be a good idea to try to reduce the complexity of the model by removing random effects that cannot be supported by the data or fixing parameters that may be know with high certainty from the relevant literature.
Before changing the model too much it may be worth trying a few different things.
2.5.1 Try different starting parameters
Given the complexity and high number of parameters in many NLME models it is always important to initialize the fit from a set of starting parameters instead of only one parameter NamedTuple
. It is quite likely that the optimization can end up in a local minimum that is not the global minimum, so we need to check the robustness of our maximum likelihood estimate candidates. Performing this robustness step is explained more in detail in a different tutorial, but just to show the effect we can try with a different initial fixed effect guess for volume of distribution.
= fit(
foce_fit2
warfarin_pk_model,
pop,merge(param_vals, (; pop_Vc = 12.0)),
FOCE();
= (show_trace = false,),
optim_options )
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -319.93473
Number of subjects: 31
Number of parameters: Fixed Optimized
0 9
Observation records: Active Missing
conc: 239 47
Total: 239 47
-----------------------
Estimate
-----------------------
pop_CL 0.13064
pop_Vc 8.0709
pop_tabs 0.37094
pop_lag 0.92625
pk_Ω₁,₁ 0.05099
pk_Ω₂,₂ 0.019014
pk_Ω₃,₃ 1.1482
σ_prop 0.090596
σ_add 0.3337
-----------------------
2.5.2 Resetting the optimization procedure
Whenever a model fit concludes we will have ended up with a final set of fixed effect estimates (population level parameters) and well as a set of empirical bayes estimates. We also end up with a final inverse Hessian approximation. There are ways to restart the optimization at the final fixed effects parameters while resetting the random effects to the central value of the prior and to reset the optimization procedure.
To restart the optimization procedure and use the final fixed effects and empirical bayes estimates use the following.
= fit(foce_fit2; reset_optim = true) foce_fit3
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -319.93444
Number of subjects: 31
Number of parameters: Fixed Optimized
0 9
Observation records: Active Missing
conc: 239 47
Total: 239 47
-----------------------
Estimate
-----------------------
pop_CL 0.13064
pop_Vc 8.0712
pop_tabs 0.37132
pop_lag 0.92611
pk_Ω₁,₁ 0.050905
pk_Ω₂,₂ 0.01913
pk_Ω₃,₃ 1.1399
σ_prop 0.090621
σ_add 0.33356
-----------------------
2.5.3 Resetting the optimization procedure and the empirical Bayes estimates
To restart both, we can simply invoke fit
as normal with coef(foce_fit3)
as the initial fixed effects.
= fit(
foce_fit4
warfarin_pk_model,
pop,coef(foce_fit3),
FOCE();
= (show_trace = false,),
optim_options )
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -319.93444
Number of subjects: 31
Number of parameters: Fixed Optimized
0 9
Observation records: Active Missing
conc: 239 47
Total: 239 47
-----------------------
Estimate
-----------------------
pop_CL 0.13064
pop_Vc 8.0712
pop_tabs 0.37132
pop_lag 0.92611
pk_Ω₁,₁ 0.050905
pk_Ω₂,₂ 0.01913
pk_Ω₃,₃ 1.1399
σ_prop 0.090621
σ_add 0.33356
-----------------------
Restarting the optimization is often a good way to avoid problems with gradient convergence.
2.5.4 A computationally expensive alternative
If gradient convergence below a certain threshold is very important it can often be achieved with GradientDescent
. Gradient Descent has very good asymptotic (for iterations going to infinity) properties and avoids potential numerical issues with the inverse Hessian approximation in situations with numerical noise in the objective that could for example come from numerical integration of the dynamical system. However, since it is not using (an approximation to the) second order information, it may also take quite large steps that lead to extreme values of the parameters causing warnings to appear on screen from the ODE model solvers and it may well use a lot of iterations to reach the prescribed tolerance. Consider the following.
= fit(
foce_fit5
warfarin_pk_model,
pop,coef(foce_fit3),
FOCE();
= Pumas.Optim.GradientDescent(),
optim_alg = (g_tol = 1e-5, show_trace = false),
optim_options
) foce_fit5.optim
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
* Status: success
* Candidate solution
Final objective value: 3.199344e+02
* Found with
Algorithm: Gradient Descent
* Convergence measures
|x - x'| = 2.73e-08 ≰ 0.0e+00
|x - x'|/|x'| = 6.90e-09 ≰ 0.0e+00
|f(x) - f(x')| = 4.65e-06 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 1.45e-08 ≰ 0.0e+00
|g(x)| = 8.79e-06 ≤ 1.0e-05
* Work counters
Seconds run: 50 (vs limit Inf)
Iterations: 395
f(x) calls: 989
∇f(x) calls: 989
We were indeed able to get a very low gradient tolerance, but notice the many iterations used! This may simply not be feasible for very large data sets or computationally very expensive models.
3 Calculating Some Key Diagnostics
3.1 Model Fit and Comparison
When we specify FOCE()
, LaplaceI()
or FO()
we are doing Maximum Likelihood Estimation. This also means that the most natural number we may have to gauge and compare model fit is the log-likelihood. If we want to extract the log-likelihood at the maximum likelihood estimates we can simply use the loglikelihood
function that has a method for just a FittedPumasModel
.
loglikelihood(foce_fit5)
-319.9344482247824
We see that this is the same number as in the printed output although it was obviously rounded there. We can then compare this number to the log-likelihood of another fitted model, and if they are nested we can even do a log-likelihood ratio test with lrtest
. Please consult the docstring for how to use this function ?lrtest
.
Other more general approaches of model comparison involves information criteria such as aic
, aicc
, bic
, and more. These all have methods for FittedPumasModel
so to calculate them simply use them as follows
= aic(foce_fit5), bic(foce_fit5) aic_fit, bic_fit
(657.8688964495648, 689.1570684169484)
3.2 Computing Parameter Precision with infer
The infer
function in Pumas estimates the uncertainty (precision) of parameter estimates. Depending on the chosen method, infer
can provide standard errors, confidence intervals, correlation matrices, and condition numbers.
An example usage:
= infer(foce_fit4; level = 0.95) inference_results
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -319.93444
Number of subjects: 31
Number of parameters: Fixed Optimized
0 9
Observation records: Active Missing
conc: 239 47
Total: 239 47
---------------------------------------------------------------------
Estimate SE 95.0% C.I.
---------------------------------------------------------------------
pop_CL 0.13064 0.0054839 [ 0.1199 ; 0.14139 ]
pop_Vc 8.0712 0.23799 [ 7.6048 ; 8.5377 ]
pop_tabs 0.37132 0.17068 [ 0.036798 ; 0.70584 ]
pop_lag 0.92611 0.04848 [ 0.83109 ; 1.0211 ]
pk_Ω₁,₁ 0.050905 0.017225 [ 0.017143 ; 0.084666]
pk_Ω₂,₂ 0.01913 0.0056755 [ 0.0080065; 0.030254]
pk_Ω₃,₃ 1.1399 0.70819 [-0.24809 ; 2.528 ]
σ_prop 0.090621 0.014443 [ 0.062313 ; 0.11893 ]
σ_add 0.33356 0.08694 [ 0.16316 ; 0.50396 ]
---------------------------------------------------------------------
This augments the printed table from above with the uncertainty information. A useful function to know is the coefficients_table
from PumasUtilities
. It needs the model and inference objects as inputs and returns a DataFrame
with model parameter names, descriptions from the model annotations, estimates, standard errors, relative standard errors, and confidence intercal limits.
coefficients_table(foce_fit5, inference_results)
Row | Parameter | Description | Estimate | Standard Error | Relative_Se | CI Lower | CI Upper |
---|---|---|---|---|---|---|---|
String | Abstract… | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | pop_CL | Clearance (L/hr)\n | 0.131 | 0.005 | 0.042 | 0.12 | 0.141 |
2 | pop_Vc | Central volume (L)\n | 8.071 | 0.238 | 0.029 | 7.605 | 8.538 |
3 | pop_tabs | Absorption lag time (hr)\n | 0.371 | 0.171 | 0.46 | 0.037 | 0.706 |
4 | pop_lag | Lag time (hr)\n | 0.926 | 0.048 | 0.052 | 0.831 | 1.021 |
5 | pk_Ω₁,₁ | ΩCL: Clearance | 0.051 | 0.017 | 0.338 | 0.017 | 0.085 |
6 | pk_Ω₂,₂ | ΩVc: Central volume | 0.019 | 0.006 | 0.297 | 0.008 | 0.03 |
7 | pk_Ω₃,₃ | Ωtabs: Absorption lag time | 1.14 | 0.708 | 0.621 | -0.248 | 2.528 |
8 | σ_prop | σ_prop: Proportional error\n | 0.091 | 0.014 | 0.159 | 0.062 | 0.119 |
9 | σ_add | σ_add: Additive error\n | 0.334 | 0.087 | 0.261 | 0.163 | 0.504 |
In another tutorial we will look closer at different ways of calculating uncertainty for the maximum likelihood estimates.
3.3 Calculate the Condition Number
Some researchers like to include the condition number of the correlation matrix implied by the variance-covarinace matrix that was estimated by infer
. This is simply calculated by using the cond
function on the output of infer
above.
cond(inference_results)
77.88261093269588
The researcher may set a limit as to how large this number can be. Models with large condition numbers are sometimes labeled “unstable” and models with small condition numbers may be labeled “stable”. This categorization is contested and discussed among practitioners and the condition number should be reported and interpreted with care outside of the linear regression context where the interpretation was originally proposed.
3.4 Shrinkage
A Non-linear Mixed Effects Model includes random effects to represent individual specific variability and well as sampling noise. In cases where the model is over-parameterized compared to the true model or to the available data we may observe what has been called shrinkage in the literature. The thing that is shrinking is the distribution of empirical bayes estimates (η-shrinkage) or in the individual weighted residuals (ϵ-shrinkage). If there is insufficient data or the model is over-parameterized it will be possible to find exact or almost exact fits for each individual. There is not enough data to move the parameters away from the prior on the random effects and the individual distributions collapse into the population distribution.
3.4.1 Random Effect (eta) shrinkage
η-shrinkage is defined as \[ ηsh = 1-\frac{sd(\eta_i)}{\omega_\eta} \] Such that if there is agreement between the prior standard deviation (\(\omega_\eta\)) and the standard deviation of \(\eta_i\) we get no shrinkage or a shrinkage of 0. As \(sd(\eta_i)\) shrinks to 0 we get shrinkages closer and closer to 1 indicating an over-parameterization or lack individual data.
In Pumas, this number is simply calculated by invoking ηshrinkage
on the fitted model.
ηshrinkage(foce_fit)
(pk_η = [0.009845537426213302, 0.12463273678558184, 0.45335870080737173],)
This returns a NamedTuple
with one element per ~
in the @random
-block.
3.4.2 Error Model (sigma) shrinkage
Sometimes a number called ϵ-shrinkage is reported as well. The definition is analogous to the above. It divides a standard deviation by another standard deviation and subtracts it from zero to gauge how well they match. The problem for the derived variables is that we often have heteroscedasticity built into our models. This means that the standard deviation is a function of the model prediction such as in the proportional or combined gaussian error models. Then, there is not a single standard deviation to compare to. For this reason, we compute the individual weighted residuals (IWRES) and calculate the standard deviation of that. Since IWRES is already weighted we simply subtract the standard deviation from 1. \[
ϵsh = 1-sd(IWRES_i)
\] The interpretation is as above. We can calculate this using the ϵshrinkage
in Pumas.
ϵshrinkage(foce_fit)
(conc = 0.13757846337857282,)
Will return a NamedTuple
with one element per ~
that defines a Normal
distribution in the @derived
-block.
3.5 Reporting Metrics
PumasUtilities
contains a convenience function called metrics_table
that can be used to report a number of metrics in a compact format.
metrics_table(foce_fit)
Row | Metric | Value |
---|---|---|
String | Any | |
1 | Successful | false |
2 | Estimation Time | 5.925 |
3 | Subjects | 31 |
4 | Fixed Parameters | 0 |
5 | Optimized Parameters | 9 |
6 | conc Active Observations | 239 |
7 | conc Missing Observations | 47 |
8 | Total Active Observations | 239 |
9 | Total Missing Observations | 47 |
10 | Likelihood Approximation | Pumas.FOCE{Optim.NewtonTrustRegion{Float64}, Optim.Options{Float64, Nothing}} |
11 | LogLikelihood (LL) | -319.935 |
12 | -2LL | 639.87 |
13 | AIC | 657.87 |
14 | BIC | 689.158 |
15 | (η-shrinkage) pk_η₁ | 0.01 |
16 | (η-shrinkage) pk_η₂ | 0.125 |
17 | (η-shrinkage) pk_η₃ | 0.453 |
18 | (ϵ-shrinkage) conc | 0.138 |
This will return a DataFrame
with the metrics.
4 Concluding Remarks
This tutorial showcased a subset of the steps users want to perform after a model fit:
- Calculate uncertainty of the estimated parameters
- Calculate log-likelihood and information criteria for fit assessment and model comparison purposes
- Calculate shrinkage numbers to assess over-parametrization or excessive sparseness of data
Other tutorials will cover many more diagnostics steps to be done to asses the quality of the fitted model.