using Pumas
using PharmaDatasets
using DataFramesMeta
using PumasUtilities
using StatsBase # for cov2cor
Calculating Parameter Uncertainty
1 Introduction
A typical workflow for fitting a Pumas model and deriving parameter precision typically involves:
- Preparing the data and the model.
- Checking model-data compatibility.
- Obtaining initial parameter estimates.
- Fitting the model via a chosen estimation method.
- Interpreting the fit results.
- Computing parameter precision.
- (Optionally) proceeding with more advanced techniques like bootstrapping or SIR for robust uncertainty quantification.
The following sections will walk through these steps using a two-compartment PK model for Warfarin, focusing on the PK aspects only. Exploratory data analysis (EDA), although extremely important, is out of scope here. Readers interested in EDA are encouraged to consult other tutorials.
2 Model and Data
2.1 Model Definition
Below is the PK model, named warfarin_pk_model
, defined in Pumas. This model contains:
- Fixed effects (population parameters):
pop_CL, pop_Vc, pop_Q, pop_Vp, pop_tabs, pop_lag
- Inter-individual variability (IIV) components:
pk_Ω, lag_ω
- Residual error model parameters:
σ_prop, σ_add
- Covariates for scaling:
FSZCL
andFSZV
- Differential equations describing the PK behavior in the compartments.
= @model begin
warfarin_pk_model @metadata begin
= "Warfarin 2-compartment PK model (PD removed)"
desc = u"hr"
timeu end
@param begin
# PK parameters
"""
Clearance (L/hr)
"""
∈ RealDomain(lower = 0.0, upper = 13.4, init = 0.134)
pop_CL """
Central volume (L)
"""
∈ RealDomain(lower = 0.0, upper = 81.1, init = 8.11)
pop_Vc """
Inter-compartmental clearance (L/hr)
"""
∈ RealDomain(lower = 0.0, upper = 5.0, init = 0.5)
pop_Q """
Peripheral volume (L)
"""
∈ RealDomain(lower = 0.0, upper = 200.0, init = 20.0)
pop_Vp """
Absorption lag time (hr)
"""
∈ RealDomain(lower = 0.0, upper = 5.23, init = 0.523)
pop_tabs """
Lag time (hr)
"""
∈ RealDomain(lower = 0.0, upper = 5.0, init = 0.1)
pop_lag
# Inter-individual variability
"""
- ΩCL: Clearance
"""
∈ PDiagDomain([0.01])
pk_Ω """
lag_ω: Lag time
"""
∈ RealDomain(lower = 0.0, upper = 2.0, init = 0.1)
lag_ω
# Residual variability
"""
σ_prop: Proportional error
"""
∈ RealDomain(lower = 0.0, init = 0.00752)
σ_prop """
σ_add: Additive error
"""
∈ RealDomain(lower = 0.0, init = 0.0661)
σ_add end
@random begin
~ MvNormal(pk_Ω) # mean = 0, covariance = pk_Ω
pk_η ~ Normal(0.0, lag_ω)
lag_η end
@covariates begin
"""
FSZCL: Clearance scaling factor
"""
FSZCL"""
FSZV: Volume scaling factor
"""
FSZVend
@pre begin
= FSZCL * pop_CL
CL = FSZV * pop_Vc * exp(pk_η[1])
Vc = FSZCL * pop_Q
Q = FSZV * pop_Vp
Vp
= pop_tabs
tabs = log(2) / tabs
Ka end
@dosecontrol begin
= (Depot = pop_lag * exp(lag_η),)
lags end
@vars begin
:= Central / Vc
cp end
@dynamics begin
' = -Ka * Depot
Depot' =
Central* Depot - (CL / Vc) * Central - (Q / Vc) * Central + (Q / Vp) * Peripheral
Ka ' = (Q / Vc) * Central - (Q / Vp) * Peripheral
Peripheralend
@derived begin
"""
Concentration (ng/mL)
"""
~ @. Normal(cp, sqrt((σ_prop * cp)^2 + σ_add^2))
conc end
end
PumasModel
Parameters: pop_CL, pop_Vc, pop_Q, pop_Vp, pop_tabs, pop_lag, pk_Ω, lag_ω, σ_prop, σ_add
Random effects: pk_η, lag_η
Covariates: FSZCL, FSZV
Dynamical system variables: Depot, Central, Peripheral
Dynamical system type: Matrix exponential
Derived: conc
Observed: conc
2.2 Data Preparation
The Warfarin data used in this tutorial is pulled from PharmaDatasets
for demonstration purposes. Note how the code reshapes and prepares the data in “wide” format for reading into Pumas. Only the conc
column is treated as observations for the PK model.
= dataset("paganz2024/warfarin_long")
warfarin_data
# Minor numeric adjustment to avoid duplicate time points
133, 135, 137, 139], :TIME] += 1e-6
@. warfarin_data[[
# Transform the data in a single chain of operations
= @chain warfarin_data begin
warfarin_data_wide @rsubset !contains(:ID, "#")
@rtransform begin
# Scaling factors
:FSZV = :WEIGHT / 70 # volume scaling
:FSZCL = (:WEIGHT / 70)^0.75 # clearance scaling (allometric)
# Column name for the DV
:DVNAME = "DV$(:DVID)"
# Dosing indicator columns
:CMT = ismissing(:AMOUNT) ? missing : 1
:EVID = ismissing(:AMOUNT) ? 0 : 1
end
unstack(Not([:DVID, :DVNAME, :DV]), :DVNAME, :DV)
rename!(:DV1 => :conc, :DV2 => :pca)
end
Row | ID | TIME | WEIGHT | AGE | SEX | AMOUNT | FSZV | FSZCL | CMT | EVID | DV0 | pca | conc |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String3 | Float64 | Float64 | Int64 | Int64 | Float64? | Float64 | Float64 | Int64? | Int64 | Float64? | Float64? | Float64? | |
1 | 1 | 0.0 | 66.7 | 50 | 1 | 100.0 | 0.952857 | 0.96443 | 1 | 1 | missing | missing | missing |
2 | 1 | 0.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 100.0 | missing |
3 | 1 | 24.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 49.0 | 9.2 |
4 | 1 | 36.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 32.0 | 8.5 |
5 | 1 | 48.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 26.0 | 6.4 |
6 | 1 | 72.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 22.0 | 4.8 |
7 | 1 | 96.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 28.0 | 3.1 |
8 | 1 | 120.0 | 66.7 | 50 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 33.0 | 2.5 |
9 | 2 | 0.0 | 66.7 | 31 | 1 | 100.0 | 0.952857 | 0.96443 | 1 | 1 | missing | missing | missing |
10 | 2 | 0.0 | 66.7 | 31 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | 100.0 | missing |
11 | 2 | 0.5 | 66.7 | 31 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | missing | 0.0 |
12 | 2 | 2.0 | 66.7 | 31 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | missing | 8.4 |
13 | 2 | 3.0 | 66.7 | 31 | 1 | missing | 0.952857 | 0.96443 | missing | 0 | missing | missing | 9.7 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
306 | 31 | 48.0 | 83.3 | 24 | 1 | missing | 1.19 | 1.13936 | missing | 0 | missing | 24.0 | 6.4 |
307 | 31 | 72.0 | 83.3 | 24 | 1 | missing | 1.19 | 1.13936 | missing | 0 | missing | 22.0 | 4.5 |
308 | 31 | 96.0 | 83.3 | 24 | 1 | missing | 1.19 | 1.13936 | missing | 0 | missing | 28.0 | 3.4 |
309 | 31 | 120.0 | 83.3 | 24 | 1 | missing | 1.19 | 1.13936 | missing | 0 | missing | 42.0 | 2.5 |
310 | 32 | 0.0 | 62.0 | 21 | 1 | 93.0 | 0.885714 | 0.912999 | 1 | 1 | missing | missing | missing |
311 | 32 | 0.0 | 62.0 | 21 | 1 | missing | 0.885714 | 0.912999 | missing | 0 | missing | 100.0 | missing |
312 | 32 | 24.0 | 62.0 | 21 | 1 | missing | 0.885714 | 0.912999 | missing | 0 | missing | 36.0 | 8.9 |
313 | 32 | 36.0 | 62.0 | 21 | 1 | missing | 0.885714 | 0.912999 | missing | 0 | missing | 27.0 | 7.7 |
314 | 32 | 48.0 | 62.0 | 21 | 1 | missing | 0.885714 | 0.912999 | missing | 0 | missing | 24.0 | 6.9 |
315 | 32 | 72.0 | 62.0 | 21 | 1 | missing | 0.885714 | 0.912999 | missing | 0 | missing | 23.0 | 4.4 |
316 | 32 | 96.0 | 62.0 | 21 | 1 | missing | 0.885714 | 0.912999 | missing | 0 | missing | 20.0 | 3.5 |
317 | 32 | 120.0 | 62.0 | 21 | 1 | missing | 0.885714 | 0.912999 | missing | 0 | missing | 22.0 | 2.5 |
2.3 Creating a Pumas Population
Below is the creation of a population object in Pumas using read_pumas
. Only the conc
data are treated as the observation variable:
= read_pumas(
pop
warfarin_data_wide;= :ID,
id = :TIME,
time = :AMOUNT,
amt = :CMT,
cmt = :EVID,
evid = [:SEX, :WEIGHT, :FSZV, :FSZCL],
covariates = [:conc],
observations )
Population
Subjects: 31
Covariates: SEX, WEIGHT, FSZV, FSZCL
Observations: conc
The same data can contain multiple endpoints or PD observations. In this tutorial, the focus is solely on PK fitting. PKPD modeling on this warfarin dataset will be introduced later.
Also note, that parameter inference can be expensive and for that reason we simplified the model for this tutorial to decrease overall runtime.
2.4 Checking Model-Data Compatibility
Before performing any fit, it is recommended to verify whether the defined model can handle the provided dataset. Pumas offers functions such as loglikelihood
and findinfluential
for these checks.
2.4.1 The loglikelihood
Function
The loglikelihood
function computes the log-likelihood of the model given data and parameters. In Pumas, the signature typically looks like:
loglikelihood(model, population, params, approx)
where:
model
: The Pumas model definition (e.g.,warfarin_pk_model
).population
: A Pumas population object (e.g.,pop
).params
: A named tuple or dictionary containing parameter values.approx
: The approximation method to use. Common options includeFOCE()
,FO()
,LaplaceI()
, etc.
For example, one might write:
# A named tuple of parameter values
= (
param_vals = 0.12,
pop_CL = 7.3,
pop_Vc = 0.2,
pop_Q = 2.0,
pop_Vp = 0.523,
pop_tabs = 0.5,
pop_lag = Diagonal([0.01]),
pk_Ω = 0.1,
lag_ω = 0.00752,
σ_prop = 0.0661,
σ_add
)= fit(warfarin_pk_model, pop, param_vals, FOCE()) foce_fit
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter Function value Gradient norm
0 1.835895e+04 2.290908e+04
* time: 0.03062605857849121
1 7.292509e+03 1.690878e+04
* time: 1.1808969974517822
2 5.965156e+02 7.949923e+02
* time: 1.2769811153411865
3 5.688454e+02 6.737595e+02
* time: 1.366847038269043
4 4.573190e+02 1.371446e+02
* time: 1.4520111083984375
5 4.373891e+02 7.452681e+01
* time: 1.5820810794830322
6 4.314284e+02 6.260497e+01
* time: 1.663438081741333
7 4.300889e+02 7.624158e+01
* time: 1.7445361614227295
8 4.291775e+02 7.965160e+01
* time: 1.8287160396575928
9 4.275923e+02 7.228227e+01
* time: 1.9110829830169678
10 4.253032e+02 4.773878e+01
* time: 1.9934639930725098
11 4.228584e+02 4.532856e+01
* time: 2.082880973815918
12 4.215298e+02 3.914030e+01
* time: 2.1735661029815674
13 4.210396e+02 4.799680e+01
* time: 2.291038990020752
14 4.202779e+02 5.954420e+01
* time: 2.3759500980377197
15 4.187278e+02 6.710929e+01
* time: 2.4654059410095215
16 4.159568e+02 5.919902e+01
* time: 2.5511491298675537
17 4.112522e+02 2.431511e+01
* time: 2.6377949714660645
18 4.099062e+02 3.585627e+01
* time: 2.725059986114502
19 4.079242e+02 2.140592e+01
* time: 2.8105101585388184
20 4.077880e+02 2.086277e+01
* time: 2.8931641578674316
21 4.077130e+02 2.092335e+01
* time: 2.9746530055999756
22 4.075991e+02 2.099161e+01
* time: 3.0756869316101074
23 4.072771e+02 2.083122e+01
* time: 3.1630699634552
24 4.065853e+02 1.984999e+01
* time: 3.251995086669922
25 4.052522e+02 3.122539e+01
* time: 3.339354991912842
26 4.035823e+02 5.304491e+01
* time: 3.4269299507141113
27 4.023986e+02 6.508012e+01
* time: 3.511726140975952
28 4.017056e+02 6.447037e+01
* time: 3.59666109085083
29 4.011417e+02 5.749992e+01
* time: 3.682171106338501
30 4.004386e+02 5.186503e+01
* time: 3.780456066131592
31 3.987434e+02 3.579828e+01
* time: 3.8686089515686035
32 3.966281e+02 1.612106e+01
* time: 3.9572060108184814
33 3.951052e+02 8.313099e+00
* time: 4.042083978652954
34 3.945559e+02 7.029589e+00
* time: 4.128461122512817
35 3.942975e+02 5.889635e+00
* time: 4.217126131057739
36 3.941937e+02 6.115154e+00
* time: 4.30329704284668
37 3.941777e+02 6.315510e+00
* time: 4.385453939437866
38 3.941592e+02 6.540235e+00
* time: 4.471365928649902
39 3.941426e+02 6.668052e+00
* time: 4.576354026794434
40 3.940799e+02 6.964695e+00
* time: 4.672075033187866
41 3.939458e+02 7.408186e+00
* time: 4.769268035888672
42 3.936058e+02 1.169435e+01
* time: 4.867789030075073
43 3.929128e+02 1.570074e+01
* time: 4.962630987167358
44 3.917898e+02 1.648672e+01
* time: 5.055016040802002
45 3.906803e+02 1.212164e+01
* time: 5.146291017532349
46 3.902337e+02 5.709740e+00
* time: 5.2394421100616455
47 3.901807e+02 3.329299e+00
* time: 5.348740100860596
48 3.901766e+02 3.392606e+00
* time: 5.449527978897095
49 3.901747e+02 3.407238e+00
* time: 5.548814058303833
50 3.901711e+02 3.410919e+00
* time: 5.657371997833252
51 3.901627e+02 3.399260e+00
* time: 5.762876033782959
52 3.901408e+02 3.409710e+00
* time: 5.867242097854614
53 3.900855e+02 5.867207e+00
* time: 5.971941947937012
54 3.899510e+02 9.320886e+00
* time: 6.068196058273315
55 3.896752e+02 1.274900e+01
* time: 6.163855075836182
56 3.892579e+02 1.341768e+01
* time: 6.293164014816284
57 3.888174e+02 9.998268e+00
* time: 6.412065029144287
58 3.886853e+02 4.883105e+00
* time: 6.536194086074829
59 3.886619e+02 2.105634e+00
* time: 6.65932297706604
60 3.886589e+02 2.104546e+00
* time: 6.782073974609375
61 3.886585e+02 2.105201e+00
* time: 6.893738031387329
62 3.886576e+02 2.108794e+00
* time: 7.002038955688477
63 3.886560e+02 2.116212e+00
* time: 7.10846209526062
64 3.886517e+02 2.136333e+00
* time: 7.214749097824097
65 3.886411e+02 2.181259e+00
* time: 7.348499059677124
66 3.886122e+02 2.291304e+00
* time: 7.4788219928741455
67 3.885314e+02 2.731616e+00
* time: 7.626352071762085
68 3.882546e+02 5.313368e+00
* time: 7.780339956283569
69 3.880721e+02 6.435869e+00
* time: 7.953803062438965
70 3.877819e+02 8.003764e+00
* time: 8.10882306098938
71 3.875324e+02 8.835166e+00
* time: 8.211177110671997
72 3.867128e+02 8.354478e+00
* time: 8.328505992889404
73 3.859294e+02 5.009199e+00
* time: 8.458390951156616
74 3.856436e+02 1.046433e+01
* time: 8.559417009353638
75 3.853968e+02 6.781825e+00
* time: 8.66032099723816
76 3.853443e+02 1.126947e+00
* time: 8.758721113204956
77 3.853408e+02 1.024868e+00
* time: 8.855348110198975
78 3.853370e+02 1.022619e+00
* time: 8.953561067581177
79 3.853362e+02 1.007107e+00
* time: 9.050931930541992
80 3.853360e+02 9.949462e-01
* time: 9.16311502456665
81 3.853358e+02 9.900341e-01
* time: 9.259371042251587
82 3.853353e+02 9.834347e-01
* time: 9.357239007949829
83 3.853343e+02 9.749404e-01
* time: 9.453855037689209
84 3.853316e+02 1.280659e+00
* time: 9.551794052124023
85 3.853251e+02 2.177866e+00
* time: 9.648829936981201
86 3.853098e+02 3.380715e+00
* time: 9.745789051055908
87 3.852786e+02 4.629237e+00
* time: 9.844831943511963
88 3.852296e+02 5.201303e+00
* time: 9.961122035980225
89 3.851770e+02 4.139482e+00
* time: 10.057606935501099
90 3.851499e+02 1.363359e+00
* time: 10.15296196937561
91 3.851470e+02 9.175379e-02
* time: 10.243948936462402
92 3.851468e+02 5.049011e-02
* time: 10.333117961883545
93 3.851468e+02 5.279968e-02
* time: 10.416093111038208
94 3.851468e+02 5.294356e-02
* time: 10.497369050979614
95 3.851468e+02 5.294356e-02
* time: 10.620511054992676
96 3.851468e+02 5.291805e-02
* time: 10.746169090270996
97 3.851468e+02 5.288013e-02
* time: 10.84384799003601
98 3.851468e+02 5.264154e-02
* time: 10.931471109390259
99 3.851468e+02 5.264154e-02
* time: 11.068500995635986
100 3.851468e+02 5.174985e-02
* time: 11.173434972763062
101 3.851468e+02 5.183662e-02
* time: 11.257236957550049
102 3.851464e+02 2.351912e-01
* time: 11.348469018936157
103 3.851464e+02 2.719795e-01
* time: 11.437340021133423
104 3.851462e+02 3.202132e-01
* time: 11.540651082992554
105 3.851459e+02 3.357046e-01
* time: 11.62959599494934
106 3.851454e+02 2.922910e-01
* time: 11.721426010131836
107 3.851450e+02 2.316805e-01
* time: 11.812660932540894
108 3.851449e+02 2.848428e-01
* time: 11.904603958129883
109 3.851448e+02 2.793247e-01
* time: 11.993396997451782
110 3.851448e+02 2.687629e-01
* time: 12.081670999526978
111 3.851448e+02 2.437732e-01
* time: 12.168030023574829
112 3.851447e+02 1.976272e-01
* time: 12.258527040481567
113 3.851446e+02 1.155016e-01
* time: 12.364496946334839
114 3.851444e+02 4.201230e-02
* time: 12.455963134765625
115 3.851443e+02 3.192048e-02
* time: 12.545577049255371
116 3.851443e+02 1.834850e-02
* time: 12.632020950317383
117 3.851443e+02 3.016608e-03
* time: 12.716598987579346
118 3.851443e+02 2.730012e-03
* time: 12.802331924438477
119 3.851443e+02 2.730012e-03
* time: 12.9360990524292
120 3.851443e+02 2.730012e-03
* time: 13.074465990066528
121 3.851443e+02 2.730012e-03
* time: 13.225244998931885
122 3.851443e+02 2.730012e-03
* time: 13.39491605758667
123 3.851443e+02 2.730012e-03
* time: 13.535709142684937
124 3.851443e+02 2.730012e-03
* time: 13.616513967514038
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -385.14428
Number of subjects: 31
Number of parameters: Fixed Optimized
0 10
Observation records: Active Missing
conc: 239 47
Total: 239 47
-----------------------
Estimate
-----------------------
pop_CL 0.12366
pop_Vc 7.3031
pop_Q 0.12275
pop_Vp 1.0977
pop_tabs 0.53254
pop_lag 0.9608
pk_Ω₁,₁ 0.028906
lag_ω 0.41871
σ_prop 0.10049
σ_add 0.83524
-----------------------
2.5 Computing Parameter Precision with infer
The infer
function in Pumas estimates the uncertainty (precision) of parameter estimates. Depending on the chosen method, infer
can provide standard errors, confidence intervals, and correlation matrices.
The signature for infer
often looks like:
infer(
::FittedPumasModel;
fpm= 0.95,
level ::Bool = false,
rethrow_error::Bool = true,
sandwich_estimator )
where:
fpm::FittedPumasModel
: The result offit
(e.g.,foce_fit
).level
: The confidence interval level (e.g., 0.95). The confidence intervals are calculated as the(1-level)/2
and(1+level)/2
quantiles of the estimated parametersrethrow_error
: If rethrow_error is false (the default value), no error will be thrown if the variance-covariance matrix estimator fails. If it is true, an error will be thrown if the estimator fails.sandwich_estimator
: Whether to use the sandwich estimator. If set totrue
(the default value), the sandwich estimator will be used. If set tofalse
, the standard error will be calculated using the inverse of the Hessian matrix.
An example usage:
= infer(foce_fit; level = 0.95) inference_results
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -385.14428
Number of subjects: 31
Number of parameters: Fixed Optimized
0 10
Observation records: Active Missing
conc: 239 47
Total: 239 47
---------------------------------------------------------------------
Estimate SE 95.0% C.I.
---------------------------------------------------------------------
pop_CL 0.12366 0.0048503 [ 0.11415 ; 0.13316 ]
pop_Vc 7.3031 0.54808 [ 6.2289 ; 8.3773 ]
pop_Q 0.12275 0.067804 [-0.010138 ; 0.25565 ]
pop_Vp 1.0977 0.44969 [ 0.21629 ; 1.979 ]
pop_tabs 0.53254 0.10221 [ 0.33222 ; 0.73287 ]
pop_lag 0.9608 0.17529 [ 0.61725 ; 1.3044 ]
pk_Ω₁,₁ 0.028906 0.015478 [-0.0014308; 0.059243]
lag_ω 0.41871 0.099704 [ 0.22329 ; 0.61412 ]
σ_prop 0.10049 0.021933 [ 0.057507 ; 0.14348 ]
σ_add 0.83524 0.1227 [ 0.59475 ; 1.0757 ]
---------------------------------------------------------------------
This is the usual asymptotic variance-covariance estimator and we already saw this previous tutorials.
To get a matrix representation of this, use vcov()
vcov(inference_results)
10×10 Symmetric{Float64, Matrix{Float64}}:
2.35251e-5 -0.000102312 0.000100713 … 5.45947e-5 0.000209148
-0.000102312 0.300394 -0.0296822 -0.00550472 -0.00691567
0.000100713 -0.0296822 0.00459734 0.000836642 0.000296097
0.000361104 -0.218684 0.024185 0.00365746 0.0173442
6.23038e-5 -0.0140864 0.00208861 0.000920447 -0.00103078
0.000195498 0.0200596 -0.000272529 … 0.000427188 -0.00241991
-3.33791e-5 0.00126169 -0.000153034 -0.000220081 -0.000118974
1.95124e-6 -0.00706462 0.00086209 0.000508345 -0.00134152
5.45947e-5 -0.00550472 0.000836642 0.000481042 -1.90123e-5
0.000209148 -0.00691567 0.000296097 -1.90123e-5 0.0150553
and to get the condition number of the correlation matrix implied by vcov
use
cond(inference_results)
79.08840258329506
Some users request the condition number of the covariance matrix itself and even if the use is often misguided it can be calculated as well.
cond(inference_results; correlation = false)
51438.04487227657
It is also possible to calculate the correlation matrix from the vcov
output using the cov2cor
function from StatsBase
= cov2cor(Matrix(vcov(inference_results))) cor_from_cov
10×10 Matrix{Float64}:
1.0 -0.038487 0.306244 0.16556 … 0.513208 0.351433
-0.038487 1.0 -0.798725 -0.88728 -0.457929 -0.102836
0.306244 -0.798725 1.0 0.7932 0.562593 0.0355906
0.16556 -0.88728 0.7932 1.0 0.370832 0.31434
0.125679 -0.251459 0.301382 0.163542 0.410601 -0.0821928
0.229949 0.2088 -0.0229304 -0.239941 … 0.111117 -0.112514
-0.444614 0.148724 -0.145817 -0.020102 -0.648285 -0.0626442
0.00403489 -0.129279 0.127522 0.0252609 0.232462 -0.109657
0.513208 -0.457929 0.562593 0.370832 1.0 -0.00706475
0.351433 -0.102836 0.0355906 0.31434 -0.00706475 1.0
And we see that the cond
call above matches the condition number of the correlation matrix
cond(cor_from_cov)
79.08840258329506
2.5.1 Failure of the asymptotic variance-covariance matrix
It is well-known that the asymptotic variance-covariance matrix can sometimes fail to compute. This can happen for a variety of reasons including:
- There are parameters very close to a bound (often 0)
- The parameter vector does not represent a local minimum (optimization failed)
- The parameter vector does represent a local minimum but it’s not the global solution
The first one is often easy to check. The problematic parameters can be ones than have lower or upper bounds set. Often this will be a variance of standard deviation that has moved very close to the lower boundary. Consider removing the associated random effect if the problematic parameter is a variance in its specification or error model component if a combined additive and proportional error model is used and a standard deviation is close to zero.
It is also possible that the parameters do not represent a local minimum. In other words, they come from a failed fit. In this case, it can often be hard to perform the associated calculations in a stable way, but most importantly the results would not be interpretable even if they can be calculated in this case. The formulas are only valid for parameters that represent the actual (maximum likelihood) estimates. Please try to restart the optimization at different starting points in this case.
If you have reasons to believe that the model should indeed be a combined error model or if the random effect should be present it is also possible that the model converged to a local minimum that is not the global minimum. If the optimization happened to move to a region of the parameter state space that is hard to get out of you will often have to restart the fit at different parameter values. It is not possible to verify if the minimum is global in general, but it is always advised to try out more than one set of initial parameters when fitting models.
2.5.2 Bootstrap
Sometimes it is appropriate to use a different method to calculate estimate the uncertainty of the estimated parameters. Bootstrapping is a very popular approach that is simply but can often come a quite significant computational cost. Researchers often perform a bootstrapping step if their computational budget allows it or if the asymptotic variance-covariance estimator fails. Bootstrapping is advantageous because it does not rely on any invertability of matrices and it cannot produce negative variance confidence intervals because the resampled estimator respects the bounds of the model.
The signature for bootstrapping in infer
looks as follows.
infer(fpm::FittedPumasModel, bts::Bootstrap; level = 0.95)
This does not help much before also looking at the interface for Bootstrap
itself.
Bootstrap(;
= Random.default_rng,
rng = 200,
samples = nothing,
stratify_by = EnsembleThreads(),
ensemblealg )
Bootstrap
accepts a random number generator rng
, the number of resampled datasets to produce samples
, if sampling should be stratified according to the covariates in stratify_by
, and finally the ensemble algorithm to control parallelization across fits. On the JuliaHub platform this can be used together with distributed computing to perform many resampled estimations in a short time.
= infer(foce_fit, Bootstrap(samples = 50); level = 0.95) bootstrap_results
┌ Warning: Terminated early due to NaN in gradient.
└ @ Optim ~/run/_work/PumasTutorials.jl/PumasTutorials.jl/custom_julia_depot/packages/Optim/ZhuZN/src/multivariate/optimize/optimize.jl:98
┌ Info: Bootstrap inference finished.
│ Total resampled fits = 50
│ Success rate = 1.0
└ Unique resampled populations = 50
Bootstrap inference results
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -385.14428
Number of subjects: 31
Number of parameters: Fixed Optimized
0 10
Observation records: Active Missing
conc: 239 47
Total: 239 47
------------------------------------------------------------------
Estimate SE 95.0% C.I.
------------------------------------------------------------------
pop_CL 0.12366 0.010125 [0.11106 ; 0.13797]
pop_Vc 7.3031 2.8853 [0.73301 ; 8.5659 ]
pop_Q 0.12275 0.30849 [0.049315 ; 0.89045]
pop_Vp 1.0977 6.8475 [0.4313 ; 7.0928 ]
pop_tabs 0.53254 1.9309 [0.047358 ; 5.23 ]
pop_lag 0.9608 0.37605 [0.3939 ; 1.6459 ]
pk_Ω₁,₁ 0.028906 0.31858 [0.00091873; 0.60072]
lag_ω 0.41871 0.47598 [7.4566e-8 ; 1.9434 ]
σ_prop 0.10049 0.045366 [0.046031 ; 0.21184]
σ_add 0.83524 0.15362 [0.44253 ; 1.0066 ]
------------------------------------------------------------------
Successful fits: 50 out of 50
Unique resampled populations: 50
No stratification.
Again, we can calculate a covariance matrix based on the samples with vcov
vcov(bootstrap_results)
10×10 Matrix{Float64}:
0.000102507 -0.00586239 0.00077559 … 0.000138881 -0.000342387
-0.00586239 8.32485 -0.780348 -0.073393 0.205427
0.00077559 -0.780348 0.0951677 0.00853684 -0.0223975
-0.0534076 -3.56144 0.369428 0.00458453 -0.0593177
0.00441233 -5.10107 0.467024 0.0361843 -0.125019
0.000679858 0.0268326 -0.0033733 … -0.00293692 -0.00621323
0.000101106 -0.445253 0.0398041 0.000352758 -0.00750646
0.00113881 -0.199771 0.0292119 0.00384677 -0.0310601
0.000138881 -0.073393 0.00853684 0.00205807 -0.00407079
-0.000342387 0.205427 -0.0223975 -0.00407079 0.0235992
and we can even get a DataFrame
that includes all the estimated parameters from the sampled population fits
DataFrame(bootstrap_results.vcov)
Row | pop_CL | pop_Vc | pop_Q | pop_Vp | pop_tabs | pop_lag | pk_Ω₁,₁ | lag_ω | σ_prop | σ_add |
---|---|---|---|---|---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | 0.120557 | 6.4577 | 0.199234 | 2.04539 | 0.444819 | 0.865639 | 0.0641857 | 0.269349 | 0.0639825 | 1.07727 |
2 | 0.127326 | 7.61269 | 0.0765175 | 0.828385 | 0.413194 | 1.02171 | 0.0319049 | 0.327721 | 0.0780306 | 0.894551 |
3 | 0.125148 | 5.79 | 0.380711 | 2.62787 | 1.22524 | 0.381494 | 0.0282837 | 0.798105 | 0.0958695 | 0.867472 |
4 | 0.127667 | 6.91563 | 0.397515 | 1.73248 | 0.728582 | 0.80891 | 5.49156e-9 | 3.87514e-14 | 0.194506 | 0.711823 |
5 | 0.139487 | 1.26349 | 0.661383 | 6.27044 | 5.22998 | 1.69134 | 0.298052 | 2.0 | 0.119511 | 0.483218 |
6 | 0.121249 | 7.37552 | 0.0943698 | 1.34029 | 0.692588 | 0.675815 | 0.0228526 | 0.487139 | 0.0798571 | 0.897624 |
7 | 0.127719 | 0.780702 | 1.20801 | 7.22525 | 3.58378 | 1.02986 | 0.166685 | 0.308838 | 0.216874 | 0.60563 |
8 | 0.124109 | 7.29044 | 0.220874 | 1.03724 | 0.944841 | 0.792992 | 0.0447704 | 0.500921 | 0.0955452 | 0.752949 |
9 | 0.124383 | 7.43103 | 0.16777 | 1.04022 | 0.512051 | 0.747488 | 0.0111888 | 0.350478 | 0.150554 | 0.660052 |
10 | 0.118758 | 7.52689 | 0.0988427 | 0.760195 | 0.932482 | 0.567801 | 0.049224 | 0.609705 | 0.0359675 | 0.659254 |
11 | 0.119892 | 6.64343 | 0.196923 | 1.52487 | 0.649178 | 0.68864 | 0.022323 | 0.260617 | 0.115618 | 0.653642 |
12 | 0.122103 | 1.23247 | 0.892998 | 6.44023 | 4.39628 | 0.728559 | 0.262875 | 0.355931 | 0.111655 | 0.670742 |
13 | 0.138626 | 7.55712 | 0.218458 | 1.3808 | 0.60626 | 2.58697 | 0.0283555 | 1.74832 | 0.0917372 | 0.440418 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
39 | 0.119194 | 8.44752 | 0.0194031 | 0.451408 | 0.515703 | 1.25297 | 0.0657994 | 0.329208 | 0.0556347 | 0.795263 |
40 | 0.120983 | 6.98201 | 0.105433 | 1.75105 | 0.0293232 | 1.45884 | 0.0522959 | 0.000117064 | 0.0925209 | 1.00712 |
41 | 0.116736 | 6.69018 | 0.186505 | 1.63966 | 0.354256 | 0.834854 | 0.048532 | 0.25561 | 0.0502336 | 0.867861 |
42 | 0.118616 | 7.21822 | 0.0758776 | 1.36281 | 0.402106 | 1.02282 | 0.0449987 | 0.246397 | 0.0803229 | 0.892164 |
43 | 0.121406 | 0.454008 | 0.754203 | 6.2481 | 5.23 | 1.48934 | 2.17527 | 3.05243e-21 | 0.0994137 | 0.66036 |
44 | 0.118802 | 7.79741 | 0.0517954 | 0.90519 | 0.343702 | 1.13328 | 0.0362455 | 0.241086 | 0.0448612 | 0.869297 |
45 | 0.116858 | 6.56498 | 0.213017 | 1.58344 | 1.02857 | 0.537274 | 0.0343308 | 0.521422 | 0.0736202 | 0.814461 |
46 | 0.121946 | 7.6343 | 0.0762105 | 1.19186 | 0.489409 | 1.0806 | 0.0260378 | 0.299237 | 0.0765756 | 1.005 |
47 | 0.125514 | 7.41819 | 0.0708068 | 1.00399 | 0.109477 | 1.47399 | 0.0215761 | 0.0128917 | 0.0929907 | 0.885367 |
48 | 0.135721 | 5.50171 | 0.713154 | 2.90404 | 1.51585 | 0.474578 | 0.0295195 | 1.47744 | 0.160802 | 0.703552 |
49 | 0.122884 | 1.09562 | 0.69433 | 6.30673 | 5.23 | 0.847306 | 0.0927949 | 0.363307 | 0.146633 | 0.521977 |
50 | 0.063161 | 8.78094 | 0.0647714 | 48.2674 | 0.491849 | 0.97931 | 0.0462727 | 0.183935 | 0.0542663 | 0.838545 |
This is very useful for histogram plotting of parameter distributions.
2.5.3 Sampling Importance Re-sampling
Pumas has support for inference through Sampling Importance Re-sampling through the SIR()
input to infer
. The signature for SIR in infer
looks as follows.
infer(fpm::FittedPumasModel, sir::SIR; level = 0.95, ensemblealg = EnsembleThreads())
This performs sampling importance re-sampling for the population in fpm
. The confidence intervals are calculated as the (1-level)/2
and (1+level)/2
quantiles of the sampled parameters. ensemblealg
can be EnsembleThreads() (the default value) to use multi-threading or EnsembleSerial()
to use a single thread.
The signature for the SIR
specification is
SIR(; rng, samples, resamples)
SIR
accepts a random number generator rng
, the number of samples from the proposal, samples
, can be set and to complete the specification the resample
has to be set. It is suggested that samples
is at least 5 times larger than resamples
in practice to have sufficient samples to resample from.
= infer(foce_fit, SIR(samples = 1000, resamples = 200); level = 0.95) sir_results
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
[ Info: Running SIR.
[ Info: Resampling.
Simulated inference results
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: -385.14428
Number of subjects: 31
Number of parameters: Fixed Optimized
0 10
Observation records: Active Missing
conc: 239 47
Total: 239 47
--------------------------------------------------------------------
Estimate SE 95.0% C.I.
--------------------------------------------------------------------
pop_CL 0.12366 0.0035902 [0.11734 ; 0.13051 ]
pop_Vc 7.3031 0.50312 [6.2812 ; 8.1268 ]
pop_Q 0.12275 0.064973 [0.014028; 0.26124 ]
pop_Vp 1.0977 0.3639 [0.4781 ; 1.935 ]
pop_tabs 0.53254 0.091747 [0.42915 ; 0.79164 ]
pop_lag 0.9608 0.13207 [0.49744 ; 0.99674 ]
pk_Ω₁,₁ 0.028906 0.013871 [0.010166; 0.060285]
lag_ω 0.41871 0.081167 [0.37941 ; 0.69921 ]
σ_prop 0.10049 0.022577 [0.055662; 0.14483 ]
σ_add 0.83524 0.10222 [0.65035 ; 1.0776 ]
--------------------------------------------------------------------
Notice, that SIR
bases its first samples
number of samples from a truncated multivariate normal distribution with mean of the maximum likelihood population level parameters and covariance matrix that is the asymptotic matrix calculated by infer(fpm)
. This means that to use SIR
the matrix is question has to be successfully calculated by infer(fpm)
under the hood.
The methods for vcov
and DataFrame(sir_results.vcov)
that we saw for Bootstrap
also applies here
vcov(sir_results)
10×10 Matrix{Float64}:
1.28894e-5 2.70649e-5 5.274e-5 … 3.911e-5 -1.87843e-6
2.70649e-5 0.253131 -0.0257315 -0.00588294 0.0141594
5.274e-5 -0.0257315 0.00422149 0.000855764 -0.00233357
-3.6031e-5 -0.154767 0.0167583 0.00259599 -0.00147255
4.4129e-5 -0.0190394 0.00254131 0.000995957 -0.00318513
8.53863e-5 -0.00304593 0.00137696 … 0.000718823 -0.00246933
-2.19133e-5 0.00162398 -0.000182412 -0.000200852 0.000279956
2.28063e-5 -0.00493316 0.000828298 0.000514035 -0.0023004
3.911e-5 -0.00588294 0.000855764 0.000509709 -0.00112062
-1.87843e-6 0.0141594 -0.00233357 -0.00112062 0.0104489
and
DataFrame(sir_results.vcov)
Row | pop_CL | pop_Vc | pop_Q | pop_Vp | pop_tabs | pop_lag | pk_Ω₁,₁ | lag_ω | σ_prop | σ_add |
---|---|---|---|---|---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | 0.120452 | 6.33817 | 0.227315 | 1.84366 | 0.738502 | 0.969038 | 0.041874 | 0.565502 | 0.120138 | 0.724461 |
2 | 0.121319 | 7.61521 | 0.13369 | 1.11958 | 0.59516 | 0.819195 | 0.0383821 | 0.486078 | 0.0621475 | 0.904747 |
3 | 0.128086 | 7.27365 | 0.164885 | 1.00112 | 0.773617 | 1.01215 | 0.0320492 | 0.559006 | 0.133827 | 0.652045 |
4 | 0.126595 | 7.703 | 0.0923832 | 1.14081 | 0.801104 | 0.636096 | 0.0420994 | 0.661075 | 0.0966072 | 0.938117 |
5 | 0.128139 | 7.72191 | 0.196377 | 0.811411 | 0.519029 | 0.792641 | 0.0625774 | 0.491349 | 0.0789868 | 0.793253 |
6 | 0.123952 | 8.08135 | 0.0430609 | 0.821958 | 0.569438 | 0.755458 | 0.0428529 | 0.55705 | 0.105087 | 0.878993 |
7 | 0.117459 | 7.3162 | 0.0499608 | 1.30251 | 0.708196 | 0.656217 | 0.0437035 | 0.598285 | 0.0913149 | 0.889262 |
8 | 0.121096 | 7.50486 | 0.137965 | 1.28249 | 0.553584 | 0.769124 | 0.0789792 | 0.445278 | 0.0573354 | 0.951855 |
9 | 0.126205 | 6.7427 | 0.198909 | 1.48821 | 0.675411 | 0.497277 | 0.0347115 | 0.590175 | 0.0973765 | 0.843432 |
10 | 0.124098 | 7.89136 | 0.0558241 | 0.700769 | 0.559757 | 0.758432 | 0.0469582 | 0.531335 | 0.0971841 | 0.889568 |
11 | 0.121515 | 7.87714 | 0.0554836 | 0.479318 | 0.578338 | 0.61916 | 0.0392379 | 0.664918 | 0.0989668 | 0.808713 |
12 | 0.122873 | 7.27312 | 0.120638 | 1.00513 | 0.671161 | 0.421269 | 0.0373674 | 0.703267 | 0.1059 | 0.938679 |
13 | 0.126574 | 7.99423 | 0.091001 | 0.78432 | 0.493283 | 0.558988 | 0.0399548 | 0.60235 | 0.0850221 | 0.894139 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
189 | 0.125874 | 7.25386 | 0.155533 | 1.11484 | 0.528108 | 0.992596 | 0.0217301 | 0.397432 | 0.124037 | 0.86935 |
190 | 0.128931 | 7.31949 | 0.0994734 | 1.06307 | 0.646142 | 0.876359 | 0.0356783 | 0.490678 | 0.0928399 | 0.949515 |
191 | 0.11799 | 7.51179 | 0.0489818 | 0.868392 | 0.634647 | 0.807152 | 0.0231789 | 0.555591 | 0.094862 | 0.78834 |
192 | 0.123349 | 7.68835 | 0.130598 | 0.706719 | 0.543319 | 0.947635 | 0.0473085 | 0.416655 | 0.08845 | 0.851074 |
193 | 0.117229 | 6.96453 | 0.109829 | 1.30848 | 0.653853 | 0.726296 | 0.0460554 | 0.489996 | 0.0859431 | 0.935173 |
194 | 0.12112 | 7.24597 | 0.115413 | 1.26242 | 0.621571 | 0.853633 | 0.0399529 | 0.484128 | 0.0961005 | 0.822067 |
195 | 0.122174 | 6.83457 | 0.159878 | 1.5827 | 0.576497 | 0.969794 | 0.0295114 | 0.401337 | 0.109391 | 0.901286 |
196 | 0.129375 | 7.75966 | 0.132249 | 0.920036 | 0.643227 | 0.973947 | 0.0204307 | 0.492483 | 0.119423 | 0.770768 |
197 | 0.129705 | 6.37076 | 0.257739 | 1.84336 | 0.632963 | 0.993233 | 0.00919332 | 0.440514 | 0.154944 | 0.752252 |
198 | 0.122227 | 6.5618 | 0.170741 | 1.48927 | 0.550102 | 0.961237 | 0.0225875 | 0.358677 | 0.0923567 | 0.950669 |
199 | 0.121774 | 8.12433 | 0.0342786 | 0.879075 | 0.54661 | 0.903724 | 0.0548346 | 0.384889 | 0.0680357 | 1.07067 |
200 | 0.130318 | 7.12203 | 0.116892 | 1.08476 | 0.507377 | 0.725607 | 0.0302364 | 0.560484 | 0.123345 | 0.758802 |
2.5.4 Marginal MCMC
An alternative to Bootstrap
and SIR
is to simply use the MarginalMCMC
sampler which is a Hamiltonian Monte Carlo (HMC) No-U-Turn Sampler (NUTS) that will sample from the marginal loglikelihood. This means that individual effects are marginalized out and then we sample the population level parameters. This does not resample populations like Bootstrap
so inference may be more stable if many resampled populations lead to extreme estimates and it differs from SIR
in that it does not need the asymptotic covariance matrix to be calculated and sampled from.
This method requires slightly more understanding from the user when setting the options that can be found through the docstring of MarginalMCMC
. Some knowledge of Bayesian inference is advised.
= infer(foce_fit, MarginalMCMC(); level = 0.95) inference_results
As sampling based inference can be computationally intensive we exclude the actual invocation of this method from this tutorial.
3 Concluding Remarks
This tutorial showcased a typical Pumas workflow for parameter inference in Pumas models. We showed the different methods supported by Pumas for calculating parameter uncertainty.