Fitting Non-Identifiable and Poorly Identifiable Models using Bayesian Inference

Author

Mohamed Tarek

In this tutorial, we will see how to use Bayesian inference in Pumas to fit a non-identifiable (or a poorly identifiable) model by sampling from the full posterior. Unlike maximum likelihood estimation methods which only find point estimates for the model parameters, Bayesian methods can be used to sample from the full posterior of the model parameters making it more robust to identifiability issues.

1 Model Identifiability

One of the goals of statistical learning is to identify the underlying parameter values in a parametric model that best fit the observed data. In many practical scenarios, some parameters in a model may not be identifiable, for one of the following problems:

  1. The model is over-parameterized with redundant parameters, where a continuum of parameter values, e.g \(0 \leq \theta \leq 1\), would all give identical model predictions. This tends to happen when a dynamical model has many compartments and associated parameters but only one or a few compartments are observed. Another case where this happens is if there is a typo in the model where a parameter is not used everywhere it should.
  2. The model has symmetries such that a discrete set of parameter values, e.g. \(\theta = -1\) or \(\theta = 1\), give identical model predictions. This tends to happen with inappropriate use of the “absolute value” function (abs) if used on a function that can be either negative or positive potentially creating 2 separate modes. This problem can sometimes be fixed by using appropriate parameter bounds, e.g. \(\theta \geq 0\), in the model.
  3. The data is insufficient to learn some parameters’ values but more data would have been sufficient.

The first 2 problems are problems in the model describing structural non-identifiability and the third problem is a problem in the data or data-model mismatch describing practical non-identifiability (sometimes called non-estimability).

Type of Identifiability Description
Globally Structurally Identifiable Every set of parameter values \(\theta\) makes a unique model prediction \(\mu(\theta)\).
Globally Practically Identifiable Globally structurally identifiable and there is enough data to estimate the data-generating parameter values.
Locally Structurally Identifiable Only dis-connected parameters values \(\{\theta_1, \theta_2, \dots, \theta_m\}\) can result in the same model prediction \(\mu\), i.e. \(\mu(\theta_1) = \mu(\theta_2) = \dots = \mu(\theta_m)\). However, in the neighborhood \(N(\theta)\) of each set of parameters \(\theta\), each \(\theta' \in N(\theta)\) results in a unique model prediction \(\mu(\theta')\).
Locally Practically Identifiable Locally structurally identifiable and there is enough data to estimate the (potentially non-unique but dis-connected) data-generating parameter values.
Key Points
  • Local (or global) practical identifiablity is what we usually want in analyses.
  • Practical identifiability (estimability) implies structural identifiability.
Poor Identifiability

When a model may be identifiable in exact arithmetic but its identifiability is sensitive to numerical errors in computations, we will call it poorly identifiable. When the term “poorly identifiable models” is used in the rest of this tutorial, this also includes truly non-identifiable models.

2 Example

Let’s consider an example of fitting a poorly identifiable model.

2.1 Loading Pumas

First, let’s load Pumas:

using Pumas

2.2 Two Compartment Model

Now let’s define a single-subject, 2-compartment model with a depot, central and peripheral compartments and a proportional error model.

model = @model begin
    @param begin
        θ  VectorDomain(lower = zeros(5))
        σ  RealDomain(lower = 0.0)
    end
    @pre begin
        CL = θ[1]
        Vc = θ[2]
        Ka = θ[3]
        Vp = θ[4]
        Q = θ[5]
    end
    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL + Q) / Vc * Central + Q / Vp * Peripheral
        Peripheral' = Q / Vc * Central - Q / Vp * Peripheral
    end
    @derived begin
        cp := @. Central / Vc
        dv ~ @. Normal(cp, abs(cp) * σ + 1e-6)
    end
end
PumasModel
  Parameters: θ, σ
  Random effects: 
  Covariates: 
  Dynamical system variables: Depot, Central, Peripheral
  Dynamical system type: Matrix exponential
  Derived: dv
  Observed: dv
Subject Model vs Population Model

Note that studying the identifiability of subject models is still relevant when fitting a population version of the model with typical values and random effects. The reason is that when evaluating the log likelihood using the Laplace method or first-order conditional estimation (FOCE), one of the steps involved is finding the empirical Bayes estimates (EBE) which is essentially fitting a subject’s version of the model, fixing population parameters and estimating random effects. Non-identifiability in the EBE estimation can often cause the EBE estimation to fail or to be so numerically unstable that the population parameters’ fit itself fails because the marginal likelihood (and its gradient) computed with Laplace/FOCE relied on incorrect or numerically unstable EBEs.

2.3 Parameter Values

Let’s define some parameter values to use for simulation.

params == [35, 100, 0.5, 210, 30], σ = 0.1)
(θ = [35.0, 100.0, 0.5, 210.0, 30.0],
 σ = 0.1,)

2.4 Subject Definition

Next we define a subject skeleton with a single bolus dose and no observations.

skeleton = Subject(
    id = 1,
    time = 0.0:0.5:30.0,
    events = DosageRegimen(3000, time = 0.0, cmt = 1),
    observations = (; dv = nothing),
)
Subject
  ID: 1
  Events: 1
  Observations: dv: (n=61)

2.5 Fisher Information Matrix

The expected Fisher Information Matrix (FIM) is an important diagnostic which can be used to detect local practical identifiability (LPI) given the model and the experiment design [1]. The first order approximation of the expected FIM [2,3] has also been used successfully to analyze the LPI of pharmacometric nonlinear mixed effect (NLME) models [4,5,6].

The positive definiteness (non-singularity) of the expected FIM \(F(\theta)\) at parameters \(\theta\) is a sufficient condition for LPI at \(\theta\). Under more strict assumptions which are more difficult to verify, the positive definiteness of \(F(\theta)\) is even a necessary condition for LPI [1].

To compute a first-order approximation of the expected FIM, we will use the OptimalDesign package:

using OptimalDesign

times = OptimalDesign.ObsTimes(skeleton.time)
F = OptimalDesign.fim(model, [skeleton], params, [times], FO())
6×6 Symmetric{Float64, Matrix{Float64}}:
  1.29489    -0.0538762     13.6934     0.0254882   -0.218694      -2.68028
 -0.0538762   0.0188821     -4.38015   -0.00168545   0.0296863     -0.325336
 13.6934     -4.38015     1054.71       0.635262    -7.13309       66.7482
  0.0254882  -0.00168545     0.635262   0.00215196  -0.00487912    -0.0895271
 -0.218694    0.0296863     -7.13309   -0.00487912   0.144604      -0.720992
 -2.68028    -0.325336      66.7482    -0.0895271   -0.720992    1667.74

2.6 Procedure for Detecting Practical Non-Identifiability

Next, let’s do an eigenvalue decomposition of F to find the smallest eigenvalue. The smallest eigenvalue is always the first one because they are sorted.

E = eigen(F)
E.values[1]
9.646842560443467e-5

It is close to 0! This implies that the matrix is very close to being singular. While this is not strictly a proof of local non-identifiability, it is one step towards detecting non-identifiability.

Tip

The non-singularity of the expected FIM is generally only a sufficient (not necessary) condition for local identifiability. So there are some locally identifiable models that have a singular or undefined expected FIM. However, there is a subclass of models satisfying strict assumptions for which the non-singularity of the expected FIM is a necessary condition for local identifiability [1]. In practice, these assumptions are difficult to verify for a general model but if our model happens to satisfy these assumptions, then a singular expected FIM would imply local non-identifiability. In that case, the eigenvector(s) corresponding to the 0 eigenvalue would be useful diagnostics as they point in the directions along which changes in the parameters will have no effect on the log likelihood.

To summarize:

  • Being “Non-Singular” Isn’t Always a Must: A non-singular expected FIM always signals that a model is “locally identifiable” (meaning you can find a locally unique solution for its parameters). However, some locally identifiable models can still have singular or undefined expected FIM.
  • Sometimes, It’s Absolutely Necessary: There’s a special group of models where a non-singular expected FIM is required to be locally identifiable. It’s hard to know if your model falls into this group.
  • Practical Use: Even if your expected FIM is singular, it may be a helpful diagnostic. The eigenvectors of the FIM with 0 eigenvalues show you directions where changing the model’s parameters may not affect the model’s predictions – this may help you pinpoint where the model is fuzzy.

To prove if a model is practically non-identifiable, it suffices to find a single set \(\Theta\) such that for all parameters \(\theta \in \Theta\), the log likelihood is unchanged. To find \(\Theta\) numerically, one can

  1. Assume parameter values \(\theta_0\),
  2. Define a criteria for changing the parameters \(\theta\) from their current values \(\theta_0\), and then create a candidate set of parameters \(\Theta_c\),
  3. Simulate synthetic data using the parameters \(\theta_0\),
  4. Evaluate the log likelihood \(L(\theta)\) for all \(\theta \in \Theta_c\),
  5. Evaluate the sensitivity of the log likelihood to local changes within \(\Theta_c\).

One way to construct a candidate \(\Theta_c\) is as the set \(\{\theta_0 + \alpha \cdot d : \alpha \in [-\epsilon, \epsilon] \}\) for a small \(\epsilon > 0\), where \(d\) is an eigenvector corresponding to the smallest eigenvalue of the expected FIM, \(F\). The sensitivity of the log likelihood to changes within \(\Theta_c\) can then be quantified as the average value of:

\[ \Bigg( \frac{L(\theta_0 + \alpha \cdot d) - L(\theta_0)}{\alpha} \Bigg)^2 \]

for all \(\alpha \in [-\epsilon, \epsilon]\), \(\alpha \neq 0\). Let’s follow this procedure for the above model assuming \(\theta_0\) is params.

2.7 Simulating Data

Here we simulate a synthetic subject using the skeleton subject we have. We fix the seed of the pseudo-random number generator for reproducibility.

using Random

rng = Random.default_rng()
Random.seed!(rng, 12345)
pop = [Subject(simobs(model, skeleton, params; rng))]
Population
  Subjects: 1
  Observations: dv

To evaluate the log likelihood of params given pop, we can use the loglikelihood function:

ll0 = loglikelihood(model, pop, params, NaivePooled())
32.82461166030997

Since there are no random effects in this model, we use the NaivePooled() algorithm in Pumas.

2.8 Local Sensitivity Analysis of Log Likelihood

The eigenvectors corresponding to the smallest eigenvalue of F give us the directions that are likely to have the largest standard error in the maximum likelihood estimates. These are the most promising directions to test when constructing candidate sets \(\Theta_c\) for identifiability analysis. To get the eigenvector corresponding to the smallest eigenvalue, you can run:

d = E.vectors[:, 1]
6-element Vector{Float64}:
  0.006717009021476735
  0.8382731175155917
  0.0037560578470219614
 -0.5451738773535086
  0.004939403956445427
 -3.137352344503738e-6

The order of parameters is the same as the order of definition in the model: θ and then σ.

Note that the potential non-identifiability seems to be mostly in the second and fourth parameter, Vc and Vp. More precisely, the above eigenvector implies that simultaneously increasing Vc and decreasing Vp (or vice versa) by the ratios given in d may have little to no effect on the log likelihood.

Let’s try to add α * d[1:end-1] to params.θ (i.e. move params.θ in the direction d[1:end-1]) and evaluate the log likelihood for different step sizes α. To do that, we will first define a function that moves params.θ and call it on different step values α. These choices of α values correspond to a discrete candidate set \(\Theta_c\).

function moveθ(α)
    # unpacking the fields of params to variables with the same names
    (; θ, σ) = params
    # move θ by step * d[1:end-1]
    return (; θ = θ + α * d[1:end-1], σ)
end
# move by both negative and positive steps (excluding α = 0)
αs = vcat(-1e-3 .* (1:10), 1e-3 .* (1:10))
20-element Vector{Float64}:
 -0.001
 -0.002
 -0.003
 -0.004
 -0.005
 -0.006
 -0.007
 -0.008
 -0.009
 -0.01
  0.001
  0.002
  0.003
  0.004
  0.005
  0.006
  0.007
  0.008
  0.009
  0.01
newparams = moveθ.(αs)
20-element Vector{NamedTuple{(:θ, :σ), Tuple{Vector{Float64}, Float64}}}:
 (θ = [34.99999328299098, 99.99916172688249, 0.499996243942153, 210.00054517387736, 29.999995060596042], σ = 0.1)
 (θ = [34.999986565981956, 99.99832345376497, 0.49999248788430595, 210.0010903477547, 29.999990121192088], σ = 0.1)
 (θ = [34.999979848972934, 99.99748518064746, 0.4999887318264589, 210.00163552163207, 29.99998518178813], σ = 0.1)
 (θ = [34.99997313196391, 99.99664690752994, 0.4999849757686119, 210.00218069550942, 29.999980242384176], σ = 0.1)
 (θ = [34.99996641495489, 99.99580863441243, 0.49998121971076487, 210.00272586938678, 29.999975302980218], σ = 0.1)
 (θ = [34.99995969794587, 99.9949703612949, 0.49997746365291784, 210.00327104326414, 29.99997036357626], σ = 0.1)
 (θ = [34.99995298093685, 99.9941320881774, 0.49997370759507087, 210.00381621714146, 29.999965424172306], σ = 0.1)
 (θ = [34.999946263927825, 99.99329381505987, 0.49996995153722384, 210.00436139101882, 29.999960484768348], σ = 0.1)
 (θ = [34.99993954691881, 99.99245554194236, 0.4999661954793768, 210.00490656489617, 29.999955545364394], σ = 0.1)
 (θ = [34.99993282990979, 99.99161726882484, 0.4999624394215298, 210.00545173877353, 29.999950605960436], σ = 0.1)
 (θ = [35.00000671700902, 100.00083827311751, 0.500003756057847, 209.99945482612264, 30.000004939403958], σ = 0.1)
 (θ = [35.000013434018044, 100.00167654623503, 0.500007512115694, 209.9989096522453, 30.000009878807912], σ = 0.1)
 (θ = [35.000020151027066, 100.00251481935254, 0.500011268173541, 209.99836447836793, 30.00001481821187], σ = 0.1)
 (θ = [35.00002686803609, 100.00335309247006, 0.5000150242313881, 209.99781930449058, 30.000019757615824], σ = 0.1)
 (θ = [35.00003358504511, 100.00419136558757, 0.5000187802892351, 209.99727413061322, 30.000024697019782], σ = 0.1)
 (θ = [35.00004030205413, 100.0050296387051, 0.5000225363470822, 209.99672895673586, 30.00002963642374], σ = 0.1)
 (θ = [35.00004701906315, 100.0058679118226, 0.5000262924049291, 209.99618378285854, 30.000034575827694], σ = 0.1)
 (θ = [35.000053736072175, 100.00670618494013, 0.5000300484627762, 209.99563860898118, 30.000039515231652], σ = 0.1)
 (θ = [35.00006045308119, 100.00754445805764, 0.5000338045206232, 209.99509343510383, 30.000044454635606], σ = 0.1)
 (θ = [35.00006717009021, 100.00838273117516, 0.5000375605784703, 209.99454826122647, 30.000049394039564], σ = 0.1)

Now let’s evaluate the log likelihoods of all these parameter sets.

lls = map(newparams) do p
    loglikelihood(model, pop, p, NaivePooled())
end
20-element Vector{Float64}:
 32.82463258809266
 32.824653516776316
 32.82467444599981
 32.824695375686
 32.8247163049834
 32.82473723581962
 32.82475816602141
 32.82477909749674
 32.82480002929137
 32.824820961111435
 32.82459073300738
 32.82456980632097
 32.8245488798121
 32.82452795380517
 32.82450702699767
 32.824486102815484
 32.824465177567234
 32.82444425406181
 32.824423330072776
 32.824402406539846

To compute the average sensitivity within \(\Theta_c\), we then call:

sens = abs2.((lls .- ll0) ./ αs)
mean(sens)
0.0004379688924138595

Not very sensitive! Therefore, we can conclude that the log likelihood is almost the same inside \(\Theta_c\). This is a strong sign of non-identifiability, or at least poor identifiability.

Let’s contrast this with a random direction d2:

d2 = normalize(rand(6))
function moveθ2(α)
    # unpacking the fields of params to variables with the same names
    (; θ, σ) = params
    # move θ by step * d2[1:end-1]
    return (; θ = θ + α * d2[1:end-1], σ)
end
newparams2 = moveθ2.(αs)
lls2 = map(newparams2) do p
    loglikelihood(model, pop, p, NaivePooled())
end
sens2 = abs2.((lls2 .- ll0) ./ αs)
mean(sens2)
2303.7233253332915

Notice the difference in sensitivity compared to a random direction.

Note

In practice, it can be difficult to be definitive about practical non-identifiability with numerical tests due to the nature of computation in floating point numbers where numerical errors can accumulate and either:

  1. Mask a truly singular matrix by reporting its smallest eigenvalue as very close to 0 but not exactly 0 in floating point numbers, or
  2. Make a truly non-singular matrix appear singular because its smallest eigenvalue was close to 0 in exact arithmetic but was computed as exactly 0 in floating point arithmetic.

A small enough average local sensitivity can therefore be taken as numerically equivalent to 0, i.e. local practical non-identifiability. To be more general, we will sometimes use the term poor identifiability to refer to the case when the model is approximately non-identifiable.

3 Fitting a Poorly Identifiable Model

3.1 Maximum Likelihood Estimation

When using maximum likelihood (ML) estimation to fit a poorly identifiable model, the parameter values you get can be dependent on arbitrary factors such as:

  • The initial parameter estimates. Optimization algorithms will typically converge to values close to the initial value which is arbitrary.
  • Level of noise in the data. Different levels of noise in the data can cause the optimization algorithm to take different trajectories reaching different optimal parameter values at the end.
  • The implementation details of the optimization algorithm. For example, some optimization algorithms implicitly favor parameter values with the smallest norm.

Most of these factors have no statistical significance and can be considered arbitrary in any analysis. Therefore, any insights drawn from poorly identifiable parameter values fitted with ML estimation may be flawed. Luckily, the standard error estimation, if done right, will often reveal signs of poor identifiability. However, common techniques for estimating standard errors can often break down when the model is poorly identifiable. For instance,

  1. Asymptotic estimates of standard errors require the model to be locally identifiable,
  2. Bootstrap relies on the same arbitrary optimization algorithm for fitting the data resamples so its estimates are as unreliable as the ML estimates. For instance, since all the re-fits in bootstrapping are typically initialized from the ML estimates, there is a high probability that the optimization algorithm will only converge to nearby values, significantly under-estimating the variance in the ML estimates.
  3. Sampling importance resampling (SIR) may be able to handle non-identifiable models but it requires a good proposal and its results are sensitive to the proposal used.

Example: Sensitivity to Initial Estimates

Let’s fit the model to the same subject but with 2 different sets of initial estimates:

params1 == [35, 100, 0.5, 210, 30], σ = 0.1)
fpm1 = fit(model, pop, params1, NaivePooled())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0    -3.282461e+01     5.896801e+01
 * time: 0.017616987228393555
     1    -3.299541e+01     8.371540e+01
 * time: 0.38783979415893555
     2    -3.440867e+01     3.937303e+01
 * time: 0.3882269859313965
     3    -3.550285e+01     3.429323e+01
 * time: 0.38855981826782227
     4    -3.585802e+01     2.078146e+01
 * time: 0.38888096809387207
     5    -3.605801e+01     4.208231e+00
 * time: 0.38919687271118164
     6    -3.606867e+01     5.500193e+00
 * time: 0.389462947845459
     7    -3.607820e+01     5.143560e-01
 * time: 0.3897368907928467
     8    -3.607823e+01     1.670014e-01
 * time: 0.39000582695007324
     9    -3.607823e+01     6.594682e-03
 * time: 0.3902769088745117
    10    -3.607823e+01     3.931879e-03
 * time: 0.3905508518218994
    11    -3.607823e+01     5.365673e-02
 * time: 0.39081692695617676
    12    -3.607823e+01     2.875459e-02
 * time: 0.3910808563232422
    13    -3.607823e+01     3.163451e-03
 * time: 0.39134693145751953
    14    -3.607823e+01     1.189333e-04
 * time: 0.39162182807922363
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:              NaivePooled
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                    36.078234
Number of subjects:                              1
Number of parameters:         Fixed      Optimized
                                  0              6
Observation records:         Active        Missing
    dv:                          61              0
    Total:                       61              0

-----------------
       Estimate
-----------------
θ₁     34.397
θ₂    103.17
θ₃      0.46566
θ₄    204.44
θ₅     26.234
σ       0.092273
-----------------
params2 == [10.0, 300, 1.0, 10.0, 5], σ = 0.2)
fpm2 = fit(model, pop, params2, NaivePooled())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     4.245135e+02     6.802168e+02
 * time: 1.6927719116210938e-5
     1     1.499850e+02     4.520555e+01
 * time: 0.0003380775451660156
     2     1.436201e+02     4.485781e+01
 * time: 0.0006229877471923828
     3     1.108786e+02     4.698780e+01
 * time: 0.0009000301361083984
     4     9.146731e+01     4.889574e+01
 * time: 0.0011830329895019531
     5     8.587677e+01     4.879782e+01
 * time: 0.0015139579772949219
     6     8.353852e+01     5.068832e+01
 * time: 0.0018360614776611328
     7     8.061237e+01     4.922992e+01
 * time: 0.0021741390228271484
     8     7.723482e+01     3.939427e+01
 * time: 0.0025119781494140625
     9     7.347395e+01     6.732180e+01
 * time: 0.0027921199798583984
    10     6.467070e+01     2.181937e+01
 * time: 0.003072023391723633
    11     5.389941e+01     4.260986e+01
 * time: 0.0034050941467285156
    12     4.899559e+01     4.746853e+01
 * time: 0.003734111785888672
    13     4.636064e+01     4.389414e+01
 * time: 0.0040130615234375
    14     4.311120e+01     3.667380e+01
 * time: 0.004302024841308594
    15     3.452695e+01     4.210878e+01
 * time: 0.0046350955963134766
    16     2.786722e+01     4.181505e+01
 * time: 0.004975080490112305
    17     2.167865e+01     2.502622e+01
 * time: 0.0053179264068603516
    18     2.102586e+01     3.073394e+01
 * time: 0.005608081817626953
    19     2.012361e+01     9.155807e+00
 * time: 0.0058879852294921875
    20     1.993999e+01     8.931312e+00
 * time: 0.0061740875244140625
    21     1.899136e+01     3.290293e+01
 * time: 0.006461143493652344
    22     1.831295e+01     1.555682e+01
 * time: 0.0067441463470458984
    23     1.764099e+01     4.156266e+00
 * time: 0.007030963897705078
    24     1.734278e+01     4.764555e+00
 * time: 0.0073299407958984375
    25     1.692204e+01     5.545118e+00
 * time: 0.007612943649291992
    26     1.576408e+01     6.312614e+00
 * time: 0.007970094680786133
    27     1.433865e+01     8.298700e+00
 * time: 0.008343935012817383
    28     1.402269e+01     1.935485e+01
 * time: 0.008701086044311523
    29     1.388981e+01     5.313929e+00
 * time: 0.009005069732666016
    30     1.380201e+01     6.598755e+00
 * time: 0.009318113327026367
    31     1.361376e+01     1.819850e+01
 * time: 0.009626150131225586
    32     1.307080e+01     3.715701e+01
 * time: 0.00993204116821289
    33     1.084752e+01     4.989789e+01
 * time: 0.010254144668579102
    34     9.107761e+00     3.568940e+01
 * time: 0.010848045349121094
    35     8.799303e+00     1.470967e+02
 * time: 0.011271953582763672
    36     6.685368e+00     4.816807e+01
 * time: 0.011603116989135742
    37     4.048434e+00     5.757686e+01
 * time: 0.011934041976928711
    38    -4.570968e+00     4.010359e+01
 * time: 0.020991086959838867
    39    -6.951249e+00     9.481085e+01
 * time: 0.02141714096069336
    40    -8.241067e+00     2.924279e+01
 * time: 0.021824121475219727
    41    -9.267379e+00     1.592312e+01
 * time: 0.022236108779907227
    42    -9.300539e+00     6.235892e+00
 * time: 0.0225830078125
    43    -9.408955e+00     4.087560e+00
 * time: 0.0229189395904541
    44    -9.420626e+00     5.730458e+00
 * time: 0.02326202392578125
    45    -9.435635e+00     8.693778e+00
 * time: 0.023600101470947266
    46    -9.439794e+00     8.483929e+00
 * time: 0.023941993713378906
    47    -9.443618e+00     7.807829e+00
 * time: 0.02428603172302246
    48    -9.453743e+00     6.452476e+00
 * time: 0.024625062942504883
    49    -9.479753e+00     3.781804e+00
 * time: 0.02496504783630371
    50    -9.545875e+00     5.189641e+00
 * time: 0.02530503273010254
    51    -9.621161e+00     2.998542e+01
 * time: 0.025640010833740234
    52    -9.788354e+00     2.377897e+01
 * time: 0.025974035263061523
    53    -9.957783e+00     2.804857e+01
 * time: 0.02637505531311035
    54    -1.016832e+01     2.937980e+01
 * time: 0.02676701545715332
    55    -1.046535e+01     3.264929e+01
 * time: 0.027166128158569336
    56    -1.071184e+01     7.422024e+00
 * time: 0.02748703956604004
    57    -1.074836e+01     1.458278e+01
 * time: 0.027866125106811523
    58    -1.088324e+01     1.552006e+01
 * time: 0.028258085250854492
    59    -1.094332e+01     1.034929e+01
 * time: 0.0285799503326416
    60    -1.100763e+01     9.505009e+00
 * time: 0.02889704704284668
    61    -1.101755e+01     1.637700e+00
 * time: 0.029226064682006836
    62    -1.101926e+01     9.140574e-01
 * time: 0.029551029205322266
    63    -1.101993e+01     3.661381e-01
 * time: 0.02987194061279297
    64    -1.102009e+01     1.784483e-01
 * time: 0.030199050903320312
    65    -1.102013e+01     2.946407e-02
 * time: 0.03052210807800293
    66    -1.102013e+01     3.293414e-03
 * time: 0.030843019485473633
    67    -1.102013e+01     5.169943e-04
 * time: 0.031167030334472656
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:              NaivePooled
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                    11.020132
Number of subjects:                              1
Number of parameters:         Fixed      Optimized
                                  0              6
Observation records:         Active        Missing
    dv:                          61              0
    Total:                       61              0

----------------
       Estimate
----------------
θ₁     30.963
θ₂    214.26
θ₃     55.46
θ₄    169.43
θ₅     12.071
σ       0.13904
----------------

Now let’s display the estimates of θ side by side:

hcat(coef(fpm1).θ, coef(fpm2).θ)
5×2 Matrix{Float64}:
  34.3968     30.9631
 103.168     214.264
   0.465659   55.4603
 204.442     169.429
  26.2337     12.0709

Notice the significant difference in the final estimates when changing nothing but the initial estimates. Also note that the 2 sets of coefficients are not in the same neighbourhood and don’t have similar log likelihoods. This is indicative of the existence of multiple dis-connected local optima.

Example: Sensitivity to Noise Level

To demonstrate the sensitivity to noise level, we will re-simulate the synthetic subject from the same pseudo-random number generator and seed but using a higher σ.

rng = Random.default_rng()
Random.seed!(rng, 12345)
newparams = (; θ = params.θ, σ = 0.2)
newpop = [Subject(simobs(model, skeleton, newparams; rng))]
Population
  Subjects: 1
  Observations: dv

Now let’s do the fit once with pop and once with newpop:

fpm1 = fit(model, pop, params, NaivePooled())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0    -3.282461e+01     5.896801e+01
 * time: 1.5020370483398438e-5
     1    -3.299541e+01     8.371540e+01
 * time: 0.00043201446533203125
     2    -3.440867e+01     3.937303e+01
 * time: 0.0007669925689697266
     3    -3.550285e+01     3.429323e+01
 * time: 0.0010879039764404297
     4    -3.585802e+01     2.078146e+01
 * time: 0.0014069080352783203
     5    -3.605801e+01     4.208231e+00
 * time: 0.0017299652099609375
     6    -3.606867e+01     5.500193e+00
 * time: 0.0019979476928710938
     7    -3.607820e+01     5.143560e-01
 * time: 0.0022640228271484375
     8    -3.607823e+01     1.670014e-01
 * time: 0.0025348663330078125
     9    -3.607823e+01     6.594682e-03
 * time: 0.0028090476989746094
    10    -3.607823e+01     3.931879e-03
 * time: 0.0030748844146728516
    11    -3.607823e+01     5.365673e-02
 * time: 0.0033409595489501953
    12    -3.607823e+01     2.875459e-02
 * time: 0.003612041473388672
    13    -3.607823e+01     3.163451e-03
 * time: 0.003882884979248047
    14    -3.607823e+01     1.189333e-04
 * time: 0.004149913787841797
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:              NaivePooled
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                    36.078234
Number of subjects:                              1
Number of parameters:         Fixed      Optimized
                                  0              6
Observation records:         Active        Missing
    dv:                          61              0
    Total:                       61              0

-----------------
       Estimate
-----------------
θ₁     34.397
θ₂    103.17
θ₃      0.46566
θ₄    204.44
θ₅     26.234
σ       0.092273
-----------------
fpm2 = fit(model, newpop, newparams, NaivePooled())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     8.763976e+00     2.735808e+01
 * time: 1.2874603271484375e-5
     1     8.405325e+00     2.531358e+01
 * time: 0.00041604042053222656
     2     6.620382e+00     1.286061e+01
 * time: 0.0007419586181640625
     3     5.297438e+00     8.779519e+00
 * time: 0.0010590553283691406
     4     4.963913e+00     3.840318e+00
 * time: 0.0013730525970458984
     5     4.930293e+00     1.184668e+01
 * time: 0.0016579627990722656
     6     4.873975e+00     3.753008e+00
 * time: 0.0019199848175048828
     7     4.858307e+00     2.308575e+00
 * time: 0.002234935760498047
     8     4.854947e+00     8.434789e-02
 * time: 0.0025060176849365234
     9     4.854713e+00     7.173484e-02
 * time: 0.0027718544006347656
    10     4.853126e+00     8.197275e-01
 * time: 0.0030350685119628906
    11     4.851996e+00     1.269989e+00
 * time: 0.0032989978790283203
    12     4.850709e+00     1.179594e+00
 * time: 0.003571033477783203
    13     4.849908e+00     1.616557e-01
 * time: 0.003846883773803711
    14     4.849724e+00     8.747458e-01
 * time: 0.004114866256713867
    15     4.849534e+00     1.703991e-01
 * time: 0.0043909549713134766
    16     4.849431e+00     1.385412e-01
 * time: 0.004658937454223633
    17     4.849400e+00     1.192473e-01
 * time: 0.00492405891418457
    18     4.849395e+00     8.727150e-03
 * time: 0.005188941955566406
    19     4.849394e+00     2.408209e-02
 * time: 0.005465030670166016
    20     4.849394e+00     1.491013e-03
 * time: 0.005732059478759766
    21     4.849394e+00     1.658892e-04
 * time: 0.006000041961669922
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:              NaivePooled
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                   -4.8493937
Number of subjects:                              1
Number of parameters:         Fixed      Optimized
                                  0              6
Observation records:         Active        Missing
    dv:                          61              0
    Total:                       61              0

----------------
       Estimate
----------------
θ₁     33.85
θ₂    125.52
θ₃      0.50012
θ₄    192.8
θ₅     22.25
σ       0.18103
----------------

Now let’s display the estimates of θ side by side:

hcat(coef(fpm1).θ, coef(fpm2).θ)
5×2 Matrix{Float64}:
  34.3968     33.8499
 103.168     125.517
   0.465659    0.500121
 204.442     192.8
  26.2337     22.2497

Given that we have many observations per subject, this level of fluctuation due to a higher noise is a symptom of non-identifiability. Also note the big difference compared to the true data-generating parameter values!

params.θ
5-element Vector{Float64}:
  35.0
 100.0
   0.5
 210.0
  30.0
Summary

In this section, we have demonstrated beyond reasonable doubt that ML estimation workflows can be unreliable when fitting poorly identifiable models, potentially leading to erroneous conclusions in a study. So if we cannot rely on ML estimation for fitting poorly identifiable models, what can we use? The answer is Bayesian inference.

3.2 Bayesian Inference

Using Bayesian methods to sample from the posterior of the parameter estimates is a mathematically sound way to fit non-identifiable models because even non-identifiable models have a well-defined posterior distribution, when their parameters are assigned prior distributions. If multiple parameter values all fit the data well, then all such values will be plausible samples from the posterior distribution, assuming reasonable priors were used.

3.2.1 Model Definition

Let’s see how to minimally change our model above to make it Bayesian using a weakly informative priors “roughly in the ballpark”:

bayes_model = @model begin
    @param begin
        θ ~ MvLogNormal([log(50), log(150), log(1.0), log(100.0), log(10.0)], I(5))
        σ ~ Uniform(0.0, 1.0)
    end
    @pre begin
        CL = θ[1]
        Vc = θ[2]
        Ka = θ[3]
        Vp = θ[4]
        Q = θ[5]
    end
    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL + Q) / Vc * Central + Q / Vp * Peripheral
        Peripheral' = Q / Vc * Central - Q / Vp * Peripheral
    end
    @derived begin
        cp := @. Central / Vc
        dv ~ @. Normal(cp, abs(cp) * σ + 1e-6)
    end
end
PumasModel
  Parameters: θ, σ
  Random effects: 
  Covariates: 
  Dynamical system variables: Depot, Central, Peripheral
  Dynamical system type: Matrix exponential
  Derived: dv
  Observed: dv

3.2.2 Sampling from the Posterior

Now to fit it, we need to pass an instance of BayesMCMC as the algorithm in fit. In this case, we used 4 chains for sampling with 3000 samples per chain out of which the first 1500 samples will be used to adapt the mass matrix and step size of the No-U-Turn sampler (NUTS) used in Pumas. All the chains are also parallelized using multi-threading.

# Setting the pseudo-random number generator's seed for reproducibility
rng = Pumas.default_rng()
Random.seed!(54321)
bayes_alg = BayesMCMC(;
    nsamples = 3000,
    nadapts = 1500,
    nchains = 4,
    ensemblealg = EnsembleThreads(),
    rng,
)
bayes_fpm = fit(bayes_model, pop, params, bayes_alg)
[ Info: Checking the initial parameter values.
[ Info: Checking the initial parameter values.
[ Info: Checking the initial parameter values.
[ Info: Checking the initial parameter values.
[ Info: The initial log probability and its gradient are finite. Check passed.
[ Info: The initial log probability and its gradient are finite. Check passed.
[ Info: The initial log probability and its gradient are finite. Check passed.
[ Info: The initial log probability and its gradient are finite. Check passed.
Chains MCMC chain (3000×6×4 Array{Float64, 3}):

Iterations        = 1:1:3000
Number of chains  = 4
Samples per chain = 3000
Wall duration     = 33.16 seconds
Compute duration  = 129.51 seconds
parameters        = θ₁, θ₂, θ₃, θ₄, θ₅, σ

Summary Statistics
  parameters       mean       std      mcse    ess_bulk    ess_tail      rhat  ⋯
      Symbol    Float64   Float64   Float64     Float64     Float64   Float64  ⋯

          θ₁    34.2579    0.6523    0.0092   5497.7880   4897.9866    1.0002  ⋯
          θ₂   139.3610   34.9101    0.7427   2271.2729   3797.8170    1.0030  ⋯
          θ₃     0.6537    0.1843    0.0038   2259.2549   3751.1489    1.0032  ⋯
          θ₄   190.1282   18.9562    0.3812   2672.7263   4119.4005    1.0037  ⋯
          θ₅    23.2124    3.0041    0.0575   2761.8348   4790.9687    1.0010  ⋯
           σ     0.0983    0.0096    0.0001   5778.0016   5790.5379    1.0001  ⋯
                                                                1 column omitted

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

          θ₁    32.9733    33.8287    34.2707    34.6971    35.4815
          θ₂    71.4814   112.2618   143.0335   166.5933   197.8175
          θ₃     0.3497     0.5059     0.6507     0.7827     1.0227
          θ₄   158.9771   176.7059   187.9091   201.6531   231.7550
          θ₅    17.3559    21.0279    23.3891    25.4448    28.5847
           σ     0.0818     0.0915     0.0976     0.1043     0.1189

Now let’s discard the NUTS warmup samples as burn-in:

bayes_fpm_samples = Pumas.discard(bayes_fpm, burnin = 1500)
Chains MCMC chain (1500×6×4 Array{Float64, 3}):

Iterations        = 1:1:1500
Number of chains  = 4
Samples per chain = 1500
Wall duration     = 33.16 seconds
Compute duration  = 129.51 seconds
parameters        = θ₁, θ₂, θ₃, θ₄, θ₅, σ

Summary Statistics
  parameters       mean       std      mcse    ess_bulk    ess_tail      rhat  ⋯
      Symbol    Float64   Float64   Float64     Float64     Float64   Float64  ⋯

          θ₁    34.2560    0.6637    0.0136   2689.5162   2177.6258    1.0013  ⋯
          θ₂   138.3873   35.5413    1.0641   1137.7098   2112.8005    1.0054  ⋯
          θ₃     0.6496    0.1877    0.0056   1128.0835   1998.2873    1.0055  ⋯
          θ₄   190.6384   19.5473    0.5618   1342.1071   2191.5933    1.0064  ⋯
          θ₅    23.2342    3.0175    0.0841   1306.4230   2292.9183    1.0021  ⋯
           σ     0.0981    0.0095    0.0002   3175.2050   3222.1150    1.0007  ⋯
                                                                1 column omitted

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

          θ₁    32.9492    33.8325    34.2669    34.6935    35.5236
          θ₂    70.6965   110.2724   142.5154   166.1878   198.3443
          θ₃     0.3477     0.4948     0.6441     0.7822     1.0232
          θ₄   159.0135   176.6449   188.4302   202.5296   232.9439
          θ₅    17.3025    21.0489    23.4438    25.4931    28.5847
           σ     0.0817     0.0914     0.0975     0.1039     0.1188

Notice how Bayesian inference was able to quantify the uncertainty in the non-identifiable parameters (mostly θ₂ which is Vc and θ₄ which is Vp) reflected in the large standard deviation of the marginal posterior of some parameters, std, relative to its mean value, mean. This is consistent with the eigenvector d we used earlier to prove that the model is poorly identifiable which also showed the poor identifiability was mostly prevalent in these 2 parameters.

d
6-element Vector{Float64}:
  0.006717009021476735
  0.8382731175155917
  0.0037560578470219614
 -0.5451738773535086
  0.004939403956445427
 -3.137352344503738e-6

3.2.3 Convergence Diagnostics

By default, we print summary statistics and a few convergence diagnostics: effective samples size (ess_bulk) and \(\hat{R}\) (rhat). In this case, the diagnostics look reasonable. The rhat is close to 1 and the minimum ess_bulk is around 1000.

Next we show the trace plot of all the parameters

using PumasPlots

trace_plot(bayes_fpm_samples; linkyaxes = false)

It can be seen that the chains are mostly well mixed. Occasional jumps seem to happen which may indicate the presence of another mode in the posterior with a relatively small probability mass.

Now let’s look at the auto-correlation plot:

autocor_plot(bayes_fpm_samples; linkyaxes = false)

Some auto-correlation seems to persist in some chains so let’s try some thinning, by keeping only one out of every 5 samples.

thin_bayes_fpm_samples = Pumas.discard(bayes_fpm, burnin = 1500, ratio = 0.2)

autocor_plot(thin_bayes_fpm_samples; linkyaxes = false)

Looks better!

To Thin Or Not To Thin

In general, thinning is not recommended unless there are extremely high levels of auto-correlation in the samples. This is because the thinned samples will always have less information than the full set of samples before thinning. However, we perform thinning in this tutorial for demonstration purposes.

3.2.4 Posterior Predictive Check

Now let’s do a posterior predictive check by first simulating 1000 scenarios from the posterior distribution of the response including residual noise.

obstimes = 0.0:0.5:30.0
sims = simobs(bayes_fpm_samples; samples = 1000, simulate_error = true, obstimes)
[ Info: Sampling 1000 sample(s) from the posterior predictive distribution of each subject.
Simulated population (Vector{<:Subject})
  Simulated subjects: 1000
  Simulated variables: dv

then we can do a visual predictive check (VPC) plot using the simulations

vpc_res = vpc(sims; observations = [:dv], ensemblealg = EnsembleThreads())

vpc_plt = vpc_plot(
    vpc_res;
    simquantile_medians = true,
    observations = true,
    axis = (xlabel = "Time (h)", ylabel = "Concentration", xticks = 0:2:30),
)
[ Info: Detected 1000 scenarios and 1 subjects in the input simulations. Running VPC.
[ Info: Continuous VPC

Summary

With very few changes in the model and a few lines of code, we were able to obtain samples from the full posterior of the parameters of our poorly identifiable model.

4 Uncertainty Propagation, Queries and Decision Making

Just because a model is non-identifiable does not mean that the model is useless or less correct. In fact, more correct models that incorporate more biological processes tend to be non-identifiable because we can only observe/measure very few variables in the model, while simplified models are more likely to be identifiable. Given samples from the posterior of a non-identifiable model, one can do the following:

  • Propagate the uncertainty forward to the predictions to get samples from the posterior predictive distribution, instead of relying on a single prediction using the ML estimates.
Value of Uncertainty Quantification

Parameter uncertainty due to structural non-identifiability will by definition have no effect on the model’s predictions when predicting the observed response. However, uncertainty due to practical non-identifiability, or otherwise insufficient data, can have an impact on the model’s predictions. In this case, basing decisions on the full posterior predictive distribution instead of a single prediction from the ML estimates will make the decisions more robust to parameter uncertainty due to insufficient observations and model misspecification.

  • Ask probabilistic questions given your data. For example, what’s the probability that the drug effect is \(> 0\)? Or what’s the probability that the new drug A is better than the control drug B after only 3 months of data? Or what’s the probability of satisfying a therapeutic criteria for efficacy and safety given the current dose?
  • What-if analysis (aka counter-factual simulation) and dose optimization. For example, you can make predictions assuming the subject is a pediatric using the model parameters’ posterior inferred from an adult’s data. Or you can test different dose levels to select the best dose according to some therapeutic criteria. This can also be done in the non-Bayesian setting.

4.1 Probabilistic Questions

In this section, we show how to

  1. Estimate the probability that a parameter is more than a specific value, and
  2. Estimate the probability that the subject satisfies a desired therapeutic criteria. From there, one can simulate multiple doses and choose the dose that maximizes this probability.

To estimate the probability that CL > 35, we can run:

mean(bayes_fpm_samples) do p
    p.θ[1] > 35
end
0.12133333333333333

To estimate the probability that a subject satisfies a desired therapeutic criteria, we first simulate from the posterior predictive distribution (without residual error):

ipreds = simobs(bayes_fpm_samples; samples = 1000, simulate_error = false, obstimes)
[ Info: Sampling 1000 sample(s) from the posterior predictive distribution of each subject.
Simulated population (Vector{<:Subject})
  Simulated subjects: 1000
  Simulated variables: dv

Next, we can estimate the area-under-curve (auc) and maximum drug concentration (cmax) given the different posterior samples.

using NCA

nca_params = postprocess(ipreds) do gen, obs
    pk_auc = NCA.auc(gen.dv, obstimes)
    pk_cmax = NCA.cmax(gen.dv, obstimes)
    (; pk_auc, pk_cmax)
end
1000-element Vector{NamedTuple{(:pk_auc, :pk_cmax), Tuple{Float64, Float64}}}:
 (pk_auc = 86.91497803014423, pk_cmax = 9.940504338194009)
 (pk_auc = 88.64642467147539, pk_cmax = 9.782946880264959)
 (pk_auc = 83.90202718874055, pk_cmax = 9.34376917423757)
 (pk_auc = 87.39731696675011, pk_cmax = 9.862236694254756)
 (pk_auc = 87.80512090747456, pk_cmax = 9.702278331110739)
 (pk_auc = 89.07773261903851, pk_cmax = 9.758991972882868)
 (pk_auc = 85.1196956238667, pk_cmax = 9.393419839752713)
 (pk_auc = 86.2925531460696, pk_cmax = 9.44919795743169)
 (pk_auc = 89.51877457126756, pk_cmax = 10.28950792797656)
 (pk_auc = 86.59751342516361, pk_cmax = 9.609515382898971)
 ⋮
 (pk_auc = 86.97655904209671, pk_cmax = 9.570386363345873)
 (pk_auc = 86.56471656049247, pk_cmax = 10.07082553150403)
 (pk_auc = 89.764365956303, pk_cmax = 10.211256720983231)
 (pk_auc = 87.62536399390372, pk_cmax = 9.745427734741108)
 (pk_auc = 86.5102194874643, pk_cmax = 9.580550086555732)
 (pk_auc = 87.44356449826826, pk_cmax = 9.519826983486444)
 (pk_auc = 88.3568507805343, pk_cmax = 9.824435830371694)
 (pk_auc = 86.11728861680128, pk_cmax = 9.950544536927223)
 (pk_auc = 87.56879318211763, pk_cmax = 10.182058272520834)

Finally, we can estimate the probability of satisfying a therapeutic criteria.

prob = mean(nca_params) do p
    p.pk_auc > 90 && p.pk_cmax < 15.0
end
0.032

To compute the probability of efficacy and safety separately, one can instead run:

prob1 = mean(nca_params) do p
    p.pk_auc > 90
end
prob2 = mean(nca_params) do p
    p.pk_cmax < 15.0
end
prob1, prob2
(0.032, 1.0)

4.2 Counter-factual Analysis and Dose Optimization

After developing a model, one may be interested in simulating scenarios, e.g. different covariates or doses, that have not been observed in the data while reusing the same posterior distribution of the parameters learnt from the data. This can be used to select a new dose that maximizes the probabilities of efficacy and safety simultaneously given the previously observed data.

In Pumas, you can do this by defining a new subject that includes the new covariates or dose and then passing that to simobs. First, let’s define a new skeleton subject that represents our counter-factual scenario where a dose of 3200 was administered instead of the 3000 used in the observed case:

cf_skeleton = Subject(
    id = 1,
    time = 0.0:0.5:30.0,
    events = DosageRegimen(3200, time = 0.0, cmt = 1),
    observations = (; dv = nothing),
)
Subject
  ID: 1
  Events: 1
  Observations: dv: (n=61)

To simulate from the posterior predictive distribution of this new subject using the posterior of the parameters, you can run:

cf_ipreds = simobs(
    thin_bayes_fpm_samples,
    cf_skeleton;
    samples = 1000,
    simulate_error = false,
    obstimes,
    subject = 1,
)
[ Info: Simulating 1000 sample(s) from the posterior predictive distribution of subject 1 using the dose and covariates in the input subject.
Simulated population (Vector{<:Subject})
  Simulated subjects: 1000
  Simulated variables: dv

We can then re-evaluate the probability of satisfying a therapeutic criteria:

cf_nca_params = postprocess(cf_ipreds) do gen, obs
    pk_auc = NCA.auc(gen.dv, obstimes)
    pk_cmax = NCA.cmax(gen.dv, obstimes)
    (; pk_auc, pk_cmax)
end
cf_prob = mean(cf_nca_params) do p
    p.pk_auc > 90 && p.pk_cmax < 15.0
end
0.965

We can see that the probability increased. To understand why, let’s look at the probabilities of the auc and cmax criteria separately:

cf_prob1 = mean(cf_nca_params) do p
    p.pk_auc > 90
end
cf_prob2 = mean(cf_nca_params) do p
    p.pk_cmax < 15.0
end
cf_prob1, cf_prob2
(0.965, 1.0)

Contrast this to the old dose’s probabilities:

prob1, prob2
(0.032, 1.0)

Note that this example was not particularly interesting because of the dense sampling which despite of it, the uncertainty in the parameters was still high. This implies that the uncertainty was largely due to structural identifiability issues in the model. Since uncertainty in parameters due to structural non-identifiability does not affect the model predictions, the posterior predictive distribution was much more concentrated than the parameters’ posterior.

To estimate the mean and standard deviation of the predictions at each point in time we can run:

μs = mean(ipreds) do gen, obs
    gen.dv
end
σs = std(ipreds) do gen, obs
    gen.dv
end
61-element Vector{Float64}:
 0.0
 0.31204904032886366
 0.3799550865760596
 0.3644464596157579
 0.33363781812047966
 0.30700666982138863
 0.2841593270949681
 0.2614537628189557
 0.23744203478523063
 0.2126816421727952
 ⋮
 0.014684017487732333
 0.015010149018483639
 0.015378215927157844
 0.015774690756976652
 0.016187984950170937
 0.016608413616467893
 0.01702804625673846
 0.01744050415259986
 0.017840743757490264

To get the average relative standard deviations (ignoring the first prediction which is 0), we run:

mean(σs[2:end] ./ μs[2:end])
0.026013102070374487

So the predictions are not very sensitive to the parameter uncertainty.

5 Summary

In this tutorial, we have seen how to test for model non-identifiability using the Fisher information matrix and sensitivity analysis. We have shown that maximum likelihood estimation is unreliable when fitting poorly identifiable models. And we have seen how one can use Bayesian inference to: 1) fit non-identifiable or poorly identifiable models to data, 2) ask probabilistic questions of the model, and 3) simulate counter-factual scenarios.

6 References

  1. Thomas J. Rothenberg. Identification in parametric models. Econometrica, 1971.
  2. F. Mentre, A Mallet, and D. Baccar. Optimal design in random-effects regression models. Biometrika, 1997.
  3. S. Retout and F Mentre. Further development of the fisher information matrix in nonlinear mixed-effects models with evaluation in population pharmacokinetics. Journal of biopharmaceutical statistics, 2003.
  4. V. Shivva, K. Korell, I. Tucker, and S. Duffull. An approach for identifiability of population pharmacokinetic-pharmacodynamic models. CPT Pharmacometrics & Systems Pharmacology, 2013.
  5. Stephen Dufful, A workflow for resolving model internal consistency in use-reuse settings (aka repairing unstable models). PAGANZ, 2024.
  6. Dan Wright. The identifiability of a turnover model for allopurinol urate-lowering effect. PAGANZ, 2024.