Interpreting Results from Pumas Fits

Author

Patrick Kofod Mogensen

1 Introduction

After following a typical workflow for fitting a Pumas model it is important to assess and interpret the results. This includes steps such as

Looking at log-likelihood related quantities such as loglikelihood, aic and bic to compare the fit to other existing fits.
Assessing convergence of the maximum likelihood optimization.
Calculating the condition number of the correlation matrix implied by the estimated variance-covariance matrix.
Calculating shrinkage for random effects and the dependent variable.

The following sections will walk through these steps using a one-compartment PK model for Warfarin, focusing on the PK aspects only.

2 Model and Data

2.1 Model Definition

Below is the PK model, named warfarin_pk_model, defined in Pumas. This model contains:

Fixed effects (population parameters): pop_CL, pop_Vc, pop_tabs, pop_lag
Inter-individual variability (IIV) components: pk_Ω
Residual error model parameters: σ_prop, σ_add
Covariates for scaling: FSZCL and FSZV
Differential equations describing the PK behavior in the compartments.

using Pumas
using PharmaDatasets
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using PumasUtilities
using Markdown

warfarin_pk_model = @model begin

    @metadata begin
        desc = "Warfarin 1-compartment PK model (PD removed)"
        timeu = u"hr"
    end

    @param begin
        # PK parameters
        """
        Clearance (L/hr)
        """
        pop_CL ∈ RealDomain(lower = 0.0, init = 0.134)
        """
        Central volume (L)
        """
        pop_Vc ∈ RealDomain(lower = 0.0, init = 8.11)
        """
        Absorption lag time (hr)
        """
        pop_tabs ∈ RealDomain(lower = 0.0, init = 0.523)
        """
        Lag time (hr)
        """
        pop_lag ∈ RealDomain(lower = 0.0, init = 0.1)

        # Inter-individual variability
        """
          - ΩCL: Clearance
          - ΩVc: Central volume
          - Ωtabs: Absorption lag time
        """
        pk_Ω ∈ PDiagDomain([0.01, 0.01, 0.01])

        # Residual variability
        """
        σ_prop: Proportional error
        """
        σ_prop ∈ RealDomain(lower = 0.0, init = 0.00752)
        """
        σ_add: Additive error
        """
        σ_add ∈ RealDomain(lower = 0.0, init = 0.0661)
    end

    @random begin
        pk_η ~ MvNormal(pk_Ω)    # mean = 0, covariance = pk_Ω
    end

    @covariates begin
        """
        FSZCL: Clearance scaling factor
        """
        FSZCL
        """
        FSZV: Volume scaling factor
        """
        FSZV
    end

    @pre begin
        CL = FSZCL * pop_CL * exp(pk_η[1])
        Vc = FSZV * pop_Vc * exp(pk_η[2])

        tabs = pop_tabs * exp(pk_η[3])
        Ka = log(2) / tabs
    end

    @dosecontrol begin
        lags = (Depot = pop_lag,)
    end

    @vars begin
        cp := Central / Vc
    end

    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL / Vc) * Central
    end

    @derived begin
        """
        Concentration (ng/mL)
        """
        conc ~ @. Normal(cp, sqrt((σ_prop * cp)^2 + σ_add^2))
    end
end

PumasModel
  Parameters: pop_CL, pop_Vc, pop_tabs, pop_lag, pk_Ω, σ_prop, σ_add
  Random effects: pk_η
  Covariates: FSZCL, FSZV
  Dynamical system variables: Depot, Central
  Dynamical system type: Matrix exponential
  Derived: conc
  Observed: conc

2.2 Data Preparation

The Warfarin data used in this tutorial is pulled from PharmaDatasets for demonstration purposes. Note how the code reshapes and prepares the data in “wide” format for reading into Pumas. Only the conc column is treated as observations for the PK model.

warfarin_data = dataset("pumas/warfarin_nonmem")

# Step 2: Fix Duplicate Time Points
# -------------------------------
# Some subjects have duplicate time points for dvid = 1
# For this dataset, the triple (id, time, dvid) should define
# a row uniquely, but
nrow(warfarin_data)
nrow(unique(warfarin_data, ["id", "time", "dvid"]))

# We can identify the problematic rows by grouping on the index variables
@chain warfarin_data begin
    @groupby :id :time :dvid
    @transform :tmp = length(:id)
    @rsubset :tmp > 1
end

# Transform the data in a single chain of operations
warfarin_data_wide = @chain warfarin_data begin
    @rtransform begin
        # Scaling factors
        :FSZV = :wtbl / 70            # volume scaling
        :FSZCL = (:wtbl / 70)^0.75     # clearance scaling (allometric)
        # Column name for the dv
        :dvname = "dv$(:dvid)"
        # Dosing indicator columns
        :cmt = ismissing(:amt) ? missing : 1
        :evid = iszero(:amt) ? 0 : 1
    end
    unstack(Not([:dvid, :dvname, :dv]), :dvname, :dv)
    rename!(:dv1 => :conc, :dv2 => :pca)
    select!(Not([:dv0]))
end
first(warfarin_data_wide, 10)

10×12 DataFrame

Row	id	time	evid	amt	wtbl	age	sex	FSZV	FSZCL	cmt	conc	pca
	Int64	Float64	Int64	Float64	Float64	Int64	String1	Float64	Float64	Int64	Float64?	Float64?
1	1	0.0	1	100.0	66.7	50	M	0.952857	0.96443	1	missing	missing
2	1	0.5	0	0.0	66.7	50	M	0.952857	0.96443	1	0.0	missing
3	1	1.0	0	0.0	66.7	50	M	0.952857	0.96443	1	1.9	missing
4	1	2.0	0	0.0	66.7	50	M	0.952857	0.96443	1	3.3	missing
5	1	3.0	0	0.0	66.7	50	M	0.952857	0.96443	1	6.6	missing
6	1	6.0	0	0.0	66.7	50	M	0.952857	0.96443	1	9.1	missing
7	1	9.0	0	0.0	66.7	50	M	0.952857	0.96443	1	10.8	missing
8	1	12.0	0	0.0	66.7	50	M	0.952857	0.96443	1	8.6	missing
9	1	24.0	0	0.0	66.7	50	M	0.952857	0.96443	1	5.6	44.0
10	1	36.0	0	0.0	66.7	50	M	0.952857	0.96443	1	4.0	27.0

2.3 Creating a Pumas Population

Below is the creation of a population object in Pumas using read_pumas. Only the conc data are treated as the observation variable:

pop = read_pumas(
    warfarin_data_wide;
    id = :id,
    time = :time,
    amt = :amt,
    cmt = :cmt,
    evid = :evid,
    covariates = [:sex, :wtbl, :FSZV, :FSZCL],
    observations = [:conc],
)

Population
  Subjects: 32
  Covariates: sex, wtbl, FSZV, FSZCL
  Observations: conc

# A named tuple of parameter values
param_vals = (
    pop_CL = 0.134,
    pop_Vc = 8.11,
    pop_tabs = 0.523,
    pop_lag = 0.1,
    pk_Ω = Diagonal([0.01, 0.01, 0.01]),
    σ_prop = 0.00752,
    σ_add = 0.0661,
)

foce_fit = fit(warfarin_pk_model, pop, param_vals, FOCE())

[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     1.209064e+04     1.489225e+04
 * time: 0.03635907173156738
     1     2.643772e+03     3.167516e+03
 * time: 1.5721640586853027
     2     1.836601e+03     2.118430e+03
 * time: 1.6450309753417969
     3     9.351337e+02     8.722439e+02
 * time: 1.7171721458435059
     4     6.402300e+02     4.199225e+02
 * time: 1.786393165588379
     5     5.103664e+02     1.642121e+02
 * time: 1.8539941310882568
     6     4.760464e+02     5.453749e+01
 * time: 1.9224891662597656
     7     4.703757e+02     3.643518e+01
 * time: 1.988415002822876
     8     4.699019e+02     3.135992e+01
 * time: 2.054548978805542
     9     4.697614e+02     2.953531e+01
 * time: 2.1192541122436523
    10     4.693153e+02     2.463233e+01
 * time: 2.1858232021331787
    11     4.685743e+02     2.580427e+01
 * time: 2.253462076187134
    12     4.675133e+02     3.864937e+01
 * time: 2.4556210041046143
    13     4.666775e+02     5.495470e+01
 * time: 2.5227930545806885
    14     4.661197e+02     5.692101e+01
 * time: 2.5870871543884277
    15     4.656782e+02     4.770992e+01
 * time: 2.650909185409546
    16     4.651802e+02     3.087698e+01
 * time: 2.7152509689331055
    17     4.645523e+02     1.184834e+01
 * time: 2.7812790870666504
    18     4.641447e+02     1.162249e+01
 * time: 2.8452391624450684
    19     4.639978e+02     1.125144e+01
 * time: 2.9101979732513428
    20     4.639307e+02     1.156463e+01
 * time: 2.972553014755249
    21     4.638001e+02     1.312870e+01
 * time: 3.0364041328430176
    22     4.635282e+02     1.480920e+01
 * time: 3.100142002105713
    23     4.630353e+02     2.169377e+01
 * time: 3.166457176208496
    24     4.623847e+02     4.478029e+01
 * time: 3.2339820861816406
    25     4.617426e+02     6.468975e+01
 * time: 3.301582098007202
    26     4.610293e+02     7.776996e+01
 * time: 3.421640157699585
    27     4.597628e+02     8.785260e+01
 * time: 3.486584186553955
    28     4.566753e+02     9.769803e+01
 * time: 3.552165985107422
    29     4.490421e+02     1.008838e+02
 * time: 3.6203081607818604
    30     4.391868e+02     9.978816e+01
 * time: 3.6893789768218994
    31     4.130704e+02     5.917685e+01
 * time: 3.7594661712646484
    32     4.055780e+02     3.852825e+01
 * time: 3.8291990756988525
    33     4.023118e+02     3.889617e+01
 * time: 3.8994221687316895
    34     4.012516e+02     3.694774e+01
 * time: 3.982931137084961
    35     4.004391e+02     2.061972e+01
 * time: 4.0525970458984375
    36     3.983039e+02     3.508422e+01
 * time: 4.124250173568726
    37     3.969705e+02     3.841031e+01
 * time: 4.219356060028076
    38     3.965462e+02     3.738338e+01
 * time: 4.290049076080322
    39     3.950409e+02     3.064794e+01
 * time: 4.360884189605713
    40     3.945750e+02     2.876435e+01
 * time: 4.431478977203369
    41     3.937725e+02     2.571446e+01
 * time: 4.501197099685669
    42     3.933955e+02     2.436113e+01
 * time: 4.575390100479126
    43     3.927565e+02     2.051055e+01
 * time: 4.645941972732544
    44     3.916020e+02     1.629016e+01
 * time: 4.7156150341033936
    45     3.886991e+02     2.689821e+01
 * time: 4.78617000579834
    46     3.870054e+02     2.298579e+01
 * time: 4.855947971343994
    47     3.853691e+02     2.614990e+01
 * time: 4.9517011642456055
    48     3.841730e+02     2.207550e+01
 * time: 5.022073030471802
    49     3.825113e+02     2.204392e+01
 * time: 5.091864109039307
    50     3.808880e+02     2.444774e+01
 * time: 5.162142038345337
    51     3.800407e+02     1.250603e+01
 * time: 5.231981992721558
    52     3.798092e+02     1.167926e+01
 * time: 5.300029993057251
    53     3.797789e+02     1.162382e+01
 * time: 5.367969989776611
    54     3.797069e+02     1.152441e+01
 * time: 5.436172008514404
    55     3.794424e+02     1.132717e+01
 * time: 5.520421981811523
    56     3.788131e+02     2.006483e+01
 * time: 5.588903188705444
    57     3.771525e+02     3.584773e+01
 * time: 5.65826416015625
    58     3.731297e+02     5.697386e+01
 * time: 5.727363109588623
    59     3.658669e+02     6.542151e+01
 * time: 5.79776406288147
    60     3.604193e+02     4.036440e+01
 * time: 5.868426084518433
    61     3.532839e+02     1.573820e+01
 * time: 5.938542127609253
    62     3.520181e+02     1.393394e+01
 * time: 6.025071144104004
    63     3.517984e+02     6.702128e+00
 * time: 6.094166994094849
    64     3.517541e+02     3.503534e+00
 * time: 6.162482976913452
    65     3.516438e+02     8.698938e+00
 * time: 6.232764005661011
    66     3.511839e+02     1.404105e+01
 * time: 6.312251091003418
    67     3.510644e+02     2.541515e+00
 * time: 6.380890130996704
    68     3.510207e+02     3.156649e+00
 * time: 6.465133190155029
    69     3.509958e+02     3.043873e+00
 * time: 6.532257080078125
    70     3.509765e+02     2.672353e+00
 * time: 6.600190162658691
    71     3.509752e+02     2.603857e+00
 * time: 6.666201114654541
    72     3.509724e+02     2.505573e+00
 * time: 6.732197999954224
    73     3.509665e+02     2.379626e+00
 * time: 6.7990500926971436
    74     3.509503e+02     3.583048e+00
 * time: 6.8670079708099365
    75     3.509120e+02     6.021411e+00
 * time: 6.95074200630188
    76     3.508281e+02     8.835299e+00
 * time: 7.01765513420105
    77     3.506934e+02     9.696807e+00
 * time: 7.0848000049591064
    78     3.505761e+02     6.057279e+00
 * time: 7.152677059173584
    79     3.505357e+02     1.712984e+00
 * time: 7.220663070678711
    80     3.505314e+02     6.749771e-01
 * time: 7.2867491245269775
    81     3.505313e+02     6.722315e-01
 * time: 7.351997137069702
    82     3.505312e+02     6.699466e-01
 * time: 7.431652069091797
    83     3.505307e+02     6.606583e-01
 * time: 7.49639105796814
    84     3.505298e+02     6.412634e-01
 * time: 7.561689138412476
    85     3.505274e+02     9.092421e-01
 * time: 7.627287149429321
    86     3.505221e+02     1.339986e+00
 * time: 7.693188190460205
    87     3.505129e+02     1.608305e+00
 * time: 7.7596211433410645
    88     3.505026e+02     1.291018e+00
 * time: 7.841353178024292
    89     3.504973e+02     5.119495e-01
 * time: 7.909154176712036
    90     3.504963e+02     6.289982e-02
 * time: 7.975205183029175
    91     3.504963e+02     3.116578e-03
 * time: 8.041146039962769
    92     3.504963e+02     5.625186e-04
 * time: 8.105208158493042

FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                   GradientNorm
Log-likelihood value:                   -350.49625

--------------------
           Estimate
--------------------
pop_CL     0.13465
pop_Vc     8.0535
pop_tabs   0.55061
pop_lag    0.87158
pk_Ω₁,₁    0.070642
pk_Ω₂,₂    0.018302
pk_Ω₃,₃    0.91326
σ_prop     0.090096
σ_add      0.39115
--------------------

2.4 Evaluating the termination of the fit

Fitting non-linear mixed effects model is not always a simple task. The model has to be appropriate to the data, data must be accurately sampled and recorded, no information should be truncated, and so on for the fits to run reliably. Even in the most perfect version of data collection and modeling there can still be numerical challenges in evaluating various expressions, calculating empirical bayes estimates in an inner loop, handling vanishing variances, local optima and more.

In Pumas we use Optim.jl (Mogensen and Riseth 2018) to perform the minimization of any FO, FOCE, or LaplaceI fits with out without MAP for maximum aposterori fitting. The returned FittedPumasModel contains the optimization results structure from Optim.jl in the optim field. This can be useful when assessing convergence.

Convergence in non-linear optimization is not always a simply task to assess:

We can look at the gradient element sizes and we often look at the infinity norm of the gradient to this end. The infinity norm of a vector picks out the largest absolute value of the elements. By default we require this to be less than 1e-3 to declare convergence, but some problems may be more or less numerically stable so that number may be large or small depending on the circumstances.

Gradient Elements and Infinity Norm of the Gradient

Gradient Elements

The gradient is a vector that tells us which direction to move to increase a function’s value the fastest
Each element in this vector corresponds to how much the function changes when we adjust one parameter
“Gradient element sizes” refers to the magnitude (absolute value) of each of these directional changes

To illustrate with a simple example, consider a model with 3 parameters (like CL, V, and Ka), a gradient might look like:

gradient = [0.0001, -0.002, 0.0005]

3-element Vector{Float64}:
  0.0001
 -0.002
  0.0005

Here:

0.0001 tells us how much the function would change if we adjust the first parameter (CL)
-0.002 for the second parameter (V)
0.0005 for the third parameter (Ka)

Infinity Norm of the Gradient

The infinity norm is simply the largest absolute value among all gradient elements
In our example above:

|0.0001| = 0.0001
|-0.002| = 0.002 #  <- This is the largest!
|0.0005| = 0.0005

Therefore, infinity norm = 0.002

Why This Matters for Convergence:

When optimizing a model (finding the best parameters), we want all gradient elements to be close to zero
A large gradient element means we can still improve our solution by moving in that direction
A small gradient (all elements close to zero) suggests we’re at or near the optimal solution
By convention in Pumas, if the infinity norm is less than 0.001 (1e-3), we consider the optimization “converged”

Think of it like hiking to find the highest point:

The gradient tells you which direction to walk
The size of each gradient element tells you how steep it is in each direction
When you’re at the top, it should be flat in all directions (all gradient elements close to zero)
The infinity norm tells you about the steepest direction you could still go
If the steepest direction is very gentle (< 0.001), you’re probably at the peak!

We do not stop according to small changes in the parameters or objective function value by default. This is because it is typically not a direct sign of convergence but rather that progress is stalled. That said, many researches set a relative change tolerance in the parameters in the same way that is done in NONMEM because often you will see many final iterations with next to no improvement. This can be done by setting optim_options = (x_reltol = 1e-5) in fit to match NONMEM for example.

Let us consider a bunch of different cases and how you can find and interpret the information in the optim field. This looks something like the following.

julia> foce_fit.optim
L1 * Status: success
L2
L3 * Candidate solution
L4    Final objective value:     3.199344e+02
L5
L6 * Found with
L7    Algorithm:     BFGS
L8
L9 * Convergence measures
L10   |x - x'|               = 4.36e-04 ≰ 0.0e+00
L11   |x - x'|/|x'|          = 1.10e-04 ≰ 0.0e+00
L12   |f(x) - f(x')|         = 3.57e-06 ≰ 0.0e+00
L13   |f(x) - f(x')|/|f(x')| = 1.12e-08 ≰ 0.0e+00
L14   |g(x)|                 = 5.62e-04 ≤ 1.0e-03
L15
L16 * Work counters
L17   Seconds run:   1  (vs limit Inf)
L18   Iterations:    95
L19   f(x) calls:    101
L20   ∇f(x) calls:   97

We will use the line numbers to the left below to refer to the relevant part of the output. While interpreting the output it may be worth remembering that the optimization is cast as the problem of minimizing the negative log of the marginal likelihood.

2.4.1 Case 1: gradient convergence

Since a (local) minimum of a smooth optimization function satisfies that the partial derivatives are zero we will often look at the norm of the gradient to declare convergence. The default value for the tolerance set as the g_tol option in optim_options in Pumas is 1e-3. The final state of the measure is found in L14 in the output above where it is seen that the tolerance was not met. This tolerance is set to match what is often possible to achieve in well-specified models, but smaller is always better. To derive a theoretically well-reasoned value we would need to know the Hessian at the optimum as this is not possible to know before running a fit and very expensive to calculate exactly after a fit. Several factors affect the level that can be reached in practice including the accuracy of the solution of the embedded dynamical system and the curvature of the objective function surface. If the gradient norm is lower than 1e-3 we will in general accept the fit as having converged, but “near convergence” is often accepted as well if there are not other indications of issues with the model.

2.4.2 Case 2: no change in parameters

If x_abstol or x_reltol was not manually set by the user we will only terminate the optimization procedure if no change in parameters could be made at all from one iteration to the next. The final state of these termination tests are found in L10 and L11 respectively. This can indicate a few things but essentially indicates that the procedure that scales the search direction to find the next value of the parameters failed. If this occurs with a relatively small gradient norm we may accept the solution though the termination reason cannot be used to conclude convergence in isolation. The reasons why no step led to an improvement of the objective function value can be many. Some reasons include:

The objective function is not smooth due to numerical noise from calculations in the objective function including model solutions.
The inverse Hessian approximation in the BFGS algorithm is far from the true inverse Hessian of the objective function. Restarting the fit may help.

As mentioned above, NONMEM sets x_reltol=1e-5 by default but it is 0 in Pumas.

2.4.3 Case 3: small change in parameters

As mentioned above, Pumas defaults to only terminate due to the size of the change in parameters from one iteration to the next is exactly zero. If the relative change tolerance x_reltol is set to a number such as 1e-5 to match other software such as NONMEM then the user may well accept the solution since they deliberately used this criterion. In general, we advise using this setting with caution as it can terminate with quite large gradient norm.

Again, this information is seen in L10 and L11 but the limit after the inequality will change to the number provided.

2.4.4 Case 4: no change in objective function

This case is related to Case 2. We do not allow for termination due to the absence of improvement in the objective function so in general this should be seen as a failure. The rest of the comments about reasons for stopping, acceptance based on gradient norms etc applies here as well.

The information about these termination tests can be found in L12 and L13.

2.4.5 Case 5: small change in objective function

Just a Case 4 is related to Case 2 this case is related to Case 3. A very small improvement is often not a valid reason to accept the returned solution as the maximum likelihood estimates. If the user has good reason to actively sets one of the options f_abstol or f_reltol the result may be accepted.

As in Case 4 the information can be found in L12 and L13 but the right-hand side of the inequalities are updated as in Case 3.

2.4.6 Case 6: “maximum number of iterations reached”

The pre-specified limit on the number of iterations was reached. This can be increased in optim_options by setting the iterations keyword. This defaults to 1000 which should generally be plenty so termination according to this criterion is not a sign of convergence. The output may be accepted if the solution seems to cycle between parameter values with a gradient norm close to the prescribed tolerance.

The number iterations can be found in L18.

2.4.7 Case 7: line search failed

This case is related to Case 2, 3, 4, and 5. Given the search direction it was not possible to find a step that sufficiently decreases the objective function value. The advice found in Case 2 applies here as well.

In the output this will be indicates in L1.

2.4.8 Case 8: “user provided callback termination”

This is an advanced use-case where a user-provided callback function returned the value true to indicate that the optimization should terminate. Can only happen if the user supplied a custom callback function as part of optim_options.

In the output this will be indicated in L1.

2.4.9 Case 9: “time limit reached”

The optimization routine allows for a soft time limit that is checked after every iteration. The default allows for infinite run-time, but if the time_limit option in optim_options is set this termination reason may occur.

In the output this will be indicated in L17.

2.4.10 Case 10: “objective evaluation limit reached

The optimization routine allows for a limit to be placed on the total number of objective function evaluations through the f_calls_limit option in optim_options. It is not set by default in Pumas. The interpretation is similar to Case 6.

In the output this will be indicated in L19.

2.4.11 Case 11: “gradient evaluation limit reached

The optimization routine allows for a limit to be placed on the total number of gradient function evaluations through the g_calls_limit option in optim_options. It is not set by default in Pumas. The interpretation is similar to Case 6.

In the output this will be indicated in L20.

2.4.12 Case 12: “hessian evaluation limit reached

The optimization routine allows for a limit to be placed on the total number of hessian function evaluations through the h_calls_limit option in optim_options. It is not set by default in Pumas. The interpretation is similar to Case 6.

In the output this would be placed after L20 if it was present, but it is not because the BFGS algorithm only maintains an approximation to the inverse Hessian it doesn’t actually compute and utilize the real Hessian.

2.4.13 Case 13: “objective function increased”

This can happen if the allow_f_increases option is set to false in optim_options. The default is true. The causes are the same as in Case 2.

This would be indicated in L1 if it failed.

2.5 Optimization didn’t succeed - then what?

If the optimization did not success according to the description above it is always important to think about what to do to improve the situation if possible. If the model is or initial parameters are grossly wrong there may not be much to do about the failed optimization. It may always be a good idea to try to reduce the complexity of the model by removing random effects that cannot be supported by the data or fixing parameters that may be know with high certainty from the relevant literature.

Before changing the model too much it may be worth trying a few different things.

2.5.1 Try different starting parameters

Given the complexity and high number of parameters in many NLME models it is always important to initialize the fit from a set of starting parameters instead of only one parameter NamedTuple. It is quite likely that the optimization can end up in a local minimum that is not the global minimum, so we need to check the robustness of our maximum likelihood estimate candidates. Performing this robustness step is explained more in detail in a different tutorial, but just to show the effect we can try with a different initial fixed effect guess for volume of distribution.

foce_fit2 = fit(
    warfarin_pk_model,
    pop,
    merge(param_vals, (; pop_Vc = 12.0)),
    FOCE();
    optim_options = (show_trace = false,),
)

[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.

FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                      NoXChange
Log-likelihood value:                   -350.49625

--------------------
           Estimate
--------------------
pop_CL     0.13465
pop_Vc     8.0534
pop_tabs   0.55063
pop_lag    0.8716
pk_Ω₁,₁    0.070656
pk_Ω₂,₂    0.018293
pk_Ω₃,₃    0.91324
σ_prop     0.090094
σ_add      0.39116
--------------------

2.5.2 Resetting the optimization procedure

Whenever a model fit concludes we will have ended up with a final set of fixed effect estimates (population level parameters) and well as a set of empirical bayes estimates. We also end up with a final inverse Hessian approximation. There are ways to restart the optimization at the final fixed effects parameters while resetting the random effects to the central value of the prior and to reset the optimization procedure.

To restart the optimization procedure and use the final fixed effects and empirical bayes estimates use the following.

foce_fit3 = fit(foce_fit2; reset_optim = true)

[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.

FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                      NoXChange
Log-likelihood value:                   -350.49625

--------------------
           Estimate
--------------------
pop_CL     0.13465
pop_Vc     8.0534
pop_tabs   0.55063
pop_lag    0.8716
pk_Ω₁,₁    0.070656
pk_Ω₂,₂    0.018293
pk_Ω₃,₃    0.91324
σ_prop     0.090094
σ_add      0.39116
--------------------

2.5.3 Resetting the optimization procedure and the empirical Bayes estimates

To restart both, we can simply invoke fit as normal with coef(foce_fit3) as the initial fixed effects.

foce_fit4 = fit(
    warfarin_pk_model,
    pop,
    coef(foce_fit3),
    FOCE();
    optim_options = (show_trace = false,),
)

[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.

FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                      NoXChange
Log-likelihood value:                   -350.49625

--------------------
           Estimate
--------------------
pop_CL     0.13465
pop_Vc     8.0534
pop_tabs   0.55063
pop_lag    0.8716
pk_Ω₁,₁    0.070656
pk_Ω₂,₂    0.018293
pk_Ω₃,₃    0.91324
σ_prop     0.090094
σ_add      0.39116
--------------------

Restarting the optimization is often a good way to avoid problems with gradient convergence.

2.5.4 A computationally expensive alternative

If gradient convergence below a certain threshold is very important it can often be achieved with GradientDescent. Gradient Descent has very good asymptotic (for iterations going to infinity) properties and avoids potential numerical issues with the inverse Hessian approximation in situations with numerical noise in the objective that could for example come from numerical integration of the dynamical system. However, since it is not using (an approximation to the) second order information, it may also take quite large steps that lead to extreme values of the parameters causing warnings to appear on screen from the ODE model solvers and it may well use a lot of iterations to reach the prescribed tolerance. Consider the following.

foce_fit5 = fit(
    warfarin_pk_model,
    pop,
    coef(foce_fit3),
    FOCE();
    optim_alg = Pumas.Optim.GradientDescent(),
    optim_options = (g_tol = 1e-5, show_trace = false),
)
foce_fit5.optim

[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.

 * Status: success

 * Candidate solution
    Final objective value:     3.504963e+02

 * Found with
    Algorithm:     Gradient Descent

 * Convergence measures
    |x - x'|               = 1.84e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 4.60e-09 ≰ 0.0e+00
    |f(x) - f(x')|         = 2.90e-07 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 8.27e-10 ≰ 0.0e+00
    |g(x)|                 = 9.86e-06 ≤ 1.0e-05

 * Work counters
    Seconds run:   73  (vs limit Inf)
    Iterations:    458
    f(x) calls:    1144
    ∇f(x) calls:   1144

We were indeed able to get a very low gradient tolerance, but notice the many iterations used! This may simply not be feasible for very large data sets or computationally very expensive models.

3 Calculating Some Key Diagnostics

3.1 Model Fit and Comparison

When we specify FOCE(), LaplaceI() or FO() we are doing Maximum Likelihood Estimation. This also means that the most natural number we may have to gauge and compare model fit is the log-likelihood. If we want to extract the log-likelihood at the maximum likelihood estimates we can simply use the loglikelihood function that has a method for just a FittedPumasModel.

loglikelihood(foce_fit5)

-350.49625325074004

We see that this is the same number as in the printed output although it was obviously rounded there. We can then compare this number to the log-likelihood of another fitted model, and if they are nested we can even do a log-likelihood ratio test with lrtest. Please consult the docstring for how to use this function ?lrtest.

Other more general approaches of model comparison involves information criteria such as aic, aicc, bic, and more. These all have methods for FittedPumasModel so to calculate them simply use them as follows

aic_fit, bic_fit = aic(foce_fit5), bic(foce_fit5)

(718.9925065014801, 750.7215829536661)

3.2 Computing Parameter Precision with `infer`

The infer function in Pumas estimates the uncertainty (precision) of parameter estimates. Depending on the chosen method, infer can provide standard errors, confidence intervals, correlation matrices, and condition numbers.

An example usage:

inference_results = infer(foce_fit4; level = 0.95)

[ Info: Calculating: variance-covariance matrix.
[ Info: Done.

Asymptotic inference results using sandwich estimator

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                      NoXChange
Log-likelihood value:                   -350.49625

---------------------------------------------------------
           Estimate   SE          95.0% C.I.
---------------------------------------------------------
pop_CL     0.13465    0.0066548   [ 0.12161  ; 0.1477  ]
pop_Vc     8.0534     0.22109     [ 7.6201   ; 8.4867  ]
pop_tabs   0.55063    0.18705     [ 0.18402  ; 0.91723 ]
pop_lag    0.8716     0.056682    [ 0.7605   ; 0.98269 ]
pk_Ω₁,₁    0.070656   0.024587    [ 0.022467 ; 0.11885 ]
pk_Ω₂,₂    0.018293   0.0051507   [ 0.0081975; 0.028388]
pk_Ω₃,₃    0.91324    0.40643     [ 0.11665  ; 1.7098  ]
σ_prop     0.090094   0.014521    [ 0.061634 ; 0.11855 ]
σ_add      0.39116    0.065402    [ 0.26298  ; 0.51935 ]
---------------------------------------------------------

This augments the printed table from above with the uncertainty information. A useful function to know is the coefficients_table from PumasUtilities. It needs the model and inference objects as inputs and returns a DataFrame with model parameter names, descriptions from the model annotations, estimates, standard errors, relative standard errors, and confidence intercal limits.

coefficients_table(foce_fit5, inference_results)

9×8 DataFrame

Row	Parameter	Description	Constant	Estimate	Standard Error	Relative_Se	CI Lower	CI Upper
	String	SubStrin…	Bool	Float64	Float64	Float64	Float64	Float64
1	pop_CL	Clearance (L/hr)	false	0.135	0.007	0.049	0.122	0.148
2	pop_Vc	Central volume (L)	false	8.053	0.221	0.027	7.62	8.487
3	pop_tabs	Absorption lag time (hr)	false	0.551	0.187	0.34	0.184	0.917
4	pop_lag	Lag time (hr)	false	0.872	0.057	0.065	0.761	0.983
5	pk_Ω₁,₁	ΩCL: Clearance	false	0.071	0.025	0.348	0.022	0.119
6	pk_Ω₂,₂	ΩVc: Central volume	false	0.018	0.005	0.282	0.008	0.028
7	pk_Ω₃,₃	Ωtabs: Absorption lag time	false	0.913	0.406	0.445	0.117	1.71
8	σ_prop	σ_prop: Proportional error	false	0.09	0.015	0.161	0.062	0.119
9	σ_add	σ_add: Additive error	false	0.391	0.065	0.167	0.263	0.519

In another tutorial we will look closer at different ways of calculating uncertainty for the maximum likelihood estimates.

3.3 Calculate the Condition Number

Some researchers like to include the condition number of the correlation matrix implied by the variance-covarinace matrix that was estimated by infer. This is simply calculated by using the cond function on the output of infer above.

cond(inference_results)

50.12806079924137

The researcher may set a limit as to how large this number can be. Models with large condition numbers are sometimes labeled “unstable” and models with small condition numbers may be labeled “stable”. This categorization is contested and discussed among practitioners and the condition number should be reported and interpreted with care outside of the linear regression context where the interpretation was originally proposed.

3.4 Shrinkage

A Non-linear Mixed Effects Model includes random effects to represent individual specific variability and well as sampling noise. In cases where the model is over-parameterized compared to the true model or to the available data we may observe what has been called shrinkage in the literature. The thing that is shrinking is the distribution of empirical bayes estimates (η-shrinkage) or in the individual weighted residuals (ϵ-shrinkage). If there is insufficient data or the model is over-parameterized it will be possible to find exact or almost exact fits for each individual. There is not enough data to move the parameters away from the prior on the random effects and the individual distributions collapse into the population distribution.

3.4.1 Random Effect (eta) shrinkage

η-shrinkage is defined as \[ ηsh = 1-\frac{sd(\eta_i)}{\omega_\eta} \] Such that if there is agreement between the prior standard deviation (\(\omega_\eta\)) and the standard deviation of \(\eta_i\) we get no shrinkage or a shrinkage of 0. As \(sd(\eta_i)\) shrinks to 0 we get shrinkages closer and closer to 1 indicating an over-parameterization or lack individual data.

In Pumas, this number is simply calculated by invoking ηshrinkage on the fitted model.

ηshrinkage(foce_fit)

(pk_η = [0.0072149466850324195, 0.1356596215313841, 0.4123026162888239],)

This returns a NamedTuple with one element per ~ in the @random-block.

3.4.2 Error Model (sigma) shrinkage

Sometimes a number called ϵ-shrinkage is reported as well. The definition is analogous to the above. It divides a standard deviation by another standard deviation and subtracts it from zero to gauge how well they match. The problem for the derived variables is that we often have heteroscedasticity built into our models. This means that the standard deviation is a function of the model prediction such as in the proportional or combined gaussian error models. Then, there is not a single standard deviation to compare to. For this reason, we compute the individual weighted residuals (IWRES) and calculate the standard deviation of that. Since IWRES is already weighted we simply subtract the standard deviation from 1. \[ ϵsh = 1-sd(IWRES_i) \] The interpretation is as above. We can calculate this using the ϵshrinkage in Pumas.

ϵshrinkage(foce_fit)

(conc = 0.13818444464688695,)

Will return a NamedTuple with one element per ~ that defines a Normal distribution in the @derived-block.

3.5 Reporting Metrics

PumasUtilities contains a convenience function called metrics_table that can be used to report a number of metrics in a compact format.

metrics_table(foce_fit)

WARNING: using deprecated binding Distributions.MatrixReshaped in Pumas.
, use Distributions.ReshapedDistribution{2, S, D} where D<:Distributions.Distribution{Distributions.ArrayLikeVariate{1}, S} where S<:Distributions.ValueSupport instead.

18×2 DataFrame

Row	Metric	Value
	Any	Any
1	Successful	true
2	Estimation Time	8.105
3	Subjects	32
4	Fixed Parameters	0
5	Optimized Parameters	9
6	conc Active Observations	251
7	conc Missing Observations	47
8	Total Active Observations	251
9	Total Missing Observations	47
10	Likelihood Approximation	Pumas.FOCE{Optim.NewtonTrustRegion{Float64}, Optim.Options{Float64, Nothing}}
11	RawTypst("LogLikelihood (LL)")	-350.496
12	RawTypst("-2LL")	700.993
13	AIC	718.993
14	BIC	750.722
15	RawTypst("(η-shrinkage) pk_η₁")	0.007
16	RawTypst("(η-shrinkage) pk_η₂")	0.136
17	RawTypst("(η-shrinkage) pk_η₃")	0.412
18	RawTypst("(ϵ-shrinkage) conc")	0.138

This will return a DataFrame with the metrics.

4 Concluding Remarks

This tutorial showcased a subset of the steps users want to perform after a model fit:

Calculate uncertainty of the estimated parameters
Calculate log-likelihood and information criteria for fit assessment and model comparison purposes
Calculate shrinkage numbers to assess over-parametrization or excessive sparseness of data

Other tutorials will cover many more diagnostics steps to be done to asses the quality of the fitted model.

References

Mogensen, P, and A Riseth. 2018. “Optim: A Mathematical Optimization Package for Julia.” Journal of Open Source Software 3 (24).

Reuse

CC BY-SA 4.0