Interpreting Results from Pumas Fits

Author

Patrick Kofod Mogensen

1 Introduction

After following a typical workflow for fitting a Pumas model it is important to assess and interpret the results. This includes steps such as

  1. Looking at log-likelihood related quantities such as loglikelihood, aic and bic to compare the fit to other existing fits.
  2. Assessing convergence of the maximum likelihood optimization.
  3. Calculating the condition number of the correlation matrix implied by the estimated variance-covariance matrix.
  4. Calculating shrinkage for random effects and the dependent variable.

The following sections will walk through these steps using a one-compartment PK model for Warfarin, focusing on the PK aspects only.

2 Model and Data

2.1 Model Definition

Below is the PK model, named warfarin_pk_model, defined in Pumas. This model contains:

  • Fixed effects (population parameters): pop_CL, pop_Vc, pop_tabs, pop_lag
  • Inter-individual variability (IIV) components: pk_Ω
  • Residual error model parameters: σ_prop, σ_add
  • Covariates for scaling: FSZCL and FSZV
  • Differential equations describing the PK behavior in the compartments.
using Pumas
using PharmaDatasets
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using PumasUtilities
using Markdown
warfarin_pk_model = @model begin

    @metadata begin
        desc = "Warfarin 2-compartment PK model (PD removed)"
        timeu = u"hr"
    end

    @param begin
        # PK parameters
        """
        Clearance (L/hr)
        """
        pop_CL  RealDomain(lower = 0.0, init = 0.134)
        """
        Central volume (L)
        """
        pop_Vc  RealDomain(lower = 0.0, init = 8.11)
        """
        Absorption lag time (hr)
        """
        pop_tabs  RealDomain(lower = 0.0, init = 0.523)
        """
        Lag time (hr)
        """
        pop_lag  RealDomain(lower = 0.0, init = 0.1)

        # Inter-individual variability
        """
          - ΩCL: Clearance
          - ΩVc: Central volume
          - Ωtabs: Absorption lag time
        """
        pk_Ω  PDiagDomain([0.01, 0.01, 0.01])

        # Residual variability
        """
        σ_prop: Proportional error
        """
        σ_prop  RealDomain(lower = 0.0, init = 0.00752)
        """
        σ_add: Additive error
        """
        σ_add  RealDomain(lower = 0.0, init = 0.0661)
    end

    @random begin
        pk_η ~ MvNormal(pk_Ω)    # mean = 0, covariance = pk_Ω
    end

    @covariates begin
        """
        FSZCL: Clearance scaling factor
        """
        FSZCL
        """
        FSZV: Volume scaling factor
        """
        FSZV
    end

    @pre begin
        CL = FSZCL * pop_CL * exp(pk_η[1])
        Vc = FSZV * pop_Vc * exp(pk_η[2])

        tabs = pop_tabs * exp(pk_η[3])
        Ka = log(2) / tabs
    end

    @dosecontrol begin
        lags = (Depot = pop_lag,)
    end

    @vars begin
        cp := Central / Vc
    end

    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL / Vc) * Central
    end

    @derived begin
        """
        Concentration (ng/mL)
        """
        conc ~ @. Normal(cp, sqrt((σ_prop * cp)^2 + σ_add^2))
    end
end
PumasModel
  Parameters: pop_CL, pop_Vc, pop_tabs, pop_lag, pk_Ω, σ_prop, σ_add
  Random effects: pk_η
  Covariates: FSZCL, FSZV
  Dynamical system variables: Depot, Central
  Dynamical system type: Matrix exponential
  Derived: conc
  Observed: conc

2.2 Data Preparation

The Warfarin data used in this tutorial is pulled from PharmaDatasets for demonstration purposes. Note how the code reshapes and prepares the data in “wide” format for reading into Pumas. Only the conc column is treated as observations for the PK model.

warfarin_data = dataset("paganz2024/warfarin_long")

# Step 1: Fix Duplicate Time Points
# -------------------------------
# Some subjects have duplicate time points for DVID = 1
# For this dataset, the triple (ID, TIME, DVID) should define
# a row uniquely, but
nrow(warfarin_data)
nrow(unique(warfarin_data, ["ID", "TIME", "DVID"]))

# We can identify the problematic rows by grouping on the index variables
@chain warfarin_data begin
    @groupby :ID :TIME :DVID
    @transform :tmp = length(:ID)
    @rsubset :tmp > 1
end

# It is important to understand the reason for the duplicate values.
# Sometimes the duplication is caused by recording errors, sometimes
# it is a data processing error, e.g. when joining tables, or it can
# be genuine records, e.g. when samples have been analyzed in multiple
# labs. The next step depends on which of the causes are behind the
# duplications.
#
# In this case, we will assume that both values are informative and
# we will therefore just adjust the time stamp a bit for the second
# observation.
warfarin_data_processed = @chain warfarin_data begin
    @groupby :ID :TIME :DVID
    @transform :tmp = 1:length(:ID)
    @rtransform :TIME = :tmp == 1 ? :TIME : :TIME + 1e-6
    @select Not(:tmp)
end

# Transform the data in a single chain of operations
warfarin_data_wide = @chain warfarin_data_processed begin
    @rsubset !contains(:ID, "#")
    @rtransform begin
        # Scaling factors
        :FSZV = :WEIGHT / 70            # volume scaling
        :FSZCL = (:WEIGHT / 70)^0.75     # clearance scaling (allometric)
        # Column name for the DV
        :DVNAME = "DV$(:DVID)"
        # Dosing indicator columns
        :CMT = ismissing(:AMOUNT) ? missing : 1
        :EVID = ismissing(:AMOUNT) ? 0 : 1
    end
    unstack(Not([:DVID, :DVNAME, :DV]), :DVNAME, :DV)
    rename!(:DV1 => :conc, :DV2 => :pca)
end
first(warfarin_data_wide, 10)
10×13 DataFrame
Row ID TIME WEIGHT AGE SEX AMOUNT FSZV FSZCL CMT EVID DV0 pca conc
String3 Float64 Float64 Int64 Int64 Float64? Float64 Float64 Int64? Int64 Float64? Float64? Float64?
1 1 0.0 66.7 50 1 100.0 0.952857 0.96443 1 1 missing missing missing
2 1 0.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 100.0 missing
3 1 24.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 49.0 9.2
4 1 36.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 32.0 8.5
5 1 48.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 26.0 6.4
6 1 72.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 22.0 4.8
7 1 96.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 28.0 3.1
8 1 120.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 33.0 2.5
9 2 0.0 66.7 31 1 100.0 0.952857 0.96443 1 1 missing missing missing
10 2 0.0 66.7 31 1 missing 0.952857 0.96443 missing 0 missing 100.0 missing

2.3 Creating a Pumas Population

Below is the creation of a population object in Pumas using read_pumas. Only the conc data are treated as the observation variable:

pop = read_pumas(
    warfarin_data_wide;
    id = :ID,
    time = :TIME,
    amt = :AMOUNT,
    cmt = :CMT,
    evid = :EVID,
    covariates = [:SEX, :WEIGHT, :FSZV, :FSZCL],
    observations = [:conc],
)
Population
  Subjects: 31
  Covariates: SEX, WEIGHT, FSZV, FSZCL
  Observations: conc
# A named tuple of parameter values
param_vals = (
    pop_CL = 0.134,
    pop_Vc = 8.11,
    pop_tabs = 0.523,
    pop_lag = 0.1,
    pk_Ω = Diagonal([0.01, 0.01, 0.01]),
    σ_prop = 0.00752,
    σ_add = 0.0661,
)

foce_fit = fit(warfarin_pk_model, pop, param_vals, FOCE())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     1.162661e+04     1.466365e+04
 * time: 0.025365114212036133
     1     2.406460e+03     3.085514e+03
 * time: 1.021204948425293
     2     1.659276e+03     2.059323e+03
 * time: 1.0773899555206299
     3     8.461002e+02     8.448119e+02
 * time: 1.1318740844726562
     4     5.854346e+02     4.065946e+02
 * time: 1.1869699954986572
     5     4.690447e+02     1.615353e+02
 * time: 1.2430610656738281
     6     4.366893e+02     5.711013e+01
 * time: 1.2941889762878418
     7     4.311441e+02     3.977658e+01
 * time: 1.345263957977295
     8     4.306433e+02     3.487033e+01
 * time: 1.396347999572754
     9     4.304605e+02     3.296424e+01
 * time: 1.447127103805542
    10     4.299081e+02     2.832467e+01
 * time: 1.5609760284423828
    11     4.290418e+02     2.641483e+01
 * time: 1.6108300685882568
    12     4.279296e+02     3.246969e+01
 * time: 1.6602370738983154
    13     4.271390e+02     4.261715e+01
 * time: 1.7093570232391357
    14     4.265564e+02     4.580960e+01
 * time: 1.758587121963501
    15     4.259866e+02     3.939591e+01
 * time: 1.8074491024017334
    16     4.253361e+02     2.541924e+01
 * time: 1.8563799858093262
    17     4.246790e+02     1.065400e+01
 * time: 1.905911922454834
    18     4.243666e+02     1.008258e+01
 * time: 1.9565610885620117
    19     4.242953e+02     9.698523e+00
 * time: 2.0095551013946533
    20     4.242637e+02     9.555638e+00
 * time: 2.062530040740967
    21     4.241737e+02     9.311705e+00
 * time: 2.116131067276001
    22     4.239779e+02     1.039981e+01
 * time: 2.169063091278076
    23     4.235278e+02     2.112614e+01
 * time: 2.2540299892425537
    24     4.226852e+02     4.178215e+01
 * time: 2.3052849769592285
    25     4.214158e+02     6.509012e+01
 * time: 2.357930898666382
    26     4.199242e+02     8.198864e+01
 * time: 2.4097890853881836
    27     4.180994e+02     8.928912e+01
 * time: 2.462177038192749
    28     4.147253e+02     1.054369e+02
 * time: 2.516242027282715
    29     4.034609e+02     9.933613e+01
 * time: 2.5748190879821777
    30     3.980708e+02     8.697802e+01
 * time: 2.661442995071411
    31     3.955519e+02     8.275864e+01
 * time: 2.738002061843872
    32     3.873347e+02     6.262355e+01
 * time: 2.7946860790252686
    33     3.832793e+02     7.480027e+01
 * time: 2.851979970932007
    34     3.752425e+02     6.106840e+01
 * time: 2.9374709129333496
    35     3.747485e+02     5.955920e+01
 * time: 3.0083720684051514
    36     3.733720e+02     5.190330e+01
 * time: 3.0635058879852295
    37     3.716220e+02     4.957865e+01
 * time: 3.118527889251709
    38     3.676007e+02     3.354968e+01
 * time: 3.1739320755004883
    39     3.652893e+02     2.435479e+01
 * time: 3.2284951210021973
    40     3.633350e+02     2.229678e+01
 * time: 3.285543918609619
    41     3.623167e+02     1.560549e+01
 * time: 3.346513032913208
    42     3.618529e+02     1.072487e+01
 * time: 3.40645694732666
    43     3.617373e+02     1.102560e+01
 * time: 3.46566104888916
    44     3.616625e+02     1.117617e+01
 * time: 3.526158094406128
    45     3.615728e+02     1.129104e+01
 * time: 3.5859251022338867
    46     3.614525e+02     1.142667e+01
 * time: 3.664763927459717
    47     3.612791e+02     1.160763e+01
 * time: 3.7193970680236816
    48     3.609742e+02     1.204366e+01
 * time: 3.773339033126831
    49     3.603227e+02     1.748493e+01
 * time: 3.8282411098480225
    50     3.588011e+02     2.344365e+01
 * time: 3.883455991744995
    51     3.553125e+02     4.035077e+01
 * time: 3.9389710426330566
    52     3.490364e+02     6.686821e+01
 * time: 3.9982779026031494
    53     3.444280e+02     3.741419e+01
 * time: 4.060928106307983
    54     3.426003e+02     3.103704e+01
 * time: 4.1237170696258545
    55     3.425553e+02     4.206850e+01
 * time: 4.187546968460083
    56     3.406514e+02     1.515976e+01
 * time: 4.249005079269409
    57     3.404596e+02     4.108500e+01
 * time: 4.311809062957764
    58     3.402361e+02     2.379647e+01
 * time: 4.3919970989227295
    59     3.400932e+02     2.407116e+01
 * time: 4.445549011230469
    60     3.396735e+02     2.432991e+01
 * time: 4.499437093734741
    61     3.389708e+02     2.664139e+01
 * time: 4.553574085235596
    62     3.366166e+02     3.272679e+01
 * time: 4.608957052230835
    63     3.311055e+02     6.090289e+01
 * time: 4.667623996734619
    64     3.282486e+02     4.297799e+01
 * time: 4.727225065231323
    65     3.252658e+02     2.043436e+01
 * time: 4.787807941436768
    66     3.228006e+02     1.400875e+01
 * time: 4.849309921264648
    67     3.206502e+02     1.022665e+01
 * time: 4.909164905548096
    68     3.203177e+02     5.080303e+00
 * time: 4.96812891960144
    69     3.200898e+02     3.214824e+00
 * time: 5.051217079162598
    70     3.200164e+02     5.239993e+00
 * time: 5.104501962661743
    71     3.199774e+02     3.229390e+00
 * time: 5.157756090164185
    72     3.199714e+02     1.117530e+00
 * time: 5.210805892944336
    73     3.199711e+02     7.270447e-01
 * time: 5.2631330490112305
    74     3.199710e+02     7.165060e-01
 * time: 5.312026023864746
    75     3.199707e+02     7.049190e-01
 * time: 5.36190390586853
    76     3.199702e+02     6.943763e-01
 * time: 5.414783000946045
    77     3.199687e+02     1.317285e+00
 * time: 5.470673084259033
    78     3.199653e+02     2.280205e+00
 * time: 5.527374982833862
    79     3.199582e+02     3.290792e+00
 * time: 5.583611011505127
    80     3.199475e+02     3.541242e+00
 * time: 5.6400721073150635
    81     3.199386e+02     2.314603e+00
 * time: 5.697320938110352
    82     3.199355e+02     6.986412e-01
 * time: 5.777375936508179
    83     3.199351e+02     9.990660e-02
 * time: 5.8290510177612305
    84     3.199351e+02     9.912811e-02
 * time: 5.878021955490112
    85     3.199351e+02     9.910770e-02
 * time: 5.925197124481201
FittedPumasModel

Successful minimization:                     false

Likelihood approximation:                     FOCE
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                   -319.93509
Number of subjects:                             31
Number of parameters:         Fixed      Optimized
                                  0              9
Observation records:         Active        Missing
    conc:                       239             47
    Total:                      239             47

-----------------------
             Estimate
-----------------------
pop_CL        0.13065
pop_Vc        8.0708
pop_tabs      0.37156
pop_lag       0.9261
pk_Ω₁,₁       0.051091
pk_Ω₂,₂       0.018918
pk_Ω₃,₃       1.1401
σ_prop        0.090635
σ_add         0.3336
-----------------------

2.4 Evaluating the termination of the fit

Fitting non-linear mixed effects model is not always a simple task. The model has to be appropriate to the data, data must be accurately sampled and recorded, no information should be truncated, and so on for the fits to run reliably. Even in the most perfect version of data collection and modeling there can still be numerical challenges in evaluating various expressions, calculating empirical bayes estimates in an inner loop, handling vanishing variances, local optima and more.

In Pumas we use Optim.jl (Mogensen and Riseth 2018) to perform the minimization of any FO, FOCE, or LaplaceI fits with out without MAP for maximum aposterori fitting. The returned FittedPumasModel contains the optimization results structure from Optim.jl in the optim field. This can be useful when assessing convergence.

Convergence in non-linear optimization is not always a simply task to assess:

  • We can look at the gradient element sizes and we often look at the infinity norm of the gradient to this end. The infinity norm of a vector picks out the largest absolute value of the elements. By default we require this to be less than 1e-3 to declare convergence, but some problems may be more or less numerically stable so that number may be large or small depending on the circumstances.
Gradient Elements and Infinity Norm of the Gradient
  1. Gradient Elements
  • The gradient is a vector that tells us which direction to move to increase a function’s value the fastest
  • Each element in this vector corresponds to how much the function changes when we adjust one parameter
  • “Gradient element sizes” refers to the magnitude (absolute value) of each of these directional changes

To illustrate with a simple example, consider a model with 3 parameters (like CL, V, and Ka), a gradient might look like:

gradient = [0.0001, -0.002, 0.0005]
3-element Vector{Float64}:
  0.0001
 -0.002
  0.0005

Here:

  • 0.0001 tells us how much the function would change if we adjust the first parameter (CL)
  • -0.002 for the second parameter (V)
  • 0.0005 for the third parameter (Ka)
  1. Infinity Norm of the Gradient
  • The infinity norm is simply the largest absolute value among all gradient elements
  • In our example above:
|0.0001| = 0.0001
|-0.002| = 0.002 #  <- This is the largest!
|0.0005| = 0.0005

Therefore, infinity norm = 0.002

Why This Matters for Convergence:

  • When optimizing a model (finding the best parameters), we want all gradient elements to be close to zero
  • A large gradient element means we can still improve our solution by moving in that direction
  • A small gradient (all elements close to zero) suggests we’re at or near the optimal solution
  • By convention in Pumas, if the infinity norm is less than 0.001 (1e-3), we consider the optimization “converged”

Think of it like hiking to find the highest point:

  • The gradient tells you which direction to walk
  • The size of each gradient element tells you how steep it is in each direction
  • When you’re at the top, it should be flat in all directions (all gradient elements close to zero)
  • The infinity norm tells you about the steepest direction you could still go
  • If the steepest direction is very gentle (< 0.001), you’re probably at the peak!
  • We do not stop according to small changes in the parameters or objective function value by default. This is because it is typically not a direct sign of convergence but rather that progress is stalled. That said, many researches set a relative change tolerance in the parameters in the same way that is done in NONMEM because often you will see many final iterations with next to no improvement. This can be done by setting optim_options = (x_reltol = 1e-5) in fit to match NONMEM for example.

Let us consider a bunch of different cases and how you can find and interpret the information in the optim field. This looks something like the following.

julia> foce_fit.optim
L1 * Status: success
L2
L3 * Candidate solution
L4    Final objective value:     3.199344e+02
L5
L6 * Found with
L7    Algorithm:     BFGS
L8
L9 * Convergence measures
L10   |x - x'|               = 4.36e-04 ≰ 0.0e+00
L11   |x - x'|/|x'|          = 1.10e-04 ≰ 0.0e+00
L12   |f(x) - f(x')|         = 3.57e-06 ≰ 0.0e+00
L13   |f(x) - f(x')|/|f(x')| = 1.12e-08 ≰ 0.0e+00
L14   |g(x)|                 = 5.62e-04 ≤ 1.0e-03
L15
L16 * Work counters
L17   Seconds run:   1  (vs limit Inf)
L18   Iterations:    95
L19   f(x) calls:    101
L20   ∇f(x) calls:   97

We will use the line numbers to the left below to refer to the relevant part of the output. While interpreting the output it may be worth remembering that the optimization is cast as the problem of minimizing the negative log of the marginal likelihood.

2.4.1 Case 1: gradient convergence

Since a (local) minimum of a smooth optimization function satisfies that the partial derivatives are zero we will often look at the norm of the gradient to declare convergence. The default value for the tolerance set as the g_tol option in optim_options in Pumas is 1e-3. The final state of the measure is found in L14 in the output above where it is seen that the tolerance was not met. This tolerance is set to match what is often possible to achieve in well-specified models, but smaller is always better. To derive a theoretically well-reasoned value we would need to know the Hessian at the optimum as this is not possible to know before running a fit and very expensive to calculate exactly after a fit. Several factors affect the level that can be reached in practice including the accuracy of the solution of the embedded dynamical system and the curvature of the objective function surface. If the gradient norm is lower than 1e-3 we will in general accept the fit as having converged, but “near convergence” is often accepted as well if there are not other indications of issues with the model.

2.4.2 Case 2: no change in parameters

If x_abstol or x_reltol was not manually set by the user we will only terminate the optimization procedure if no change in parameters could be made at all from one iteration to the next. The final state of these termination tests are found in L10 and L11 respectively. This can indicate a few things but essentially indicates that the procedure that scales the search direction to find the next value of the parameters failed. If this occurs with a relatively small gradient norm we may accept the solution though the termination reason cannot be used to conclude convergence in isolation. The reasons why no step led to an improvement of the objective function value can be many. Some reasons include:

  1. The objective function is not smooth due to numerical noise from calculations in the objective function including model solutions.
  2. The inverse Hessian approximation in the BFGS algorithm is far from the true inverse Hessian of the objective function. Restarting the fit may help.

As mentioned above, NONMEM sets x_reltol=1e-5 by default but it is 0 in Pumas.

2.4.3 Case 3: small change in parameters

As mentioned above, Pumas defaults to only terminate due to the size of the change in parameters from one iteration to the next is exactly zero. If the relative change tolerance x_reltol is set to a number such as 1e-5 to match other software such as NONMEM then the user may well accept the solution since they deliberately used this criterion. In general, we advise using this setting with caution as it can terminate with quite large gradient norm.

Again, this information is seen in L10 and L11 but the limit after the inequality will change to the number provided.

2.4.4 Case 4: no change in objective function

This case is related to Case 2. We do not allow for termination due to the absence of improvement in the objective function so in general this should be seen as a failure. The rest of the comments about reasons for stopping, acceptance based on gradient norms etc applies here as well.

The information about these termination tests can be found in L12 and L13.

2.4.5 Case 5: small change in objective function

Just a Case 4 is related to Case 2 this case is related to Case 3. A very small improvement is often not a valid reason to accept the returned solution as the maximum likelihood estimates. If the user has good reason to actively sets one of the options f_abstol or f_reltol the result may be accepted.

As in Case 4 the information can be found in L12 and L13 but the right-hand side of the inequalities are updated as in Case 3.

2.4.6 Case 6: “maximum number of iterations reached”

The pre-specified limit on the number of iterations was reached. This can be increased in optim_options by setting the iterations keyword. This defaults to 1000 which should generally be plenty so termination according to this criterion is not a sign of convergence. The output may be accepted if the solution seems to cycle between parameter values with a gradient norm close to the prescribed tolerance.

The number iterations can be found in L18.

2.4.7 Case 7: line search failed

This case is related to Case 2, 3, 4, and 5. Given the search direction it was not possible to find a step that sufficiently decreases the objective function value. The advice found in Case 2 applies here as well.

In the output this will be indicates in L1.

2.4.8 Case 8: “user provided callback termination”

This is an advanced use-case where a user-provided callback function returned the value true to indicate that the optimization should terminate. Can only happen if the user supplied a custom callback function as part of optim_options.

In the output this will be indicated in L1.

2.4.9 Case 9: “time limit reached”

The optimization routine allows for a soft time limit that is checked after every iteration. The default allows for infinite run-time, but if the time_limit option in optim_options is set this termination reason may occur.

In the output this will be indicated in L17.

2.4.10 Case 10: “objective evaluation limit reached

The optimization routine allows for a limit to be placed on the total number of objective function evaluations through the f_calls_limit option in optim_options. It is not set by default in Pumas. The interpretation is similar to Case 6.

In the output this will be indicated in L19.

2.4.11 Case 11: “gradient evaluation limit reached

The optimization routine allows for a limit to be placed on the total number of gradient function evaluations through the g_calls_limit option in optim_options. It is not set by default in Pumas. The interpretation is similar to Case 6.

In the output this will be indicated in L20.

2.4.12 Case 12: “hessian evaluation limit reached

The optimization routine allows for a limit to be placed on the total number of hessian function evaluations through the h_calls_limit option in optim_options. It is not set by default in Pumas. The interpretation is similar to Case 6.

In the output this would be placed after L20 if it was present, but it is not because the BFGS algorithm only maintains an approximation to the inverse Hessian it doesn’t actually compute and utilize the real Hessian.

2.4.13 Case 13: “objective function increased”

This can happen if the allow_f_increases option is set to false in optim_options. The default is true. The causes are the same as in Case 2.

This would be indicated in L1 if it failed.

2.5 Optimization didn’t succeed - then what?

If the optimization did not success according to the description above it is always important to think about what to do to improve the situation if possible. If the model is or initial parameters are grossly wrong there may not be much to do about the failed optimization. It may always be a good idea to try to reduce the complexity of the model by removing random effects that cannot be supported by the data or fixing parameters that may be know with high certainty from the relevant literature.

Before changing the model too much it may be worth trying a few different things.

2.5.1 Try different starting parameters

Given the complexity and high number of parameters in many NLME models it is always important to initialize the fit from a set of starting parameters instead of only one parameter NamedTuple. It is quite likely that the optimization can end up in a local minimum that is not the global minimum, so we need to check the robustness of our maximum likelihood estimate candidates. Performing this robustness step is explained more in detail in a different tutorial, but just to show the effect we can try with a different initial fixed effect guess for volume of distribution.

foce_fit2 = fit(
    warfarin_pk_model,
    pop,
    merge(param_vals, (; pop_Vc = 12.0)),
    FOCE();
    optim_options = (show_trace = false,),
)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                   -319.93473
Number of subjects:                             31
Number of parameters:         Fixed      Optimized
                                  0              9
Observation records:         Active        Missing
    conc:                       239             47
    Total:                      239             47

-----------------------
             Estimate
-----------------------
pop_CL        0.13064
pop_Vc        8.0709
pop_tabs      0.37094
pop_lag       0.92625
pk_Ω₁,₁       0.05099
pk_Ω₂,₂       0.019014
pk_Ω₃,₃       1.1482
σ_prop        0.090596
σ_add         0.3337
-----------------------

2.5.2 Resetting the optimization procedure

Whenever a model fit concludes we will have ended up with a final set of fixed effect estimates (population level parameters) and well as a set of empirical bayes estimates. We also end up with a final inverse Hessian approximation. There are ways to restart the optimization at the final fixed effects parameters while resetting the random effects to the central value of the prior and to reset the optimization procedure.

To restart the optimization procedure and use the final fixed effects and empirical bayes estimates use the following.

foce_fit3 = fit(foce_fit2; reset_optim = true)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                   -319.93444
Number of subjects:                             31
Number of parameters:         Fixed      Optimized
                                  0              9
Observation records:         Active        Missing
    conc:                       239             47
    Total:                      239             47

-----------------------
             Estimate
-----------------------
pop_CL        0.13064
pop_Vc        8.0712
pop_tabs      0.37132
pop_lag       0.92611
pk_Ω₁,₁       0.050905
pk_Ω₂,₂       0.01913
pk_Ω₃,₃       1.1399
σ_prop        0.090621
σ_add         0.33356
-----------------------

2.5.3 Resetting the optimization procedure and the empirical Bayes estimates

To restart both, we can simply invoke fit as normal with coef(foce_fit3) as the initial fixed effects.

foce_fit4 = fit(
    warfarin_pk_model,
    pop,
    coef(foce_fit3),
    FOCE();
    optim_options = (show_trace = false,),
)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                   -319.93444
Number of subjects:                             31
Number of parameters:         Fixed      Optimized
                                  0              9
Observation records:         Active        Missing
    conc:                       239             47
    Total:                      239             47

-----------------------
             Estimate
-----------------------
pop_CL        0.13064
pop_Vc        8.0712
pop_tabs      0.37132
pop_lag       0.92611
pk_Ω₁,₁       0.050905
pk_Ω₂,₂       0.01913
pk_Ω₃,₃       1.1399
σ_prop        0.090621
σ_add         0.33356
-----------------------

Restarting the optimization is often a good way to avoid problems with gradient convergence.

2.5.4 A computationally expensive alternative

If gradient convergence below a certain threshold is very important it can often be achieved with GradientDescent. Gradient Descent has very good asymptotic (for iterations going to infinity) properties and avoids potential numerical issues with the inverse Hessian approximation in situations with numerical noise in the objective that could for example come from numerical integration of the dynamical system. However, since it is not using (an approximation to the) second order information, it may also take quite large steps that lead to extreme values of the parameters causing warnings to appear on screen from the ODE model solvers and it may well use a lot of iterations to reach the prescribed tolerance. Consider the following.

foce_fit5 = fit(
    warfarin_pk_model,
    pop,
    coef(foce_fit3),
    FOCE();
    optim_alg = Pumas.Optim.GradientDescent(),
    optim_options = (g_tol = 1e-5, show_trace = false),
)
foce_fit5.optim
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
 * Status: success

 * Candidate solution
    Final objective value:     3.199344e+02

 * Found with
    Algorithm:     Gradient Descent

 * Convergence measures
    |x - x'|               = 2.73e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 6.90e-09 ≰ 0.0e+00
    |f(x) - f(x')|         = 4.65e-06 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.45e-08 ≰ 0.0e+00
    |g(x)|                 = 8.79e-06 ≤ 1.0e-05

 * Work counters
    Seconds run:   50  (vs limit Inf)
    Iterations:    395
    f(x) calls:    989
    ∇f(x) calls:   989

We were indeed able to get a very low gradient tolerance, but notice the many iterations used! This may simply not be feasible for very large data sets or computationally very expensive models.

3 Calculating Some Key Diagnostics

3.1 Model Fit and Comparison

When we specify FOCE(), LaplaceI() or FO() we are doing Maximum Likelihood Estimation. This also means that the most natural number we may have to gauge and compare model fit is the log-likelihood. If we want to extract the log-likelihood at the maximum likelihood estimates we can simply use the loglikelihood function that has a method for just a FittedPumasModel.

loglikelihood(foce_fit5)
-319.9344482247824

We see that this is the same number as in the printed output although it was obviously rounded there. We can then compare this number to the log-likelihood of another fitted model, and if they are nested we can even do a log-likelihood ratio test with lrtest. Please consult the docstring for how to use this function ?lrtest.

Other more general approaches of model comparison involves information criteria such as aic, aicc, bic, and more. These all have methods for FittedPumasModel so to calculate them simply use them as follows

aic_fit, bic_fit = aic(foce_fit5), bic(foce_fit5)
(657.8688964495648, 689.1570684169484)

3.2 Computing Parameter Precision with infer

The infer function in Pumas estimates the uncertainty (precision) of parameter estimates. Depending on the chosen method, infer can provide standard errors, confidence intervals, correlation matrices, and condition numbers.

An example usage:

inference_results = infer(foce_fit4; level = 0.95)
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator

Successful minimization:                      true

Likelihood approximation:                     FOCE
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                   -319.93444
Number of subjects:                             31
Number of parameters:         Fixed      Optimized
                                  0              9
Observation records:         Active        Missing
    conc:                       239             47
    Total:                      239             47

---------------------------------------------------------------------
            Estimate           SE                      95.0% C.I.
---------------------------------------------------------------------
pop_CL       0.13064         0.0054839        [ 0.1199   ; 0.14139 ]
pop_Vc       8.0712          0.23799          [ 7.6048   ; 8.5377  ]
pop_tabs     0.37132         0.17068          [ 0.036798 ; 0.70584 ]
pop_lag      0.92611         0.04848          [ 0.83109  ; 1.0211  ]
pk_Ω₁,₁      0.050905        0.017225         [ 0.017143 ; 0.084666]
pk_Ω₂,₂      0.01913         0.0056755        [ 0.0080065; 0.030254]
pk_Ω₃,₃      1.1399          0.70819          [-0.24809  ; 2.528   ]
σ_prop       0.090621        0.014443         [ 0.062313 ; 0.11893 ]
σ_add        0.33356         0.08694          [ 0.16316  ; 0.50396 ]
---------------------------------------------------------------------

This augments the printed table from above with the uncertainty information. A useful function to know is the coefficients_table from PumasUtilities. It needs the model and inference objects as inputs and returns a DataFrame with model parameter names, descriptions from the model annotations, estimates, standard errors, relative standard errors, and confidence intercal limits.

coefficients_table(foce_fit5, inference_results)
9×7 DataFrame
Row Parameter Description Estimate Standard Error Relative_Se CI Lower CI Upper
String Abstract… Float64 Float64 Float64 Float64 Float64
1 pop_CL Clearance (L/hr)\n 0.131 0.005 0.042 0.12 0.141
2 pop_Vc Central volume (L)\n 8.071 0.238 0.029 7.605 8.538
3 pop_tabs Absorption lag time (hr)\n 0.371 0.171 0.46 0.037 0.706
4 pop_lag Lag time (hr)\n 0.926 0.048 0.052 0.831 1.021
5 pk_Ω₁,₁ ΩCL: Clearance 0.051 0.017 0.338 0.017 0.085
6 pk_Ω₂,₂ ΩVc: Central volume 0.019 0.006 0.297 0.008 0.03
7 pk_Ω₃,₃ Ωtabs: Absorption lag time 1.14 0.708 0.621 -0.248 2.528
8 σ_prop σ_prop: Proportional error\n 0.091 0.014 0.159 0.062 0.119
9 σ_add σ_add: Additive error\n 0.334 0.087 0.261 0.163 0.504

In another tutorial we will look closer at different ways of calculating uncertainty for the maximum likelihood estimates.

3.3 Calculate the Condition Number

Some researchers like to include the condition number of the correlation matrix implied by the variance-covarinace matrix that was estimated by infer. This is simply calculated by using the cond function on the output of infer above.

cond(inference_results)
77.88261093269588

The researcher may set a limit as to how large this number can be. Models with large condition numbers are sometimes labeled “unstable” and models with small condition numbers may be labeled “stable”. This categorization is contested and discussed among practitioners and the condition number should be reported and interpreted with care outside of the linear regression context where the interpretation was originally proposed.

3.4 Shrinkage

A Non-linear Mixed Effects Model includes random effects to represent individual specific variability and well as sampling noise. In cases where the model is over-parameterized compared to the true model or to the available data we may observe what has been called shrinkage in the literature. The thing that is shrinking is the distribution of empirical bayes estimates (η-shrinkage) or in the individual weighted residuals (ϵ-shrinkage). If there is insufficient data or the model is over-parameterized it will be possible to find exact or almost exact fits for each individual. There is not enough data to move the parameters away from the prior on the random effects and the individual distributions collapse into the population distribution.

3.4.1 Random Effect (eta) shrinkage

η-shrinkage is defined as \[ ηsh = 1-\frac{sd(\eta_i)}{\omega_\eta} \] Such that if there is agreement between the prior standard deviation (\(\omega_\eta\)) and the standard deviation of \(\eta_i\) we get no shrinkage or a shrinkage of 0. As \(sd(\eta_i)\) shrinks to 0 we get shrinkages closer and closer to 1 indicating an over-parameterization or lack individual data.

In Pumas, this number is simply calculated by invoking ηshrinkage on the fitted model.

ηshrinkage(foce_fit)
(pk_η = [0.009845537426213302, 0.12463273678558184, 0.45335870080737173],)

This returns a NamedTuple with one element per ~ in the @random-block.

3.4.2 Error Model (sigma) shrinkage

Sometimes a number called ϵ-shrinkage is reported as well. The definition is analogous to the above. It divides a standard deviation by another standard deviation and subtracts it from zero to gauge how well they match. The problem for the derived variables is that we often have heteroscedasticity built into our models. This means that the standard deviation is a function of the model prediction such as in the proportional or combined gaussian error models. Then, there is not a single standard deviation to compare to. For this reason, we compute the individual weighted residuals (IWRES) and calculate the standard deviation of that. Since IWRES is already weighted we simply subtract the standard deviation from 1. \[ ϵsh = 1-sd(IWRES_i) \] The interpretation is as above. We can calculate this using the ϵshrinkage in Pumas.

ϵshrinkage(foce_fit)
(conc = 0.13757846337857282,)

Will return a NamedTuple with one element per ~ that defines a Normal distribution in the @derived-block.

3.5 Reporting Metrics

PumasUtilities contains a convenience function called metrics_table that can be used to report a number of metrics in a compact format.

metrics_table(foce_fit)
18×2 DataFrame
Row Metric Value
String Any
1 Successful false
2 Estimation Time 5.925
3 Subjects 31
4 Fixed Parameters 0
5 Optimized Parameters 9
6 conc Active Observations 239
7 conc Missing Observations 47
8 Total Active Observations 239
9 Total Missing Observations 47
10 Likelihood Approximation Pumas.FOCE{Optim.NewtonTrustRegion{Float64}, Optim.Options{Float64, Nothing}}
11 LogLikelihood (LL) -319.935
12 -2LL 639.87
13 AIC 657.87
14 BIC 689.158
15 (η-shrinkage) pk_η₁ 0.01
16 (η-shrinkage) pk_η₂ 0.125
17 (η-shrinkage) pk_η₃ 0.453
18 (ϵ-shrinkage) conc 0.138

This will return a DataFrame with the metrics.

4 Concluding Remarks

This tutorial showcased a subset of the steps users want to perform after a model fit:

  1. Calculate uncertainty of the estimated parameters
  2. Calculate log-likelihood and information criteria for fit assessment and model comparison purposes
  3. Calculate shrinkage numbers to assess over-parametrization or excessive sparseness of data

Other tutorials will cover many more diagnostics steps to be done to asses the quality of the fitted model.

References

Mogensen, P, and A Riseth. 2018. “Optim: A Mathematical Optimization Package for Julia.” Journal of Open Source Software 3 (24).

Reuse