Interpreting Results from Pumas Fits

Author

Patrick Kofod Mogensen

1 Introduction

After following a typical workflow for fitting a Pumas model it is important to assess and interpret the results. This includes steps such as

  1. Looking at log-likelihood related quantities such as loglikelihood, aic and bic to compare the fit to other existing fits.
  2. Assessing convergence of the maximum likelihood optimization.
  3. Calculating the condition number of the correlation matrix implied by the estimated variance-covariance matrix.
  4. Calculating shrinkage for random effects and the dependent variable.

The following sections will walk through these steps using a one-compartment PK model for Warfarin, focusing on the PK aspects only.

2 Model and Data

2.1 Model Definition

Below is the PK model, named warfarin_pk_model, defined in Pumas. This model contains:

  • Fixed effects (population parameters): pop_CL, pop_Vc, pop_tabs, pop_lag
  • Inter-individual variability (IIV) components: pk_Ω
  • Residual error model parameters: σ_prop, σ_add
  • Covariates for scaling: FSZCL and FSZV
  • Differential equations describing the PK behavior in the compartments.
using Pumas
using PharmaDatasets
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using PumasUtilities
using Markdown
warfarin_pk_model = @model begin

    @metadata begin
        desc = "Warfarin 1-compartment PK model (PD removed)"
        timeu = u"hr"
    end

    @param begin
        # PK parameters
        """
        Clearance (L/hr)
        """
        pop_CL  RealDomain(lower = 0.0, init = 0.134)
        """
        Central volume (L)
        """
        pop_Vc  RealDomain(lower = 0.0, init = 8.11)
        """
        Absorption lag time (hr)
        """
        pop_tabs  RealDomain(lower = 0.0, init = 0.523)
        """
        Lag time (hr)
        """
        pop_lag  RealDomain(lower = 0.0, init = 0.1)

        # Inter-individual variability
        """
          - ΩCL: Clearance
          - ΩVc: Central volume
          - Ωtabs: Absorption lag time
        """
        pk_Ω  PDiagDomain([0.01, 0.01, 0.01])

        # Residual variability
        """
        σ_prop: Proportional error
        """
        σ_prop  RealDomain(lower = 0.0, init = 0.00752)
        """
        σ_add: Additive error
        """
        σ_add  RealDomain(lower = 0.0, init = 0.0661)
    end

    @random begin
        pk_η ~ MvNormal(pk_Ω)    # mean = 0, covariance = pk_Ω
    end

    @covariates begin
        """
        FSZCL: Clearance scaling factor
        """
        FSZCL
        """
        FSZV: Volume scaling factor
        """
        FSZV
    end

    @pre begin
        CL = FSZCL * pop_CL * exp(pk_η[1])
        Vc = FSZV * pop_Vc * exp(pk_η[2])

        tabs = pop_tabs * exp(pk_η[3])
        Ka = log(2) / tabs
    end

    @dosecontrol begin
        lags = (Depot = pop_lag,)
    end

    @vars begin
        cp := Central / Vc
    end

    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL / Vc) * Central
    end

    @derived begin
        """
        Concentration (ng/mL)
        """
        conc ~ @. CombinedNormal(cp, σ_add, σ_prop)
    end
end
PumasModel
  Parameters: pop_CL, pop_Vc, pop_tabs, pop_lag, pk_Ω, σ_prop, σ_add
  Random effects: pk_η
  Covariates: FSZCL, FSZV
  Dynamical system variables: Depot, Central
  Dynamical system type: Matrix exponential
  Derived: conc
  Observed: conc

2.2 Data Preparation

The Warfarin data used in this tutorial is pulled from PharmaDatasets for demonstration purposes. Note how the code reshapes and prepares the data in “wide” format for reading into Pumas. Only the conc column is treated as observations for the PK model.

warfarin_data = dataset("pumas/warfarin_nonmem")

# Step 2: Fix Duplicate Time Points
# -------------------------------
# Some subjects have duplicate time points for dvid = 1
# For this dataset, the triple (id, time, dvid) should define
# a row uniquely, but
nrow(warfarin_data)
nrow(unique(warfarin_data, ["id", "time", "dvid"]))

# We can identify the problematic rows by grouping on the index variables
@chain warfarin_data begin
    @groupby :id :time :dvid
    @transform :tmp = length(:id)
    @rsubset :tmp > 1
end

# Transform the data in a single chain of operations
warfarin_data_wide = @chain warfarin_data begin
    @rtransform begin
        # Scaling factors
        :FSZV = :wtbl / 70            # volume scaling
        :FSZCL = (:wtbl / 70)^0.75     # clearance scaling (allometric)
        # Column name for the dv
        :dvname = "dv$(:dvid)"
        # Dosing indicator columns
        :cmt = ismissing(:amt) ? missing : 1
        :evid = iszero(:amt) ? 0 : 1
    end
    unstack(Not([:dvid, :dvname, :dv]), :dvname, :dv)
    rename!(:dv1 => :conc, :dv2 => :pca)
    select!(Not([:dv0]))
end
first(warfarin_data_wide, 10)
10×12 DataFrame
Row id time evid amt wtbl age sex FSZV FSZCL cmt conc pca
Int64 Float64 Int64 Float64 Float64 Int64 String1 Float64 Float64 Int64 Float64? Float64?
1 1 0.0 1 100.0 66.7 50 M 0.952857 0.96443 1 missing missing
2 1 0.5 0 0.0 66.7 50 M 0.952857 0.96443 1 0.0 missing
3 1 1.0 0 0.0 66.7 50 M 0.952857 0.96443 1 1.9 missing
4 1 2.0 0 0.0 66.7 50 M 0.952857 0.96443 1 3.3 missing
5 1 3.0 0 0.0 66.7 50 M 0.952857 0.96443 1 6.6 missing
6 1 6.0 0 0.0 66.7 50 M 0.952857 0.96443 1 9.1 missing
7 1 9.0 0 0.0 66.7 50 M 0.952857 0.96443 1 10.8 missing
8 1 12.0 0 0.0 66.7 50 M 0.952857 0.96443 1 8.6 missing
9 1 24.0 0 0.0 66.7 50 M 0.952857 0.96443 1 5.6 44.0
10 1 36.0 0 0.0 66.7 50 M 0.952857 0.96443 1 4.0 27.0

2.3 Creating a Pumas Population

Below is the creation of a population object in Pumas using read_pumas. Only the conc data are treated as the observation variable:

pop = read_pumas(
    warfarin_data_wide;
    id = :id,
    time = :time,
    amt = :amt,
    cmt = :cmt,
    evid = :evid,
    covariates = [:sex, :wtbl, :FSZV, :FSZCL],
    observations = [:conc],
)
Population
  Subjects: 32
  Covariates: sex, wtbl, FSZV, FSZCL
  Observations: conc
# A named tuple of parameter values
param_vals = (
    pop_CL = 0.134,
    pop_Vc = 8.11,
    pop_tabs = 0.523,
    pop_lag = 0.1,
    pk_Ω = Diagonal([0.01, 0.01, 0.01]),
    σ_prop = 0.00752,
    σ_add = 0.0661,
)

foce_fit = fit(warfarin_pk_model, pop, param_vals, FOCE())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     1.209064e+04     1.489225e+04
 * time: 0.05093979835510254
     1     2.643772e+03     3.167516e+03
 * time: 2.6453518867492676
     2     1.836601e+03     2.118430e+03
 * time: 2.733819007873535
     3     9.351337e+02     8.722439e+02
 * time: 2.819694995880127
     4     6.402300e+02     4.199225e+02
 * time: 2.9045958518981934
     5     5.103664e+02     1.642121e+02
 * time: 3.160916805267334
     6     4.760464e+02     5.453749e+01
 * time: 3.2459259033203125
     7     4.703757e+02     3.643518e+01
 * time: 3.3307669162750244
     8     4.699019e+02     3.135992e+01
 * time: 3.4168338775634766
     9     4.697614e+02     2.953531e+01
 * time: 3.501236915588379
    10     4.693153e+02     2.463233e+01
 * time: 3.5851998329162598
    11     4.685743e+02     2.580427e+01
 * time: 3.6660449504852295
    12     4.675133e+02     3.864937e+01
 * time: 3.746372938156128
    13     4.666775e+02     5.495470e+01
 * time: 3.828974962234497
    14     4.661197e+02     5.692101e+01
 * time: 3.911259889602661
    15     4.656782e+02     4.770992e+01
 * time: 3.990532875061035
    16     4.651802e+02     3.087698e+01
 * time: 4.070924997329712
    17     4.645523e+02     1.184834e+01
 * time: 4.193528890609741
    18     4.641447e+02     1.162249e+01
 * time: 4.361615896224976
    19     4.639978e+02     1.125144e+01
 * time: 4.444846868515015
    20     4.639307e+02     1.156463e+01
 * time: 4.528854846954346
    21     4.638001e+02     1.312870e+01
 * time: 4.617182970046997
    22     4.635282e+02     1.480920e+01
 * time: 4.704620838165283
    23     4.630353e+02     2.169377e+01
 * time: 4.794595003128052
    24     4.623847e+02     4.478029e+01
 * time: 4.92241096496582
    25     4.617426e+02     6.468975e+01
 * time: 5.0028228759765625
    26     4.610293e+02     7.776996e+01
 * time: 5.085462808609009
    27     4.597628e+02     8.785260e+01
 * time: 5.168321847915649
    28     4.566753e+02     9.769803e+01
 * time: 5.251665830612183
    29     4.490421e+02     1.008838e+02
 * time: 5.340794801712036
    30     4.391868e+02     9.978816e+01
 * time: 5.443156957626343
    31     4.130704e+02     5.917685e+01
 * time: 5.529854774475098
    32     4.055780e+02     3.852824e+01
 * time: 5.619260787963867
    33     4.023118e+02     3.889618e+01
 * time: 5.70869779586792
    34     4.012516e+02     3.694777e+01
 * time: 5.8281638622283936
    35     4.004391e+02     2.061956e+01
 * time: 5.918389797210693
    36     3.983040e+02     3.508422e+01
 * time: 6.008492946624756
    37     3.969705e+02     3.841036e+01
 * time: 6.1001927852630615
    38     3.965462e+02     3.738341e+01
 * time: 6.189194917678833
    39     3.950409e+02     3.064789e+01
 * time: 6.297652006149292
    40     3.945750e+02     2.876430e+01
 * time: 6.3870909214019775
    41     3.937725e+02     2.571443e+01
 * time: 6.474837779998779
    42     3.933955e+02     2.436114e+01
 * time: 6.5639259815216064
    43     3.927564e+02     2.051066e+01
 * time: 6.655068874359131
    44     3.916020e+02     1.629024e+01
 * time: 6.749541997909546
    45     3.886991e+02     2.689820e+01
 * time: 6.847118854522705
    46     3.870054e+02     2.298579e+01
 * time: 6.942440986633301
    47     3.853691e+02     2.614989e+01
 * time: 7.055938959121704
    48     3.841730e+02     2.207557e+01
 * time: 7.150958776473999
    49     3.825114e+02     2.204387e+01
 * time: 7.245250940322876
    50     3.808880e+02     2.444789e+01
 * time: 7.341068983078003
    51     3.800407e+02     1.250621e+01
 * time: 7.43572998046875
    52     3.798092e+02     1.167926e+01
 * time: 7.529706954956055
    53     3.797789e+02     1.162382e+01
 * time: 7.63646388053894
    54     3.797069e+02     1.152441e+01
 * time: 7.728648900985718
    55     3.794424e+02     1.132716e+01
 * time: 7.819743871688843
    56     3.788132e+02     2.006425e+01
 * time: 7.9083898067474365
    57     3.771525e+02     3.584662e+01
 * time: 7.997523784637451
    58     3.731299e+02     5.697236e+01
 * time: 8.086609840393066
    59     3.658672e+02     6.542037e+01
 * time: 8.19500994682312
    60     3.604196e+02     4.036518e+01
 * time: 8.289736986160278
    61     3.532842e+02     1.574097e+01
 * time: 8.384963989257812
    62     3.520181e+02     1.393316e+01
 * time: 8.481120824813843
    63     3.517984e+02     6.700859e+00
 * time: 8.57617998123169
    64     3.517541e+02     3.503796e+00
 * time: 8.677265882492065
    65     3.516437e+02     8.713309e+00
 * time: 8.799713850021362
    66     3.511843e+02     1.405445e+01
 * time: 8.91054892539978
    67     3.510646e+02     2.540338e+00
 * time: 9.005870819091797
    68     3.510209e+02     3.157030e+00
 * time: 9.092185974121094
    69     3.509959e+02     3.045070e+00
 * time: 9.174425840377808
    70     3.509765e+02     2.672874e+00
 * time: 9.257720947265625
    71     3.509752e+02     2.603935e+00
 * time: 9.353906869888306
    72     3.509724e+02     2.505642e+00
 * time: 9.443802833557129
    73     3.509666e+02     2.379650e+00
 * time: 9.529699802398682
    74     3.509504e+02     3.576567e+00
 * time: 9.612651824951172
    75     3.509122e+02     6.012775e+00
 * time: 9.695895910263062
    76     3.508285e+02     8.828809e+00
 * time: 9.780113935470581
    77     3.506940e+02     9.705226e+00
 * time: 9.874581813812256
    78     3.505765e+02     6.080422e+00
 * time: 9.959722995758057
    79     3.505358e+02     1.726857e+00
 * time: 10.04671597480774
    80     3.505314e+02     6.749542e-01
 * time: 10.133582830429077
    81     3.505313e+02     6.722103e-01
 * time: 10.236495971679688
    82     3.505312e+02     6.699474e-01
 * time: 10.395134925842285
    83     3.505307e+02     6.606710e-01
 * time: 10.500795841217041
    84     3.505298e+02     6.413392e-01
 * time: 10.590865850448608
    85     3.505274e+02     9.087461e-01
 * time: 10.680714845657349
    86     3.505222e+02     1.339575e+00
 * time: 10.768779993057251
    87     3.505129e+02     1.608652e+00
 * time: 10.859548807144165
    88     3.505026e+02     1.292479e+00
 * time: 10.947035789489746
    89     3.504973e+02     5.133008e-01
 * time: 11.041557788848877
    90     3.504963e+02     6.322176e-02
 * time: 11.124586820602417
    91     3.504963e+02     3.129887e-03
 * time: 11.210976839065552
    92     3.504963e+02     3.129887e-03
 * time: 11.33131194114685
    93     3.504963e+02     3.129887e-03
 * time: 11.399275779724121
FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                      NoXChange
Log-likelihood value:                   -350.49625

--------------------
           Estimate
--------------------
pop_CL     0.13465
pop_Vc     8.0534
pop_tabs   0.55063
pop_lag    0.87159
pk_Ω₁,₁    0.070654
pk_Ω₂,₂    0.018296
pk_Ω₃,₃    0.91327
σ_prop     0.090096
σ_add      0.39115
--------------------

2.4 Evaluating the termination of the fit

Fitting non-linear mixed effects model is not always a simple task. The model has to be appropriate to the data, data must be accurately sampled and recorded, no information should be truncated, and so on for the fits to run reliably. Even in the most perfect version of data collection and modeling there can still be numerical challenges in evaluating various expressions, calculating empirical bayes estimates in an inner loop, handling vanishing variances, local optima and more.

In Pumas we use Optim.jl (Mogensen and Riseth 2018) to perform the minimization of any FO, FOCE, or LaplaceI fits with out without MAP for maximum aposterori fitting. The returned FittedPumasModel contains the optimization results structure from Optim.jl in the optim field. This can be useful when assessing convergence.

Convergence in non-linear optimization is not always a simply task to assess:

  • We can look at the gradient element sizes and we often look at the infinity norm of the gradient to this end. The infinity norm of a vector picks out the largest absolute value of the elements. By default we require this to be less than 1e-3 to declare convergence, but some problems may be more or less numerically stable so that number may be large or small depending on the circumstances.
Gradient Elements and Infinity Norm of the Gradient
  1. Gradient Elements
  • The gradient is a vector that tells us which direction to move to increase a function’s value the fastest
  • Each element in this vector corresponds to how much the function changes when we adjust one parameter
  • “Gradient element sizes” refers to the magnitude (absolute value) of each of these directional changes

To illustrate with a simple example, consider a model with 3 parameters (like CL, V, and Ka), a gradient might look like:

gradient = [0.0001, -0.002, 0.0005]
3-element Vector{Float64}:
  0.0001
 -0.002
  0.0005

Here:

  • 0.0001 tells us how much the function would change if we adjust the first parameter (CL)
  • -0.002 for the second parameter (V)
  • 0.0005 for the third parameter (Ka)
  1. Infinity Norm of the Gradient
  • The infinity norm is simply the largest absolute value among all gradient elements
  • In our example above:
|0.0001| = 0.0001
|-0.002| = 0.002 #  <- This is the largest!
|0.0005| = 0.0005

Therefore, infinity norm = 0.002

Why This Matters for Convergence:

  • When optimizing a model (finding the best parameters), we want all gradient elements to be close to zero
  • A large gradient element means we can still improve our solution by moving in that direction
  • A small gradient (all elements close to zero) suggests we’re at or near the optimal solution
  • By convention in Pumas, if the infinity norm is less than 0.001 (1e-3), we consider the optimization “converged”

Think of it like hiking to find the highest point:

  • The gradient tells you which direction to walk
  • The size of each gradient element tells you how steep it is in each direction
  • When you’re at the top, it should be flat in all directions (all gradient elements close to zero)
  • The infinity norm tells you about the steepest direction you could still go
  • If the steepest direction is very gentle (< 0.001), you’re probably at the peak!
  • We do not stop according to small changes in the parameters or objective function value by default. This is because it is typically not a direct sign of convergence but rather that progress is stalled. That said, many researches set a relative change tolerance in the parameters in the same way that is done in NONMEM because often you will see many final iterations with next to no improvement. This can be done by setting optim_options = (x_reltol = 1e-5) in fit to match NONMEM for example.

Let us consider a bunch of different cases and how you can find and interpret the information in the optim field. This looks something like the following.

julia> foce_fit.optim
L1 * Status: success
L2
L3 * Candidate solution
L4    Final objective value:     3.199344e+02
L5
L6 * Found with
L7    Algorithm:     BFGS
L8
L9 * Convergence measures
L10   |x - x'|               = 4.36e-04 ≰ 0.0e+00
L11   |x - x'|/|x'|          = 1.10e-04 ≰ 0.0e+00
L12   |f(x) - f(x')|         = 3.57e-06 ≰ 0.0e+00
L13   |f(x) - f(x')|/|f(x')| = 1.12e-08 ≰ 0.0e+00
L14   |g(x)|                 = 5.62e-04 ≤ 1.0e-03
L15
L16 * Work counters
L17   Seconds run:   1  (vs limit Inf)
L18   Iterations:    95
L19   f(x) calls:    101
L20   ∇f(x) calls:   97

We will use the line numbers to the left below to refer to the relevant part of the output. While interpreting the output it may be worth remembering that the optimization is cast as the problem of minimizing the negative log of the marginal likelihood.

2.4.1 Case 1: gradient convergence

Since a (local) minimum of a smooth optimization function satisfies that the partial derivatives are zero we will often look at the norm of the gradient to declare convergence. The default value for the tolerance set as the g_tol option in optim_options in Pumas is 1e-3. The final state of the measure is found in L14 in the output above where it is seen that the tolerance was not met. This tolerance is set to match what is often possible to achieve in well-specified models, but smaller is always better. To derive a theoretically well-reasoned value we would need to know the Hessian at the optimum as this is not possible to know before running a fit and very expensive to calculate exactly after a fit. Several factors affect the level that can be reached in practice including the accuracy of the solution of the embedded dynamical system and the curvature of the objective function surface. If the gradient norm is lower than 1e-3 we will in general accept the fit as having converged, but “near convergence” is often accepted as well if there are not other indications of issues with the model.

2.4.2 Case 2: no change in parameters

If x_abstol or x_reltol was not manually set by the user we will only terminate the optimization procedure if no change in parameters could be made at all from one iteration to the next. The final state of these termination tests are found in L10 and L11 respectively. This can indicate a few things but essentially indicates that the procedure that scales the search direction to find the next value of the parameters failed. If this occurs with a relatively small gradient norm we may accept the solution though the termination reason cannot be used to conclude convergence in isolation. The reasons why no step led to an improvement of the objective function value can be many. Some reasons include:

  1. The objective function is not smooth due to numerical noise from calculations in the objective function including model solutions.
  2. The inverse Hessian approximation in the BFGS algorithm is far from the true inverse Hessian of the objective function. Restarting the fit may help.

As mentioned above, NONMEM sets x_reltol=1e-5 by default but it is 0 in Pumas.

2.4.3 Case 3: small change in parameters

As mentioned above, Pumas defaults to only terminate due to the size of the change in parameters from one iteration to the next is exactly zero. If the relative change tolerance x_reltol is set to a number such as 1e-5 to match other software such as NONMEM then the user may well accept the solution since they deliberately used this criterion. In general, we advise using this setting with caution as it can terminate with quite large gradient norm.

Again, this information is seen in L10 and L11 but the limit after the inequality will change to the number provided.

2.4.4 Case 4: no change in objective function

This case is related to Case 2. We do not allow for termination due to the absence of improvement in the objective function so in general this should be seen as a failure. The rest of the comments about reasons for stopping, acceptance based on gradient norms etc applies here as well.

The information about these termination tests can be found in L12 and L13.

2.4.5 Case 5: small change in objective function

Just a Case 4 is related to Case 2 this case is related to Case 3. A very small improvement is often not a valid reason to accept the returned solution as the maximum likelihood estimates. If the user has good reason to actively sets one of the options f_abstol or f_reltol the result may be accepted.

As in Case 4 the information can be found in L12 and L13 but the right-hand side of the inequalities are updated as in Case 3.

2.4.6 Case 6: “maximum number of iterations reached”

The pre-specified limit on the number of iterations was reached. This can be increased in optim_options by setting the iterations keyword. This defaults to 1000 which should generally be plenty so termination according to this criterion is not a sign of convergence. The output may be accepted if the solution seems to cycle between parameter values with a gradient norm close to the prescribed tolerance.

The number iterations can be found in L18.

2.4.7 Case 7: line search failed

This case is related to Case 2, 3, 4, and 5. Given the search direction it was not possible to find a step that sufficiently decreases the objective function value. The advice found in Case 2 applies here as well.

In the output this will be indicates in L1.

2.4.8 Case 8: “user provided callback termination”

This is an advanced use-case where a user-provided callback function returned the value true to indicate that the optimization should terminate. Can only happen if the user supplied a custom callback function as part of optim_options.

In the output this will be indicated in L1.

2.4.9 Case 9: “time limit reached”

The optimization routine allows for a soft time limit that is checked after every iteration. The default allows for infinite run-time, but if the time_limit option in optim_options is set this termination reason may occur.

In the output this will be indicated in L17.

2.4.10 Case 10: “objective evaluation limit reached

The optimization routine allows for a limit to be placed on the total number of objective function evaluations through the f_calls_limit option in optim_options. It is not set by default in Pumas. The interpretation is similar to Case 6.

In the output this will be indicated in L19.

2.4.11 Case 11: “gradient evaluation limit reached

The optimization routine allows for a limit to be placed on the total number of gradient function evaluations through the g_calls_limit option in optim_options. It is not set by default in Pumas. The interpretation is similar to Case 6.

In the output this will be indicated in L20.

2.4.12 Case 12: “hessian evaluation limit reached

The optimization routine allows for a limit to be placed on the total number of hessian function evaluations through the h_calls_limit option in optim_options. It is not set by default in Pumas. The interpretation is similar to Case 6.

In the output this would be placed after L20 if it was present, but it is not because the BFGS algorithm only maintains an approximation to the inverse Hessian it doesn’t actually compute and utilize the real Hessian.

2.4.13 Case 13: “objective function increased”

This can happen if the allow_f_increases option is set to false in optim_options. The default is true. The causes are the same as in Case 2.

This would be indicated in L1 if it failed.

2.5 Optimization didn’t succeed - then what?

If the optimization did not success according to the description above it is always important to think about what to do to improve the situation if possible. If the model is or initial parameters are grossly wrong there may not be much to do about the failed optimization. It may always be a good idea to try to reduce the complexity of the model by removing random effects that cannot be supported by the data or fixing parameters that may be know with high certainty from the relevant literature.

Before changing the model too much it may be worth trying a few different things.

2.5.1 Try different starting parameters

Given the complexity and high number of parameters in many NLME models it is always important to initialize the fit from a set of starting parameters instead of only one parameter NamedTuple. It is quite likely that the optimization can end up in a local minimum that is not the global minimum, so we need to check the robustness of our maximum likelihood estimate candidates. Performing this robustness step is explained more in detail in a different tutorial, but just to show the effect we can try with a different initial fixed effect guess for volume of distribution.

foce_fit2 = fit(
    warfarin_pk_model,
    pop,
    merge(param_vals, (; pop_Vc = 12.0)),
    FOCE();
    optim_options = (show_trace = false,),
)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                      NoXChange
Log-likelihood value:                   -350.49625

--------------------
           Estimate
--------------------
pop_CL     0.13465
pop_Vc     8.0534
pop_tabs   0.55063
pop_lag    0.8716
pk_Ω₁,₁    0.070655
pk_Ω₂,₂    0.018294
pk_Ω₃,₃    0.91324
σ_prop     0.090094
σ_add      0.39116
--------------------

2.5.2 Resetting the optimization procedure

Whenever a model fit concludes we will have ended up with a final set of fixed effect estimates (population level parameters) and well as a set of empirical bayes estimates. We also end up with a final inverse Hessian approximation. There are ways to restart the optimization at the final fixed effects parameters while resetting the random effects to the central value of the prior and to reset the optimization procedure.

To restart the optimization procedure and use the final fixed effects and empirical bayes estimates use the following.

foce_fit3 = fit(foce_fit2; reset_optim = true)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                      NoXChange
Log-likelihood value:                   -350.49625

--------------------
           Estimate
--------------------
pop_CL     0.13465
pop_Vc     8.0534
pop_tabs   0.55063
pop_lag    0.87159
pk_Ω₁,₁    0.070655
pk_Ω₂,₂    0.018294
pk_Ω₃,₃    0.91324
σ_prop     0.090095
σ_add      0.39116
--------------------

2.5.3 Resetting the optimization procedure and the empirical Bayes estimates

To restart both, we can simply invoke fit as normal with coef(foce_fit3) as the initial fixed effects.

foce_fit4 = fit(
    warfarin_pk_model,
    pop,
    coef(foce_fit3),
    FOCE();
    optim_options = (show_trace = false,),
)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                      NoXChange
Log-likelihood value:                   -350.49625

--------------------
           Estimate
--------------------
pop_CL     0.13465
pop_Vc     8.0534
pop_tabs   0.55063
pop_lag    0.87159
pk_Ω₁,₁    0.070655
pk_Ω₂,₂    0.018294
pk_Ω₃,₃    0.91324
σ_prop     0.090095
σ_add      0.39116
--------------------

Restarting the optimization is often a good way to avoid problems with gradient convergence.

2.5.4 A computationally expensive alternative

If gradient convergence below a certain threshold is very important it can often be achieved with GradientDescent. Gradient Descent has very good asymptotic (for iterations going to infinity) properties and avoids potential numerical issues with the inverse Hessian approximation in situations with numerical noise in the objective that could for example come from numerical integration of the dynamical system. However, since it is not using (an approximation to the) second order information, it may also take quite large steps that lead to extreme values of the parameters causing warnings to appear on screen from the ODE model solvers and it may well use a lot of iterations to reach the prescribed tolerance. Consider the following.

foce_fit5 = fit(
    warfarin_pk_model,
    pop,
    coef(foce_fit3),
    FOCE();
    optim_alg = Pumas.Optim.GradientDescent(),
    optim_options = (g_tol = 1e-5, show_trace = false),
)
foce_fit5.optim
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
 * Status: success

 * Candidate solution
    Final objective value:     3.504963e+02

 * Found with
    Algorithm:     Gradient Descent

 * Convergence measures
    |x - x'|               = 1.60e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 3.99e-09 ≰ 0.0e+00
    |f(x) - f(x')|         = 2.83e-07 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 8.06e-10 ≰ 0.0e+00
    |g(x)|                 = 7.71e-06 ≤ 1.0e-05

 * Work counters
    Seconds run:   163  (vs limit Inf)
    Iterations:    446
    f(x) calls:    1116
    ∇f(x) calls:   447
    ∇f(x)ᵀv calls: 1115

We were indeed able to get a very low gradient tolerance, but notice the many iterations used! This may simply not be feasible for very large data sets or computationally very expensive models.

3 Calculating Some Key Diagnostics

3.1 Model Fit and Comparison

When we specify FOCE(), LaplaceI() or FO() we are doing Maximum Likelihood Estimation. This also means that the most natural number we may have to gauge and compare model fit is the log-likelihood. If we want to extract the log-likelihood at the maximum likelihood estimates we can simply use the loglikelihood function that has a method for just a FittedPumasModel.

loglikelihood(foce_fit5)
-350.49625376090324

We see that this is the same number as in the printed output although it was obviously rounded there. We can then compare this number to the log-likelihood of another fitted model, and if they are nested we can even do a log-likelihood ratio test with lrtest. Please consult the docstring for how to use this function ?lrtest.

Other more general approaches of model comparison involves information criteria such as aic, aicc, bic, and more. These all have methods for FittedPumasModel so to calculate them simply use them as follows

aic_fit, bic_fit = aic(foce_fit5), bic(foce_fit5)
(718.9925075218065, 750.7215839739926)

3.2 Computing Parameter Precision with infer

The infer function in Pumas estimates the uncertainty (precision) of parameter estimates. Depending on the chosen method, infer can provide standard errors, confidence intervals, correlation matrices, and condition numbers.

An example usage:

inference_results = infer(foce_fit4; level = 0.95)
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator

Dynamical system type:          Matrix exponential

Number of subjects:                             32

Observation records:         Active        Missing
    conc:                       251             47
    Total:                      251             47

Number of parameters:      Constant      Optimized
                                  0              9

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:                      NoXChange
Log-likelihood value:                   -350.49625

--------------------------------------------------------
           Estimate   SE          95.0% C.I.
--------------------------------------------------------
pop_CL     0.13465    0.0066547   [ 0.12161  ; 0.1477 ]
pop_Vc     8.0534     0.22108     [ 7.6201   ; 8.4868 ]
pop_tabs   0.55063    0.18702     [ 0.18407  ; 0.91718]
pop_lag    0.87159    0.056684    [ 0.76049  ; 0.98269]
pk_Ω₁,₁    0.070655   0.024585    [ 0.022468 ; 0.11884]
pk_Ω₂,₂    0.018294   0.0051511   [ 0.0081976; 0.02839]
pk_Ω₃,₃    0.91324    0.40637     [ 0.11677  ; 1.7097 ]
σ_prop     0.090095   0.014521    [ 0.061634 ; 0.11855]
σ_add      0.39116    0.065399    [ 0.26298  ; 0.51934]
--------------------------------------------------------

This augments the printed table from above with the uncertainty information. A useful function to know is the coefficients_table from PumasUtilities. It needs the model and inference objects as inputs and returns a DataFrame with model parameter names, descriptions from the model annotations, estimates, standard errors, relative standard errors, and confidence intercal limits.

coefficients_table(foce_fit5, inference_results)
9×8 DataFrame
Row Parameter Description Constant Estimate Standard Error Relative_Se CI Lower CI Upper
String SubStrin… Bool Float64 Float64 Float64 Float64 Float64
1 pop_CL Clearance (L/hr) false 0.135 0.007 0.049 0.122 0.148
2 pop_Vc Central volume (L) false 8.053 0.221 0.027 7.62 8.487
3 pop_tabs Absorption lag time (hr) false 0.551 0.187 0.34 0.184 0.917
4 pop_lag Lag time (hr) false 0.872 0.057 0.065 0.76 0.983
5 pk_Ω₁,₁ ΩCL: Clearance false 0.071 0.025 0.348 0.022 0.119
6 pk_Ω₂,₂ ΩVc: Central volume false 0.018 0.005 0.282 0.008 0.028
7 pk_Ω₃,₃ Ωtabs: Absorption lag time false 0.913 0.406 0.445 0.117 1.71
8 σ_prop σ_prop: Proportional error false 0.09 0.015 0.161 0.062 0.119
9 σ_add σ_add: Additive error false 0.391 0.065 0.167 0.263 0.519

In another tutorial we will look closer at different ways of calculating uncertainty for the maximum likelihood estimates.

3.3 Calculate the Condition Number

Some researchers like to include the condition number of the correlation matrix implied by the variance-covarinace matrix that was estimated by infer. This is simply calculated by using the cond function on the output of infer above.

cond(inference_results)
50.113557293111924

The researcher may set a limit as to how large this number can be. Models with large condition numbers are sometimes labeled “unstable” and models with small condition numbers may be labeled “stable”. This categorization is contested and discussed among practitioners and the condition number should be reported and interpreted with care outside of the linear regression context where the interpretation was originally proposed.

3.4 Shrinkage

A Non-linear Mixed Effects Model includes random effects to represent individual specific variability and well as sampling noise. In cases where the model is over-parameterized compared to the true model or to the available data we may observe what has been called shrinkage in the literature. The thing that is shrinking is the distribution of empirical bayes estimates (η-shrinkage) or in the individual weighted residuals (ϵ-shrinkage). If there is insufficient data or the model is over-parameterized it will be possible to find exact or almost exact fits for each individual. There is not enough data to move the parameters away from the prior on the random effects and the individual distributions collapse into the population distribution.

3.4.1 Random Effect (eta) shrinkage

η-shrinkage is defined as \[ ηsh = 1-\frac{sd(\eta_i)}{\omega_\eta} \] Such that if there is agreement between the prior standard deviation (\(\omega_\eta\)) and the standard deviation of \(\eta_i\) we get no shrinkage or a shrinkage of 0. As \(sd(\eta_i)\) shrinks to 0 we get shrinkages closer and closer to 1 indicating an over-parameterization or lack individual data.

In Pumas, this number is simply calculated by invoking ηshrinkage on the fitted model.

ηshrinkage(foce_fit)
(pk_η = [0.007292801165891594, 0.13558469343660862, 0.41230531924963354],)

This returns a NamedTuple with one element per ~ in the @random-block.

3.4.2 Error Model (sigma) shrinkage

Sometimes a number called ϵ-shrinkage is reported as well. The definition is analogous to the above. It divides a standard deviation by another standard deviation and subtracts it from zero to gauge how well they match. The problem for the derived variables is that we often have heteroscedasticity built into our models. This means that the standard deviation is a function of the model prediction such as in the proportional or combined gaussian error models. Then, there is not a single standard deviation to compare to. For this reason, we compute the individual weighted residuals (IWRES) and calculate the standard deviation of that. Since IWRES is already weighted we simply subtract the standard deviation from 1. \[ ϵsh = 1-sd(IWRES_i) \] The interpretation is as above. We can calculate this using the ϵshrinkage in Pumas.

ϵshrinkage(foce_fit)
(conc = 0.13817851132695302,)

Will return a NamedTuple with one element per ~ that defines a Normal distribution in the @derived-block.

3.5 Reporting Metrics

PumasUtilities contains a convenience function called metrics_table that can be used to report a number of metrics in a compact format.

metrics_table(foce_fit)
WARNING: using deprecated binding Distributions.MatrixReshaped in Pumas.
, use Distributions.ReshapedDistribution{2, S, D} where D<:Distributions.Distribution{Distributions.ArrayLikeVariate{1}, S} where S<:Distributions.ValueSupport instead.
18×2 DataFrame
Row Metric Value
String Any
1 Successful true
2 Estimation Time 11.399
3 Subjects 32
4 Fixed Parameters 0
5 Optimized Parameters 9
6 conc Active Observations 251
7 conc Missing Observations 47
8 Total Active Observations 251
9 Total Missing Observations 47
10 Likelihood Approximation Pumas.FOCE{Optim.NewtonTrustRegion{Float64}, Optim.Options{Float64, Nothing}}
11 LogLikelihood (LL) -350.496
12 -2LL 700.993
13 AIC 718.993
14 BIC 750.722
15 (η-shrinkage) pk_η₁ 0.007
16 (η-shrinkage) pk_η₂ 0.136
17 (η-shrinkage) pk_η₃ 0.412
18 (ϵ-shrinkage) conc 0.138

This will return a DataFrame with the metrics.

4 Concluding Remarks

This tutorial showcased a subset of the steps users want to perform after a model fit:

  1. Calculate uncertainty of the estimated parameters
  2. Calculate log-likelihood and information criteria for fit assessment and model comparison purposes
  3. Calculate shrinkage numbers to assess over-parametrization or excessive sparseness of data

Other tutorials will cover many more diagnostics steps to be done to asses the quality of the fitted model.

References

Mogensen, P, and A Riseth. 2018. “Optim: A Mathematical Optimization Package for Julia.” Journal of Open Source Software 3 (24).

Reuse