Comparing NONMEM and Pumas models


Andreas Noack

using Pumas
using DataFramesMeta
using Downloads: Downloads
using CSV

1 Introduction

When translating a model from NONMEM to Pumas, it is important to verify that the two models are in fact identical. The better way to check that the two models are identical is to compare the the log-likelihood of the two models evaluated in the same parameter values and using the same data. If the two values match, then you can be almost certain that the translation is correct.

However, NONMEM does not report the log-likelihood. Instead, it reports an objective function value (OFV) which is a function of the log-likelihood. This means that comparing Pumas and NONMEM results require adjusting the output of Pumas’ loglikelihood function to make it comparable to the NONMEM OFV. This tutorial will demonstrate how to make this adjustment and explain why the two numbers are different.

2 An example

To demonstrate how to compare NONMEM’s OFV to Pumas’ loglikelihood, we will use an example model from First, we will download the result file for the model

nonmem_lst = readlines(

Starting with NONMEM version 7.4.0, two different versions of the OFV are reported in the output

findfirst(contains("OBJECTIVE FUNCTION VALUE WITH CONSTANT"), nonmem_lst) |>
i -> begin
    _lines = i-3:i+1
    @info "lines $_lines of lst file"
[ Info: lines 373:377 of lst file
 N*LOG(2PI) CONSTANT TO OBJECTIVE FUNCTION:    1361.8669062093250     
 OBJECTIVE FUNCTION VALUE WITH CONSTANT:       3945.1778071510653     

The value OBJECTIVE FUNCTION VALUE WITHOUT CONSTANT is NONMEM’s traditional OFV which is also reported in the optimization trace. Below we print the last iteration of the trace where the OFV matches the one from the output above

findlast(contains("ITERATION NO.:"), nonmem_lst) |>
i -> begin
    _lines = i:i+1
    @info "lines $_lines of lst file"
[ Info: lines 345:346 of lst file
0ITERATION NO.:   37    OBJECTIVE VALUE:   2583.31090094174        NO. OF FUNC. EVALS.:  28

The value for the TOTAL DATA POINTS NORMALLY DISTRIBUTED line is usually the number of observations in the dataset. We can confirm this with

findfirst(contains("TOT. NO. OF OBS RECS"), nonmem_lst) |>
i -> begin
    _lines = i:i+1
    @info "lines $_lines of lst file"
[ Info: lines 85:86 of lst file
 TOT. NO. OF OBS RECS:      741

Be aware that this is not the case if the model is for censored data (also known as M3).

With the traditional OFV and the number N, we can reconstruct all the numbers in the output above with

NONMEM_OFV_nt = (; N = 741, OFV = 2583.3109009417403)
NONMEM_OFV_nt = (; NONMEM_OFV_nt..., constant = NONMEM_OFV_nt.N * log(2 * π))
    (; NONMEM_OFV_nt..., OFV_with_constant = NONMEM_OFV_nt.OFV + NONMEM_OFV_nt.constant)
NONMEM_OFV_nt = NamedTuple{(:N, :constant, :OFV, :OFV_with_constant)}(NONMEM_OFV_nt)
(N = 741,
 constant = 1361.866906209325,
 OFV = 2583.3109009417403,
 OFV_with_constant = 3945.1778071510653,)

How does this relate to Pumas’ loglikelihood? The answer is simply that OBJECTIVE FUNCTION VALUE WITH CONSTANT is the same as Pumas’ -2*loglikelihood(fit). We will confirm this in the following but first we need a Pumas translation of the model. The NONMEM model to be converted to Pumas is

    nonmem_lst[findfirst(contains("\$SUBROU"), nonmem_lst):findfirst(

IF(ETN.EQ.3) ET=1.3
CL = THETA(2)*((WT/70)**0.75)* EXP(ETA(1))
V = THETA(3)*EXP(ETA(2))

(0, 2)  ; KA
(0, 3)  ; CL
(0, 10) ; V2
(0.02)  ; RUVp
(1)     ; RUVa

0.05    ; iiv CL
0.2     ; iiv V2


IF (W.EQ.0) W = 1
FAKE=1 ; for testing


A Pumas translation of this model could look like

mdl = @model begin
    @param begin
        θKa  RealDomain(lower = 0.0, init = 2.0)
        θCL  RealDomain(lower = 0.0, init = 3.0)
        θVc  RealDomain(lower = 0.0, init = 10.0)

        ωCL  RealDomain(lower = 0.0, init = (0.05))
        ωVc  RealDomain(lower = 0.0, init = (0.2))

        σₐ  RealDomain(lower = 0.0, init = 1.0)
        σₚ  RealDomain(lower = 0.0, init = 0.02)
    @covariates wt
    @random begin
        _CL ~ LogNormal(log(θCL), ωCL)
        _Vc ~ LogNormal(log(θVc), ωVc)
    @pre begin
        Ka = θKa
        CL = _CL * (wt / 70)^0.75
        Vc = _Vc
    @dynamics Depots1Central1
    @derived begin
        μ := @. Central / Vc
        dv ~ @. Normal(μ, μ * σₚ + σₐ)
  Parameters: θKa, θCL, θVc, ωCL, ωVc, σₐ, σₚ
  Random effects: _CL, _Vc
  Covariates: wt
  Dynamical system variables: Depot, Central
  Dynamical system type: Closed form
  Derived: dv
  Observed: dv

Notice that the error model is non-standard. The usual combined error model is Normal(μ, √((μ * σₚ)^2 + σₐ^2)), but the version here matches the NONMEM version as we will see below.

Next step is to load the dataset

data_df =
filter!(t -> != 2, data_df)
data_df[!, "cmt"] .= 1
data_pd = read_pumas(data_df; mdv = :mdv, covariates = [:wt])
┌ Warning: 38 row(s) has(ve) non-missing observation(s) with mdv set to one. mdv is taking precedence.
└ @ Pumas ~/.julia/packages/Pumas/Aiu7N/src/data_parsing/io.jl:2026
39-element Vector{Subject{NamedTuple{(:dv,), Tuple{Vector{Union{Missing, Float64}}}}, Pumas.ConstantCovar{NamedTuple{(:wt,), Tuple{Float64}}}, T3, Vector{Float64}} where T3}:
 Subject(id = 1, ...)
 Subject(id = 3, ...)
 Subject(id = 4, ...)
 Subject(id = 5, ...)
 Subject(id = 6, ...)
 Subject(id = 7, ...)
 Subject(id = 8, ...)
 Subject(id = 9, ...)
 Subject(id = 10, ...)
 Subject(id = 11, ...)
 Subject(id = 32, ...)
 Subject(id = 33, ...)
 Subject(id = 34, ...)
 Subject(id = 35, ...)
 Subject(id = 36, ...)
 Subject(id = 37, ...)
 Subject(id = 38, ...)
 Subject(id = 39, ...)
 Subject(id = 40, ...)

The reported OFVs in the lst file are based on the final parameter values, so to compare the two models we need to read these values. Please notice that we evaluate the value of the log-likelihood in Pumas in the final parameters of the NONMEM run instead of fitting the model in Pumas. Not only will this save us time because model fitting might be slow, but more importantly, it focuses the comparison on just the model formulation and leaves out influences of possible estimation differences between Pumas and NONMEM.

Often, the final estimates of a NONMEM run are stored in an accompanying XML file which is easy to load in most programming languages. However, such an XML file is not included in the output available to us. Instead, we can read the parameter values from lst file

NONMEM_final_v =
    findlast(contains("NPARAMETR:"), nonmem_lst) |>
    i -> parse.(Float64, split(nonmem_lst[i])[2:end])
7-element Vector{Float64}:

This parameter vector can then be converted into a NamedTuple of the parameters used in the Pumas model

NONMEM_final_nt = (;
    θKa = NONMEM_final_v[1],
    θCL = NONMEM_final_v[2],
    θVc = NONMEM_final_v[3],
    σₚ = NONMEM_final_v[4],
    σₐ = NONMEM_final_v[5],
    ωCL = sqrt(NONMEM_final_v[6]),
    ωVc = sqrt(NONMEM_final_v[7]),
(θKa = 2.3172,
 θCL = 54.615,
 θVc = 462.51,
 σₚ = -0.082008,
 σₐ = 4.1796,
 ωCL = 0.3138996654983882,
 ωVc = 0.39600505047284434,)

and with these values, we can confirm that -2*loglikelihood is the same as NONMEM’s OFV with constant

(-2 * loglikelihood(mdl, data_pd, NONMEM_final_nt, FOCE()), NONMEM_OFV_nt.OFV_with_constant)
(3945.177809918999, 3945.1778071510653)

As mentioned, the OFV with constant was only introduced in NONMEM version 7.4.0 so it can be useful to compute the version without the constant which all NONMEM versions report. This can easily be done by using the nobs function in Pumas, i.e.

    -2 * loglikelihood(mdl, data_pd, NONMEM_final_nt, FOCE()) - nobs(data_pd) * log(2 * π),
(2583.310903709674, 2583.3109009417403)

This version can also be useful if you want to compare to the OFV of the NONMEM file in the initial parameters either by reading the value at iteration zero

NONMEM_init_OFV_f =
    findfirst(contains("ITERATION NO."), nonmem_lst) |>
    i -> parse(Float64, match(r"[0-9]+\.[0-9]+", nonmem_lst[i]).match)

or setting MAXEVAL=0. In either case, the OFV with constant is not included in the lst file. Evaluating the model in the initial parameters is by far the fastest way to compare models in the case where the model has not yet been run.

    -2 * loglikelihood(mdl, data_pd, init_params(mdl), FOCE()) - nobs(data_pd) * log(2 * π),
(14995.780626411153, 14995.7806174588)

Parallel NONMEM has a bug that causes the N value to be wrong in some but not all cases. The bug affects the reported OFV with constant so it is safer to compare with the traditional OFV when the lst was based on a parallel NONMEM run. The bug has been reported but is present in the current version (7.5.1) of NONMEM.

It is safe to assume that the model has been correctly translated from NONMEM to Pumas if the OFV of NONMEM matches the adjusted log-likelihood value, as in the examples above, but the opposite might not be true. In our experience, the more likely reason for the values to differ is still an error in the model translation, but there could be other reason as well. Other reasons for a difference could be:

  • The dataset. Sometimes the datasets used for the comparison might not be exactly identical. Alternatively, the dataset might have been read incorrectly if a argument to read_pumas was not been specified. There might also be a difference in how the dataset is filtered, i.e. between NONMEM’s IGNORE and a filter call in Pumas.
  • The parameter values. The values might have been incorrectly entered in the Pumas version. Either because the values have been read in the wrong order or because differences in the parametetrization, i.e. a parameter might be a variance in the NONMEM version but a standard deviation in Pumas version and should therefore be converted with the square root.
  • The interpolation direction. NONMEM uses next observation carried backwards (NOCB) while Pumas’ default is last observation carried forward (LOCB).
  • The empirical Bayes estimates. For complicated models, there might be multiple local extrema of the joint likelihood function (as a function of the random effects). In these cases, there is a risk that NONMEM and Pumas do not compute the same empirical Bayes estimates and therefore the objective functions values will differ. It can therefore be useful to compare the NONMEM’s phi file with the output from Pumas’ empirical_bayes.
  • The algorithms. While Pumas has been extensively tested against NONMEM, there might be minor differences in the numerical integration of the differential equations or the calculation of derivatives. There differences can in rare cases cause the results to differ.

3 Why are the two objective functions different?

The constant term discussed in the previous section does not depend on the model parameters and consequently does not incluence any inference aspects of the model. Some applications choose to leave out the constant for simplicity while other include it order to maintain the link between the probability density function (pdf) of the data and the likelihood function. In the following, we will provide the details behind these two versions of the objective function. To simplify the notation, we will use a simplified model with the same constant term as the Gaussian non-linear mixed effects model.

The likelihood function (e.g., Casella and Berger 2002), for a sample, \(Y_1,\dots,Y_n\), of independent and identically distributed (iid) observations is defined as

\[ L(\theta) = \prod_{i=1}^n p(y_i) \]

where \(p\) is the pdf of \(Y_i\). It is generally much simpler to work with the logarithm of the likelihood function, the log-likelihood, \(\ell(\theta) = \log(L(\theta))\). In

\[ \ell(\theta) = \sum_{i=1}^n \log p(y_i) \]

The value of \(\theta\) that maximizes this function also maximizes \(L(\theta)\).

The pdf of a normally distributed random variable \(Y_i\) is \(\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{\frac{(y_i - \mu)^2}{2\sigma^2}\right\}\) so the log-likelihood of an indenpendently normally distributed sample is

\[ \ell(\theta) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_i^n (y_i - \mu)^2 \]

All terms in the expression have a negative sign and \(2\) in the denominator so it is often simpler to work with

\[ -2\ell(\theta) = n\log(2\pi) + n\log(\sigma^2) + \frac{1}{\sigma^2}\sum_i^n (y_i - \mu)^2 \]

Furthermore, it is clear that the term \(n\log(2\pi)\) does not depend on the model parameters \((\mu,\sigma^2)\) so leaving out the term from the optimization will not change the optimizer. The version without the constant term is the NONMEM OFV

\[ OFV(\theta) = -2\ell(\theta) - n\log(2\pi) = n\log(\sigma^2) + \frac{1}{2\sigma^2}\sum_i^n (y_i - \mu)^2 \]

4 References

Casella, George, and Roger Berger. 2002. Statistical Inference. Duxbury.