Error Models in Non-linear Mixed Effects Models: A Comprehensive Tutorial with Pumas

Author

Vijay Ivaturi

1 Fundamentals of Error Models in Non-linear Mixed Effects Models

Error models are essential components of non-linear mixed effects (NLME) models as they describe the unexplained variability in our observations. This variability can arise from multiple sources:

  • Measurement error (e.g., analytical imprecision)
  • Model misspecification
  • Biological variation
  • Unknown or unaccounted factors

At its core, an error model defines how the observed data relates to the model predictions, accounting for randomness or “noise” in the measurements. In NLME modeling, we typically have:

  • Fixed effects: Population parameters that apply to all individuals
  • Random effects: Individual deviations from the population values
  • Residual error: Unexplained variability in the observations

The residual error is what we are addressing with error models, and it’s crucial to select an appropriate error model that reflects the nature of the data being analyzed.

2 Understanding the Derived Block Expression

In Pumas, the @derived block provides a natural and intuitive way to specify how observed data relates to model predictions. This syntax offers significant advantages over other software (like NONMEM) by explicitly mapping the dependent variable to an underlying probability distribution.

2.1 Anatomy of the Derived Block

Let’s break down the syntax of a typical derived block:

@derived begin
    y ~ @. Normal(μ, σ)
end

Each component has a specific meaning:

  • @derived begin ... end: Defines the section of the model that specifies the error model
  • y: The observed dependent variable (e.g., drug concentration, response)
  • ~: The “distributed as” operator, indicating that y follows a probability distribution
  • @.: A broadcast operator that applies the function element-wise (simplifies vectorization)
  • Normal(μ, σ): The probability distribution with parameters μ (mean) and σ (standard deviation)

2.2 Sources of Elements in the Syntax

  • y: Comes from your dataset column that contains observations, mapped in read_pumas
  • μ: Usually the output of your structural model with or without the random effects (predicted values)
  • σ: The standard deviation parameter that characterizes the residual error magnitude

2.3 The Power of Distribution-Based Modeling

Pumas leverages Julia’s rich ecosystem of probability distributions (from Distributions.jl) to provide a wide range of options for error models:

  • Continuous distributions: Normal, LogNormal, Gamma, Student’s T, etc.
  • Discrete distributions: Bernoulli, Poisson, Negative Binomial, etc.
  • Censored/Truncated distributions: For handling data below detection limits or with upper/lower bounds

This approach offers several advantages:

  1. Interpretability: The model directly specifies the assumed distribution of observations
  2. Flexibility: Easy to switch between different error structures
  3. Statistical rigor: Proper handling of different data types
  4. Extensibility: Can incorporate any distribution available in Julia

3 Continuous Error Model Specifications

Now, let’s examine specific continuous error models in detail:

3.1 Additive Error Model

y ~ @. Normal(μ, σ)
  • Mathematical form: y = f(\theta, \eta_i) + \epsilon_i, where \epsilon_i \sim N(0, \sigma^2)
  • Meaning: Observations vary around the predicted value with constant standard deviation
  • Dispersion: \sigma^2
  • When to use: When measurement error is constant across the range of predictions
  • Examples: Laboratory measurements with constant analytical error
  • Interpretation: \sigma represents the standard deviation in the original units of the data. Coefficient of Variation varies with the mean \frac{\sigma}{\mu}.

3.2 Proportional Error Model

y ~ @. Normal(μ, μ * σ)
  • Mathematical form: y = f(\theta, \eta_i) \times (1 + \epsilon_i), where \epsilon_i \sim N(0, \sigma^2)
  • Meaning: Standard deviation increases proportionally with the predicted value
  • Dispersion: (μ\sigma)^2
  • When to use: When error increases with concentration (common in PK modeling)
  • Examples: Drug concentration measurements, where higher concentrations have larger absolute errors
  • Interpretation: For small values, \sigma approximately equals the Coefficient of Variation (CV)

3.3 Combined Error Model (Without Correlation)

y ~ @. Normal(μ, sqrt(σ_add^2 +* σ_prop)^2))
  • Mathematical form: y = f(\theta, \eta_i) \times (1 + \epsilon_{1i}) + \epsilon_{2i}, where \epsilon_{1i} \sim N(0, \sigma_{prop}^2) and \epsilon_{2i} \sim N(0, \sigma_{add}^2)
  • Meaning: Combines both additive and proportional error components
  • Dispersion: \sigma_{add}^2 + (\mu \times \sigma_{prop})^2
  • When to use: When both constant and proportional errors are present
  • Examples: Many biological measurements, especially across wide measurement ranges
  • Interpretation: \sigma_{add} is the additive standard deviation component, \sigma_{prop} is approximately the CV if both standard deviations are small.

3.4 Combined Error Model (With Correlation)

y ~ @. Normal(μ, sqrt(σ_add^2 +* σ_prop)^2 + 2 * μ * σ_cov))
  • Mathematical form: y = f(\theta, \eta_i) + \epsilon_i, where \epsilon_i has a more complex covariance structure
  • Meaning: Includes correlation between additive and proportional errors
  • Dispersion: \sigma_{add}^2 + (\mu \times \sigma_{prop})^2 + 2 \times \mu \times \sigma_{cov}
  • When to use: When there’s evidence of correlation between error components
  • Examples: Complex analytical methods with interdependent error sources
  • Interpretation: \sigma_{cov} represents the covariance between additive and proportional errors

3.5 Log-Normal Models

There are several ways to implement log-normal error models:

  1. Direct Log-Normal specification:
y ~ @. LogNormal(log(μ), σ)
  • Mathematical form: \log(y) \sim N(\log(\mu), \sigma^2)
  • Meaning: Observations follow a log-normal distribution
  • Dispersion: \sigma^2
  • When to use: When data are strictly positive and errors increase with the response
  • Interpretation: For small \sigma values, \sigma ≈ CV but CV is generally expressed as \sqrt{e^{\sigma^2}-1}
  1. Log-transform of the dependent variable:
logy ~ @. Normal(log(μ), σ)
  • Mathematical form: \log(y) = \log(f(\theta, \eta_i)) + \epsilon_i, where \epsilon_i \sim N(0, \sigma^2)
  • Meaning: Log-transformed observations follow a normal distribution
  • Dispersion: \sigma^2
  • When to use: Alternative specification for log-normal errors, especially when data are already log-transformed
  • Interpretation: Same as above, but working directly with the log-transformed data

3.6 The Connection Between Proportional Error and Log-Normal Error

Interestingly, proportional and log-normal error models are closely related:

  • For proportional error: y = \mu \times (1 + \epsilon_i), where \epsilon_i \sim N(0, \sigma^2)
  • For log-normal error: y = \mu \times \exp(\epsilon_i), where \epsilon_i \sim N(0, \sigma^2)

These models are approximately equivalent:

  1. Log-normal models are often appropriate when proportional error seems reasonable
  2. For data with large variance, log-normal often fits better than proportional
  3. Converting between models requires understanding their mathematical relationship

This relationship also explains why log-transformation of data with proportional error often results in homoscedastic residuals (constant variance across the prediction range).

3.7 Student’s T Distribution

y ~ @. μ + σ * TDist(ν)
  • Meaning: Observations follow a t-distribution with ν degrees of freedom
  • When to use: When data may contain outliers (t-distribution has heavier tails than Normal)
  • Dispersion: ν
  • Examples: Datasets suspected to contain outliers

3.8 Gamma Distribution

y ~ @. Gamma(ν, μ / ν)
  • Meaning: Observations follow a gamma distribution
  • Dispersion: ν^-1
  • When to use: For strictly positive, right-skewed data
  • Examples: Clearance values, certain biomarker measurements
Beyond the Gaussian error models

An advantage with the Pumas approach where we specify the conditional distribution of the dependent variable explicitly is that you are free to use other distributions as well. E.g. it can be argued that the (Gaussian) proportional error model is just an approximation of a Gamma model. Hence, instead of approximating the Gamma model with the proportional error model, you could write

dv ~ @. Gamma(ν, cp / ν)

where ν is the dispersion parameter. An advantage with this approach is that you avoid the risk of negative concentrations when simulating this model. A potential drawback is that you might need to remove any zero concentrations from your sample when estimating as zero is often unlikely in a Gamma distribution.

For the time being, you’d have to use LaplaceI for estimation when using the Gamma distribution we hope to add support for using FOCE as well.

Yet another alternative you could consider is LogNormal, i.e.

dv ~ @. LogNormal(log(cp), σ)

which is very similar to the Gamma and proportional error model for small σ though the relationship between the conditional mean and cp is nicer for Gamma.

For the LogNormal both the FOCE and LaplaceI approximations of the marginal likelihood are supported.

4 Handling Incomplete Data

4.1 Methods for Handling Below Limit of Quantification (BLQ) Data

In pharmacometric modeling, data below the lower limit of quantification (LLOQ) presents a common challenge. Pumas provides several approaches for handling such data:

4.1.1 M1 Method: Discarding BLQ Data

The simplest approach is to discard BLQ data entirely. While easy to implement, this method:

  • Discards potentially valuable information
  • May lead to biased parameter estimates
  • Is not recommended when a substantial portion of data is BLQ

In Pumas, this is done implicitly by filtering out BLQ observations before analysis.

4.1.2 M2 Method: Replacing BLQ with LLOQ/2

The M2 method replaces BLQ values with LLOQ/2 (or another fraction). In Pumas, this would be done during data preprocessing:

# Example of preprocessing for M2 method
df = @transform(data, :dv = ismissing(:dv) || :dv < lloq ? lloq / 2 : :dv)

Then use a standard error model:

@derived begin
    cp := @. Central / V
    dv ~ @. Normal(cp, σ)
end

4.1.3 M3 Method: Likelihood-Based Approach with Censored Distribution

y ~ @. censored_latent(Normal(μ, σ); lower = lloq)
  • Meaning: Treats BLQ data as censored observations, incorporating the probability that the true value is below LLOQ
  • When to use: When significant proportion of data is BLQ but still informative
  • Examples: Early or late time points in PK studies

Read more about the M3 method in Pumas here.

4.1.4 M4 Method: Separate Model for BLQ Probability

M4 is simply M3 with the assumption that the original data was also naturally truncated.


@derived begin
    cp := @. Central / Vc
    cp_normal := @. Normal(cp, sqrt(σ_add^2 + (cp * σ_prop)^2))
    cp_trunc ~ @. truncated(cp_normal, 0, Inf)
    cp ~ @. censored_latent(cp_trunc, LLOQ, Inf)
end

4.2 Truncated Distribution

y ~ @. truncated(Normal(μ, σ); lower = l, upper = u)
  • Meaning: Normal distribution truncated at specified boundaries
  • When to use: When data can only exist within certain physical bounds
  • Approximation method: LaplaceI()
  • Examples: Physiologically constrained measurements, bounded response variables

4.3 Handling Upper Limit of Quantification (ULOQ)

Similar to LLOQ, data above ULOQ can be handled using censored distributions:

y ~ @. censored_latent(Normal(μ, σ), nothing, uloq)

using the full function signature

censored_latent(dist, lower, upper)

where dist is the uncensored (latent) distribution, lower is the lower limit of quantification (set to nothing if there is no lower level) and upper is the upper limit of quantification (omitted or set to nothing if there is no upper level).

4.4 Complete Model for Both LLOQ and ULOQ

y ~ @. censored_latent(Normal(μ, σ), lloq, uloq)
  • Meaning: Handles both lower and upper censoring simultaneously
  • When to use: For assays with both LLOQ and ULOQ where some observations fall outside these limits

Before diving into specific error models, it’s important to understand how they relate to the statistical properties of our data and how they’re parameterized in Pumas.

5 Parameterization Best Practices

To ensure good convergence and interpretability:

  1. Use untransformed parameters: Define σ directly, not log(σ) or σ².
  2. Constrain to positive values: Use RealDomain(lower=0).
  3. Avoid over-parameterization: For combined error models, ensure both components are identifiable.
  4. Scaling: Consider the magnitude of your data when setting initial estimates.

6 Practical Considerations for Error Model Selection and Implementation

6.1 Choosing Between Error Models

Selecting the appropriate error model is a critical decision that can impact parameter estimates and model predictions. Here are some data-driven approaches to guide your selection:

  1. Exploratory data analysis:

    • Plot residual vs. predicted values to identify patterns
    • Create quantile-quantile plots to check for normality
    • Examine variance across the range of observations
  2. Model comparison metrics:

    • AIC/BIC: Lower values indicate better fit
    • Likelihood ratio test for nested models
    • Visual predictive checks (VPCs)
  3. Scientific considerations:

    • Mechanism of measurement error
    • Biological plausibility
    • Literature precedent for similar data

6.2 Common Pitfalls and Solutions

6.2.1 Over-parameterization

Problem: Combined error models may not be identifiable if data doesn’t support both components.

Solution:

  • Start with simpler error models
  • Check parameter correlation matrix
  • Consider fixing one component if both aren’t identifiable

6.2.2 Poorly Scaled Initial Estimates

Problem: Optimization may fail if initial values are orders of magnitude off.

Solution:

  • Use exploratory analysis to inform initial estimates
  • Consider the units and magnitude of your observations
  • Start with established values from literature when available

6.2.3 Confounding Between Inter-individual Variability and Residual Error

Problem: Unable to distinguish between sources of variability.

Solution:

  • Ensure rich sampling for each individual
  • Consider study design factors
  • Use informative priors in Bayesian approaches

6.3 Transformations and Alternative Parameterizations

Sometimes, transforming the data or reparameterizing the error model can improve estimation:

6.3.1 Log-transformation

Instead of:

y ~ @. LogNormal(log(μ), σ)

You can use:

log_y = log.(y)  # Pre-transform data
log_y ~ @. Normal(log(μ), σ)

Benefits:

  • Often improves numerical stability
  • May be more intuitive for some modelers
  • Can be easier to diagnose

6.3.2 Variance Parameterization

Some modelers prefer to work with variance parameters:

# Define variance parameters
@param begin
    σ²_add  RealDomain(lower = 0)
    σ²_prop  RealDomain(lower = 0)
end

# Use square root in error model
@derived begin
    y ~ @. Normal(μ, sqrt(σ²_add +* sqrt(σ²_prop))^2))
end

6.3.3 Coefficient of Variation Parameterization

For proportional error, you can parameterize directly in terms of CV:

@param begin
    CV  RealDomain(lower = 0, upper = 1)  # Coefficient of variation (0-100%)
end

@derived begin
    y ~ @. Normal(μ, μ * CV)
end

6.4 Understanding the Difference Between Standard Deviation and Variance in Initial Estimates

In Pumas, scalar error models are specified using standard deviation (σ) rather than variance (σ²). This distinction is important for several reasons:

  1. Interpretation: Standard deviation is in the same units as the data, making interpretation more intuitive.
  2. Parameterization: When providing initial estimates, you specify σ, not σ².
  3. Model definition: In the @derived block, distributions expect standard deviation parameters.

For example, in a Normal distribution:

y ~ @. Normal(μ, σ)  # σ is standard deviation, not variance

Converting between variance and standard deviation:

  • Standard deviation = √Variance
  • Variance = Standard deviation²

6.5 Coefficient of Variation and Proportional Error

For proportional error models, the error parameter (σ_prop) is directly related to the coefficient of variation (CV):

  • For small values of σ_prop (< 0.3), CV ≈ σ_prop
  • More generally: CV = √(exp(σ_prop²) - 1)

This relationship helps interpret the magnitude of proportional error in familiar terms.

6.6 Initial Estimates for Error Models

Providing good initial estimates for error models is crucial for convergence:

  1. Additive error (σ): Start with approximately 10-20% of the mean observation value.
  2. Proportional error (σ_prop): Start with 0.1-0.3 (representing 10-30% CV).
  3. Combined error: Start with smaller values for both components (e.g., σ_add ≈ 5-10% of mean observation, σ_prop ≈ 0.05-0.15).

Example of setting initial parameters:

# For concentration data with mean around 100 ng/mL
θ_init = (
    # Other parameters...
    σ_add = 10.0,    # 10 ng/mL standard deviation (not variance!)
    σ_prop = 0.2,     # 20% proportional error
)

A common source of confusion when working with error models is the distinction between standard deviation (σ) and variance (σ²). This is particularly important when:

  1. Setting initial estimates: Pumas expects standard deviation, not variance
  2. Interpreting results: Reported values are standard deviations if the suggestion to use \sigma in @param is used
  3. Comparing with other software: Some programs (like NONMEM) may report variance

6.6.1 Example: Correct vs. Incorrect Initial Estimates

Correct approach:

# If we expect 20% proportional error (CV = 0.2)
σ_prop_init = 0.2  # Standard deviation

# If we expect 10 units of additive error
σ_add_init = 10.0  # Standard deviation

Incorrect approach:

# WRONG: Using variance instead of standard deviation
σ_prop_init = 0.04  # This is actually the variance (0.2²)
σ_add_init = 100.0  # This is actually the variance (10²)

6.6.2 Converting Literature Values

If a publication reports variance values, convert to standard deviation for Pumas:

# Converting from published variance to standard deviation
σ_literature = sqrt(σ²_published)

6.6.3 Relationship to Coefficient of Variation

For proportional error models, the relationship between standard deviation parameter (σ_prop) and CV:

  • For smaller values (σ_prop < 0.3): CV ≈ σ_prop
  • For any value: CV = √(exp(σ_prop²) - 1)

Understanding these relationships helps you properly interpret and compare results across different modeling platforms.

7 Implementation Examples in Pumas

Here’s a complete example of implementing different error models in a Pumas model:

using Pumas

# Define a simple one-compartment PK model with different error models
model_add = @model begin
    @param begin
        θ  VectorDomain(4)
        Ω  PDiagDomain(2)
        σ  RealDomain()
    end

    @random begin
        η ~ MvNormal(Ω)
    end

    @covariates wt

    @pre begin
        CL = θ[1] * exp(η[1]) * (wt / 70)^0.75
        V = θ[2] * exp(η[2]) * (wt / 70)
        Ka = θ[3]
        F = θ[4]
    end

    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL / V) * Central
    end

    @derived begin
        cp = @. Central / V
        dv ~ @. Normal(cp, σ)  # Additive error model
    end
end

# Similar model with proportional error
model_prop = @model begin
    # ... same as above until @derived

    @derived begin
        cp = @. Central / V
        dv ~ @. Normal(cp, cp * σ)  # Proportional error model
    end
end

# Combined error model
model_combined = @model begin
    @param begin
        θ  VectorDomain(4)
        Ω  PDiagDomain(2)
        σ_add  RealDomain()
        σ_prop  RealDomain()
    end

    # ... same as above until @derived

    @derived begin
        cp = @. Central / V
        dv ~ @. Normal(cp, sqrt(σ_add^2 + (cp * σ_prop)^2))  # Combined error model
    end
end

# Log-normal error model
model_lognormal = @model begin
    # ... same as above until @derived

    @derived begin
        cp = @. Central / V
        dv ~ @. LogNormal(log(cp), σ)  # Log-normal error model
    end
end

# Handling BLQ data with M3 method
model_m3 = @model begin
    # ... same as above until @derived

    @derived begin
        cp = @. Central / V
        dv ~ @. censored_latent(Normal(cp, σ), lloq)  # M3 method
    end
end

# Multiple response variables
model_multi = @model begin
    # ... param, random, etc.

    @derived begin
        cp = @. Central / V
        effect = @. Emax * cp / (EC50 + cp)

        # PK observation
        dv_pk ~ @. Normal(cp, cp * σ_pk)

        # PD observation (ordered categorical)
        dv_pd ~ @. Normal(effect, σ_pd)
    end
end

8 Model Diagnostics for Error Models

After fitting a model, it’s crucial to assess if the chosen error model is appropriate:

Key diagnostics to evaluate:

  • CWRES vs. predictions: Should show random scatter around zero
  • QQ plots: Should follow the line for appropriate distribution
  • VPC: Observed data should fall within prediction intervals
  • Shrinkage: High shrinkage may indicate model misspecification

9 Conclusion

Error models are a fundamental component of non-linear mixed effects modeling, determining how we relate observations to model predictions. Pumas provides a flexible, intuitive syntax for specifying error models through the @derived block, supporting a wide range of continuous and discrete distributions.

The ability to implement sophisticated error structures with minimal code is a significant advantage of Pumas over many other NLME platforms. By leveraging Julia’s rich ecosystem of probability distributions, Pumas enables modelers to select the most appropriate error model for their data.

From handling BLQ data through methods like M3 to modeling complex time-to-event outcomes, Pumas provides comprehensive tools for addressing various data challenges. The availability of multiple likelihood approximation methods ensures that even complex models can be estimated efficiently.

This tutorial only covers the error models for continuous data. Discrete data is covered in the later tutorials.