Error Models in Non-linear Mixed Effects Models: A Comprehensive Tutorial with Pumas
1 Fundamentals of Error Models in Non-linear Mixed Effects Models
Error models are essential components of non-linear mixed effects (NLME) models as they describe the unexplained variability in our observations. This variability can arise from multiple sources:
- Measurement error (e.g., analytical imprecision)
- Model misspecification
- Biological variation
- Unknown or unaccounted factors
At its core, an error model defines how the observed data relates to the model predictions, accounting for randomness or “noise” in the measurements. In NLME modeling, we typically have:
- Fixed effects: Population parameters that apply to all individuals
- Random effects: Individual deviations from the population values
- Residual error: Unexplained variability in the observations
The residual error is what we are addressing with error models, and it’s crucial to select an appropriate error model that reflects the nature of the data being analyzed.
2 Understanding the Derived Block Expression
In Pumas, the @derived
block provides a natural and intuitive way to specify how observed data relates to model predictions. This syntax offers significant advantages over other software (like NONMEM) by explicitly mapping the dependent variable to an underlying probability distribution.
2.1 Anatomy of the Derived Block
Let’s break down the syntax of a typical derived block:
@derived begin
~ @. Normal(μ, σ)
y end
Each component has a specific meaning:
@derived begin ... end
: Defines the section of the model that specifies the error modely
: The observed dependent variable (e.g., drug concentration, response)~
: The “distributed as” operator, indicating thaty
follows a probability distribution@.
: A broadcast operator that applies the function element-wise (simplifies vectorization)Normal(μ, σ)
: The probability distribution with parameters μ (mean) and σ (standard deviation)
2.2 Sources of Elements in the Syntax
- y: Comes from your dataset column that contains observations, mapped in
read_pumas
- μ: Usually the output of your structural model with or without the random effects (predicted values)
- σ: The standard deviation parameter that characterizes the residual error magnitude
2.3 The Power of Distribution-Based Modeling
Pumas leverages Julia’s rich ecosystem of probability distributions (from Distributions.jl) to provide a wide range of options for error models:
- Continuous distributions: Normal, LogNormal, Gamma, Student’s T, etc.
- Discrete distributions: Bernoulli, Poisson, Negative Binomial, etc.
- Censored/Truncated distributions: For handling data below detection limits or with upper/lower bounds
This approach offers several advantages:
- Interpretability: The model directly specifies the assumed distribution of observations
- Flexibility: Easy to switch between different error structures
- Statistical rigor: Proper handling of different data types
- Extensibility: Can incorporate any distribution available in Julia
3 Continuous Error Model Specifications
Now, let’s examine specific continuous error models in detail:
3.1 Additive Error Model
~ @. Normal(μ, σ) y
- Mathematical form: y = f(\theta, \eta_i) + \epsilon_i, where \epsilon_i \sim N(0, \sigma^2)
- Meaning: Observations vary around the predicted value with constant standard deviation
- Dispersion: \sigma^2
- When to use: When measurement error is constant across the range of predictions
- Examples: Laboratory measurements with constant analytical error
- Interpretation: \sigma represents the standard deviation in the original units of the data. Coefficient of Variation varies with the mean \frac{\sigma}{\mu}.
3.2 Proportional Error Model
~ @. Normal(μ, μ * σ) y
- Mathematical form: y = f(\theta, \eta_i) \times (1 + \epsilon_i), where \epsilon_i \sim N(0, \sigma^2)
- Meaning: Standard deviation increases proportionally with the predicted value
- Dispersion: (μ\sigma)^2
- When to use: When error increases with concentration (common in PK modeling)
- Examples: Drug concentration measurements, where higher concentrations have larger absolute errors
- Interpretation: For small values, \sigma approximately equals the Coefficient of Variation (CV)
3.3 Combined Error Model (Without Correlation)
~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2)) y
- Mathematical form: y = f(\theta, \eta_i) \times (1 + \epsilon_{1i}) + \epsilon_{2i}, where \epsilon_{1i} \sim N(0, \sigma_{prop}^2) and \epsilon_{2i} \sim N(0, \sigma_{add}^2)
- Meaning: Combines both additive and proportional error components
- Dispersion: \sigma_{add}^2 + (\mu \times \sigma_{prop})^2
- When to use: When both constant and proportional errors are present
- Examples: Many biological measurements, especially across wide measurement ranges
- Interpretation: \sigma_{add} is the additive standard deviation component, \sigma_{prop} is approximately the CV if both standard deviations are small.
3.4 Combined Error Model (With Correlation)
~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_cov)) y
- Mathematical form: y = f(\theta, \eta_i) + \epsilon_i, where \epsilon_i has a more complex covariance structure
- Meaning: Includes correlation between additive and proportional errors
- Dispersion: \sigma_{add}^2 + (\mu \times \sigma_{prop})^2 + 2 \times \mu \times \sigma_{cov}
- When to use: When there’s evidence of correlation between error components
- Examples: Complex analytical methods with interdependent error sources
- Interpretation: \sigma_{cov} represents the covariance between additive and proportional errors
3.5 Log-Normal Models
There are several ways to implement log-normal error models:
- Direct Log-Normal specification:
~ @. LogNormal(log(μ), σ) y
- Mathematical form: \log(y) \sim N(\log(\mu), \sigma^2)
- Meaning: Observations follow a log-normal distribution
- Dispersion: \sigma^2
- When to use: When data are strictly positive and errors increase with the response
- Interpretation: For small \sigma values, \sigma ≈ CV but CV is generally expressed as \sqrt{e^{\sigma^2}-1}
- Log-transform of the dependent variable:
~ @. Normal(log(μ), σ) logy
- Mathematical form: \log(y) = \log(f(\theta, \eta_i)) + \epsilon_i, where \epsilon_i \sim N(0, \sigma^2)
- Meaning: Log-transformed observations follow a normal distribution
- Dispersion: \sigma^2
- When to use: Alternative specification for log-normal errors, especially when data are already log-transformed
- Interpretation: Same as above, but working directly with the log-transformed data
3.6 The Connection Between Proportional Error and Log-Normal Error
Interestingly, proportional and log-normal error models are closely related:
- For proportional error: y = \mu \times (1 + \epsilon_i), where \epsilon_i \sim N(0, \sigma^2)
- For log-normal error: y = \mu \times \exp(\epsilon_i), where \epsilon_i \sim N(0, \sigma^2)
These models are approximately equivalent:
- Log-normal models are often appropriate when proportional error seems reasonable
- For data with large variance, log-normal often fits better than proportional
- Converting between models requires understanding their mathematical relationship
This relationship also explains why log-transformation of data with proportional error often results in homoscedastic residuals (constant variance across the prediction range).
3.7 Student’s T Distribution
~ @. μ + σ * TDist(ν) y
- Meaning: Observations follow a t-distribution with ν degrees of freedom
- When to use: When data may contain outliers (t-distribution has heavier tails than Normal)
- Dispersion: ν
- Examples: Datasets suspected to contain outliers
3.8 Gamma Distribution
~ @. Gamma(ν, μ / ν) y
- Meaning: Observations follow a gamma distribution
- Dispersion:
ν^-1
- When to use: For strictly positive, right-skewed data
- Examples: Clearance values, certain biomarker measurements
An advantage with the Pumas approach where we specify the conditional distribution of the dependent variable explicitly is that you are free to use other distributions as well. E.g. it can be argued that the (Gaussian) proportional error model is just an approximation of a Gamma model. Hence, instead of approximating the Gamma model with the proportional error model, you could write
~ @. Gamma(ν, cp / ν) dv
where ν is the dispersion parameter. An advantage with this approach is that you avoid the risk of negative concentrations when simulating this model. A potential drawback is that you might need to remove any zero concentrations from your sample when estimating as zero is often unlikely in a Gamma distribution.
For the time being, you’d have to use LaplaceI
for estimation when using the Gamma distribution we hope to add support for using FOCE
as well.
Yet another alternative you could consider is LogNormal, i.e.
~ @. LogNormal(log(cp), σ) dv
which is very similar to the Gamma and proportional error model for small σ though the relationship between the conditional mean and cp is nicer for Gamma.
For the LogNormal both the FOCE
and LaplaceI
approximations of the marginal likelihood are supported.
4 Handling Incomplete Data
4.1 Methods for Handling Below Limit of Quantification (BLQ) Data
In pharmacometric modeling, data below the lower limit of quantification (LLOQ) presents a common challenge. Pumas provides several approaches for handling such data:
4.1.1 M1 Method: Discarding BLQ Data
The simplest approach is to discard BLQ data entirely. While easy to implement, this method:
- Discards potentially valuable information
- May lead to biased parameter estimates
- Is not recommended when a substantial portion of data is BLQ
In Pumas, this is done implicitly by filtering out BLQ observations before analysis.
4.1.2 M2 Method: Replacing BLQ with LLOQ/2
The M2 method replaces BLQ values with LLOQ/2 (or another fraction). In Pumas, this would be done during data preprocessing:
# Example of preprocessing for M2 method
= @transform(data, :dv = ismissing(:dv) || :dv < lloq ? lloq / 2 : :dv) df
Then use a standard error model:
@derived begin
:= @. Central / V
cp ~ @. Normal(cp, σ)
dv end
4.1.3 M3 Method: Likelihood-Based Approach with Censored Distribution
~ @. censored_latent(Normal(μ, σ); lower = lloq) y
- Meaning: Treats BLQ data as censored observations, incorporating the probability that the true value is below LLOQ
- When to use: When significant proportion of data is BLQ but still informative
- Examples: Early or late time points in PK studies
Read more about the M3 method in Pumas here.
4.1.4 M4 Method: Separate Model for BLQ Probability
M4 is simply M3 with the assumption that the original data was also naturally truncated.
@derived begin
:= @. Central / Vc
cp := @. Normal(cp, sqrt(σ_add^2 + (cp * σ_prop)^2))
cp_normal ~ @. truncated(cp_normal, 0, Inf)
cp_trunc ~ @. censored_latent(cp_trunc, LLOQ, Inf)
cp end
4.2 Truncated Distribution
~ @. truncated(Normal(μ, σ); lower = l, upper = u) y
- Meaning: Normal distribution truncated at specified boundaries
- When to use: When data can only exist within certain physical bounds
- Approximation method: LaplaceI()
- Examples: Physiologically constrained measurements, bounded response variables
4.3 Handling Upper Limit of Quantification (ULOQ)
Similar to LLOQ, data above ULOQ can be handled using censored distributions:
~ @. censored_latent(Normal(μ, σ), nothing, uloq) y
using the full function signature
censored_latent(dist, lower, upper)
where dist
is the uncensored (latent) distribution, lower
is the lower limit of quantification (set to nothing
if there is no lower level) and upper
is the upper limit of quantification (omitted or set to nothing
if there is no upper level).
4.4 Complete Model for Both LLOQ and ULOQ
~ @. censored_latent(Normal(μ, σ), lloq, uloq) y
- Meaning: Handles both lower and upper censoring simultaneously
- When to use: For assays with both LLOQ and ULOQ where some observations fall outside these limits
Before diving into specific error models, it’s important to understand how they relate to the statistical properties of our data and how they’re parameterized in Pumas.
5 Parameterization Best Practices
To ensure good convergence and interpretability:
- Use untransformed parameters: Define σ directly, not log(σ) or σ².
- Constrain to positive values: Use
RealDomain(lower=0)
. - Avoid over-parameterization: For combined error models, ensure both components are identifiable.
- Scaling: Consider the magnitude of your data when setting initial estimates.
6 Practical Considerations for Error Model Selection and Implementation
6.1 Choosing Between Error Models
Selecting the appropriate error model is a critical decision that can impact parameter estimates and model predictions. Here are some data-driven approaches to guide your selection:
Exploratory data analysis:
- Plot residual vs. predicted values to identify patterns
- Create quantile-quantile plots to check for normality
- Examine variance across the range of observations
Model comparison metrics:
- AIC/BIC: Lower values indicate better fit
- Likelihood ratio test for nested models
- Visual predictive checks (VPCs)
Scientific considerations:
- Mechanism of measurement error
- Biological plausibility
- Literature precedent for similar data
6.2 Common Pitfalls and Solutions
6.2.1 Over-parameterization
Problem: Combined error models may not be identifiable if data doesn’t support both components.
Solution:
- Start with simpler error models
- Check parameter correlation matrix
- Consider fixing one component if both aren’t identifiable
6.2.2 Poorly Scaled Initial Estimates
Problem: Optimization may fail if initial values are orders of magnitude off.
Solution:
- Use exploratory analysis to inform initial estimates
- Consider the units and magnitude of your observations
- Start with established values from literature when available
6.2.3 Confounding Between Inter-individual Variability and Residual Error
Problem: Unable to distinguish between sources of variability.
Solution:
- Ensure rich sampling for each individual
- Consider study design factors
- Use informative priors in Bayesian approaches
6.3 Transformations and Alternative Parameterizations
Sometimes, transforming the data or reparameterizing the error model can improve estimation:
6.3.1 Log-transformation
Instead of:
~ @. LogNormal(log(μ), σ) y
You can use:
= log.(y) # Pre-transform data
log_y ~ @. Normal(log(μ), σ) log_y
Benefits:
- Often improves numerical stability
- May be more intuitive for some modelers
- Can be easier to diagnose
6.3.2 Variance Parameterization
Some modelers prefer to work with variance parameters:
# Define variance parameters
@param begin
∈ RealDomain(lower = 0)
σ²_add ∈ RealDomain(lower = 0)
σ²_prop end
# Use square root in error model
@derived begin
~ @. Normal(μ, sqrt(σ²_add + (μ * sqrt(σ²_prop))^2))
y end
6.3.3 Coefficient of Variation Parameterization
For proportional error, you can parameterize directly in terms of CV:
@param begin
∈ RealDomain(lower = 0, upper = 1) # Coefficient of variation (0-100%)
CV end
@derived begin
~ @. Normal(μ, μ * CV)
y end
6.4 Understanding the Difference Between Standard Deviation and Variance in Initial Estimates
In Pumas, scalar error models are specified using standard deviation (σ) rather than variance (σ²). This distinction is important for several reasons:
- Interpretation: Standard deviation is in the same units as the data, making interpretation more intuitive.
- Parameterization: When providing initial estimates, you specify σ, not σ².
- Model definition: In the
@derived
block, distributions expect standard deviation parameters.
For example, in a Normal distribution:
~ @. Normal(μ, σ) # σ is standard deviation, not variance y
Converting between variance and standard deviation:
- Standard deviation = √Variance
- Variance = Standard deviation²
6.5 Coefficient of Variation and Proportional Error
For proportional error models, the error parameter (σ_prop) is directly related to the coefficient of variation (CV):
- For small values of σ_prop (< 0.3), CV ≈ σ_prop
- More generally: CV = √(exp(σ_prop²) - 1)
This relationship helps interpret the magnitude of proportional error in familiar terms.
6.6 Initial Estimates for Error Models
Providing good initial estimates for error models is crucial for convergence:
- Additive error (σ): Start with approximately 10-20% of the mean observation value.
- Proportional error (σ_prop): Start with 0.1-0.3 (representing 10-30% CV).
- Combined error: Start with smaller values for both components (e.g., σ_add ≈ 5-10% of mean observation, σ_prop ≈ 0.05-0.15).
Example of setting initial parameters:
# For concentration data with mean around 100 ng/mL
= (
θ_init # Other parameters...
= 10.0, # 10 ng/mL standard deviation (not variance!)
σ_add = 0.2, # 20% proportional error
σ_prop )
A common source of confusion when working with error models is the distinction between standard deviation (σ) and variance (σ²). This is particularly important when:
- Setting initial estimates: Pumas expects standard deviation, not variance
- Interpreting results: Reported values are standard deviations if the suggestion to use \sigma in @param is used
- Comparing with other software: Some programs (like NONMEM) may report variance
6.6.1 Example: Correct vs. Incorrect Initial Estimates
Correct approach:
# If we expect 20% proportional error (CV = 0.2)
= 0.2 # Standard deviation
σ_prop_init
# If we expect 10 units of additive error
= 10.0 # Standard deviation σ_add_init
Incorrect approach:
# WRONG: Using variance instead of standard deviation
= 0.04 # This is actually the variance (0.2²)
σ_prop_init = 100.0 # This is actually the variance (10²) σ_add_init
6.6.2 Converting Literature Values
If a publication reports variance values, convert to standard deviation for Pumas:
# Converting from published variance to standard deviation
= sqrt(σ²_published) σ_literature
6.6.3 Relationship to Coefficient of Variation
For proportional error models, the relationship between standard deviation parameter (σ_prop) and CV:
- For smaller values (σ_prop < 0.3): CV ≈ σ_prop
- For any value: CV = √(exp(σ_prop²) - 1)
Understanding these relationships helps you properly interpret and compare results across different modeling platforms.
7 Implementation Examples in Pumas
Here’s a complete example of implementing different error models in a Pumas model:
using Pumas
# Define a simple one-compartment PK model with different error models
= @model begin
model_add @param begin
∈ VectorDomain(4)
θ ∈ PDiagDomain(2)
Ω ∈ RealDomain()
σ end
@random begin
~ MvNormal(Ω)
η end
@covariates wt
@pre begin
= θ[1] * exp(η[1]) * (wt / 70)^0.75
CL = θ[2] * exp(η[2]) * (wt / 70)
V = θ[3]
Ka = θ[4]
F end
@dynamics begin
' = -Ka * Depot
Depot' = Ka * Depot - (CL / V) * Central
Centralend
@derived begin
= @. Central / V
cp ~ @. Normal(cp, σ) # Additive error model
dv end
end
# Similar model with proportional error
= @model begin
model_prop # ... same as above until @derived
@derived begin
= @. Central / V
cp ~ @. Normal(cp, cp * σ) # Proportional error model
dv end
end
# Combined error model
= @model begin
model_combined @param begin
∈ VectorDomain(4)
θ ∈ PDiagDomain(2)
Ω ∈ RealDomain()
σ_add ∈ RealDomain()
σ_prop end
# ... same as above until @derived
@derived begin
= @. Central / V
cp ~ @. Normal(cp, sqrt(σ_add^2 + (cp * σ_prop)^2)) # Combined error model
dv end
end
# Log-normal error model
= @model begin
model_lognormal # ... same as above until @derived
@derived begin
= @. Central / V
cp ~ @. LogNormal(log(cp), σ) # Log-normal error model
dv end
end
# Handling BLQ data with M3 method
= @model begin
model_m3 # ... same as above until @derived
@derived begin
= @. Central / V
cp ~ @. censored_latent(Normal(cp, σ), lloq) # M3 method
dv end
end
# Multiple response variables
= @model begin
model_multi # ... param, random, etc.
@derived begin
= @. Central / V
cp = @. Emax * cp / (EC50 + cp)
effect
# PK observation
~ @. Normal(cp, cp * σ_pk)
dv_pk
# PD observation (ordered categorical)
~ @. Normal(effect, σ_pd)
dv_pd end
end
8 Model Diagnostics for Error Models
After fitting a model, it’s crucial to assess if the chosen error model is appropriate:
Key diagnostics to evaluate:
- CWRES vs. predictions: Should show random scatter around zero
- QQ plots: Should follow the line for appropriate distribution
- VPC: Observed data should fall within prediction intervals
- Shrinkage: High shrinkage may indicate model misspecification
9 Conclusion
Error models are a fundamental component of non-linear mixed effects modeling, determining how we relate observations to model predictions. Pumas provides a flexible, intuitive syntax for specifying error models through the @derived
block, supporting a wide range of continuous and discrete distributions.
The ability to implement sophisticated error structures with minimal code is a significant advantage of Pumas over many other NLME platforms. By leveraging Julia’s rich ecosystem of probability distributions, Pumas enables modelers to select the most appropriate error model for their data.
From handling BLQ data through methods like M3 to modeling complex time-to-event outcomes, Pumas provides comprehensive tools for addressing various data challenges. The availability of multiple likelihood approximation methods ensures that even complex models can be estimated efficiently.
This tutorial only covers the error models for continuous data. Discrete data is covered in the later tutorials.