Visual Predictive Checks for Continuous Data Models

Author

Patrick Kofod Mogensen

using Pumas
using PharmaDatasets
using DataFramesMeta
using CairoMakie
using AlgebraOfGraphics
using PumasUtilities

1 Learning Objectives

By the end of this tutorial, you will be able to generate and interpret VPC plots in Pumas, explore misspecification via simulation, and apply stratification and prediction correction.

2 Introduction

The original Visual Predictive Check was proposed to compare the observed data to prediction intervals from an estimated model. If the model is not in agreement with the data we will observe data curves outside of the prediction intervals and we may reject the model. In a Bayesian context, a similar plot is used to verify the posterior predictive distribution against the data.

In practice, the interpretation is not so easy. Covariates including different treatments, biologically significant differences between patients, and different sample times complicates matters significantly. This means that the clear visual check quickly becomes muddied. In this tutorial, we will see how to produce a VPC for continuous data in Pumas and we will try to adjust for some of the complications involved with using the VPC in practice.

First, we set up a Warfarin PK model.

warfarin_pk_model = @model begin

    @metadata begin
        desc = "Warfarin 1-compartment PK model (PD removed)"
        timeu = u"hr"
    end

    @param begin
        # PK parameters
        """
        Clearance (L/hr)
        """
        pop_CL  RealDomain(lower = 0.0, init = 0.134)
        """
        Central volume (L)
        """
        pop_Vc  RealDomain(lower = 0.0, init = 8.11)
        """
        Absorption lag time (hr)
        """
        pop_tabs  RealDomain(lower = 0.0, init = 0.523)

        # Inter-individual variability
        """
          - ΩCL: Clearance
          - ΩVc: Central volume
        """
        pk_Ω  PDiagDomain([0.01, 0.01])
        # Residual variability
        """
        σ_prop: Proportional error
        """
        σ_prop  RealDomain(lower = 0.0, init = 0.00752)
    end

    @random begin
        pk_η ~ MvNormal(pk_Ω)    # mean = 0, covariance = pk_Ω
    end

    @covariates begin
        """
        FSZCL: Clearance scaling factor
        """
        FSZCL
        """
        FSZV: Volume scaling factor
        """
        FSZV
    end

    @pre begin
        CL = FSZCL * pop_CL * exp(pk_η[1])
        Vc = FSZV * pop_Vc * exp(pk_η[2])

        tabs = pop_tabs
        Ka = log(2) / tabs
    end

    @vars begin
        cp := Central / Vc
    end

    @dynamics begin
        Depot' = -Ka * Depot
        Central' = Ka * Depot - (CL / Vc) * Central
    end

    @derived begin
        """
        Concentration (ng/mL)
        """
        conc ~ @. Normal(cp, sqrt((σ_prop * cp)^2))
    end
end
PumasModel
  Parameters: pop_CL, pop_Vc, pop_tabs, pk_Ω, σ_prop
  Random effects: pk_η
  Covariates: FSZCL, FSZV
  Dynamical system variables: Depot, Central
  Dynamical system type: Matrix exponential
  Derived: conc
  Observed: conc
warfarin_data = dataset("paganz2024/warfarin_long")

@chain warfarin_data begin
    @groupby :ID :TIME :DVID
    @transform :tmp = length(:ID)
    @rsubset :tmp > 1
end

warfarin_data_processed = @chain warfarin_data begin
    @groupby :ID :TIME :DVID
    @transform :tmp = 1:length(:ID)
    @rtransform :TIME = :tmp == 1 ? :TIME : :TIME + 1e-6
    @select Not(:tmp)
end

# Transform the data in a single chain of operations
warfarin_data_wide = @chain warfarin_data_processed begin
    @rsubset !contains(:ID, "#")
    @rtransform begin
        # Scaling factors
        :FSZV = :WEIGHT / 70            # volume scaling
        :FSZCL = (:WEIGHT / 70)^0.75     # clearance scaling (allometric)
        # Column name for the DV
        :DVNAME = "DV$(:DVID)"
        # Dosing indicator columns
        :CMT = ismissing(:AMOUNT) ? missing : 1
        :EVID = ismissing(:AMOUNT) ? 0 : 1
    end
    unstack(Not([:DVID, :DVNAME, :DV]), :DVNAME, :DV)
    rename!(:DV1 => :conc, :DV2 => :pca)
end
317×13 DataFrame
292 rows omitted
Row ID TIME WEIGHT AGE SEX AMOUNT FSZV FSZCL CMT EVID DV0 pca conc
String3 Float64 Float64 Int64 Int64 Float64? Float64 Float64 Int64? Int64 Float64? Float64? Float64?
1 1 0.0 66.7 50 1 100.0 0.952857 0.96443 1 1 missing missing missing
2 1 0.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 100.0 missing
3 1 24.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 49.0 9.2
4 1 36.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 32.0 8.5
5 1 48.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 26.0 6.4
6 1 72.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 22.0 4.8
7 1 96.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 28.0 3.1
8 1 120.0 66.7 50 1 missing 0.952857 0.96443 missing 0 missing 33.0 2.5
9 2 0.0 66.7 31 1 100.0 0.952857 0.96443 1 1 missing missing missing
10 2 0.0 66.7 31 1 missing 0.952857 0.96443 missing 0 missing 100.0 missing
11 2 0.5 66.7 31 1 missing 0.952857 0.96443 missing 0 missing missing 0.0
12 2 2.0 66.7 31 1 missing 0.952857 0.96443 missing 0 missing missing 8.4
13 2 3.0 66.7 31 1 missing 0.952857 0.96443 missing 0 missing missing 9.7
306 31 48.0 83.3 24 1 missing 1.19 1.13936 missing 0 missing 24.0 6.4
307 31 72.0 83.3 24 1 missing 1.19 1.13936 missing 0 missing 22.0 4.5
308 31 96.0 83.3 24 1 missing 1.19 1.13936 missing 0 missing 28.0 3.4
309 31 120.0 83.3 24 1 missing 1.19 1.13936 missing 0 missing 42.0 2.5
310 32 0.0 62.0 21 1 93.0 0.885714 0.912999 1 1 missing missing missing
311 32 0.0 62.0 21 1 missing 0.885714 0.912999 missing 0 missing 100.0 missing
312 32 24.0 62.0 21 1 missing 0.885714 0.912999 missing 0 missing 36.0 8.9
313 32 36.0 62.0 21 1 missing 0.885714 0.912999 missing 0 missing 27.0 7.7
314 32 48.0 62.0 21 1 missing 0.885714 0.912999 missing 0 missing 24.0 6.9
315 32 72.0 62.0 21 1 missing 0.885714 0.912999 missing 0 missing 23.0 4.4
316 32 96.0 62.0 21 1 missing 0.885714 0.912999 missing 0 missing 20.0 3.5
317 32 120.0 62.0 21 1 missing 0.885714 0.912999 missing 0 missing 22.0 2.5
pop = read_pumas(
    warfarin_data_wide;
    id = :ID,
    time = :TIME,
    amt = :AMOUNT,
    cmt = :CMT,
    evid = :EVID,
    covariates = [:SEX, :WEIGHT, :FSZV, :FSZCL],
    observations = [:conc],
)
Population
  Subjects: 31
  Covariates: SEX, WEIGHT, FSZV, FSZCL
  Observations: conc
foce_fit = fit(warfarin_pk_model, pop, init_params(warfarin_pk_model), FOCE())
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
Iter     Function value   Gradient norm 
     0     7.293083e+04     1.461056e+05
 * time: 0.019902944564819336
     1     9.818354e+03     1.953327e+04
 * time: 0.7908179759979248
     2     7.222705e+03     1.429007e+04
 * time: 0.9616858959197998
     3     3.173581e+03     6.054138e+03
 * time: 0.9866287708282471
     4     1.786367e+03     3.182851e+03
 * time: 1.0113868713378906
     5     1.008524e+03     1.523930e+03
 * time: 1.0362298488616943
     6     6.628559e+02     7.392148e+02
 * time: 1.0604078769683838
     7     5.074320e+02     3.380632e+02
 * time: 1.0861709117889404
     8     4.488285e+02     1.409490e+02
 * time: 1.1124999523162842
     9     4.321458e+02     4.815869e+01
 * time: 1.1397178173065186
    10     4.293762e+02     3.611674e+01
 * time: 1.1664648056030273
    11     4.291782e+02     3.810027e+01
 * time: 1.1914119720458984
    12     4.291570e+02     3.798194e+01
 * time: 1.2148637771606445
    13     4.290988e+02     3.654733e+01
 * time: 1.2384648323059082
    14     4.289761e+02     3.190818e+01
 * time: 1.2619178295135498
    15     4.287351e+02     2.028140e+01
 * time: 1.286815881729126
    16     4.284295e+02     1.615961e+01
 * time: 1.311743974685669
    17     4.282338e+02     1.409536e+01
 * time: 1.337871789932251
    18     4.281796e+02     1.775240e+01
 * time: 1.3619389533996582
    19     4.281680e+02     1.641484e+01
 * time: 1.3849267959594727
    20     4.281590e+02     1.490102e+01
 * time: 1.4081189632415771
    21     4.281320e+02     1.464321e+01
 * time: 1.431372880935669
    22     4.280727e+02     1.477070e+01
 * time: 1.45560884475708
    23     4.279464e+02     1.487623e+01
 * time: 1.480719804763794
    24     4.277608e+02     1.479359e+01
 * time: 1.561551809310913
    25     4.276099e+02     2.182121e+01
 * time: 1.587122917175293
    26     4.275547e+02     2.510023e+01
 * time: 1.6110379695892334
    27     4.275402e+02     2.460539e+01
 * time: 1.6356477737426758
    28     4.275273e+02     2.373992e+01
 * time: 1.6610620021820068
    29     4.274912e+02     2.199978e+01
 * time: 1.6859629154205322
    30     4.274025e+02     1.928910e+01
 * time: 1.7115728855133057
    31     4.271657e+02     1.765219e+01
 * time: 1.7373127937316895
    32     4.265541e+02     3.361826e+01
 * time: 1.7636089324951172
    33     4.250214e+02     5.665070e+01
 * time: 1.7899267673492432
    34     4.222442e+02     7.740390e+01
 * time: 1.8159708976745605
    35     4.202011e+02     7.057124e+01
 * time: 1.8406548500061035
    36     4.179514e+02     4.045950e+01
 * time: 1.8655848503112793
    37     4.170700e+02     7.924687e+00
 * time: 1.8906378746032715
    38     4.170338e+02     9.918974e+00
 * time: 1.911681890487671
    39     4.170322e+02     9.740094e+00
 * time: 1.9316740036010742
    40     4.170294e+02     9.580525e+00
 * time: 1.9518818855285645
    41     4.170224e+02     9.354483e+00
 * time: 1.973510980606079
    42     4.170044e+02     8.980420e+00
 * time: 1.9970958232879639
    43     4.169579e+02     8.341740e+00
 * time: 2.0208258628845215
    44     4.168403e+02     1.197122e+01
 * time: 2.046637773513794
    45     4.165578e+02     1.828136e+01
 * time: 2.072518825531006
    46     4.159678e+02     2.403113e+01
 * time: 2.1215438842773438
    47     4.151117e+02     2.418037e+01
 * time: 2.1470868587493896
    48     4.144995e+02     1.738597e+01
 * time: 2.170883893966675
    49     4.142440e+02     8.526452e+00
 * time: 2.1948938369750977
    50     4.141656e+02     1.084232e+00
 * time: 2.2185938358306885
    51     4.141623e+02     1.683615e-01
 * time: 2.2410969734191895
    52     4.141623e+02     5.899685e-02
 * time: 2.2612109184265137
    53     4.141623e+02     5.110076e-02
 * time: 2.2814009189605713
    54     4.141623e+02     5.095926e-02
 * time: 2.2994728088378906
    55     4.141623e+02     5.095855e-02
 * time: 2.321917772293091
    56     4.141623e+02     5.083699e-02
 * time: 2.3398277759552
    57     4.141623e+02     5.083694e-02
 * time: 2.36370587348938
    58     4.141623e+02     5.083689e-02
 * time: 2.396068811416626
    59     4.141623e+02     5.083453e-02
 * time: 2.4234459400177
    60     4.141623e+02     5.083413e-02
 * time: 2.4542768001556396
    61     4.141623e+02     4.994785e-02
 * time: 2.4739527702331543
    62     4.141623e+02     4.969511e-02
 * time: 2.493306875228882
    63     4.141623e+02     4.809471e-02
 * time: 2.51387095451355
    64     4.141622e+02     4.566696e-02
 * time: 2.5334978103637695
    65     4.141622e+02     5.884378e-02
 * time: 2.553788900375366
    66     4.141621e+02     8.013542e-02
 * time: 2.5743069648742676
    67     4.141620e+02     8.789628e-02
 * time: 2.5943429470062256
    68     4.141618e+02     6.143283e-02
 * time: 2.6160337924957275
    69     4.141618e+02     2.027742e-02
 * time: 2.6388628482818604
    70     4.141618e+02     2.038166e-03
 * time: 2.6602327823638916
    71     4.141618e+02     1.523421e-04
 * time: 2.679537773132324
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                   -414.16175
Number of subjects:                             31
Number of parameters:         Fixed      Optimized
                                  0              6
Observation records:         Active        Missing
    conc:                       239             47
    Total:                      239             47

------------------------
             Estimate
------------------------
pop_CL        0.13281
pop_Vc        8.2582
pop_tabs      0.93142
pk_Ω₁,₁       0.037972
pk_Ω₂,₂       0.0092456
σ_prop        0.21277
------------------------

3 What exactly is a VPC?

A Visual Predictive Check (VPC) is a plot constructed to identify misspecification of the model. It attempts to compare the distribution of the data with the distribution of simulations from the estimated model. Essentially constructed in parts:

flowchart TD
    Start(["Start"]) --> Q1["For pcVPC calculate the prediction median function<br>and apply it to the observed independent variable."]
    Q1 --> Q2["Calculate the sequence of quantiles (0.1, 0.5, 0.9)<br>for the (corrected) dependent variable observations<br>across the independent variable nodes"]
    Q2 --> Q3
    Q2 --> Q6
    Q3["Simulate all datasets using estimated parameters with<br>random effects and error model sampling (and apply<br>correction for pcVPC)"] --> Q4["Calculate quantiles for each simulated dataset"]
    Q4 --> Q5["Determine prediction intervals across<br>simulations per bin/location"]
    Q5 --> Q6["Plot results (observed vs simulated intervals)"]
    Q6 --> Q7["Check if observed medians lie within intervals"]
    Q7 --> End(["End"])

    %% Labels and styles
    subgraph n1 ["VPC Workflow"]
    end

    classDef step fill:#9ecfdf,stroke:#000,stroke-width:1px,color:#000
    classDef Peach stroke-width:1px,stroke-dasharray:none,stroke:#FBB35A,fill:#FFEFDB,color:#8F632D

    class Q1,Q2,Q3,Q4,Q5,Q6,Q7 step
    class n1 Peach

This differs from other diagnostics such as weighted residuals by considering not only data compared to the modeled mean, but tries to make sure that more or less extreme subjects are also well-predicted by the model.

4 Applying the VPC workflow in Pumas

In Pumas we have a predefined workflow for simulating the required data as well as summarizing the data using the binless VPC method. In short, the binless method avoids binning data across the independent variable. Instead, it applies a smoothed local quantile regression to combine information from data that is close together according to the kernel used. Binless VPCs are advantageous over traditional VPCs as they allow smoother representation of model-data agreement rather than across few discrete intervals.

The workflow consists of just two functions:

  • vpc(...) to simulate the required data and calculate the quantities used for lines and ribbons in the final plots
  • vpc_plot(...) from PumasUtilities that takes the data and presents it in a plot
VPC with simobs

It is also possible to simulate the data using simobs but we will not cover that approach in this tutorial.

4.1 How to simulate?

To simulate, we use the vpc function along with input on how the simulations and quantile calculations should be performed. The most simple invocation of the function is:

myvpc = vpc(foce_fit)
[ Info: Continuous VPC
Visual Predictive Check
  Type of VPC: Continuous VPC
  Simulated populations: 499
  Subjects in data: 31
  Stratification variable(s): None
  Confidence level: 0.95
  VPC lines: quantiles ([0.1, 0.5, 0.9])

This creates an object that has all the information we’ll need. Here’s a brief explanation for each line of the VPC output:

  • Visual Predictive Check: This is the name of the diagnostic tool used to assess how well your model predicts observed data by comparing simulated data distributions.
  • Type of VPC: Continuous VPC: The VPC was performed for a dependent variable (e.g., concentration) that changes continuously over time.
  • Simulated populations: 499: The model generated 499 synthetic datasets to create prediction intervals for comparison.
  • Subjects in data: 31: Your original dataset contains 31 individual subjects.
  • Stratification variable(s): None: The VPC was performed on the entire dataset without splitting it into subgroups (e.g., by sex or age).
  • Confidence level: 0.95: The shaded prediction intervals on the plot are calculated to contain 95% of the simulated data.
  • VPC lines: quantiles ([0.1, 0.5, 0.9]): The VPC plot will display the 10th percentile (0.1), median (0.5), and 90th percentile (0.9) of the simulated data.

While the above function call will produce data for a VPC we often need to make adjustments in practice. Consider the (partial) signature where we omitted some options for clarity of this tutorial:

vpc(
    fpm::AbstractFittedPumasModel;
    samples::Integer = 499,
    observations::Vector{Symbol} = [keys(fpm.data[1].observations)[1]],
    covariates::Vector{Symbol} = [:time],
    stratify_by::Vector{Symbol} = Symbol[],
    bandwidth = nothing,
    prediction_correction::Bool = false,
)

Some important potential keyword arguments in the vpc function are:

  • samples (default is 500) to control the number times the population is re-simulated according to the fitted model
  • observations is a vector of symbols (e.g. [:conc]) to control what endpoint to perform the VPC for if there are multiple present in the model
  • covariates (default is :time) is a vector of symbols (e.g. [:tad]) to control the first axis variable
  • stratify_by to control stratification of VPC calculations and plotting
  • bandwidth is a number (default is 2.0 for continuous VPCs) to control the level of smoothing
  • prediction_correction (default is false) to apply a type of correction for differences in subjects. If prediction correction is applied it is currently not possible to also use stratification.

Please see the docstring (?vpc) for all options and defaults.

With the output of vpc in hand, we can continue to plotting the VPC.

4.2 How to plot?

Plotting is done with the vpc_plot function. This function accepts the output of vpc() and can be customized in the same way as the rest of the built-in plotting functions covered earlier.

vpc_plot(myvpc, figurelegend = (; position = :b, nbanks = 2, tellheight = true))

The VPC looks good. The data medians are generally well within the ribbons.

4.3 A misspecified model

To produce a VPC that does not look good we can intentionally constrain the model to not produce the estimates found above. Let us test whether forcing a higher pop_CL can be detected as misspecification in our setup:

foce_fit_CL = fit(
    warfarin_pk_model,
    pop,
    (init_params(warfarin_pk_model)..., pop_CL = 0.15),
    FOCE();
    constantcoef = (:pop_CL,),
    optim_options = (; show_trace = false),
)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
FittedPumasModel

Successful minimization:                      true

Likelihood approximation:                     FOCE
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                   -418.66271
Number of subjects:                             31
Number of parameters:         Fixed      Optimized
                                  1              5
Observation records:         Active        Missing
    conc:                       239             47
    Total:                      239             47

-----------------------
             Estimate
-----------------------
pop_CL        0.15
pop_Vc        8.3615
pop_tabs      0.93424
pk_Ω₁,₁       0.051854
pk_Ω₂,₂       0.010071
σ_prop        0.21565
-----------------------

Then we perform the required calculations

myvpc_CL = vpc(foce_fit_CL)
[ Info: Continuous VPC
Visual Predictive Check
  Type of VPC: Continuous VPC
  Simulated populations: 499
  Subjects in data: 31
  Stratification variable(s): None
  Confidence level: 0.95
  VPC lines: quantiles ([0.1, 0.5, 0.9])

And produce the plot

vpc_plot(myvpc_CL, figurelegend = (; position = :b, nbanks = 2, tellheight = true))

In this case, the model does not look well-specified, and we should consider not fixing the population level clearance parameter. Note, that the interpretation is up to the person looking at the plot. In this case, it appears as if the fit is decent for early time points and bad for later time points. This would tend to rule out a problem in the absorption phase. The question is then if it’s misspecified because of clearance or maybe we didn’t correctly account for a distribution phase with many peripheral compartments? In this case, we introduced the problem ourselves so we have a good idea, but in general it’s not always easy.

4.4 Stratification

If a covariate with a limited number of values is thought to be significant in the model or if we want to make sure that the fit is equally good across strata we should produce a stratified plot.

my_vpc_sex = vpc(foce_fit; stratify_by = [:SEX])
[ Info: Continuous VPC
Visual Predictive Check
  Type of VPC: Continuous VPC
  Simulated populations: 499
  Subjects in data: 31
  Stratification variable(s): [:SEX]
  Confidence level: 0.95
  VPC lines: quantiles ([0.1, 0.5, 0.9])
vpc_plot(my_vpc_sex; figurelegend = (; position = :b, nbanks = 2, tellheight = true))

As we can see, stratification also allows us to investigate the data more closely. Supposedly, the data median starts at a large positive number at time point 0! This leads to a model misspecification according to the red outlier band. When outliers are observed we need to consider at least three possible scenarios:

  • Is the data wrong? There could be unphysiological values that we need to explain and that our model could not be expected to explain. Hopefully, exploratory data analysis (EDA) caught such values.
  • Is the model wrong? If the data looks good, maybe the model simply cannot fit these data points. We may need to take a closer look at the model fit for specific subjects that contribute to this upper part of the observed distribution.
  • Is the VPC misspecified? Just as binned VPCs require appropriate binning, the binless VPC has aggregation of similar data points by use of local regression. We can try to vary the bandwidth parameter to see if deviating from the default value of 2

Let us try experiment with the bandwidth. In the dataset we observe that there are some very large observations at :TIME == 2.0. These could be biasing the curves for the observed data.

my_vpc_sex = vpc(foce_fit; stratify_by = [:SEX], bandwidth = 1.9)
[ Info: Continuous VPC
Visual Predictive Check
  Type of VPC: Continuous VPC
  Simulated populations: 499
  Subjects in data: 31
  Stratification variable(s): [:SEX]
  Confidence level: 0.95
  VPC lines: quantiles ([0.1, 0.5, 0.9])
vpc_plot(my_vpc_sex; figurelegend = (; position = :b, nbanks = 2, tellheight = true))

We see that the black curve now goes down to a small value at :TIME == 0.0 as expected. The problem highlights something that is true for all VPCs and should always be kept in mind. We are aggregating data across time and covariate differences, so it is not always easy to know what the VPC curve “should” look like when the scenarios become complex. Here, we felt comfortable concluding that the first VPC did not invalidate our fit because we don’t think any subjects should start with positive values of the drug in plasma, but it may not always be easy to say if the VPC or the fit is the reason for an apparent indication of misspecification.

4.5 Changing to Time-After-Dose (tad)

When producing a VPC it is often suggested to plot time-after-dose as the first axis variable and stratify by the number of the dose unless the dosage regimen is more complicated and several doses are administered at the same time or very close together. To change the variable on the first axis, we use the covariates keyword. Stratification is done as before.

my_vpc_sex = vpc(foce_fit; covariates = [:tad], stratify_by = [:dosenum], bandwidth = 1.9)
[ Info: Continuous VPC
Visual Predictive Check
  Type of VPC: Continuous VPC
  Simulated populations: 499
  Subjects in data: 31
  Stratification variable(s): [:dosenum]
  Confidence level: 0.95
  VPC lines: quantiles ([0.1, 0.5, 0.9])
vpc_plot(my_vpc_sex; figurelegend = (; position = :b, nbanks = 2, tellheight = true))

In our example, there is not real change because we only have a single dose per subject.

4.6 Prediction correction

Prediction-corrected Visual Predictive Checks (pcVPCs) were developed to account for variability in independent variables (e.g., dose, covariates, or time-varying predictors) that may influence predictions but cannot easily be stratified. In such cases, raw observed and simulated values may differ simply due to differences in design rather than model misspecification. The pcVPC corrects these design-related differences by normalizing both observed and simulated outcomes relative to the population-level prediction at each time point (typically using the model-predicted median). This allows for a clearer comparison of the model’s predictive performance across heterogeneous conditions.

To perform a prediction corrected VPC in Pumas, simply set the prediction_correction keyword to true.

my_pcvpc = vpc(foce_fit; prediction_correction = true, bandwidth = 1.9)
vpc_plot(my_pcvpc; figurelegend = (; position = :t, nbanks = 2, tellheight = true))
[ Info: Continuous VPC

In this instance, the prediction correction does not cause much of a difference.

5 Conclusion

In this tutorial, we saw how to produce a VPC from a fitted model. We showed how to look for misspecification and some of the levers and controls a user has to customize the plot.

6 Further Reading

If you want to read more on the application and properties of visual predictive checks, please have a look at the following papers.

1. Karlsson, M. O., & Holford, N. H. (2008). A tutorial on visual predictive checks. PAGE 17, Abstract 1434. http://www.page-meeting.org/?abstract=1434
2. Bergstrand, M., Hooker, A. C., Wallin, J. E., & Karlsson, M. O. (2011). Prediction-corrected visual predictive checks for diagnosing nonlinear mixed-effects models. CPT: Pharmacometrics & Systems Pharmacology, 89(2), 204–217. https://doi.org/10.1038/clpt.2010.268 
3. Ette, E. I., & Williams, P. J. (2004). Population pharmacokinetics II: estimation methods. Annals of Pharmacotherapy, 38(11), 1907–1915. https://doi.org/10.1345/aph.1E225
4. Bonate, P. L. (2011). Pharmacokinetic-Pharmacodynamic Modeling and Simulation (2nd ed.). Springer. Chapters on model evaluation and simulation diagnostics.