A Forest Plot Workflow in Pumas

Author

Patrick Kofod Mogensen

# Use packages
using Pumas, PumasUtilities, Random, DataFramesMeta
# Set seed
seed = Random.seed!(234)

Welcome to this Pumas tutorial on Forest Plots. Forest plots are used to show covariate model impact on model parameters and and secondary parameters (e.g. exposure metrics) in pharmacometric analyses. In the pharmacometric context of model evaluation they were introduced by Menon-Andersen et. al. (2011) “Essential pharmacokinetic Information for Drug Dosage Decisions: A Concise Visual presentation in the Drug Label”. We will go through the Pumas features for carrying out these steps in this tutorial.

Beta feature

In Pumas v2.6 an experimental table_forest_plot function was added that is supplied with any Pumas distribution. This function is to be considered a beta or experimental feature. This means that the interface may be be subject to changes before the function is released in its final form.

1 The Forest Plot

Before we dive into the simulations and plotting, let’s first clarify the goal of producing a forest plot in pharmacometrics and verbally describe the process. Code sections below will clarify all details.

Overall, the following steps are needed to build a forest plot:

  1. Model Preparation: Prepare a model for assessment, potentially with fitted parameters and inference information, if simulations with parameter uncertainty are desired.
  2. Simulation Dataset Specification: Specify the reference and alternative subjects to be used for the simulations.
  3. Simulation Type: Define the type of simulation to perform—whether to include parameter uncertainty, Between-Subject Variability (BSV) and Residual Unexplained Variability (RUV), or other approaches.
  4. Extraction: Extract exposure and individual coefficient data from the simulations.
  5. Summarization: Summarize the simulation results for the endpoints of interest, focusing on the quantiles of the quantities relative to the reference simulation.
  6. Plotting: Visualize the results from the previous steps with appropriate plots.

1.1 Model preparation

After building a model we often have covariate relationships in the final model that we want to understand. Some usual ones are age, weight, sex, and race, but given the context we can come up with many more examples such as route or other formulation information. These covariates are represented within the model with a functional form and parameters estimated from data, derived through statistical methods for model selection.

Consumers of the (covariate) model need to understand the clinical impact of covariate variations. For example, it’s not obvious from the model alone if a 10% decrease in body weight results in a 5%, 10%, or 20% change in drug exposure at steady state (AUC) just by looking at the functial form and parameter size. Simulations help clarify these effects, and a forest plot effectively summarizes the results visually.

1.2 Simulation dataset specification

To evaluate the impact of covariates on clinical outcomes, we run simulations grouped by covariates or combinations of covariates. Taking body weight as an example, we might choose 70 kg as the reference value. Often, this reference value is pre-defined in the model; however, we can adjust it if simulating for a population different from the model’s original dataset. Then, we define alternative values, such as 50 kg and 90 kg, to see how changes in weight might affect the clinical outcome.

In practice, we set up a hypothetical subject with all covariates at reference values. Then, for each covariate we want to evaluate at alternative values we set up an additional subject. Here, we would set the weight covariate value to 50 kg for one alternative subject and 90 kg for another. Then, we can simulate that subject multiple times.

Note

Be careful when setting up alternative covariate values as there is nothing restricting you from setting alternative values that are out of the range of values used in the estimation part of the workflow. If you use too extreme values you will be extrapolating beyond the data used and the model may not be valid in that region.

1.3 Simulation type

Once the virtual subjects are defined, we simulate to obtain clinical outcomes for each group. The FDA’s Guidance for Industry: Population Pharmacokinetics suggests three potential approaches for simulations, depending on the objectives:

  • Simulating with parameter uncertainty only: This approach excludes between-subject variability (BSV) and residual unexplained variability (RUV). It is often used to assess a typical subject’s ability to reach a drug exposure target or evaluate covariate impact. Given our objective to assess covariate effects, this method may be appropriate.

  • Simulating with BSV and RUV: Here, parameters are fixed to their estimated values, while BSV and RUV are allowed to vary. Although this provides different insights, it is not the standard approach for assessing covariate parameter effects. Covariate effects are typically driven by population parameters (fixed effects, or “thetas” in NONMEM terminology) and covariate values. Using simulation with uncertainty is generally more aligned with the objective of covariate effect assessment.

  • Simulating solely with fixed effects: This method fixes parameters and excludes variability, making it unsuitable for this case. Without variability, the intervals produced for the plot rely entirely on the point estimates of the model fixed effects, which could lead to overconfidence in the results.

1.4 Extract exposure and individual coefficient data

The simulation output will often have to be transformed a bit before you have the final numbers to be used. This may be because things like clearance, volume, or other individual coefficients have to be extracted from the simulation output, or because you need a derived quantity of the simulation output to quantify such as AUC, Cmax, or something else.

  • For exposure measurements there might be more to consider. You can get an AUC in many ways, so the users has to understand exactly what is needed to answer the questions. If the quantity of interest is AUC or Cmax at steady state calculated using the usual NCA rules then using NCA.jl is a very convenient approach. This allows us to take simobs output and generate NCASubjects that will allow us to easily calculate exposure measures. If instead the exposure measure is some function of model output or you want to calculate the exposure in a specific way then a more custom approach may have to be used. We will go through both below.

  • The individual coefficients (icoefs in Pumas vocabulary) can often be extracted directly from the simulation output so there is not much to consider there. We will show below how to do it.

1.5 Summarization

After running the simulations and extracting or generating the relevant measure, we identify a central reference value, usually the median, to use for comparisons. We then calculate the ratio of outcomes in alternative scenarios to this reference value and analyze the distribution of these ratios. This may be done separately or as part of the forest plot functionality.

1.6 Plotting

While the above information could just as well be presented in a table many people prefer to create forest plots for very quick identification of numbers outside specified regions. Forest plots display data generated in the step of summarization in a interpretable way, typically as central values with confidence intervals or empirical density plots.

2 An example

2.1 Model preparation

First, let us define the model to be used in this example. A simple one compartment model with a weight and sex effect on clearance of the drug. Besides the Central compartment we add a compartment to calculate the AUC of the Central(t) curve over time. In the @observed block we divide by volume to obtain the concentration AUC instead.

mdl_auc = @model begin
    @param begin
        tvCL ~ RealDomain(lower = 0.5, upper = 1.5, init = 1.0)
        tvVc ~ RealDomain(lower = 0.1, upper = 20.0, init = 10.0)
        WEIGHTonCL ~ RealDomain(lower = -2.0, upper = 2.0, init = 1.2)
        SEXonCL ~ RealDomain(lower = -2.0, upper = 2.0, init = 1.4)
    end

    @random begin
        etaVc ~ Normal(0.0, 0.05)
        etaCL ~ Normal(0.0, 0.15)
    end
    @covariates WEIGHT SEX
    @pre begin
        CL = tvCL * (WEIGHT / 70)^WEIGHTonCL * SEXonCL^(SEX == "MALE") * exp(etaCL)
        Vc = tvVc * exp(etaVc)
    end
    @dynamics begin
        Central' = -CL / Vc * Central
        CentralAUC' = Central
    end

    @derived begin
        dv ~ @. LogNormal(log(Central / Vc), 0.1)
    end
    @observed begin
        AUC = CentralAUC ./ Vc
    end
end
PumasModel
  Parameters: tvCL, tvVc, WEIGHTonCL, SEXonCL
  Random effects: etaVc, etaCL
  Covariates: WEIGHT, SEX
  Dynamical system variables: Central, CentralAUC
  Dynamical system type: Matrix exponential
  Derived: dv
  Observed: dv, AUC

Let us set up a hypothetical dataset, simulate from it, fit parameters based on the simulated dataset, and run parameter inference:

# Number of subjects to simulate
data_n = 40
# Time points to simulate at
data_time = [0.0, 0.5, 1.0, 4.0, 5.0, 24.0]
# Dose is 10 unit IV Bolus
data_event = DosageRegimen(10; route = NCA.IVBolus)
# Weight distribution has mean around 70
# A more advanced situations may have different distributions across SEX
weight_dist = LogNormal(4.25, 0.2)
# Sex distribution is equal probability for each outcome
sample_covariates(rng) =
    (WEIGHT = rand(rng, weight_dist), SEX = rand(rng, ["MALE", "FEMALE"]))

# Grab initial parameters from model
param0 = init_params(mdl_auc)

# Set up the Subjects to simulate trajectories for
data_population = [
    Subject(
        id = "data" * string(n),
        time = data_time,
        events = data_event,
        covariates = sample_covariates(seed),
    ) for n = 1:data_n
]

# Simulate
simulated_data = simobs(mdl_auc, data_population, param0)

# Fit the model given the simulated data
data_result = fit(
    mdl_auc,
    Subject.(simulated_data),
    param0,
    FOCE();
    optim_options = (; show_trace = false),
)

# Calculate inference based on the asymptotic variance-covariance matrix
data_inference = infer(data_result)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator

Successful minimization:                      true

Likelihood approximation:                     FOCE
Likelihood Optimizer:                         BFGS
Dynamical system type:          Matrix exponential

Log-likelihood value:                    328.09034
Number of subjects:                             40
Number of parameters:         Fixed      Optimized
                                  0              4
Observation records:         Active        Missing
    dv:                         240              0
    Total:                      240              0

----------------------------------------------------------------
              Estimate          SE                 95.0% C.I.
----------------------------------------------------------------
tvCL           1.0025         0.037285       [0.92937;  1.0755]
tvVc          10.054          0.11788        [9.8227 ; 10.285 ]
WEIGHTonCL     1.2081         0.11338        [0.98583;  1.4303]
SEXonCL        1.4353         0.069532       [1.299  ;  1.5715]
----------------------------------------------------------------

We see that we needed to specify observation times, events, and covariates for our Subjects. Then we simulated observations, fit the model and estimated the variance-covariance matrix. Since we now have point estimates for the fixed effects and parameter uncertainty, we are ready to simulate for our forest plots.

2.2 Simulation dataset preparation

A key aspect of the forest plot generation is specifying the datasets to be simulated for later comparison. The idea with the forest plot is, that each row of the forest plot contains values that compare change to only one covariate change compared to the reference specification at a time. This is typically called a univariate or univariable covariate effect. This is of course different than asking questions about what subjects are realistic to draw from the population. It is a theoretical construct to understand the effect of each covariate on its own. Of course, the size of the effect may well depend on the reference values that we pick. In Pumas, we use NamedTuples when specifying covariates:

cov = (; WEIGHT = 70, SEX = "FEMALE")
(WEIGHT = 70,
 SEX = "FEMALE",)

Now, if we want to update the SEX value we can use the merge function. We take the “reference” NamedTuple and merge it with another named tuple that contains the things we want to update:

alt_cov = merge(cov, (; SEX = "MALE"))
(WEIGHT = 70,
 SEX = "MALE",)

We will use this in the code below.

If we want to, it is also possible to show a change of several covariates at a time but in that case it is of course useful to also have the univariate changes in the plot. As an example, if the reference subject is a male of 70 kg we may want to compare their results to a female of 70 kg or a male of 50 kg. However, we may also want to see the effect of being a female of 50 kg. This is not the usual approach of varying only one covariate at a time, but if we did have the first to alternative simulations as described we may be able to also get information out of showing the (female, 50kg) alternative in the plot as well.

There is a special consideration we have not discussed about so far, and that is when to measure the outcomes. In the simulations we can either do one dose and test in that first interval, we can give several doses and measure the AUC of the full time course, or we can do the thing that is often relevant where we simulate to steady state and only compare the covariate effects at steady state. This all depends on the context of the drug and the simulation. Here, we pick a daily dose and we want to measure after 28 doses equivalent to a typical “four week month”.

We use 27 additional doses to the first one to reach a pseudo-steady-state after four weeks of treatment. We will use a dose of 10 units as the reference as was present in the original dataset but we will also set up a different event type with 20 units to show that for the cases where different dose levels are relevant to the forest plot it is also possible to include dosing differences in the list of alternatives.

ii = 24
pseudo_ss = 27
sim_time = 0:1:(pseudo_ss+1)*ii

ss_event = DosageRegimen(10; addl = pseudo_ss, ii)
ss_event20 = DosageRegimen(20; addl = pseudo_ss, ii)
interval = [ii * pseudo_ss, ii * (pseudo_ss + 1)]

With the regimens in place, we need to come up with a structure to hold all the information. We could use individual variables for everything but let us explore the use of a DataFrame to keep all information about a specification (a row) easy to find. Let us look at the code and discuss.

forest_df = DataFrame(
    "type" => ["reference", "alternative", "alternative", "alternative", "alternative"],
    "N" => fill(100, 5),
    "group" => [
        "Reference",
        "SEX (ref: FEMALE)",
        "WEIGHT (ref: 70 kg)",
        "WEIGHT (ref: 70 kg)",
        "DOSE (ref: 10mg)",
    ],
    "label" => ["Uncertainty", "MALE", "50 kg", "90 kg", "20 mg"],
    "time" => fill(sim_time, 5),
    "covariates" => [
        (; WEIGHT = 70, SEX = "FEMALE"),
        (; SEX = "MALE"),
        (; WEIGHT = 50),
        (; WEIGHT = 90),
        (;),
    ],
    "events" => [ss_event, ss_event, ss_event, ss_event, ss_event20],
    "notes" => ["", "", "", "", ""],
)
5×8 DataFrame
Row type N group label time covariates events notes
String Int64 String String StepRang… NamedTup… DosageRe… String
1 reference 100 Reference Uncertainty 0:1:672 (WEIGHT = 70, SEX = "FEMALE") DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted
2 alternative 100 SEX (ref: FEMALE) MALE 0:1:672 (SEX = "MALE",) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted
3 alternative 100 WEIGHT (ref: 70 kg) 50 kg 0:1:672 (WEIGHT = 50,) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted
4 alternative 100 WEIGHT (ref: 70 kg) 90 kg 0:1:672 (WEIGHT = 90,) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted
5 alternative 100 DOSE (ref: 10mg) 20 mg 0:1:672 NamedTuple() DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 20.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted

What we see is that there are 8 columns:

  1. type: refers to whether it is a reference row or alternative row. There can only be one reference row.
  2. N: The number of subjects in this specification to simulate
  3. group: What covariate simulation group this is and what the reference value is. For multiple alternatives in the same group we will have repetitions
  4. label: The chosen alternative value label
  5. time: The time points set in the subjects being constructed
  6. covariates: The reference covariates for the reference row and the covariates to change in the alternative rows
  7. events: The events to be used in the simulation
  8. notes: Notes for the simulation to display in the final plot

Let us create a column of Subjects defined by the other columns. We need to go through the DataFrame by row and replace the covariate in question from the reference value and use the specified times and events. First, we find the reference subject and verify that there is only one. Then, we collect the reference covariates. Finally, we add the column with subjects such that in each row this column represents the information in all the other columns. We collect these steps in a function as we will be performing them more than once.

function prepare_subjects(df)
    # find the reference row
    ref_row = findall(x -> x == "reference", df.type)
    if length(ref_row) != 1
        throw(
            ErrorException(
                "There needs to be exactly one reference row in the input DataFrame",
            ),
        )
    end

    # extract reference covariates
    # ref_row is a vector of length 1 so we use the only value it contains
    ref_covariates = df[first(ref_row), :covariates]

    # use @rtransform to combine several columns worth of information into one
    # new column with the actual Subjects to simulate into
    _df = @rtransform(
        df,
        :subject = Subject(
            id = :group * :label,
            time = :time,
            events = :events,
            covariates = merge(ref_covariates, :covariates),
        )
    )
    # return the new DataFrame
    return _df
end
prepare_subjects (generic function with 1 method)

We can then use this function to take a DataFrame that adheres to our specification above and give us back a copy of that DataFrame with the simulation populations returned. Let us apply it to forest_df:

forest_df_sub = prepare_subjects(forest_df)
5×9 DataFrame
Row type N group label time covariates events notes subject
String Int64 String String StepRang… NamedTup… DosageRe… String Subject…
1 reference 100 Reference Uncertainty 0:1:672 (WEIGHT = 70, SEX = "FEMALE") DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = ReferenceUncertainty, ...)
2 alternative 100 SEX (ref: FEMALE) MALE 0:1:672 (SEX = "MALE",) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = SEX (ref: FEMALE)MALE, ...)
3 alternative 100 WEIGHT (ref: 70 kg) 50 kg 0:1:672 (WEIGHT = 50,) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = WEIGHT (ref: 70 kg)50 kg, ...)
4 alternative 100 WEIGHT (ref: 70 kg) 90 kg 0:1:672 (WEIGHT = 90,) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = WEIGHT (ref: 70 kg)90 kg, ...)
5 alternative 100 DOSE (ref: 10mg) 20 mg 0:1:672 NamedTuple() DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 20.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = DOSE (ref: 10mg)20 mg, ...)

Now we are ready to set up the simulations.

2.3 Simulation Type

Above, we discussed two types of simulations:

  • Simulation with parameter uncertainty
  • Simulation with BSV+RUV

Let us see how to do this in practise.

2.3.1 Simulation with Parameter Uncertainty

To simulate with uncertainty in Pumas, we need to use the simobs function with the data_inference as input.

  • The data_inference object contains the inference results from the estimation step.
  • By default, the inference is based on the asymptotic variance-covariance estimator, but it could also be derived from other methods like bootstrapping or SIR (Sampling Importance Resampling).

To simulate several fixed effects vectors from the uncertainty distribution we need to use the following signature in Pumas:

  1. Set random effects to zero

Use the zero_randeffs function to ensure random effects are not sampled during simulation.

sim_randeffs = zero_randeffs(model, population, parameters)
  1. Simulate without residual error

Set simulate_error=false in the simobs function to avoid sampling residual unexplained variability (RUV) and pass the sim_randeffs from the previous step.

sim_uncertainty =
    simobs(model, population, parameters, sim_randeffs; samples, simulate_error = false)
  1. Simulate a population for a total of samples times to get samples number of draws from the parameter uncertainty distribution.

In this use case, the population consists of the same subject replicated N times, and each needs a unique sample from the uncertainty distribution. Passing the population as input will generate a Vector of Populations. If you simulate a single Subject within a one-subject Population and set samples, the output will be a Vector of SimulatedPopulations, each containing one subject. To extract a usable Vector of SimulatedObservations, you can use broadcasting (.) with the first function that contains only the first (and only) element of each element of the outer vector.

Note

In Pumas, a Population is represented as a vector of Subjects. This means that even a single Subject, when put in a Vector, becomes a Population of one Subject. Here’s how you can create a Population with a single Subject:

# Define a single subject
subject = Subject(id = 1)

# Put the subject in a vector to create a population
population = [subject]

# Now, `population` is a population with one subject

A simple example that can illustrate the above process can be formulated with just a Vector of Vectors of numbers. Consider the following:

V = [[1.0], [2.0], [3.0]]
3-element Vector{Vector{Float64}}:
 [1.0]
 [2.0]
 [3.0]

If we had run simobs(model, [subject], param, sim_randeffs; samples = 3, simulate_error=false) we would have gotten a similar Vector with three Vectors inside each containing one SimulatedSubject. The goal is to create a vector that is [1.0, 2.0, 3.0] and we can get this by using the . to broadcast our function call where the function is first. first is a function that returns the first element of a collection:

first([9, 8, 7])
9

So, given our variable V above, we can do:

first.(V)
3-element Vector{Float64}:
 1.0
 2.0
 3.0

For convenience, let us write a function that unwraps the simulation output in a way that’s useful to us using these insights:

function simobs_uncertainty(data_inference, subject; samples)
    first.(
        simobs(
            data_inference,
            [subject],                      # single subject population
            zero_randeffs(                  # sim_randeffs to avoid BSV simulation
                data_inference.fpm.model,
                fill(subject, samples),
                coef(data_inference),
            ),
            ;
            samples,                        # samples
            simulate_error = false,         # to avoid RUV simulation
        )
    )
end
simobs_uncertainty (generic function with 1 method)
Note

A future version of Pumas will have a signature to simulate one Subject for repeated samples that will return a “Population” of the same subject simulated with samples different draws from the uncertainty distribution.

Then, we take each row in the DataFrame and apply the simulation function to the content of the columns:

forest_df_sims_WU = @rtransform(
    forest_df_sub,
    :simulations = simobs_uncertainty(data_inference, :subject; samples = :N)
)
5×10 DataFrame
Row type N group label time covariates events notes subject simulations
String Int64 String String StepRang… NamedTup… DosageRe… String Subject… Array…
1 reference 100 Reference Uncertainty 0:1:672 (WEIGHT = 70, SEX = "FEMALE") DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = ReferenceUncertainty, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv
2 alternative 100 SEX (ref: FEMALE) MALE 0:1:672 (SEX = "MALE",) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = SEX (ref: FEMALE)MALE, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv
3 alternative 100 WEIGHT (ref: 70 kg) 50 kg 0:1:672 (WEIGHT = 50,) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = WEIGHT (ref: 70 kg)50 kg, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv
4 alternative 100 WEIGHT (ref: 70 kg) 90 kg 0:1:672 (WEIGHT = 90,) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = WEIGHT (ref: 70 kg)90 kg, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv
5 alternative 100 DOSE (ref: 10mg) 20 mg 0:1:672 NamedTuple() DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 20.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted Subject(id = DOSE (ref: 10mg)20 mg, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv

Here the naming convention _WU means With Uncertainty.

2.3.2 Simulation with BSV+RUV

Simulating with BSV and RUV is straight forward, as this is the standard way of invoking simobs in Pumas.

forest_df_sims_BR = @rtransform(
    deepcopy(forest_df_sub),
    :simulations = simobs(mdl_auc, fill(:subject, :N), coef(data_result))
)
forest_df_sims_BR.label[1] = "BSV + RUV"
"BSV + RUV"

Here the naming convention _BR means BSV and RUV. Notice, that this is typically not the approach used for covariate model evaluation. If BSV and RUV is used it is typically not used together with simulation with uncertainty as it becomes impossible to separate uncertainty sampling from the other samples being taken for random effects and observed variables.

2.4 Extract exposure and individual coefficient data

After simulations have been performed the next step is to extract the relevant individual coefficients and secondary parameters. How to get these depends on the simulation approach. Later, we will look at how to set up the forest plot based on NCA calculations, but first we will look at the model based approach.

Since we simulated a cumulative AUC we can find the AUC in a specific interval by subtracting the first value in our evaluation interval from the value at the final time point. The following function accepts our simulations, the variable name of the AUC variable and the time interval in which to calculate the AUC:

function dynamic_auc(sims; var, interval)
    auc = map(sims) do sim
        df1 = filter(x -> x.evid == 0, DataFrame(sim))
        df2 = filter(x -> x.time == interval[1] || x.time == interval[2], df1)
        df3 = combine(groupby(df2, :id), var => x -> last(diff(x)))
        getproperty(df3, Symbol(string(var) * "_function"))[1]
    end
    return auc
end
dynamic_auc (generic function with 1 method)

The following function calculates Cmax in the same manner

function dynamic_cmax(sims; var, interval)
    auc = map(sims) do sim
        df1 = filter(x -> x.evid == 0, DataFrame(sim))
        df2 = filter(x -> interval[1] <= x.time <= x.time == interval[2], df1)
        df3 = combine(groupby(df2, :id), var => x -> maximum(skipmissing(x)))
        getproperty(df3, Symbol(string(var) * "_function"))[1]
    end
    return auc
end
dynamic_cmax (generic function with 1 method)

Finally, we can get the icoefs from the df.simulations output. We create a function to grab the first value of each icoef for each simulated subject for each level of simulation:

forest_icoef(df, name) =
    [first.(getproperty.(getproperty.(sims, :icoefs), name)) for sims in df.simulations]
forest_icoef (generic function with 1 method)

2.5 Summarize data

Our plotting function will create all the summarization as part of the plot, but it is instructive to understand what is going on behind the scenes.

Whenever we have a reference Subject and an alternative Subject that we have simulated, we extract their exposure measurements of interest. Let us say this is AUC at steady state. The values that are shown in the forest plot are then calculated as follows:

  1. Calculate the median value of the reference Subject across the N simulations
  2. Divide the values for the alternative Subject with the median from the previous step to generate a ratio
  3. Calculate the 0.05, 0.5 0.95 quantiles of the ratios to get the 90% confidence interval and the median value for the ratio

It is also possible to show the effects at their original scales by modifying the functions below to accept a relative_reference keyword argument instead of hardcoding the value to be true.

These numbers are then reported in the plot as numerical values and as markers with error bands.

2.6 Plotting the data

In Pumas v2.6, a beta feature was added to the PlottingUtilities package called table_forest_plot. The docstring can be explored by typing ?table_forest_plot after using PlottingUtilities. This function is able to take information about a forest plot to set up the visuals according to covariate groups, values that are different from reference values and the numerical values of the median and intervals that are used in the plot. It is possible to explore this function for advanced users, but currently, we provide the following functions in this tutorial that can be used to map from a DataFrame as we have used so far to the input format table_forest_plot needs.

# Function to calculate the triplet to be shown
function forest_values_column(forest_input)
    # For each input, calculate the median and the 5th and 95th percentiles,
    # and return them as a string in the format "median [5th percentile 95th percentile]"
    [
        "$(round(median(input),digits=3)) [$(round(quantile(input,0.05),digits=3)) $(round(quantile(input,0.95),digits=3))]"
        for input in forest_input
    ]
end

function df_forest_plot(df, parameter; kwargs...)
    # Extract the values for the given parameter from the dataframe
    values = getproperty(df, parameter[1])

    # Find the index of the "reference" row in the dataframe
    reference_values = findfirst(x -> x == "reference", df.type)

    # Extract the reference values
    reference = values[reference_values]

    # If the reference values are a vector, calculate their median
    if reference isa AbstractVector
        reference = median(reference)
    end

    # Calculate the numerical text to be displayed in the forest plot
    # This is done by dividing each value by the median of the reference values
    numerical_text = forest_values_column(map(x -> x ./ median(reference), values))

    # Define the headers for the forest plot
    headers = ["Scenario", "Median [90% CI]", all(isempty, df.notes) ? nothing : "Notes"]

    # Define the label for the axis
    axis_label = "Ratio to reference median"

    # Flag to indicate whether the reference values should be relative
    relative_reference = true

    # Calculate the values and reference to be used in the forest plot
    values, reference = if relative_reference
        # If relative_reference is true, divide each value by the reference
        # and set the reference to 1
        values = map(val -> val isa Nothing ? nothing : val ./ reference, values)
        reference = 1
        values, reference
    else
        # If relative_reference is false, use the original values and reference
        values, reference
    end

    # Create the forest plot
    table_forest_plot(
        [df.label, numerical_text, df.notes],  # data to be plotted
        headers,  # headers for the plot
        values,  # values to be plotted
        ;  # start of optional arguments
        axis_header = parameter[2],  # header for the axis
        reference,  # reference values
        groups = df.group,  # groups for the plot
        axis_label,  # label for the axis
        reference_band = [0.8, 1.25],  # reference band for the plot
        kwargs...,  # additional optional arguments
    )
end
df_forest_plot (generic function with 1 method)

The plot generates some custom headers, calculates reference values, and so on. We also set the grey reference band to be plotted from 0.8 to 1.25 which is the typical values recommended for bioequivalence testing. Ideally, it should be set with specific clinical goals or requirements in mind.

2.7 Exposure measures

We can now use the functions we prepared to calculate the exposure measures and summarize them using the plotting function.

The model specification mdl_auc uses a dynamical variable Central_AUC to measure the AUC of the central compartment through the AUC variable set in the @observed block. We can access this through the DataFrame constructor on the SimulatedObservations.

2.7.1 AUC last dosing interval

forest_df_sims_WU.auc_measure =
    [dynamic_auc(sim; var = :AUC, interval) for sim in forest_df_sims_WU.simulations]
df_forest_plot(forest_df_sims_WU, :auc_measure => "AUC", ;)

forest_df_sims_WU.cmax_measure =
    [dynamic_cmax(sim; var = :dv, interval) for sim in forest_df_sims_WU.simulations]
df_forest_plot(forest_df_sims_WU, :cmax_measure => "Cmax", ;)

df_forest_plot(forest_df_sims_WU, :auc_measure => "AUC", ;)

df_forest_plot(forest_df_sims_WU, :auc_measure => "AUC"; show_density = true)

This is one of the first forest plots we have produced, so let us break down its components:

  1. Columns:

    • The first column lists the labels for each scenario.
    • The second column shows the median ratio values along with intervals defined by the 0.05 and 0.95 quantiles (representing the 90% confidence interval).
    • The third column is the “plot” part of the forest plot, where the median ratios are visualized relative to a reference line (1.0) and a shaded range [0.8, 1.25].
  2. Reference Group:

    • The reference group is labeled “uncertainty,” indicating variability in the reference simulation.
    • By definition, it has a median ratio of 1.0 because it serves as the baseline.
    • This group reflects parameter uncertainty, allowing us to assess the overall uncertainty of the plot.
  3. SEX Group:

    • This group shows the effect of changing the SEX of a reference subject from FEMALE to MALE.
    • The median ratio is below 1.0, indicating that being MALE reduces the AUC compared to FEMALE.
    • The median value and the intervals are critical for assessing the significance of this effect.
  4. WEIGHT Group:

    • This group is similar in structure to the SEX group but includes two alternative weight scenarios: 50 kg and 90 kg.
    • Each weight scenario is compared to the reference weight (70 kg), with corresponding median ratios and intervals.

To compare the simulations with parameter uncertainty to the simulations with ruv and bsv we can produce the following plot:

forest_df_sims_BR.label[1] = "BSV + RUV"
forest_df_sims_BR.auc_measure =
    [dynamic_auc(sim; var = :AUC, interval) for sim in forest_df_sims_BR.simulations]
df_forest_plot(forest_df_sims_BR, :auc_measure => "AUC", ;)

We immediately see that including BSV and RUV increases the range of simulated values significantly. The medians are similar.

2.7.2 Cmax at pseudo-steady-state

df_forest_plot(forest_df_sims_WU, :cmax_measure => "Cmax pss", ;)

df_forest_plot(forest_df_sims_WU, :cmax_measure => "Cmax pss", ; show_density = true)

Overall we see fairly limited effects, and of course we see an effect of the big dose group of around 2.

2.8 Individual Coefficients Forest Plots

Depending on the motivation to produce the forest plots it may be useful to produce forest plots for individual parameters. In Pumas, these are called icoefs. We can produce an example for clearance to illustrate how this may be done.

2.8.1 Prepare icoef

forest_df_sims_WU[!, :CL] = forest_icoef(forest_df_sims_WU, :CL)

We could repeat it for all combinations of WU, bsvruv, density=false, or density=true. Here, we will only keep the default density=false and plot clearance for simulation with parameter uncertainty:

df_forest_plot(forest_df_sims_WU, :CL => "CL")

3 Calculation of Exposures with NCA.jl

Let us look at how we can calculate exposure measures using NCA.jl functionality. Instead of using the model functionality we will use steady state doses and the NCA functions in Pumas. We do not need the special AUC compartment in this application. More specifically, we cannot use an AUC compartment because we will be using steady-state doses. The AUC compartment can never reach a steady state as it will be ever-growing, so we need to exclude it when setting ss=1 in the DosageRegimen below.

mdl = @model begin
    @param begin
        tvCL ~ RealDomain(lower = 0.5, upper = 1.5, init = 1.0)
        tvVc ~ RealDomain(lower = 0.1, upper = 20.0, init = 10.0)
        WEIGHTonCL ~ RealDomain(lower = -2.0, upper = 2.0, init = 1.2)
        SEXonCL ~ RealDomain(lower = -2.0, upper = 2.0, init = 1.4)
    end

    @random begin
        etaVc ~ Normal(0.0, 0.05)
        etaCL ~ Normal(0.0, 0.15)
    end
    @covariates WEIGHT SEX
    @pre begin
        CL = tvCL * (WEIGHT / 70)^WEIGHTonCL * SEXonCL^(SEX == "MALE") * exp(etaCL)
        Vc = tvVc * exp(etaVc)
    end
    @dynamics begin
        Central' = -CL / Vc * Central
    end

    @derived begin
        dv ~ @. LogNormal(log(Central / Vc), 0.1)
    end
end
PumasModel
  Parameters: tvCL, tvVc, WEIGHTonCL, SEXonCL
  Random effects: etaVc, etaCL
  Covariates: WEIGHT, SEX
  Dynamical system variables: Central
  Dynamical system type: Matrix exponential
  Derived: dv
  Observed: dv

Again, we simulate the different scenarios: Specifying the route enables us to later use functionality from NCA.jl to calculate AUC.

ii = 24
sim_time = 0:8:ii
ss_event = DosageRegimen(10; ss = 1, ii, route = NCA.IVBolus)
ss_event20 = DosageRegimen(20; ss = 1, ii, route = NCA.IVBolus)

forest_df_nca = DataFrame(
    "type" => ["reference", "alternative", "alternative", "alternative", "alternative"],
    "N" => fill(100, 5),
    "group" => [
        "Reference",
        "SEX (ref: FEMALE)",
        "WEIGHT (ref: 70 kg)",
        "WEIGHT (ref: 70 kg)",
        "DOSE (ref: 10mg)",
    ],
    "label" => ["Uncertainty", "MALE", "50 kg", "90 kg", "20 mg"],
    "time" => fill(sim_time, 5),
    "covariates" => [
        (; WEIGHT = 70, SEX = "FEMALE"),
        (; SEX = "MALE"),
        (; WEIGHT = 50),
        (; WEIGHT = 90),
        (;),
    ],
    "events" => [ss_event, ss_event, ss_event, ss_event, ss_event20],
    "notes" => ["", "", "", "", ""],
)
forest_df_nca = prepare_subjects(forest_df_nca)

And we can simulate the individual datasets:

# Since the original model had an AUC compartment we can no longer simulate from
# the past fit result because we now have steady state dosing and AUC never reaches
# a steady state.
data_result_nca = fit(
    mdl,
    Subject.(simulated_data),
    param0,
    FOCE();
    optim_options = (; show_trace = false),
)
data_inference_nca = infer(data_result_nca)
forest_df_nca = @rtransform(
    forest_df_nca,
    :simulations = simobs_uncertainty(data_inference_nca, :subject; samples = :N)
)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
5×10 DataFrame
Row type N group label time covariates events notes subject simulations
String Int64 String String StepRang… NamedTup… DosageRe… String Subject… Array…
1 reference 100 Reference Uncertainty 0:8:24 (WEIGHT = 70, SEX = "FEMALE") DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted Subject(id = ReferenceUncertainty, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv
2 alternative 100 SEX (ref: FEMALE) MALE 0:8:24 (SEX = "MALE",) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted Subject(id = SEX (ref: FEMALE)MALE, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv
3 alternative 100 WEIGHT (ref: 70 kg) 50 kg 0:8:24 (WEIGHT = 50,) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted Subject(id = WEIGHT (ref: 70 kg)50 kg, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv
4 alternative 100 WEIGHT (ref: 70 kg) 90 kg 0:8:24 (WEIGHT = 90,) DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted Subject(id = WEIGHT (ref: 70 kg)90 kg, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv
5 alternative 100 DOSE (ref: 10mg) 20 mg 0:8:24 NamedTuple() DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 20.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted Subject(id = DOSE (ref: 10mg)20 mg, ...) Simulated population (Vector{<:Subject}), n = 100, variables: dv

3.1 Individual Coefficients

Since this is an alternative way of calculating the exposure measurements we do not need to repeat a lot of icoef based forest plots, but for good measure let us show that the procedure is the same as before:

forest_df_nca[!, :CL] = forest_icoef(forest_df_nca, :CL)
df_forest_plot(forest_df_nca, :CL => "CL")

3.2 Exposure measures

To calculate auclast we take the simulated subjects, turn them into Subjects and then we turn those into NCASubjects to be able to use the NCA.auclast function.

forest_auc_last(df) =
    [NCA.auclast(NCASubject.(Subject.(sim))).auclast for sim in df.simulations]
forest_auc_last (generic function with 1 method)

Cmax at steady state can also be calculated using NCA. Of course, there will be a significant dose size effect here which should be around 2 as we are doubling the dose. Let us see if the covariate effects affect Cmax as well:

forest_cmax_ss(df) =
    [NCA.cmaxss(NCASubject.(Subject.(sim))).cmaxss for sim in df.simulations]
forest_cmax_ss (generic function with 1 method)

3.2.1 AUC last dosing interval

forest_df_nca.auc_measure = forest_auc_last(forest_df_nca)
df_forest_plot(forest_df_nca, :auc_measure => "AUC", ;)

df_forest_plot(forest_df_nca, :auc_measure => "AUC"; show_density = true)

4 Styles and preferences

We chose one way of creating a forest plot above, but many practitioners have their own preferences of labels, colors, and so on. It is of course important to be able to modify this to preferences or even in-house requirements at different companies. One thing that is sometimes set up different would be the category labels. We put the reference value with the group name but others prefer to only use the covariate name as the group name and write the row labels as alternative_value : reference_value. We can do the same here:

forest_df_sims_WU_newlabel = deepcopy(forest_df_sims_WU)
forest_df_sims_WU_newlabel.group .= ["Reference", "SEX", "WEIGHT", "WEIGHT", "DOSE"]
forest_df_sims_WU_newlabel.label .=
    ["Uncertainty", "MALE : FEMALE", "50 kg : 70 kg", "90 kg : 70 kg", "20 mg : 10 mg"]
df_forest_plot(forest_df_sims_WU_newlabel, :auc_measure => "AUC", ;)

or we can even get creative:

forest_df_sims_WU_newlabel = deepcopy(forest_df_sims_WU)
forest_df_sims_WU_newlabel.group .= ["Reference", "SEX", "WEIGHT", "WEIGHT", "DOSE"]
forest_df_sims_WU_newlabel.label .=
    ["Uncertainty", "MALE ← FEMALE", "50 kg ← 70 kg", "90 kg ← 70 kg", "20 mg ← 10 mg"]
df_forest_plot(forest_df_sims_WU_newlabel, :auc_measure => "AUC", ;)

5 Conclusion

In this tutorial, we saw how to produce forest plots in Pumas using built-in functionality and the new beta feature in PlottingUtilities. We encourage all users try to out the workflow on their own problem and produce feedback an constructive criticism before the final forest plot feature is release in a future version of Pumas.