# Use packages
using Pumas, PumasUtilities, Random, DataFramesMeta
# Set seed
= Random.seed!(234) seed
A Forest Plot Workflow in Pumas
Welcome to this Pumas tutorial on Forest Plots. Forest plots are used to show covariate model impact on model parameters and and secondary parameters (e.g. exposure metrics) in pharmacometric analyses. In the pharmacometric context of model evaluation they were introduced by Menon-Andersen et. al. (2011) “Essential pharmacokinetic Information for Drug Dosage Decisions: A Concise Visual presentation in the Drug Label”. We will go through the Pumas features for carrying out these steps in this tutorial.
In Pumas v2.6 an experimental table_forest_plot
function was added that is supplied with any Pumas distribution. This function is to be considered a beta or experimental feature. This means that the interface may be be subject to changes before the function is released in its final form.
1 The Forest Plot
Before we dive into the simulations and plotting, let’s first clarify the goal of producing a forest plot in pharmacometrics and verbally describe the process. Code sections below will clarify all details.
Overall, the following steps are needed to build a forest plot:
- Model Preparation: Prepare a model for assessment, potentially with fitted parameters and inference information, if simulations with parameter uncertainty are desired.
- Simulation Dataset Specification: Specify the reference and alternative subjects to be used for the simulations.
- Simulation Type: Define the type of simulation to perform—whether to include parameter uncertainty, Between-Subject Variability (BSV) and Residual Unexplained Variability (RUV), or other approaches.
- Extraction: Extract exposure and individual coefficient data from the simulations.
- Summarization: Summarize the simulation results for the endpoints of interest, focusing on the quantiles of the quantities relative to the reference simulation.
- Plotting: Visualize the results from the previous steps with appropriate plots.
1.1 Model preparation
After building a model we often have covariate relationships in the final model that we want to understand. Some usual ones are age, weight, sex, and race, but given the context we can come up with many more examples such as route or other formulation information. These covariates are represented within the model with a functional form and parameters estimated from data, derived through statistical methods for model selection.
Consumers of the (covariate) model need to understand the clinical impact of covariate variations. For example, it’s not obvious from the model alone if a 10% decrease in body weight results in a 5%, 10%, or 20% change in drug exposure at steady state (AUC) just by looking at the functial form and parameter size. Simulations help clarify these effects, and a forest plot effectively summarizes the results visually.
1.2 Simulation dataset specification
To evaluate the impact of covariates on clinical outcomes, we run simulations grouped by covariates or combinations of covariates. Taking body weight as an example, we might choose 70 kg as the reference value. Often, this reference value is pre-defined in the model; however, we can adjust it if simulating for a population different from the model’s original dataset. Then, we define alternative values, such as 50 kg and 90 kg, to see how changes in weight might affect the clinical outcome.
In practice, we set up a hypothetical subject with all covariates at reference values. Then, for each covariate we want to evaluate at alternative values we set up an additional subject. Here, we would set the weight covariate value to 50 kg for one alternative subject and 90 kg for another. Then, we can simulate that subject multiple times.
Be careful when setting up alternative covariate values as there is nothing restricting you from setting alternative values that are out of the range of values used in the estimation part of the workflow. If you use too extreme values you will be extrapolating beyond the data used and the model may not be valid in that region.
1.3 Simulation type
Once the virtual subjects are defined, we simulate to obtain clinical outcomes for each group. The FDA’s Guidance for Industry: Population Pharmacokinetics suggests three potential approaches for simulations, depending on the objectives:
Simulating with parameter uncertainty only: This approach excludes between-subject variability (BSV) and residual unexplained variability (RUV). It is often used to assess a typical subject’s ability to reach a drug exposure target or evaluate covariate impact. Given our objective to assess covariate effects, this method may be appropriate.
Simulating with BSV and RUV: Here, parameters are fixed to their estimated values, while BSV and RUV are allowed to vary. Although this provides different insights, it is not the standard approach for assessing covariate parameter effects. Covariate effects are typically driven by population parameters (fixed effects, or “thetas” in NONMEM terminology) and covariate values. Using simulation with uncertainty is generally more aligned with the objective of covariate effect assessment.
Simulating solely with fixed effects: This method fixes parameters and excludes variability, making it unsuitable for this case. Without variability, the intervals produced for the plot rely entirely on the point estimates of the model fixed effects, which could lead to overconfidence in the results.
1.4 Extract exposure and individual coefficient data
The simulation output will often have to be transformed a bit before you have the final numbers to be used. This may be because things like clearance, volume, or other individual coefficients have to be extracted from the simulation output, or because you need a derived quantity of the simulation output to quantify such as AUC, Cmax, or something else.
For exposure measurements there might be more to consider. You can get an AUC in many ways, so the users has to understand exactly what is needed to answer the questions. If the quantity of interest is AUC or Cmax at steady state calculated using the usual NCA rules then using
NCA.jl
is a very convenient approach. This allows us to takesimobs
output and generateNCASubject
s that will allow us to easily calculate exposure measures. If instead the exposure measure is some function of model output or you want to calculate the exposure in a specific way then a more custom approach may have to be used. We will go through both below.The individual coefficients (
icoef
s in Pumas vocabulary) can often be extracted directly from the simulation output so there is not much to consider there. We will show below how to do it.
1.5 Summarization
After running the simulations and extracting or generating the relevant measure, we identify a central reference value, usually the median, to use for comparisons. We then calculate the ratio of outcomes in alternative scenarios to this reference value and analyze the distribution of these ratios. This may be done separately or as part of the forest plot functionality.
1.6 Plotting
While the above information could just as well be presented in a table many people prefer to create forest plots for very quick identification of numbers outside specified regions. Forest plots display data generated in the step of summarization in a interpretable way, typically as central values with confidence intervals or empirical density plots.
2 An example
2.1 Model preparation
First, let us define the model to be used in this example. A simple one compartment model with a weight and sex effect on clearance of the drug. Besides the Central
compartment we add a compartment to calculate the AUC
of the Central(t)
curve over time. In the @observed
block we divide by volume to obtain the concentration AUC instead.
= @model begin
mdl_auc @param begin
~ RealDomain(lower = 0.5, upper = 1.5, init = 1.0)
tvCL ~ RealDomain(lower = 0.1, upper = 20.0, init = 10.0)
tvVc ~ RealDomain(lower = -2.0, upper = 2.0, init = 1.2)
WEIGHTonCL ~ RealDomain(lower = -2.0, upper = 2.0, init = 1.4)
SEXonCL end
@random begin
~ Normal(0.0, 0.05)
etaVc ~ Normal(0.0, 0.15)
etaCL end
@covariates WEIGHT SEX
@pre begin
= tvCL * (WEIGHT / 70)^WEIGHTonCL * SEXonCL^(SEX == "MALE") * exp(etaCL)
CL = tvVc * exp(etaVc)
Vc end
@dynamics begin
' = -CL / Vc * Central
Central' = Central
CentralAUCend
@derived begin
~ @. LogNormal(log(Central / Vc), 0.1)
dv end
@observed begin
= CentralAUC ./ Vc
AUC end
end
PumasModel
Parameters: tvCL, tvVc, WEIGHTonCL, SEXonCL
Random effects: etaVc, etaCL
Covariates: WEIGHT, SEX
Dynamical system variables: Central, CentralAUC
Dynamical system type: Matrix exponential
Derived: dv
Observed: dv, AUC
Let us set up a hypothetical dataset, simulate from it, fit parameters based on the simulated dataset, and run parameter inference:
# Number of subjects to simulate
= 40
data_n # Time points to simulate at
= [0.0, 0.5, 1.0, 4.0, 5.0, 24.0]
data_time # Dose is 10 unit IV Bolus
= DosageRegimen(10; route = NCA.IVBolus)
data_event # Weight distribution has mean around 70
# A more advanced situations may have different distributions across SEX
= LogNormal(4.25, 0.2)
weight_dist # Sex distribution is equal probability for each outcome
sample_covariates(rng) =
= rand(rng, weight_dist), SEX = rand(rng, ["MALE", "FEMALE"]))
(WEIGHT
# Grab initial parameters from model
= init_params(mdl_auc)
param0
# Set up the Subjects to simulate trajectories for
= [
data_population Subject(
= "data" * string(n),
id = data_time,
time = data_event,
events = sample_covariates(seed),
covariates = 1:data_n
) for n
]
# Simulate
= simobs(mdl_auc, data_population, param0)
simulated_data
# Fit the model given the simulated data
= fit(
data_result
mdl_auc,Subject.(simulated_data),
param0,FOCE();
= (; show_trace = false),
optim_options
)
# Calculate inference based on the asymptotic variance-covariance matrix
= infer(data_result) data_inference
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Asymptotic inference results using sandwich estimator
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Matrix exponential
Log-likelihood value: 328.09034
Number of subjects: 40
Number of parameters: Fixed Optimized
0 4
Observation records: Active Missing
dv: 240 0
Total: 240 0
----------------------------------------------------------------
Estimate SE 95.0% C.I.
----------------------------------------------------------------
tvCL 1.0025 0.037285 [0.92937; 1.0755]
tvVc 10.054 0.11788 [9.8227 ; 10.285 ]
WEIGHTonCL 1.2081 0.11338 [0.98583; 1.4303]
SEXonCL 1.4353 0.069532 [1.299 ; 1.5715]
----------------------------------------------------------------
We see that we needed to specify observation times, events, and covariates for our Subject
s. Then we simulated observations, fit the model and estimated the variance-covariance matrix. Since we now have point estimates for the fixed effects and parameter uncertainty, we are ready to simulate for our forest plots.
2.2 Simulation dataset preparation
A key aspect of the forest plot generation is specifying the datasets to be simulated for later comparison. The idea with the forest plot is, that each row of the forest plot contains values that compare change to only one covariate change compared to the reference specification at a time. This is typically called a univariate or univariable covariate effect. This is of course different than asking questions about what subjects are realistic to draw from the population. It is a theoretical construct to understand the effect of each covariate on its own. Of course, the size of the effect may well depend on the reference values that we pick. In Pumas, we use NamedTuple
s when specifying covariates:
= (; WEIGHT = 70, SEX = "FEMALE") cov
(WEIGHT = 70,
SEX = "FEMALE",)
Now, if we want to update the SEX
value we can use the merge
function. We take the “reference” NamedTuple
and merge it with another named tuple that contains the things we want to update:
= merge(cov, (; SEX = "MALE")) alt_cov
(WEIGHT = 70,
SEX = "MALE",)
We will use this in the code below.
If we want to, it is also possible to show a change of several covariates at a time but in that case it is of course useful to also have the univariate changes in the plot. As an example, if the reference subject is a male of 70 kg we may want to compare their results to a female of 70 kg or a male of 50 kg. However, we may also want to see the effect of being a female of 50 kg. This is not the usual approach of varying only one covariate at a time, but if we did have the first to alternative simulations as described we may be able to also get information out of showing the (female, 50kg)
alternative in the plot as well.
There is a special consideration we have not discussed about so far, and that is when to measure the outcomes. In the simulations we can either do one dose and test in that first interval, we can give several doses and measure the AUC of the full time course, or we can do the thing that is often relevant where we simulate to steady state and only compare the covariate effects at steady state. This all depends on the context of the drug and the simulation. Here, we pick a daily dose and we want to measure after 28 doses equivalent to a typical “four week month”.
We use 27 additional doses to the first one to reach a pseudo-steady-state after four weeks of treatment. We will use a dose of 10 units as the reference as was present in the original dataset but we will also set up a different event type with 20 units to show that for the cases where different dose levels are relevant to the forest plot it is also possible to include dosing differences in the list of alternatives.
= 24
ii = 27
pseudo_ss = 0:1:(pseudo_ss+1)*ii
sim_time
= DosageRegimen(10; addl = pseudo_ss, ii)
ss_event = DosageRegimen(20; addl = pseudo_ss, ii)
ss_event20 = [ii * pseudo_ss, ii * (pseudo_ss + 1)] interval
With the regimens in place, we need to come up with a structure to hold all the information. We could use individual variables for everything but let us explore the use of a DataFrame to keep all information about a specification (a row) easy to find. Let us look at the code and discuss.
= DataFrame(
forest_df "type" => ["reference", "alternative", "alternative", "alternative", "alternative"],
"N" => fill(100, 5),
"group" => [
"Reference",
"SEX (ref: FEMALE)",
"WEIGHT (ref: 70 kg)",
"WEIGHT (ref: 70 kg)",
"DOSE (ref: 10mg)",
],"label" => ["Uncertainty", "MALE", "50 kg", "90 kg", "20 mg"],
"time" => fill(sim_time, 5),
"covariates" => [
= 70, SEX = "FEMALE"),
(; WEIGHT = "MALE"),
(; SEX = 50),
(; WEIGHT = 90),
(; WEIGHT
(;),
],"events" => [ss_event, ss_event, ss_event, ss_event, ss_event20],
"notes" => ["", "", "", "", ""],
)
Row | type | N | group | label | time | covariates | events | notes |
---|---|---|---|---|---|---|---|---|
String | Int64 | String | String | StepRang… | NamedTup… | DosageRe… | String | |
1 | reference | 100 | Reference | Uncertainty | 0:1:672 | (WEIGHT = 70, SEX = "FEMALE") | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | |
2 | alternative | 100 | SEX (ref: FEMALE) | MALE | 0:1:672 | (SEX = "MALE",) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | |
3 | alternative | 100 | WEIGHT (ref: 70 kg) | 50 kg | 0:1:672 | (WEIGHT = 50,) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | |
4 | alternative | 100 | WEIGHT (ref: 70 kg) | 90 kg | 0:1:672 | (WEIGHT = 90,) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | |
5 | alternative | 100 | DOSE (ref: 10mg) | 20 mg | 0:1:672 | NamedTuple() | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 20.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted |
What we see is that there are 8 columns:
type
: refers to whether it is a reference row or alternative row. There can only be one reference row.N
: The number of subjects in this specification to simulategroup
: What covariate simulation group this is and what the reference value is. For multiple alternatives in the same group we will have repetitionslabel
: The chosen alternative value labeltime
: The time points set in the subjects being constructedcovariates
: The reference covariates for the reference row and the covariates to change in the alternative rowsevents
: The events to be used in the simulationnotes
: Notes for the simulation to display in the final plot
Let us create a column of Subject
s defined by the other columns. We need to go through the DataFrame
by row and replace the covariate in question from the reference value and use the specified times and events. First, we find the reference subject and verify that there is only one. Then, we collect the reference covariates. Finally, we add the column with subjects such that in each row this column represents the information in all the other columns. We collect these steps in a function as we will be performing them more than once.
function prepare_subjects(df)
# find the reference row
= findall(x -> x == "reference", df.type)
ref_row if length(ref_row) != 1
throw(
ErrorException(
"There needs to be exactly one reference row in the input DataFrame",
),
)end
# extract reference covariates
# ref_row is a vector of length 1 so we use the only value it contains
= df[first(ref_row), :covariates]
ref_covariates
# use @rtransform to combine several columns worth of information into one
# new column with the actual Subjects to simulate into
= @rtransform(
_df
df,:subject = Subject(
= :group * :label,
id = :time,
time = :events,
events = merge(ref_covariates, :covariates),
covariates
)
)# return the new DataFrame
return _df
end
prepare_subjects (generic function with 1 method)
We can then use this function to take a DataFrame
that adheres to our specification above and give us back a copy of that DataFrame
with the simulation populations returned. Let us apply it to forest_df
:
= prepare_subjects(forest_df) forest_df_sub
Row | type | N | group | label | time | covariates | events | notes | subject |
---|---|---|---|---|---|---|---|---|---|
String | Int64 | String | String | StepRang… | NamedTup… | DosageRe… | String | Subject… | |
1 | reference | 100 | Reference | Uncertainty | 0:1:672 | (WEIGHT = 70, SEX = "FEMALE") | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = ReferenceUncertainty, ...) | |
2 | alternative | 100 | SEX (ref: FEMALE) | MALE | 0:1:672 | (SEX = "MALE",) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = SEX (ref: FEMALE)MALE, ...) | |
3 | alternative | 100 | WEIGHT (ref: 70 kg) | 50 kg | 0:1:672 | (WEIGHT = 50,) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = WEIGHT (ref: 70 kg)50 kg, ...) | |
4 | alternative | 100 | WEIGHT (ref: 70 kg) | 90 kg | 0:1:672 | (WEIGHT = 90,) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = WEIGHT (ref: 70 kg)90 kg, ...) | |
5 | alternative | 100 | DOSE (ref: 10mg) | 20 mg | 0:1:672 | NamedTuple() | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 20.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = DOSE (ref: 10mg)20 mg, ...) |
Now we are ready to set up the simulations.
2.3 Simulation Type
Above, we discussed two types of simulations:
- Simulation with parameter uncertainty
- Simulation with BSV+RUV
Let us see how to do this in practise.
2.3.1 Simulation with Parameter Uncertainty
To simulate with uncertainty in Pumas, we need to use the simobs
function with the data_inference
as input.
- The
data_inference
object contains the inference results from the estimation step. - By default, the inference is based on the asymptotic variance-covariance estimator, but it could also be derived from other methods like bootstrapping or SIR (Sampling Importance Resampling).
To simulate several fixed effects vectors from the uncertainty distribution we need to use the following signature in Pumas:
- Set random effects to zero
Use the zero_randeffs
function to ensure random effects are not sampled during simulation.
= zero_randeffs(model, population, parameters) sim_randeffs
- Simulate without residual error
Set simulate_error=false
in the simobs
function to avoid sampling residual unexplained variability (RUV) and pass the sim_randeffs
from the previous step.
=
sim_uncertainty simobs(model, population, parameters, sim_randeffs; samples, simulate_error = false)
- Simulate a
population
for a total ofsamples
times to getsamples
number of draws from the parameter uncertainty distribution.
In this use case, the population
consists of the same subject replicated N
times, and each needs a unique sample from the uncertainty distribution. Passing the population
as input will generate a Vector
of Populations
. If you simulate a single Subject
within a one-subject Population
and set samples
, the output will be a Vector
of SimulatedPopulations
, each containing one subject. To extract a usable Vector
of SimulatedObservations
, you can use broadcasting (.
) with the first
function that contains only the first (and only) element of each element of the outer vector.
In Pumas, a Population
is represented as a vector of Subject
s. This means that even a single Subject
, when put in a Vector
, becomes a Population
of one Subject
. Here’s how you can create a Population
with a single Subject
:
# Define a single subject
= Subject(id = 1)
subject
# Put the subject in a vector to create a population
= [subject]
population
# Now, `population` is a population with one subject
A simple example that can illustrate the above process can be formulated with just a Vector
of Vector
s of numbers. Consider the following:
= [[1.0], [2.0], [3.0]] V
3-element Vector{Vector{Float64}}:
[1.0]
[2.0]
[3.0]
If we had run simobs(model, [subject], param, sim_randeffs; samples = 3, simulate_error=false)
we would have gotten a similar Vector
with three Vector
s inside each containing one SimulatedSubject
. The goal is to create a vector that is [1.0, 2.0, 3.0]
and we can get this by using the .
to broadcast our function call where the function is first
. first
is a function that returns the first element of a collection:
first([9, 8, 7])
9
So, given our variable V
above, we can do:
first.(V)
3-element Vector{Float64}:
1.0
2.0
3.0
For convenience, let us write a function that unwraps the simulation output in a way that’s useful to us using these insights:
function simobs_uncertainty(data_inference, subject; samples)
first.(
simobs(
data_inference,# single subject population
[subject], zero_randeffs( # sim_randeffs to avoid BSV simulation
data_inference.fpm.model,fill(subject, samples),
coef(data_inference),
),
;# samples
samples, = false, # to avoid RUV simulation
simulate_error
)
)end
simobs_uncertainty (generic function with 1 method)
A future version of Pumas will have a signature to simulate one Subject
for repeated samples
that will return a “Population
” of the same subject simulated with samples
different draws from the uncertainty distribution.
Then, we take each row in the DataFrame
and apply the simulation function to the content of the columns:
= @rtransform(
forest_df_sims_WU
forest_df_sub,:simulations = simobs_uncertainty(data_inference, :subject; samples = :N)
)
Row | type | N | group | label | time | covariates | events | notes | subject | simulations |
---|---|---|---|---|---|---|---|---|---|---|
String | Int64 | String | String | StepRang… | NamedTup… | DosageRe… | String | Subject… | Array… | |
1 | reference | 100 | Reference | Uncertainty | 0:1:672 | (WEIGHT = 70, SEX = "FEMALE") | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = ReferenceUncertainty, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv | |
2 | alternative | 100 | SEX (ref: FEMALE) | MALE | 0:1:672 | (SEX = "MALE",) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = SEX (ref: FEMALE)MALE, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv | |
3 | alternative | 100 | WEIGHT (ref: 70 kg) | 50 kg | 0:1:672 | (WEIGHT = 50,) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = WEIGHT (ref: 70 kg)50 kg, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv | |
4 | alternative | 100 | WEIGHT (ref: 70 kg) | 90 kg | 0:1:672 | (WEIGHT = 90,) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = WEIGHT (ref: 70 kg)90 kg, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv | |
5 | alternative | 100 | DOSE (ref: 10mg) | 20 mg | 0:1:672 | NamedTuple() | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 20.0 1 24.0 27 0.0 0.0 0 ⋯\n 1 column omitted | Subject(id = DOSE (ref: 10mg)20 mg, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv |
Here the naming convention _WU
means With Uncertainty
.
2.3.2 Simulation with BSV+RUV
Simulating with BSV and RUV is straight forward, as this is the standard way of invoking simobs
in Pumas.
= @rtransform(
forest_df_sims_BR deepcopy(forest_df_sub),
:simulations = simobs(mdl_auc, fill(:subject, :N), coef(data_result))
)1] = "BSV + RUV" forest_df_sims_BR.label[
"BSV + RUV"
Here the naming convention _BR
means BSV and RUV
. Notice, that this is typically not the approach used for covariate model evaluation. If BSV and RUV is used it is typically not used together with simulation with uncertainty as it becomes impossible to separate uncertainty sampling from the other samples being taken for random effects and observed variables.
2.4 Extract exposure and individual coefficient data
After simulations have been performed the next step is to extract the relevant individual coefficients and secondary parameters. How to get these depends on the simulation approach. Later, we will look at how to set up the forest plot based on NCA calculations, but first we will look at the model based approach.
Since we simulated a cumulative AUC
we can find the AUC
in a specific interval by subtracting the first value in our evaluation interval from the value at the final time point. The following function accepts our simulations, the variable name of the AUC
variable and the time interval in which to calculate the AUC
:
function dynamic_auc(sims; var, interval)
= map(sims) do sim
auc = filter(x -> x.evid == 0, DataFrame(sim))
df1 = filter(x -> x.time == interval[1] || x.time == interval[2], df1)
df2 = combine(groupby(df2, :id), var => x -> last(diff(x)))
df3 getproperty(df3, Symbol(string(var) * "_function"))[1]
end
return auc
end
dynamic_auc (generic function with 1 method)
The following function calculates Cmax
in the same manner
function dynamic_cmax(sims; var, interval)
= map(sims) do sim
auc = filter(x -> x.evid == 0, DataFrame(sim))
df1 = filter(x -> interval[1] <= x.time <= x.time == interval[2], df1)
df2 = combine(groupby(df2, :id), var => x -> maximum(skipmissing(x)))
df3 getproperty(df3, Symbol(string(var) * "_function"))[1]
end
return auc
end
dynamic_cmax (generic function with 1 method)
Finally, we can get the icoef
s from the df.simulations
output. We create a function to grab the first value of each icoef
for each simulated subject for each level of simulation:
forest_icoef(df, name) =
first.(getproperty.(getproperty.(sims, :icoefs), name)) for sims in df.simulations] [
forest_icoef (generic function with 1 method)
2.5 Summarize data
Our plotting function will create all the summarization as part of the plot, but it is instructive to understand what is going on behind the scenes.
Whenever we have a reference Subject
and an alternative Subject
that we have simulated, we extract their exposure measurements of interest. Let us say this is AUC
at steady state. The values that are shown in the forest plot are then calculated as follows:
- Calculate the median value of the reference
Subject
across theN
simulations - Divide the values for the alternative
Subject
with the median from the previous step to generate a ratio - Calculate the 0.05, 0.5 0.95 quantiles of the ratios to get the 90% confidence interval and the median value for the ratio
It is also possible to show the effects at their original scales by modifying the functions below to accept a relative_reference
keyword argument instead of hardcoding the value to be true
.
These numbers are then reported in the plot as numerical values and as markers with error bands.
2.6 Plotting the data
In Pumas v2.6, a beta feature was added to the PlottingUtilities
package called table_forest_plot
. The docstring can be explored by typing ?table_forest_plot
after using PlottingUtilities
. This function is able to take information about a forest plot to set up the visuals according to covariate groups, values that are different from reference values and the numerical values of the median and intervals that are used in the plot. It is possible to explore this function for advanced users, but currently, we provide the following functions in this tutorial that can be used to map from a DataFrame
as we have used so far to the input format table_forest_plot
needs.
# Function to calculate the triplet to be shown
function forest_values_column(forest_input)
# For each input, calculate the median and the 5th and 95th percentiles,
# and return them as a string in the format "median [5th percentile 95th percentile]"
["$(round(median(input),digits=3)) [$(round(quantile(input,0.05),digits=3)) $(round(quantile(input,0.95),digits=3))]"
in forest_input
for input
]end
function df_forest_plot(df, parameter; kwargs...)
# Extract the values for the given parameter from the dataframe
= getproperty(df, parameter[1])
values
# Find the index of the "reference" row in the dataframe
= findfirst(x -> x == "reference", df.type)
reference_values
# Extract the reference values
= values[reference_values]
reference
# If the reference values are a vector, calculate their median
if reference isa AbstractVector
= median(reference)
reference end
# Calculate the numerical text to be displayed in the forest plot
# This is done by dividing each value by the median of the reference values
= forest_values_column(map(x -> x ./ median(reference), values))
numerical_text
# Define the headers for the forest plot
= ["Scenario", "Median [90% CI]", all(isempty, df.notes) ? nothing : "Notes"]
headers
# Define the label for the axis
= "Ratio to reference median"
axis_label
# Flag to indicate whether the reference values should be relative
= true
relative_reference
# Calculate the values and reference to be used in the forest plot
= if relative_reference
values, reference # If relative_reference is true, divide each value by the reference
# and set the reference to 1
= map(val -> val isa Nothing ? nothing : val ./ reference, values)
values = 1
reference
values, referenceelse
# If relative_reference is false, use the original values and reference
values, referenceend
# Create the forest plot
table_forest_plot(
# data to be plotted
[df.label, numerical_text, df.notes], # headers for the plot
headers, # values to be plotted
values, # start of optional arguments
; = parameter[2], # header for the axis
axis_header # reference values
reference, = df.group, # groups for the plot
groups # label for the axis
axis_label, = [0.8, 1.25], # reference band for the plot
reference_band ..., # additional optional arguments
kwargs
)end
df_forest_plot (generic function with 1 method)
The plot generates some custom headers, calculates reference values, and so on. We also set the grey reference band to be plotted from 0.8 to 1.25 which is the typical values recommended for bioequivalence testing. Ideally, it should be set with specific clinical goals or requirements in mind.
2.7 Exposure measures
We can now use the functions we prepared to calculate the exposure measures and summarize them using the plotting function.
The model specification mdl_auc
uses a dynamical variable Central_AUC
to measure the AUC of the central compartment through the AUC
variable set in the @observed
block. We can access this through the DataFrame
constructor on the SimulatedObservations
.
2.7.1 AUC last dosing interval
=
forest_df_sims_WU.auc_measure dynamic_auc(sim; var = :AUC, interval) for sim in forest_df_sims_WU.simulations]
[df_forest_plot(forest_df_sims_WU, :auc_measure => "AUC", ;)
=
forest_df_sims_WU.cmax_measure dynamic_cmax(sim; var = :dv, interval) for sim in forest_df_sims_WU.simulations]
[df_forest_plot(forest_df_sims_WU, :cmax_measure => "Cmax", ;)
df_forest_plot(forest_df_sims_WU, :auc_measure => "AUC", ;)
df_forest_plot(forest_df_sims_WU, :auc_measure => "AUC"; show_density = true)
This is one of the first forest plots we have produced, so let us break down its components:
Columns:
- The first column lists the labels for each scenario.
- The second column shows the median ratio values along with intervals defined by the 0.05 and 0.95 quantiles (representing the 90% confidence interval).
- The third column is the “plot” part of the forest plot, where the median ratios are visualized relative to a reference line (1.0) and a shaded range [0.8, 1.25].
Reference Group:
- The reference group is labeled “uncertainty,” indicating variability in the reference simulation.
- By definition, it has a median ratio of 1.0 because it serves as the baseline.
- This group reflects parameter uncertainty, allowing us to assess the overall uncertainty of the plot.
SEX Group:
- This group shows the effect of changing the SEX of a reference subject from FEMALE to MALE.
- The median ratio is below 1.0, indicating that being MALE reduces the AUC compared to FEMALE.
- The median value and the intervals are critical for assessing the significance of this effect.
WEIGHT Group:
- This group is similar in structure to the SEX group but includes two alternative weight scenarios: 50 kg and 90 kg.
- Each weight scenario is compared to the reference weight (70 kg), with corresponding median ratios and intervals.
To compare the simulations with parameter uncertainty to the simulations with ruv and bsv we can produce the following plot:
1] = "BSV + RUV"
forest_df_sims_BR.label[=
forest_df_sims_BR.auc_measure dynamic_auc(sim; var = :AUC, interval) for sim in forest_df_sims_BR.simulations]
[df_forest_plot(forest_df_sims_BR, :auc_measure => "AUC", ;)
We immediately see that including BSV and RUV increases the range of simulated values significantly. The medians are similar.
2.7.2 Cmax at pseudo-steady-state
df_forest_plot(forest_df_sims_WU, :cmax_measure => "Cmax pss", ;)
df_forest_plot(forest_df_sims_WU, :cmax_measure => "Cmax pss", ; show_density = true)
Overall we see fairly limited effects, and of course we see an effect of the big dose group of around 2.
2.8 Individual Coefficients Forest Plots
Depending on the motivation to produce the forest plots it may be useful to produce forest plots for individual parameters. In Pumas, these are called icoef
s. We can produce an example for clearance to illustrate how this may be done.
2.8.1 Prepare icoef
:CL] = forest_icoef(forest_df_sims_WU, :CL) forest_df_sims_WU[!,
We could repeat it for all combinations of WU
, bsvruv
, density=false
, or density=true
. Here, we will only keep the default density=false
and plot clearance for simulation with parameter uncertainty:
df_forest_plot(forest_df_sims_WU, :CL => "CL")
3 Calculation of Exposures with NCA.jl
Let us look at how we can calculate exposure measures using NCA.jl
functionality. Instead of using the model functionality we will use steady state doses and the NCA
functions in Pumas. We do not need the special AUC
compartment in this application. More specifically, we cannot use an AUC
compartment because we will be using steady-state doses. The AUC
compartment can never reach a steady state as it will be ever-growing, so we need to exclude it when setting ss=1
in the DosageRegimen
below.
= @model begin
mdl @param begin
~ RealDomain(lower = 0.5, upper = 1.5, init = 1.0)
tvCL ~ RealDomain(lower = 0.1, upper = 20.0, init = 10.0)
tvVc ~ RealDomain(lower = -2.0, upper = 2.0, init = 1.2)
WEIGHTonCL ~ RealDomain(lower = -2.0, upper = 2.0, init = 1.4)
SEXonCL end
@random begin
~ Normal(0.0, 0.05)
etaVc ~ Normal(0.0, 0.15)
etaCL end
@covariates WEIGHT SEX
@pre begin
= tvCL * (WEIGHT / 70)^WEIGHTonCL * SEXonCL^(SEX == "MALE") * exp(etaCL)
CL = tvVc * exp(etaVc)
Vc end
@dynamics begin
' = -CL / Vc * Central
Centralend
@derived begin
~ @. LogNormal(log(Central / Vc), 0.1)
dv end
end
PumasModel
Parameters: tvCL, tvVc, WEIGHTonCL, SEXonCL
Random effects: etaVc, etaCL
Covariates: WEIGHT, SEX
Dynamical system variables: Central
Dynamical system type: Matrix exponential
Derived: dv
Observed: dv
Again, we simulate the different scenarios: Specifying the route
enables us to later use functionality from NCA.jl to calculate AUC.
= 24
ii = 0:8:ii
sim_time = DosageRegimen(10; ss = 1, ii, route = NCA.IVBolus)
ss_event = DosageRegimen(20; ss = 1, ii, route = NCA.IVBolus)
ss_event20
= DataFrame(
forest_df_nca "type" => ["reference", "alternative", "alternative", "alternative", "alternative"],
"N" => fill(100, 5),
"group" => [
"Reference",
"SEX (ref: FEMALE)",
"WEIGHT (ref: 70 kg)",
"WEIGHT (ref: 70 kg)",
"DOSE (ref: 10mg)",
],"label" => ["Uncertainty", "MALE", "50 kg", "90 kg", "20 mg"],
"time" => fill(sim_time, 5),
"covariates" => [
= 70, SEX = "FEMALE"),
(; WEIGHT = "MALE"),
(; SEX = 50),
(; WEIGHT = 90),
(; WEIGHT
(;),
],"events" => [ss_event, ss_event, ss_event, ss_event, ss_event20],
"notes" => ["", "", "", "", ""],
)= prepare_subjects(forest_df_nca) forest_df_nca
And we can simulate the individual datasets:
# Since the original model had an AUC compartment we can no longer simulate from
# the past fit result because we now have steady state dosing and AUC never reaches
# a steady state.
= fit(
data_result_nca
mdl,Subject.(simulated_data),
param0,FOCE();
= (; show_trace = false),
optim_options
)= infer(data_result_nca)
data_inference_nca = @rtransform(
forest_df_nca
forest_df_nca,:simulations = simobs_uncertainty(data_inference_nca, :subject; samples = :N)
)
[ Info: Checking the initial parameter values.
[ Info: The initial negative log likelihood and its gradient are finite. Check passed.
[ Info: Calculating: variance-covariance matrix.
[ Info: Done.
Row | type | N | group | label | time | covariates | events | notes | subject | simulations |
---|---|---|---|---|---|---|---|---|---|---|
String | Int64 | String | String | StepRang… | NamedTup… | DosageRe… | String | Subject… | Array… | |
1 | reference | 100 | Reference | Uncertainty | 0:8:24 | (WEIGHT = 70, SEX = "FEMALE") | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted | Subject(id = ReferenceUncertainty, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv | |
2 | alternative | 100 | SEX (ref: FEMALE) | MALE | 0:8:24 | (SEX = "MALE",) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted | Subject(id = SEX (ref: FEMALE)MALE, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv | |
3 | alternative | 100 | WEIGHT (ref: 70 kg) | 50 kg | 0:8:24 | (WEIGHT = 50,) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted | Subject(id = WEIGHT (ref: 70 kg)50 kg, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv | |
4 | alternative | 100 | WEIGHT (ref: 70 kg) | 90 kg | 0:8:24 | (WEIGHT = 90,) | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 10.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted | Subject(id = WEIGHT (ref: 70 kg)90 kg, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv | |
5 | alternative | 100 | DOSE (ref: 10mg) | 20 mg | 0:8:24 | NamedTuple() | DosageRegimen\n1×10 DataFrame\n Row │ time cmt amt evid ii addl rate duration ss ⋯\n │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 ⋯\n─────┼──────────────────────────────────────────────────────────────────────────\n 1 │ 0.0 1 20.0 1 24.0 0 0.0 0.0 1 ⋯\n 1 column omitted | Subject(id = DOSE (ref: 10mg)20 mg, ...) | Simulated population (Vector{<:Subject}), n = 100, variables: dv |
3.1 Individual Coefficients
Since this is an alternative way of calculating the exposure measurements we do not need to repeat a lot of icoef
based forest plots, but for good measure let us show that the procedure is the same as before:
:CL] = forest_icoef(forest_df_nca, :CL) forest_df_nca[!,
df_forest_plot(forest_df_nca, :CL => "CL")
3.2 Exposure measures
To calculate auclast
we take the simulated subjects, turn them into Subject
s and then we turn those into NCASubjects
to be able to use the NCA.auclast
function.
forest_auc_last(df) =
auclast(NCASubject.(Subject.(sim))).auclast for sim in df.simulations] [NCA.
forest_auc_last (generic function with 1 method)
Cmax at steady state can also be calculated using NCA
. Of course, there will be a significant dose size effect here which should be around 2 as we are doubling the dose. Let us see if the covariate effects affect Cmax
as well:
forest_cmax_ss(df) =
cmaxss(NCASubject.(Subject.(sim))).cmaxss for sim in df.simulations] [NCA.
forest_cmax_ss (generic function with 1 method)
3.2.1 AUC last dosing interval
= forest_auc_last(forest_df_nca) forest_df_nca.auc_measure
df_forest_plot(forest_df_nca, :auc_measure => "AUC", ;)
df_forest_plot(forest_df_nca, :auc_measure => "AUC"; show_density = true)
4 Styles and preferences
We chose one way of creating a forest plot above, but many practitioners have their own preferences of labels, colors, and so on. It is of course important to be able to modify this to preferences or even in-house requirements at different companies. One thing that is sometimes set up different would be the category labels. We put the reference value with the group name but others prefer to only use the covariate name as the group name and write the row labels as alternative_value : reference_value
. We can do the same here:
= deepcopy(forest_df_sims_WU)
forest_df_sims_WU_newlabel .= ["Reference", "SEX", "WEIGHT", "WEIGHT", "DOSE"]
forest_df_sims_WU_newlabel.group .=
forest_df_sims_WU_newlabel.label "Uncertainty", "MALE : FEMALE", "50 kg : 70 kg", "90 kg : 70 kg", "20 mg : 10 mg"]
[df_forest_plot(forest_df_sims_WU_newlabel, :auc_measure => "AUC", ;)
or we can even get creative:
= deepcopy(forest_df_sims_WU)
forest_df_sims_WU_newlabel .= ["Reference", "SEX", "WEIGHT", "WEIGHT", "DOSE"]
forest_df_sims_WU_newlabel.group .=
forest_df_sims_WU_newlabel.label "Uncertainty", "MALE ← FEMALE", "50 kg ← 70 kg", "90 kg ← 70 kg", "20 mg ← 10 mg"]
[df_forest_plot(forest_df_sims_WU_newlabel, :auc_measure => "AUC", ;)
5 Conclusion
In this tutorial, we saw how to produce forest plots in Pumas using built-in functionality and the new beta feature in PlottingUtilities
. We encourage all users try to out the workflow on their own problem and produce feedback an constructive criticism before the final forest plot feature is release in a future version of Pumas.