using Pumas # available with Pumas products - also reexports Distributions
using Bioequivalence # available with Pumas products
using PharmaDatasets # available with Pumas products
using CairoMakie
using AlgebraOfGraphics
using DataFrames
using DataFramesMeta
using SummaryTables
using StatsBase
using Random
using MixedModels

A Course on Bioequivalence: Unit 4 - Different experimental designs
This is Unit 4 of a complete bioequivalence analysis course using Pumas. There are 15 units in the course in total.
1 Unit overview
In the previous unit we considered statistical inference concepts and focused on the hypothesis test formulation for bioequivalence. In this unit we explore various experimental designs used in a bioequivalence trials. In particular we consider:
- The concept of a a multi-period trial and an experimental design.
- Randomization, blindness, and open-label.
- The very popular \(2\times 2\) (sometimes called the \(2\times 2\times 2\)) crossover design.
- An exploration of product specific guidances.
- Other popular experimental designs including partial replicate, fully replicate, and multiple treatment designs.
- Considerations for multiple reference (comparator) products and multiple test products.
- Multiple comparisons and adjustment of \(\alpha\).
- General statistical considerations for choosing an experimental design.
We use the following packages.
We have used these packages in the previous units. One new package we use in this unit is MixedModels
used for random effects models.
2 The concept of an experimental design
Our focus up to now was on the case where we break the subjects into cohorts of R
and T
, and then administer the respective dosages to each of the cohorts. This is the very basic parallel design.
However, there are other experimental designs that are more efficient, powerful, and enable detecting and estimating additional quantities.
In particular, the parallel design has only a single period in the sense that at the onset of the trial participants are administered a single formulation and once that treatment is complete, the trial is over. What if we broke up the participants’ trial journey into multiple periods, and in each successive period we would administer a different formulation to the subject?
As a first basic example of more than one period, assume we have this AUC endpoint dataset arising from an hypothetical bioequivalence trial:
= DataFrame(
be_data_paired = vcat(fill.(1:5, 2)...),
id = fill("TR", 10),
sequence = repeat(1:2, 5),
period = [21.5, 24.6, 22.5, 28.4, 23.8, 22.1, 19.6, 21.4, 19.9, 21.5],
AUC
)simple_table(be_data_paired)
id | sequence | period | AUC |
1 | TR | 1 | 21.5 |
1 | TR | 2 | 24.6 |
2 | TR | 1 | 22.5 |
2 | TR | 2 | 28.4 |
3 | TR | 1 | 23.8 |
3 | TR | 2 | 22.1 |
4 | TR | 1 | 19.6 |
4 | TR | 2 | 21.4 |
5 | TR | 1 | 19.9 |
5 | TR | 2 | 21.5 |
As you can see there are 5 subjects in this trial with ids 1
,…,5
. In contrast to a parallel trial, each subject appears in two rows, once for period 1
and once for period 2
. This is sometimes called a paired design because each observation of T
on a subject and the corresponding observation of R
on the subject form a pair.
There is also a sequence column. In this case there is only a single sequence and it is TR
for all subjects. This implies that each subject is first administered the test treatment T
in the first period and is then administered the reference treatment R
in the second period.
Let us compare and contrast the design we see with be_data_paired
with the simpler parallel design that we considered up to now. For clarity let us create be_data_parallel
which has the same number of AUC endpoint measurements.
= DataFrame(
be_data_parallel = 1:10,
id = vcat(fill("T", 5), fill("R", 5)),
group = 1,
period = [21.1, 21.6, 22.8, 19.6, 23.2, 18.8, 19.9, 21.7, 22.3, 23.2],
AUC
)simple_table(be_data_parallel)
id | group | period | AUC |
1 | T | 1 | 21.1 |
2 | T | 1 | 21.6 |
3 | T | 1 | 22.8 |
4 | T | 1 | 19.6 |
5 | T | 1 | 23.2 |
6 | R | 1 | 18.8 |
7 | R | 1 | 19.9 |
8 | R | 1 | 21.7 |
9 | R | 1 | 22.3 |
10 | R | 1 | 23.2 |
Here are some point contrasting the two designs, focusing on the datasets that represent these designs:
- The number of subjects in
be_data_paired
is half of that ofbe_data_parallel
. This is obvious since in the two-period design yieldingbe_data_paired
we “reuse” a subject and administerR
after we administerT
. - The duration of the trial for
be_data_paired
is typically at least twice as long as the trial forbe_data_parallel
. This is because each subject has to first receive formulationT
. Then after the first period we need a washout period giving the body time to clear of the active ingredient. Then in the second period the subject receives formulationR
. Note that the duration of the washout period can sometimes be extensive and is determined by the half life property of the product in question. - In case administration of the formulation affects a subject in the long term (or beyond a washout period),
be_data_paired
will have the second period measurements affected by the first period. - With
be_data_parallel
, variability betweenT
andR
cannot be isolated from variability between subjects. In a parallel design some of the variation we encounter is due to natural differences between subjects. In contrast withbe_data_paired
since each subject is administered bothT
andR
, by considering the difference between formulations on each subject, the measurements are potentially less influenced by the variability between subjects. Such pairing is a simple form of blocking which is a key concept in experimental design. It is an approach to remove undesirable variability from the measurements.
Point (1) is often an obvious difference and is a plus of multi-period designs. Having fewer subjects is more ethical and reduces costs. In contrast, point (2) is a plus for the one-period parallel design. It is often a key consideration if the needed washout period is long. Point (3) is a case that occurs if the washout period is “essentially infinite”. This would be a key consideration of trials that consider outcomes, including curing of a condition. It is less of a consideration in bioequivalence trials. Finally, the most important issue is point (4). From a statistical point of view, isolating subject effects is often critical. This point implies that the power of multi-period designs is often superior to the equivalent power of single period (parallel) designs.
In summary there are several reasons to use more than one period (points (1) and (4) above), as well as some potential complications (points (2) and (3) above). We now consider the most popular kind of multi-period design, the \(2 \times 2\) cross over design.
3 Overarching principles: randomization and blindness/open-label
A crucial principle that underlies all sound experimental designs is randomization.
Randomization is the process of assigning subjects to treatment groups (e.g., T
or R
in a parallel design). It requires using a formal, probabilistic mechanism, such as a coin flip or, more commonly, a computer-generated random sequence. Without randomization, we risk introducing bias. For instance, an investigator might subconsciously assign healthier-looking subjects to one group, or the first subjects to enroll in a trial might be systematically different from those who enroll later. Randomization breaks any systematic connection between the characteristics of the subjects and the treatment they receive, ensuring that, on average, the groups are comparable in all respects, both known and unknown.
To see an example of randomization, say that we have \(20\) subjects that we want to partition into \(4\) groups. For example both the reference and test formulations with fasting and without. We can use the sample
function from StatsBase
with the replace=false
option. Here is example code:
= 20
num_subjects = ["R fasting", "T fasting", "R fed", "T fed"]
group_names = length(group_names)
num_groups = num_subjects ÷ num_groups # ÷ \div + [TAB], integer division
subjects_per_group = 1:20
subjects
Random.seed!(0)
= sample(subjects, length(subjects); replace = false)
shuffled_subjects = Iterators.partition(shuffled_subjects, subjects_per_group)
groups
for (i, group) in enumerate(groups)
println(group_names[i], ": ", sort(group))
end
R fasting: [3, 5, 9, 15, 18]
T fasting: [7, 8, 10, 12, 17]
R fed: [4, 6, 11, 16, 20]
T fed: [1, 2, 13, 14, 19]
Note that the Random.seed!(0)
statement ensures the same randomization occurs every time we execute this code. The “seed” 0
can be replaced with any other number and that would imply a different randomization.
A related, but distinct, concept is blinding.
Blinding refers to the practice of concealing the treatment assignment from one or more parties involved in the trial. In a single-blind trial, the subjects are unaware of which treatment they are receiving. In a double-blind trial, considered the gold standard in many clinical trials, neither the subjects nor the investigators and their staff know the treatment assignments. This prevents knowledge of the treatment from influencing subjects’ reporting of effects or investigators’ assessment of outcomes. When all parties are aware of the treatment assignments, the trial is called open-label.
While double-blinding is critical for trials with subjective endpoints (like pain relief or mood), bioequivalence studies are typically conducted in an open-label fashion.
This is because the primary endpoints, such as AUC and Cmax, are objective pharmacokinetic measurements derived from laboratory analysis. These values are not susceptible to patient or investigator subjectivity.
However, even in an open-label setting, proper randomization of subjects to their assigned treatments or sequences is non-negotiable, as it remains the fundamental safeguard against bias and ensures the statistical validity of the trial’s conclusions.
4 The \(2 \times 2\) crossover design
The design that yields the be_data_paired
data above has the disadvantage that all subjects were first administered T
and only later they were administered R
. This can cause a bias in the results in case the T
treatment affects the R
treatment that follows, or in case conditions during the first period are significantly different in comparison to the second period.
To alleviate such potential biases, the \(2 \times 2\) crossover design (sometimes also called \(2 \times 2 \times 2\)) makes use of the notion of a sequence. There are \(2\) sequences, RT
and TR
. The first sequence implies that subjects receive R
in the first period and T
in the second period. Similarly, the second sequence is the opposite as subjects receive T
in the first period and R
in the second. Note that the phrase \(2 \times 2\) stands for “2 sequences” and “2 periods”. Further, when called \(2 \times 2 \times 2\), the additional “2” stands for the number of formulations.
Here is a tabular representation of this design:
Sequence | Period 1 | Period 2 |
---|---|---|
RT | R |
T |
TR | T |
R |
As with any trial, the study needs to be randomized. Recruited subjects are allocated either to the RT
sequence or the TR
sequence. Now, the same benefit as in the paired design creating be_data_paired
are achieved. Further, it is possible to test and rule out period effects since in each period we have subjects from both treatments.
This design is undoubtedly the most common and most recommended design to use in a bioequivalence study. Parallel designs are typically only recommended in case when there is a long washout period. Other more complex designs described below are typically for the specific cases that we describe.
Here is example endpoint data from a \(2 \times 2\) crossover study. We display only the first \(8\) rows.
= dataset(joinpath("bioequivalence", "RT_TR", "PJ2017_3_1.csv"))
be_data simple_table(first(be_data, 8))
id | sequence | period | AUC | Cmax |
1 | RT | 1 | 2849 | 499 |
1 | RT | 2 | 2230 | 436 |
2 | TR | 1 | 2025 | 438 |
2 | TR | 2 | 2000 | 361 |
3 | TR | 1 | 2090 | 535 |
3 | TR | 2 | 1826 | 558 |
4 | RT | 1 | 2790 | 733 |
4 | RT | 2 | 2864 | 416 |
We can use the detect_design
function from the Bioequivalence
package. Pumas uses also this function internally to automatically determine the type of design and study paradigm from the data.
detect_design(be_data)
(design = "RT|TR",
num_sequences = 2,
num_periods = 2,
num_formulations = 2,
reference_formulation = 'R',
test_formulations = ['T'],
replicated = non_replicated,
supports_rsabe = false,
crossover = true,)
You can see that the design string is RT|TR
capturing the two sequences. You can further see that num_sequences = 2
, num_periods = 2
, and num_formulations = 2
, all in agreement with the \(2 \times 2 \times 2\) naming convention. Other information returned by detect_design
also describes this design. For example you can see that Pumas identifies that the reference formulation is R
, and so fourth.
Now carrying out bioequivalence analysis via pumas_be
, and considering AUC
by default, we obtain:
pumas_be(be_data)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 17 | 17 |
TR | 15 | 15 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 1879 | ||
Geometric Naive Mean | 1887 | |||
T | Geometric Marginal Mean | 1848 | ||
Geometric Naive Mean | 1845 | |||
Geometric Mean T/R Ratio (%) | 98.36 | |||
Degrees of Freedom | 30 | |||
90% Confidence Interval (%) | [94.07, 102.8] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 10.52 | 0.1049 | ||
ANOVA | Formulation (p-value) | 0.5334 | ||
Sequence (p-value) | 0.1818 | |||
Period (p-value) | 0.00235 | |||
We’ll analyze elements of this output in further detail in the sequence but let us highlight a few points now.
- Observe the
Observation Counts
table at the start of the output. As is evident there were17
subjects in theRT
sequence, and15
subject it theTR
sequence. This imbalance is often not planned but can rather occur due to dropouts. - Observe that the
Paradigm
isNon replicated crossover design
. We discuss replication (non replicated, partially replicated, and fully replicated designs) below. - Pumas chooses a
Linear Model
for such a design. We discuss this in the next unit. - Observe that we obtain estimates of the
Marginal Mean
both forR
and forT
. These are model based estimates, and they may differ from the data (Naive
) estimates in case of unbalanced data (as we have here). - The
Geometric Mean T/R Ratio
is98.36
as a percentage. This is obtained from the marginal means.
round(100 * 1848 / 1879, digits = 2) # value slightly differs from output due to rounding
98.35
- A model based confidence interval of
[94.07, 102.8]
is obtained and since it is well contained within the 80–125 margin, wePass
. - Note that the model uses
30
degrees of freedom. This quantity is related to the variability used to generate the confidence interval. It is a property of the linear model used (see next unit) and the size of the data. - The variability estimate is at a
CV
of10.52
as a percentage. More on this estimation is in the next unit. The associated standard deviation estimate is0.1049
. - Since a statistical model is used, we are able to investigate if there are any unwanted effects in the model. In particular, Type-III p-values are computed for
Formulation
,Sequence
, andPeriod
. When a p-value is non-small we consider the effect as insignificant and when it is small we consider the effect as significant. In this case, since we see aPeriod
p-value of0.00235
, there may be reason to believe that an (unwanted) period effect is taking place.
Note: p-values are a deceptive quantity. As a rule of thumb, if a p-value is between \(0\%\) and an order of \(5\%\) (or sometimes \(10\%\)) then it is quite a clear indication that the underlying factor has an effect. In particular this holds if p-values are very small. For example above, we see that the period almost certainly has an effect because the p-value of 0.00235
is very small. However, if p-values are greater than \(10\%\) (or sometimes \(5\%\)) then their value is not really important. So a p-value of 0.1818
for Sequence
above should be treated exactly the same as the p-value of 0.5334
for Formulation
above. In particular, if there is no effect, we can expect the p-value to be a uniform quantity anywhere between \(0\%\) and \(100\%\). Thus we should never speak about “high p-values”. We can rather speak about “very low p-values” (as in the case of 0.00235
) or “marginally low p-values” (as would have been the case with a p-value of around \(5\%\) to \(10\%\)).
5 Exploring example product specific guidances
Now that we know a bit about parallel and cross over designs, let us carry out a short exploration of FDA’s website for Product-Specific Guidances (PSG) for Generic Drug Development: FDA (2025). As of July 2025 there are 2,288 PSGs listed. To briefly explore a small subset of these PSGs, we look at a few arbitrary products.
For this, since the FDA lists the PSG’s based on the active ingredient alphabetically, we consider the first PSG entry for the first 8 letters A–H. You can click the URL to see the PDF of each PSG to see various guidelines. Below we summarize key points:
- Abacavir sulfate: 2 by 2 cross over design on healthy males and non-pregnant, non-lactating females.
- Bacitracin:This is an ointment. A comparative study is recommended.
- Cabotegravir: parallel design on male and non-pregnant and non-lactating females.
- Dabigatran etexilate mesylate: Both a fasting and a fed study is recommended with a 2-treatment, 2-sequence, 4-period, fully replicated crossover design on normal healthy males and females.
- Econazole nitrate: This is a cream. The design is a clinical endpoint randomized, double blind, parallel, placebo-controlled.
- Famciclovir: 2 by 2 cross over design on healthy males and non-pregnant, non-lactating females.
- Gabapentin: 2 by 2 cross over design on healthy males and non-pregnant, non-lactating females.
- Halcinonide: This is a cream. The FDA recommends two studies, a pilot study and pivotal study.
This gives a taste of the fact that \(2 \times 2\) cross over studies are common. It also motivates looking at fully replicated crossover studies as described below. As is typical from creams and ointments, comparative studies are recommended. In contrast to the bioequivalence studies we cover here, based on endpoints, comparative studies typically consider the effects of the product. They are mostly out of the scope of this course.
Note that we present these here only for pedagogic purposes, if conducting a trial for such products you should consult directly with the FDA or the appropriate regulatory agency.
6 Other Designs
So far we have seen the parallel design (single period) and the \(2 \times 2\) crossover design. These are the two most common alternatives. Now that we are aware of the way designs are described based on sequences, periods, and formulations we may consider other designs as well. In particular we may consider designs with more than two sequences, more than two periods, and even more than two formulations.
Let us take an approach of considering some of the common designs by considering the example datasets supplied via the PharmaDatasets
package. In general, with that package, the datasets
function returns a list of all of the datasets. Those datasets associated with bioequivalence endpoints are set with the prefix "bioequivalence"
, then we can do some string processing to see the design type which follows "bioequivalence"
. This is the list:
unique([split(ds, "/")[2] for ds in filter(contains("bioequivalence"), datasets())])
15-element Vector{SubString{String}}:
"R_T"
"RR_RT_TR_TT"
"RTRT_TRTR"
"RRT_RTR_TRR"
"RT_TR"
"RTTR_TRRT"
"RTT_TRR"
"RTR_TRT"
"RTR_TRR"
"RR_TT"
"RST_RTS_SRT_STR_TRS_TSR"
"RRTT_TTRR"
"RTRT_RTTR_TRRT_TRTR"
"RRTT_RTTR_TRRT_TTRR"
"ADBC_BACD_CBDA_DCAB"
In this case we use strings such as "R_T"
, "RR_RT_TR_TT"
, etc.. to specify a design. Let us understand how to read this encoding. For example, the datasets with "RT_TR"
are for the \(2 \times 2\) crossover designs the we covered above. In all this specification we typically use R
for the reference and T
for the test as usual.
As you can see, certain designs have 3 or 4 formulations. This is evident with "RST_RTS_SRT_STR_TRS_TSR"
, and "ADBC_BACD_CBDA_DCAB"
. When it comes to such designs with multiple formulations, one approach is to consider the design as a specification of comparing a single reference to multiple tests. With this approach, if we consider "RST_RTS_SRT_STR_TRS_TSR"
, the first case R
is still the reference and S
and T
are two alternative test formulations.
In the second case, "ADBC_BACD_CBDA_DCAB"
, formulation A
is considered as the reference and B
, C
, and D
are considered as test cases. In any case (specific to the bioequivlance package), we consider the lowest letter in the alphabet as the reference formulation. We discuss more issues associated with multiple formulations below.
As an illustration let us take an arbitrary dataset from each group of design available in PharmaDataset
. We then print the design information for the dataset and specify the design as
\[ \text{Num Sequences} \times \text{Num Periods} \times \text{Num Formulations} \]
=
design_types unique([split(ds, "/")[2] for ds in filter(contains("bioequivalence"), datasets())])
for design_type in design_types
= first(filter(contains("bioequivalence/$design_type"), datasets()))
dataset_name = dataset(dataset_name)
be_data = detect_design(be_data)
dd
println("Dataset name: $dataset_name")
println(
"\tDesign: $(dd.design), $(dd.num_sequences)×$(dd.num_periods)×$(dd.num_formulations)",
)end
Dataset name: bioequivalence/R_T/FSL2015_7
Design: R|T, 2×1×2
Dataset name: bioequivalence/RR_RT_TR_TT/CL2009_9_2_1
Design: RR|RT|TR|TT, 4×2×2
Dataset name: bioequivalence/RTRT_TRTR/PJ2017_6
Design: RTRT|TRTR, 2×4×2
Dataset name: bioequivalence/RRT_RTR_TRR/SLTGSF2020_DS07
Design: RRT|RTR|TRR, 3×3×2
Dataset name: bioequivalence/RT_TR/PJ2017_3_12
Design: RT|TR, 2×2×2
Dataset name: bioequivalence/RTTR_TRRT/CL2009_9_4_1
Design: RTTR|TRRT, 2×4×2
Dataset name: bioequivalence/RTT_TRR/SLTGSF2020_DS10
Design: RTT|TRR, 2×3×2
Dataset name: bioequivalence/RTR_TRT/SLTGSF2020_DS03
Design: RTR|TRT, 2×3×2
Dataset name: bioequivalence/RTR_TRR/SLTGSF2020_DS22
Design: RTR|TRR, 2×3×2
Dataset name: bioequivalence/RR_TT/From_PJ2017_4_2
Design: RR|TT, 2×2×2
Dataset name: bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5
Design: RST|RTS|SRT|STR|TRS|TSR, 6×3×3
Dataset name: bioequivalence/RRTT_TTRR/SLTGSF2020_DS28
Design: RRTT|TTRR, 2×4×2
Dataset name: bioequivalence/RTRT_RTTR_TRRT_TRTR/SLTGSF2020_DS23
Design: RTRT|RTTR|TRRT|TRTR, 4×4×2
Dataset name: bioequivalence/RRTT_RTTR_TRRT_TTRR/SLTGSF2020_DS24
Design: RRTT|RTTR|TRRT|TTRR, 4×4×2
Dataset name: bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6
Design: ADBC|BACD|CBDA|DCAB, 4×4×4
Let us now consider a few of these designs and their uses.
7 Within subject variability and between subject variability
When we measure a drug’s effect, the results are never identical. This variation, or variability, comes from two main sources: differences between subjects, and differences within the subject. Understanding these sources is crucial for designing an efficient study. The variability of products, typically quantified via the CV, is a key property that requires estimation, affects the nature of the product, and requires consideration when planning a study.
Between-subject variability (BSV), or inter-subject variability, describes how different people respond to the same drug. It arises from stable differences between individuals, such as genetics, body weight, metabolism, and organ function. This is why one person might consistently have higher AUC values than another, regardless of the formulation they receive.
Within-subject variability (WSV), or intra-subject variability, describes the random, unpredictable fluctuations in a single person’s response. If the same person takes the exact same drug on two different occasions, their AUC values will still differ slightly. This is due to day-to-day physiological changes, minor differences in diet or activity, and measurement error. This variability is an inherent property of the drug and its interaction with a person’s body over time.
A toy example
To get a feel for these two forms of variability let us a create a toy model of concentration over time. This is not a well established pharmacokinetic model but is rather an illustrative conceptual example.
Assume we have 4 subjects, which always start with a (normalized) concentration of 1.0
at onset. Then assume that at every time instance the concentration decreases when multiplied by a subject specific subject_average_concentration_loss_per_time
parameter. These are the parameters:
= [0.78, 0.87, 0.88, 0.91]
subject_average_concentration_loss_per_time = length(subject_average_concentration_loss_per_time); num_subjects
As you can see this constant varies across subjects and is as low as 0.78
for the first subject and as low as 0.91
for the fourth subject. This can be viewed as some inherent between subject variation.
Let us also assume that at every time, the concentration for a subject is updated by this inherent factor of the subject multiplied by a slight variation which is a uniform random quantity between 0.85
and 1.15
. These random variations then work together to inherently create the within subject variability.
Hence the simulate_subject!
function which we now creates, takes a DataFrame
(df
) and adds to it the trajectory of such a subject. This is done until the subject leaves the trial when the concentration drops below 5%.
function simulate_subject!(df, id, subj_per_counter)
= 1.0
concentration = 0
time while concentration > 0.05
push!(df, (id, subj_per_counter, time, concentration))
*=
concentration * rand(Uniform(0.85, 1.15))
subject_average_concentration_loss_per_time[id] += 1
time end
end;
We can now simulate this (toy) model for all four subjects, where we also assume that each subject is administered the formulation over \(3\) periods:
= DataFrame(id = Int[], subj_per = Int[], time = Int[], concentration = Float64[])
df = 1
subj_per_counter for id = 1:num_subjects
for _ = 1:3
simulate_subject!(df, id, subj_per_counter)
+= 1
subj_per_counter end
end
Note that in this toy example there is only a single formulation, say R
. Thus in a sense with the example the design is:
Sequence | Period 1 | Period 2 | Period 3 |
---|---|---|---|
RRR | R |
R |
R |
We now plot the results:
= categorical(df.id)
df.id = categorical(df.subj_per)
df.subj_per =
plt data(df) *
mapping(:time, :concentration, group = :subj_per, color = :id) *
visual(Lines)
draw(
plt,= (
axis = "Time",
xlabel = "Concentration",
ylabel = "Concentration-Time Profiles",
title
), )
By summing up each of the profiles along the time axis, we get a rough form of an AUC endpoint (here each time step was \(1\), e.g. one hour). We also carry out a log transformation. Note the composition log ∘ sum
where \circ
followed by the TAB key yields ∘
. Here are these endpoints organized by the subject id where each row is another period.
= select(
auc_data combine(groupby(df, [:id, :subj_per]), :concentration => log ∘ sum => :auc),
:id, :auc],
[
)= DataFrame(
wide_df in groupby(auc_data, :id)],
[g.auc for g "id: " .* string.(unique(auc_data.id)),
)simple_table(wide_df)
id: 1 | id: 2 | id: 3 | id: 4 |
1.33 | 1.84 | 1.83 | 2.26 |
1.33 | 1.84 | 2.19 | 2.3 |
1.4 | 1.82 | 1.73 | 2.32 |
Now just by considering the values above you see that within each subject group (column) there is some variability which arises form the different replications of the subject. This is the within subject variability. Further, across the various columns there is also variability which is ts the between subject variability.
Estimating the WSV and BSV from the toy model data
Without getting into the details at this point, let us use a linear mixed model from the MixedModels
package to estimate the WSV and BSV.
The approach we take is sometimes called a one-way random effects ANOVA model. In our case it can be represented as:
\[ \text{AUC}_{ij} = \mu + b_j + w_{ij}, \qquad i = 1,\ldots,3, \qquad j = 1,\ldots,4, \]
where \(i\) is the period index and \(j\) is the subject index. Here \(\mu\) is some overall mean, and it is assumed that \(b_j\) and \(w_{ij}\) are independent (normally distributed) random quantities with a mean of zero. The variability of \(b_j\) is the BSV and the variability of \(w_{ij}\) is the WSV.
A mixed model allows us to estimate these variabilities. We do not discuss the full details at this point:
VarCorr(lmm(@formula(auc ~ 1 + (1 | id)), auc_data, REML = true))
Column | Variance | Std.Dev | |
id | (Intercept) | 0.145379 | 0.381286 |
Residual | 0.015066 | 0.122742 |
The right most column of above table presents estimates of the standard deviations of \(b_j\) and \(w_{ij}\) respectively. Converting these values to CVs we obtain:
= round(sigma2geocv(0.381286), digits = 4)
BSV = round(sigma2geocv(0.122742), digits = 4)
WSV @show BSV
@show WSV;
BSV = 0.3956
WSV = 0.1232
While this toy model and analysis is not indicative of the type of analysis we may do for actual bioequivalence trials, we still see here that we can estimate the variabilities. In particular, in certain settings it is important to understand the WSV as it indicates the nature of the product in question. For example when WSV (as a CV) is greater that \(30\%\) drugs are typically called highly variable.
Design considerations for estimation of variability
A key advantage of crossover designs over parallel designs is that they can account for the large, systematic differences between subjects. By comparing the test and reference formulations within the same person, we effectively remove much of the between-subject variability from the analysis.
However, a standard 2×2 crossover design only allows us to estimate a single, combined measure of the within-subject variability. It cannot distinguish whether the test formulation is inherently more or less variable than the reference formulation.
To estimate the within-subject variability for each formulation separately, we need to administer the same formulation to a subject more than once. This is the primary motivation for replicate designs. Such designs are also useful for drugs that are classified as highly variable drugs (HVD) where there is a high within subject variability. They are also useful for narrow therapeutic index drugs (NTID). Both of these subjects are covered in detail in further units.
8 Replicate designs
A replicate design is a design where a formulation is administered more than once for a subject. There are two categories here:
- A fully replicate design has both (or all) formulations replicated.
- A partial replicate design has only one of the formulations replicated. Note that in this case, it is typically the reference formulation which is replicated.
Fully replicate design
This is a common fully replicate design:
Sequence | Period 1 | Period 2 | Period 3 | Period 4 |
---|---|---|---|---|
TRTR | T |
R |
T |
R |
RTRT | R |
T |
R |
T |
Observe that that in each of the sequences TRTR
and RTRT
, we administer both formulations R
and T
. Since there are two measurements of each formulation per subject, with this design we can estimate the within subject variability both for R
and for T
.
In fact, this design is recommended for NTID products as in FDA (2012), as well as for HVD products as in FDA (2011).
With four periods there is obviously more of a chance of encountering dropouts, since subjects are required to participate in the study for longer durations. Nevertheless, such designs generally require a smaller study size than two period designs.
A different variation of this \(2 \times 4 \times 2\) design assigns formulations in a slightly different order than above:
Sequence | Period 1 | Period 2 | Period 3 | Period 4 |
---|---|---|---|---|
RTTR | R |
T |
T |
R |
TRRT | T |
R |
R |
T |
Observe that no matter what, in both of these variations, subjects from each sequence receive both formulations twice, and each period exhibits both formulations.
For illustration here is pumas_be
applied to both of these variants of the \(2 \times 4 \times 2\) design (click the tabs to switch between the designs):
= dataset("bioequivalence/RTRT_TRTR/SLTGSF2020_DS12")
be_data pumas_be(be_data, endpoint = :PK)
Observation Counts | ||||
Sequence ╲ Period | 1 | 2 | 3 | 4 |
RTRT | 38 | 38 | 36 | 37 |
TRTR | 39 | 38 | 34 | 38 |
Paradigm: Replicated crossover that supports reference scaling | ||||
Model: Mixed model (unequal variance) | ||||
Criteria: Standard ABE | ||||
Endpoint: PK | ||||
Formulations: Reference(R), Test(T) | ||||
Results(PK) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 0.7645 | ||
Geometric Naive Mean | 0.7588 | |||
T | Geometric Marginal Mean | 0.9105 | ||
Geometric Naive Mean | 0.883 | |||
Geometric Mean T/R Ratio (%) | 119.1 | |||
Degrees of Freedom | 72.68 | |||
90% Confidence Interval (%) | [89.13, 159.2] | Fail | CI ⊆ [80, 125] | |
Variability | CVᵣ (%) | σ̂ᵣ | 221.5 | 1.3327 | ||
CVₜ (%) | σ̂ₜ | 288.9 | 1.495 | |||
Variability Ratio (%) | 112.2 | |||
ANOVA | Formulation (p-value) | 0.3184 | ||
Sequence (p-value) | 0.2292 | |||
Period (p-value) | 0 | |||
= dataset("bioequivalence/RTTR_TRRT/CL2009_9_4_1")
be_data pumas_be(be_data, endpoint = :AUC)
Observation Counts | ||||
Sequence ╲ Period | 1 | 2 | 3 | 4 |
RTTR | 5 | 5 | 5 | 5 |
TRRT | 5 | 5 | 4 | 4 |
Paradigm: Replicated crossover that supports reference scaling | ||||
Model: Mixed model (unequal variance) | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 65.41 | ||
Geometric Naive Mean | 65.1 | |||
T | Geometric Marginal Mean | 76.33 | ||
Geometric Naive Mean | 76.73 | |||
Geometric Mean T/R Ratio (%) | 116.7 | |||
Degrees of Freedom | 19.57 | |||
90% Confidence Interval (%) | [98.19, 138.8] | Fail | CI ⊆ [80, 125] | |
Variability | CVᵣ (%) | σ̂ᵣ | 39.62 | 0.3818 | ||
CVₜ (%) | σ̂ₜ | 32.0 | 0.3122 | |||
Variability Ratio (%) | 81.77 | |||
ANOVA | Formulation (p-value) | 0.1386 | ||
Sequence (p-value) | 0.2507 | |||
Period (p-value) | 0.02269 | |||
As you can see, in both cases the output now presents estimates of CVᵣ
and CVₜ
for the within subject variability of the reference and test product respectively. Such estimates are not attainable without replication. Also note that the model is a Mixed model
with (unequal variance)
. We discuss these issues in future units.
The FDA_HighlyVariable
and FDA_NarrowTherapeuticIndex
specifications
Future units are focused on the topics of highly variable drugs, narrow therapeutic index drugs, and reference scaling approaches. At this point let us just appreciate that we often (but not always) use such designs with reference scaling which is an approach for bioequivalence used for highly variable drugs and narrow therapeutic index drugs.
At this point appreciate that to carry out such analysis with pumas_be
, we specify the FDA_HighlyVariable
or FDA_NarrowTherapeuticIndex
as the second argument. Here is what the output looks like (click the tabs to switch between FDA_HighlyVariable
and FDA_NarrowTherapeuticIndex
).
= dataset("bioequivalence/RTRT_TRTR/SLTGSF2020_DS12")
be_data pumas_be(be_data, endpoint = :PK, FDA_HighlyVariable)
Observation Counts | ||||
Sequence ╲ Period | 1 | 2 | 3 | 4 |
RTRT | 38 | 38 | 36 | 37 |
TRTR | 39 | 38 | 34 | 38 |
Paradigm: Replicated crossover that supports reference scaling | ||||
Model: Mixed model (unequal variance) | ||||
Criteria: FDA RSABE for HV | ||||
Endpoint: PK | ||||
Formulations: Reference(R), Test(T) | ||||
Results(PK) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 0.7645 | ||
Geometric Naive Mean | 0.7588 | |||
T | Geometric Marginal Mean | 0.9105 | ||
Geometric Naive Mean | 0.883 | |||
Geometric Mean T/R Ratio (%) | 119.1 | Pass | GMR ∈ [80, 125] | |
Degrees of Freedom | 72.68 | |||
90% Confidence Interval (%) | [89.13, 159.2] | |||
Variability | CVᵣ (%) | σ̂ᵣ | 221.5 | 1.3327 | OK for RS | CVᵣ ≥ Minimal CVᵣ |
CVₜ (%) | σ̂ₜ | 288.9 | 1.495 | |||
Variability Ratio (%) | 112.2 | |||
ANOVA | Formulation (p-value) | 0.3184 | ||
Sequence (p-value) | 0.2292 | |||
Period (p-value) | 0 | |||
Reference Scaling Params | Reference Scaling Constant | 0.7967 | ||
Minimal CVᵣ for Reference Scaling (%) | 30.0 | 0.294 | |||
Reference Scaling Analysis | Geometric Mean T/R Ratio (%) | 131.4 | ||
Standard Error (Log Scale) | 0.1777 | |||
90% Confidence Interval (%) | [97.69, 176.7] | |||
Degrees of Freedom | 71 | |||
Howe's Approx RSABE Stat (95%) | -0.9468 | Pass | ≤ 0 | |
= dataset("bioequivalence/RTTR_TRRT/CL2009_9_4_1")
be_data pumas_be(be_data, endpoint = :AUC, FDA_NarrowTherapeuticIndex)
Observation Counts | ||||
Sequence ╲ Period | 1 | 2 | 3 | 4 |
RTTR | 5 | 5 | 5 | 5 |
TRRT | 5 | 5 | 4 | 4 |
Paradigm: Replicated crossover that supports reference scaling | ||||
Model: Mixed model (unequal variance) | ||||
Criteria: FDA RSABE for NTI | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 65.41 | ||
Geometric Naive Mean | 65.1 | |||
T | Geometric Marginal Mean | 76.33 | ||
Geometric Naive Mean | 76.73 | |||
Geometric Mean T/R Ratio (%) | 116.7 | |||
Degrees of Freedom | 19.57 | |||
90% Confidence Interval (%) | [98.19, 138.8] | Fail | CI ⊆ [80, 125] | |
Variability | CVᵣ (%) | σ̂ᵣ | 39.62 | 0.3818 | ||
CVₜ (%) | σ̂ₜ | 32.0 | 0.3122 | |||
Variability Ratio (%) | 81.77 | |||
ANOVA | Formulation (p-value) | 0.1386 | ||
Sequence (p-value) | 0.2507 | |||
Period (p-value) | 0.02269 | |||
Reference Scaling Params | Reference Scaling Constant | 1.11 | ||
Reference Scaling Analysis | Geometric Mean T/R Ratio (%) | 118.4 | ||
Standard Error (Log Scale) | 0.0649 | |||
90% Confidence Interval (%) | [104.8, 134] | |||
Degrees of Freedom | 7 | |||
Howe's Approx RSABE Stat (95%) | -0.03577 | Fail | ≤ 0 | |
Variability Ratio Quantile (95%) | 1.592 | Pass | ≤ 2.5 | |
We’ll analyze these approaches and the complete output arising from such specifications in future units.
Partial replicate design
Sometimes, our key motivation for a replicate design is estimating the within subject variability of the reference, and the within subject variability of the test product is not as critical.
For this we can use a partial replicate design. This is for example a partial replicate design:
Sequence | Period 1 | Period 2 | Period 3 |
---|---|---|---|
RRT | R |
R |
T |
RTR | R |
T |
R |
TRR | T |
R |
R |
Here is pumas_be
applied to data from such a design.
= dataset("bioequivalence/RRT_RTR_TRR/SLTGSF2020_DS04")
be_data pumas_be(be_data, endpoint = :PK)
Observation Counts | |||
Sequence ╲ Period | 1 | 2 | 3 |
RRT | 17 | 17 | 17 |
RTR | 17 | 17 | 17 |
TRR | 17 | 17 | 17 |
Paradigm: Replicated crossover that supports reference scaling | ||||
Model: Mixed model (unequal variance) | ||||
Criteria: Standard ABE | ||||
Endpoint: PK | ||||
Formulations: Reference(R), Test(T) | ||||
Results(PK) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 41.58 | ||
Geometric Naive Mean | 41.58 | |||
T | Geometric Marginal Mean | 57.06 | ||
Geometric Naive Mean | 57.06 | |||
Geometric Mean T/R Ratio (%) | 137.2 | |||
Degrees of Freedom | 48.95 | |||
90% Confidence Interval (%) | [118.7, 158.6] | Fail | CI ⊆ [80, 125] | |
Variability | CVᵣ (%) | σ̂ᵣ | 61.96 | 0.57 | ||
ANOVA | Formulation (p-value) | 0.0006 | ||
Sequence (p-value) | 0.1543 | |||
Period (p-value) | 0.8618 | |||
As you can see the estimated variability is CVᵣ
which stands for the within subject variability of the R
.
Again, we can use FDA_HighlyVariable
for reference scaling, details are in future units.
pumas_be(be_data, FDA_HighlyVariable, endpoint = :PK)
Observation Counts | |||
Sequence ╲ Period | 1 | 2 | 3 |
RRT | 17 | 17 | 17 |
RTR | 17 | 17 | 17 |
TRR | 17 | 17 | 17 |
Paradigm: Replicated crossover that supports reference scaling | ||||
Model: Mixed model (unequal variance) | ||||
Criteria: FDA RSABE for HV | ||||
Endpoint: PK | ||||
Formulations: Reference(R), Test(T) | ||||
Results(PK) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 41.58 | ||
Geometric Naive Mean | 41.58 | |||
T | Geometric Marginal Mean | 57.06 | ||
Geometric Naive Mean | 57.06 | |||
Geometric Mean T/R Ratio (%) | 137.2 | Fail | GMR ∈ [80, 125] | |
Degrees of Freedom | 48.95 | |||
90% Confidence Interval (%) | [118.7, 158.6] | |||
Variability | CVᵣ (%) | σ̂ᵣ | 61.96 | 0.57 | OK for RS | CVᵣ ≥ Minimal CVᵣ |
ANOVA | Formulation (p-value) | 0.0006 | ||
Sequence (p-value) | 0.1543 | |||
Period (p-value) | 0.8618 | |||
Reference Scaling Params | Reference Scaling Constant | 0.7967 | ||
Minimal CVᵣ for Reference Scaling (%) | 30.0 | 0.294 | |||
Reference Scaling Analysis | Geometric Mean T/R Ratio (%) | 137.2 | ||
Standard Error (Log Scale) | 0.0866 | |||
90% Confidence Interval (%) | [118.7, 158.7] | |||
Degrees of Freedom | 48 | |||
Howe's Approx RSABE Stat (95%) | -0.02774 | Pass | ≤ 0 | |
Note that we cannot use FDA_NarrowTherapeuticIndex
as this requires a fully replicate design. Also we can only use FDA_HighlyVariable
with certain partial replicate designs. For example look at supports_rsabe =
for these two designs:
= dataset("bioequivalence/RRT_RTR_TRR/SLTGSF2020_DS04")
be_data detect_design(be_data)
(design = "RRT|RTR|TRR",
num_sequences = 3,
num_periods = 3,
num_formulations = 2,
reference_formulation = 'R',
test_formulations = ['T'],
replicated = partially_replicated,
supports_rsabe = true,
crossover = true,)
= dataset("bioequivalence/RTR_TRR/SLTGSF2020_DS22")
be_data detect_design(be_data)
(design = "RTR|TRR",
num_sequences = 2,
num_periods = 3,
num_formulations = 2,
reference_formulation = 'R',
test_formulations = ['T'],
replicated = partially_replicated,
supports_rsabe = false,
crossover = true,)
Repeated (replicated) parallel design
Another type of replicated design is a parallel design of the form:
Sequence | Period 1 | Period 2 |
---|---|---|
RR | R |
R |
TT | T |
T |
= dataset(joinpath("bioequivalence", "RR_TT", "From_PJ2017_4_2"));
be_data = detect_design(be_data) design
(design = "RR|TT",
num_sequences = 2,
num_periods = 2,
num_formulations = 2,
reference_formulation = 'R',
test_formulations = ['T'],
replicated = fully_replicated,
supports_rsabe = false,
crossover = false,)
While this design does not have some of the benefits of cross over design it does allow us to estimate the within subject variability both for R
and for T
:
pumas_be(be_data, endpoint = :PK)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RR | 34 | 35 |
TT | 38 | 38 |
Paradigm: Parallel design | ||||
Model: T test (equal variance) | ||||
Criteria: Standard ABE | ||||
Endpoint: PK | ||||
Formulations: Reference(R), Test(T) | ||||
Results(PK) | Assessment | Criteria | ||
R | Geometric Naive Mean | 3.72 | ||
T | Geometric Naive Mean | 3.891 | ||
Geometric Mean T/R Ratio (%) | 104.6 | |||
Degrees of Freedom | 143 | |||
90% Confidence Interval (%) | [95.25, 114.9] | Pass | CI ⊆ [80, 125] | |
Variability | CVᵣ (%) | σ̂ᵣ | 13.29 | 0.1323 | ||
CVₜ (%) | σ̂ₜ | 8.14 | 0.0813 | |||
As you can see both CVᵣ
and CVₜ
are estimated. A negative is obviously that subject effects cannot be factored out since a subject is either administered R
or T
.
9 Multiple formulations (more than 2)
Some studies have more than two formulations. We encode these situations as 1RnT
, nR1T
, and nRnT
(these are not industry-standard encodings but are useful for the discussion here). Here are the types of cases:
- Multiple test formulations (
1RnT
): In this case there is a single reference and multiple tests (e.g.n=2
orn=3
). As an example, this can be a situation where we consider multiple dosage forms of the test formulation and want to show that they are all bioequivalent to the reference product. - Multiple reference formulations (
nR1T
): In this case we want to compare a single test product to two or more references. For example in one trial we want to compare the test product to both a European reference and a US reference (n=2
). - Combinations of test and reference (
nRnT
): In this case we want to have more than one test product and more than one reference product. For example there are two dosage levels (low and high) and a test and reference is tested on each dosage (n=2
, and thus there are 4 formulations in total).
Often, such scenarios are carried out in a single bioequivalence trial where the design accounts for the multiple formulations. For example, in the 1RnT
case the reference is administered as one formulation, and each of the test formulations are other formulations. In the nR1T
case the reference is administered as one formulation, and each of the test formulations are other formulations. Finally in the nRnT
case, typically with n=2
, there would be four formulations, two of which are reference formulations and the other two are matching reference formulations.
Three formulations design
In the case of three formulations, R
, S
, and T
one popular design is:
Sequence | Period 1 | Period 2 | Period 3 |
---|---|---|---|
RST | R |
S |
T |
RTS | R |
T |
S |
SRT | S |
R |
T |
STR | S |
T |
R |
TRS | T |
R |
S |
TSR | T |
S |
R |
Observe that there are \(6\) possible permutations to the three formulations, R
, S
, and T
, and this design considers each possible formulation.
- If we consider this design for the
1RnT
case then it is sensible to encodeR
as the reference formulation while the test formulations areS
andT
. - If we consider this design for the
nR1T
case then it is sensible to encodeT
is the test formulation while the reference formulations areR
andS
.
Four formulations design
In the case of four formulation, denoted A
, B
, C
, and D
, with A
the reference, a common design is the following:
Sequence | Period 1 | Period 2 | Period 3 | Period 4 |
---|---|---|---|---|
ABCD | A |
B |
C |
D |
BCDA | B |
C |
D |
A |
CDAB | C |
D |
A |
B |
DABC | D |
A |
B |
C |
This is a Williams design, also known as a balanced Latin square crossover design.
This design is ensures that:
- Each formulation appears exactly once in each period and each sequence. This balances period and sequence effects.
- Each formulation both precedes and follows every other formulation the same number of times across all sequences. This distributes potential first-order carryover effects evenly.
There can be multiple interpretations:
- If we consider this design for the
1RnT
case then it is sensible to encodeA
as the reference formulation while the test formulations areB
,C
, andD
. - If we consider this design for the
nR1T
case then it is sensible to encodeA
is the test formulation while the reference formulations areB
,C
, andD
. - If we consider this design for the
nRnT
case, then we can considerA
as a reference compared to the testB
, andC
as a reference compared to the testD
.
Multiple comparisons
In each case where there are multiple formulations there are also multiple comparisons. This means that we carry out multiple hypothesis tests. Each such hypothesis test considers a different pair of formulations and has its own null hypothesis \(H_0\) (non-bioequivalence) and alternative hypothesis \(H_1\) (bioequivalence holds).
As an example for n=2
:
1RnT
: We compare the single reference formulation (R
) to each of the test formulations (S
andT
). So we have:
\[ {\cal T}_{\texttt{comparison 1}} = \begin{cases} H_0: & \texttt{R vs T}~\text{not bioequivalent},\\ H_1: & \texttt{R vs T}~\text{bioequivalent}. \end{cases} \]
\[ {\cal T}_{\texttt{comparison 2}} = \begin{cases} H_0: & \texttt{R vs S}~\text{not bioequivalent},\\ H_1: & \texttt{R vs S}~\text{bioequivalent}. \end{cases} \]
nR1T
: We compare the single test formulation (T
) to multiple reference formulations (R
andS
). So we have:
\[ {\cal T}_{\texttt{comparison 1}} = \begin{cases} H_0: & \texttt{R vs T}~\text{not bioequivalent},\\ H_1: & \texttt{R vs T}~\text{bioequivalent}. \end{cases} \]
\[ {\cal T}_{\texttt{comparison 2}} = \begin{cases} H_0: & \texttt{S vs T}~\text{not bioequivalent},\\ H_1: & \texttt{S vs T}~\text{bioequivalent}. \end{cases} \]
nRnT
: We compare referenceA
to testB
and referenceC
to testD
. Hence for each reference formulation there is an \(H_0\) and an \(H_1\). So we have:
\[ {\cal T}_{\texttt{comparison 1}} = \begin{cases} H_0: & \texttt{A vs B}~\text{not bioequivalent},\\ H_1: & \texttt{A vs B}~\text{bioequivalent}. \end{cases} \]
\[ {\cal T}_{\texttt{comparison 2}} = \begin{cases} H_0: & \texttt{C vs D}~\text{not bioequivalent},\\ H_1: & \texttt{C vs D}~\text{bioequivalent}. \end{cases} \]
Alpha (\(\alpha\)) adjustment
Recalling that hypothesis tests control the Type-I error via \(\alpha\), if we carry out multiple comparisons, each with its own \(\alpha\), there is a possibility of increasing the probability of Type-I error.
As a hypothetical example assume we are in the 1RnT
case with \(n=10\).
This is hypothetical because we will rarely try more than three test formulations in a single trial. In this case, say bioequivalence does not hold for any of the test formulations. That is, \(H_0\) holds for all 10 trials. Still, let us see the probability of wrongfully rejecting at least one of the hypothesis and saying one of the formulations is bioequivalent:
- Under \(H_0\) there is a probability of \(\alpha\) to reject \(H_0\) wrongfully for a specific test formulation.
- Thus there is a probability of \(1-\alpha\) of not making a Type-I error of that test formulation.
- If we assume the tests are independent, then there is a probability of \((1-\alpha)^{10}\) of not making any Type-I error.
- Hence the probability of making at least one Type-I error is \(1-(1-\alpha)^{10}\):
Let us see this for the typical \(\alpha = 5\%\):
= 0.05
α = 1 - (1 - α)^10 consumer_risk
0.4012630607616213
Hence the consumer’s risk from that trial is inflated from \(5\%\) to about \(40\%\)!
Now in practice we won’t carry out n=10
comparisons but rather n=2
or n=3
. Still in that case:
= round(1 - (1 - α)^2, digits = 4)
consumer_risk_2_comparisons = round(1 - (1 - α)^3, digits = 4);
consumer_risk_3_comparisons @show consumer_risk_2_comparisons
@show consumer_risk_3_comparisons;
consumer_risk_2_comparisons = 0.0975
consumer_risk_3_comparisons = 0.1426
Hence we are still at riks of increasing the probability of some Type-I error to either about \(9.8\%\) or about \(14.2\%\).
A simple approach for mitigating this increase in the consumer’s risk is the Bonferroni correction where we adjust \(\alpha\) by dividing it by the number of trials.
See how this approach works for n=2
, n=3
, and (the hypothetical case of) n=10
:
= round(1 - (1 - α / 2)^2, digits = 6)
consumer_risk_2_comparisons_adjusted = round(1 - (1 - α / 3)^3, digits = 6)
consumer_risk_3_comparisons_adjusted = round(1 - (1 - α / 10)^10, digits = 6)
consumer_risk_10_comparisons_adjusted @show consumer_risk_2_comparisons_adjusted
@show consumer_risk_3_comparisons_adjusted
@show consumer_risk_10_comparisons_adjusted;
consumer_risk_2_comparisons_adjusted = 0.049375
consumer_risk_3_comparisons_adjusted = 0.049171
consumer_risk_10_comparisons_adjusted = 0.04889
So, you can see that by adjusting alpha we are able to keep the consumer risk roughly at \(5\%\).
Note however that such adjustments to \(\alpha\) reduce the power of each of the tests. Hence while we are able to keep the consumer riks roughly fixed, we end up increasing the producer’s riks. More of this are in future units where we discuss power analysis.
The ICH guidance specifically suggests when to adjust \(\alpha\) and when not to. For this consider Section 2.2.5 in International council for harmonisation of technical requirements for pharmaceuticals for human use (2024):
For the case of nR1T
(this is called “Multiple Comparator Products” in that guidance) it is recommended not to carry out an adjustment:
In studies with multiple comparator products, multiplicity correction, i.e., alpha adjustment, is not needed because comparator products are considered independent and region-specific. Decisions will be made independently about a test product relative to a single comparator product within a single jurisdiction.
However in the case 1RnT
(“multiple test products”) it is suggested:
The need to apply multiplicity correction in pivotal trials depends on the underlying objectives of the trial:
- If the objective is to achieve BE for all test formulations vs. the comparator product, no alpha adjustment is needed.
- If the objective is to show BE for any of the test formulations, multiplicity (alpha) adjustment may be needed.
In any case, as with any design choice and statistical methodology choice:
The objective of the trial and method for multiplicity correction should be pre-specified in the study protocol.
Models and data use
In each of the cases 1RnT
, nR1T
, and nRnT
we have a choice if to use a single statistical model based on all the collected data of the design, or to break up the data into separate models. As an example if we consider 1RnT
with n=2
. We can compare the reference formulation to each of the test formulations based on a single model, or we can separate the data and consider two separate models.
That is, we can have one model that generates all the comparisons (in linear models these are sometimes called contrasts), or we can create a different model for each comparison. The one model approach has the benefit of using all data together and thus potentially enables better estimates and higher statistical power. However this is also a drawback because comparisons between two formulations are also influenced by a third (or even fourth) formulation that is not relevant for that comparison.
While there are several approaches, our general suggested approach here is:
1RnT
: Use a single model comprised of all data. Pumas’pumas_be
function supports this.nR1T
: Use separate models, with each model separating out the data and using only data, converting it to a \(2 \times 2\) cross over design. Then carry out the analysis for each filtered dataset separately.nRnT
: Similarly, separate out the models, with each model converting the data to a \(2 \times 2\) design and then carry out the analysis for each filtered dataset separately.
Let us now see some examples.
A 1RnT
example with n=3
We’ll use the bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5
dataset from the PharmaDatasets
.
This data is from Example 4.5 of Patterson and Jones (2017). Note that in that book it is treated as a nR1T
case in contrast our example below. Let’s first load the dataset:
= dataset("bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5"); be_data
Now let’s apply pumas_be
which considers the study as 1RnT
by default. Here following the ICH guidance above, let’s assume we are in case (b) where the objective is “to show BE for any of the test formulations”. In this case let’s carry out \(\alpha\) adjustment. In pumas_be
this is done via the level
keyword argument (do not confuse with the alpha
and level_y
keyword arguments that have other uses).
With a Bonforoni correction, since we carry out two comparisons, we want to half \(\alpha\) from \(5\%\) to \(2.5\%\). So the hypothesis tests are to be carried out with \(\alpha = 0.025\).
However, the level
keyword is for the width of the confidence intervals. As you can recall from Unit 3, the idea of carrying out a TOST is based on a \(100\times(1-2\alpha)\%\) confidence interval. Hence the confidence interval width we want is \(95\%\) (in contrast to the default \(90\%\)). Thus we set level = 0.95
.
Let us see it both for AUC and Cmax (click on the tabs to switch):
pumas_be(be_data, level = 0.95)
Observation Counts | |||
Sequence ╲ Period | 1 | 2 | 3 |
RST | 8 | 8 | 8 |
RTS | 11 | 11 | 11 |
SRT | 11 | 11 | 11 |
STR | 10 | 10 | 10 |
TRS | 10 | 10 | 10 |
TSR | 10 | 10 | 10 |
Paradigm: 3 formulations | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(S and T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 3808 | ||
Geometric Naive Mean | 3824 | |||
S | Geometric Marginal Mean | 5346 | ||
Geometric Naive Mean | 5412 | |||
Geometric Mean S/R Ratio (%) | 140.4 | |||
Degrees of Freedom | 114 | |||
95% Confidence Interval (%) | [130.1, 151.5] | Fail | CI ⊆ [80, 125] | |
T | Geometric Marginal Mean | 4426 | ||
Geometric Naive Mean | 4426 | |||
Geometric Mean T/R Ratio (%) | 116.2 | |||
Degrees of Freedom | 114 | |||
95% Confidence Interval (%) | [107.7, 125.5] | Fail | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 21.22 | 0.2099 | ||
pumas_be(be_data, level = 0.95, endpoint = :Cmax)
Observation Counts | |||
Sequence ╲ Period | 1 | 2 | 3 |
RST | 9 | 9 | 8 |
RTS | 11 | 11 | 11 |
SRT | 11 | 11 | 11 |
STR | 10 | 10 | 10 |
TRS | 11 | 11 | 10 |
TSR | 10 | 10 | 10 |
Paradigm: 3 formulations | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: Cmax | ||||
Formulations: Reference(R), Test(S and T) | ||||
Results(Cmax) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 832.4 | ||
Geometric Naive Mean | 837.5 | |||
S | Geometric Marginal Mean | 1327 | ||
Geometric Naive Mean | 1340 | |||
Geometric Mean S/R Ratio (%) | 159.4 | |||
Degrees of Freedom | 118 | |||
95% Confidence Interval (%) | [143.6, 176.9] | Fail | CI ⊆ [80, 125] | |
T | Geometric Marginal Mean | 1082 | ||
Geometric Naive Mean | 1078 | |||
Geometric Mean T/R Ratio (%) | 129.9 | |||
Degrees of Freedom | 118 | |||
95% Confidence Interval (%) | [117.1, 144.2] | Fail | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 29.69 | 0.2907 | ||
You can see that Pumas detects the Paradigm
is 3 formulations
and then assigns the tests and reference accordingly. For example for AUC, the estimated GMR is \(140.4\%\) and the confidence interval (notice it is 95%
) is [130.1, 151.5]
which falls outside of the desired [80, 125]
range. We will revisit more details of the output and the model in future units.
An nR1T
example with n=2
Again using the bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5
dataset, let us now consider it as nR1T
(multiple references). Following the approach outlined above, let us transform the dataset into separate \(2 \times 2\) datasets, data_ST_2by2
and data_RT_2by2
:
= dataset("bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5")
be_data @rtransform! be_data :formulation = :sequence[:period];
= @rsubset be_data :formulation in ['S', 'T'];
data_ST = @rtransform data_ST :sequence = replace(:sequence, "R" => "");
data_ST_2by2 @rtransform! data_ST_2by2 :period = findfirst(==(:formulation), :sequence);
= @rsubset be_data :formulation in ['R', 'T'];
data_RT = @rtransform data_RT :sequence = replace(:sequence, "S" => "");
data_RT_2by2 @rtransform! data_RT_2by2 :period = findfirst(==(:formulation), :sequence);
We can inspect the design of the datasets we formed with this transformation:
detect_design(data_ST_2by2)
(design = "RT|TR",
num_sequences = 2,
num_periods = 2,
num_formulations = 2,
reference_formulation = 'S',
test_formulations = ['T'],
replicated = non_replicated,
supports_rsabe = false,
crossover = true,)
detect_design(data_RT_2by2)
(design = "RT|TR",
num_sequences = 2,
num_periods = 2,
num_formulations = 2,
reference_formulation = 'R',
test_formulations = ['T'],
replicated = non_replicated,
supports_rsabe = false,
crossover = true,)
Note that in both cases the design string is RT|TR
(and does not use S
). This is just due to the letters R
and T
being a standard way to specify the design inside Pumas.
Now we can carry out the comparisons (click on the tabs to switch between comparisons - note that in each case we present AUC followed by Cmax):
Note that at the moment, pumas_be
’s output labels S
as R
in both cases, but it should be treated as S
:
pumas_be(data_ST_2by2)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 29 | 29 |
TR | 31 | 31 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 5339 | ||
Geometric Naive Mean | 5412 | |||
T | Geometric Marginal Mean | 4459 | ||
Geometric Naive Mean | 4426 | |||
Geometric Mean T/R Ratio (%) | 83.52 | |||
Degrees of Freedom | 56 | |||
90% Confidence Interval (%) | [78.84, 88.48] | Fail | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 18.73 | 0.1857 | ||
ANOVA | Formulation (p-value) | 0 | ||
Sequence (p-value) | 0.4313 | |||
Period (p-value) | 0.174 | |||
pumas_be(data_ST_2by2, endpoint = :Cmax)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 30 | 29 |
TR | 32 | 31 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: Cmax | ||||
Formulations: Reference(R), Test(T) | ||||
Results(Cmax) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 1323 | ||
Geometric Naive Mean | 1340 | |||
T | Geometric Marginal Mean | 1091 | ||
Geometric Naive Mean | 1078 | |||
Geometric Mean T/R Ratio (%) | 82.52 | |||
Degrees of Freedom | 58 | |||
90% Confidence Interval (%) | [76.17, 89.41] | Fail | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 26.7 | 0.2624 | ||
ANOVA | Formulation (p-value) | 0.00018 | ||
Sequence (p-value) | 0.8896 | |||
Period (p-value) | 0.5804 | |||
pumas_be(data_RT_2by2)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 30 | 30 |
TR | 30 | 30 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 3815 | ||
Geometric Naive Mean | 3824 | |||
T | Geometric Marginal Mean | 4430 | ||
Geometric Naive Mean | 4426 | |||
Geometric Mean T/R Ratio (%) | 116.1 | |||
Degrees of Freedom | 57 | |||
90% Confidence Interval (%) | [108.7, 124] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 21.63 | 0.2139 | ||
ANOVA | Formulation (p-value) | 0.00036 | ||
Sequence (p-value) | 0.4096 | |||
Period (p-value) | 0.5645 | |||
pumas_be(data_RT_2by2, endpoint = :Cmax)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 31 | 30 |
TR | 31 | 31 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: Cmax | ||||
Formulations: Reference(R), Test(T) | ||||
Results(Cmax) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 837.5 | ||
Geometric Naive Mean | 837.5 | |||
T | Geometric Marginal Mean | 1083 | ||
Geometric Naive Mean | 1078 | |||
Geometric Mean T/R Ratio (%) | 129.4 | |||
Degrees of Freedom | 59 | |||
90% Confidence Interval (%) | [118.8, 140.8] | Fail | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 28.56 | 0.28 | ||
ANOVA | Formulation (p-value) | 0 | ||
Sequence (p-value) | 0.2598 | |||
Period (p-value) | 0.07524 | |||
A 1RnT
example with n=3
We’ll use the bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6
dataset from the PharmaDatasets
. This data is from Example 4.6 of Patterson and Jones (2017). Note that in that book it is treated as a nRnT
case in contrast to our analysis now which considers it as having multiple tests.
= dataset("bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6")
be_data pumas_be(be_data)
Observation Counts | ||||
Sequence ╲ Period | 1 | 2 | 3 | 4 |
ADBC | 7 | 7 | 7 | 7 |
BACD | 7 | 7 | 7 | 7 |
CBDA | 7 | 7 | 7 | 7 |
DCAB | 7 | 7 | 7 | 7 |
Paradigm: 4 formulations | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(A), Test(B, C and D) | ||||
Results(AUC) | Assessment | Criteria | ||
A | Geometric Marginal Mean | 367.9 | ||
Geometric Naive Mean | 367.9 | |||
B | Geometric Marginal Mean | 369.7 | ||
Geometric Naive Mean | 369.7 | |||
Geometric Mean B/A Ratio (%) | 100.5 | |||
Degrees of Freedom | 78 | |||
90% Confidence Interval (%) | [94.7, 106.6] | Pass | CI ⊆ [80, 125] | |
C | Geometric Marginal Mean | 2985 | ||
Geometric Naive Mean | 2985 | |||
Geometric Mean C/A Ratio (%) | 811.4 | |||
Degrees of Freedom | 78 | |||
90% Confidence Interval (%) | [764.8, 860.8] | Fail | CI ⊆ [80, 125] | |
D | Geometric Marginal Mean | 2879 | ||
Geometric Naive Mean | 2879 | |||
Geometric Mean D/A Ratio (%) | 782.6 | |||
Degrees of Freedom | 78 | |||
90% Confidence Interval (%) | [737.6, 830.2] | Fail | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 13.35 | 0.1329 | ||
Now you may observe that in this example, formulations C
and D
fail with estimates and confidence intervals for the GMR way out of the desired range. In this dataset this is because they are with different dosage amounts which are in fact 8 times as high. For this we transform the data by dividing the AUC
by 8 every time the formulation is C
or D
. This is done with the following code using @rtransform
from the DataFramesMeta
package.
=
transformed_data @rtransform(be_data, :AUC = :sequence[:period] in ['C', 'D'] ? :AUC / 8 : :AUC);
= pumas_be(transformed_data) be_result
Observation Counts | ||||
Sequence ╲ Period | 1 | 2 | 3 | 4 |
ADBC | 7 | 7 | 7 | 7 |
BACD | 7 | 7 | 7 | 7 |
CBDA | 7 | 7 | 7 | 7 |
DCAB | 7 | 7 | 7 | 7 |
Paradigm: 4 formulations | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(A), Test(B, C and D) | ||||
Results(AUC) | Assessment | Criteria | ||
A | Geometric Marginal Mean | 367.9 | ||
Geometric Naive Mean | 367.9 | |||
B | Geometric Marginal Mean | 369.7 | ||
Geometric Naive Mean | 369.7 | |||
Geometric Mean B/A Ratio (%) | 100.5 | |||
Degrees of Freedom | 78 | |||
90% Confidence Interval (%) | [94.7, 106.6] | Pass | CI ⊆ [80, 125] | |
C | Geometric Marginal Mean | 373.2 | ||
Geometric Naive Mean | 373.2 | |||
Geometric Mean C/A Ratio (%) | 101.4 | |||
Degrees of Freedom | 78 | |||
90% Confidence Interval (%) | [95.6, 107.6] | Pass | CI ⊆ [80, 125] | |
D | Geometric Marginal Mean | 359.9 | ||
Geometric Naive Mean | 359.9 | |||
Geometric Mean D/A Ratio (%) | 97.82 | |||
Degrees of Freedom | 78 | |||
90% Confidence Interval (%) | [92.2, 103.8] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 13.35 | 0.1329 | ||
As you can see, now in this case, all formulations pass. Such a transformation obviously only needs to be done when the experimental setup merits it.
A nR1T
example with n=3
Let us now consider the same data as nR1T
. For this let us create three different datasets that each compare the test D
to the references, A
, B
, and C
respectively.
We’ll continue working on transformed_data
from above. Let us create a data processing helper function make_2by2by2_from_4by4by4
that converts the \(4 \times 4 \times 4\) design into a \(2 \times 2 \times 2\) design with the specification formulations:
function make_2by2by2_from_4by4by4(original_df, formulation_1, formulation_2)
= deepcopy(original_df)
df = unique(vcat(unique.(df.sequence)...))
all_formulations = setdiff(all_formulations, [formulation_1, formulation_2])
formulations_to_drop @rtransform! df :formulation = :sequence[:period]
@rsubset! df :formulation in [formulation_1, formulation_2]
@rtransform! df :sequence =
replace(:sequence, formulations_to_drop[1] => "", formulations_to_drop[2] => "")
@rtransform! df :period = findfirst(==(:formulation), :sequence)
return df
end;
We now use this function for the three comparisons we consider. You can again switch between tabs in each case below. Here are the results only for AUC:
= make_2by2by2_from_4by4by4(transformed_data, 'A', 'D')
data_AD_2by2 pumas_be(data_AD_2by2)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 14 | 14 |
TR | 14 | 14 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 367.9 | ||
Geometric Naive Mean | 367.9 | |||
T | Geometric Marginal Mean | 359.9 | ||
Geometric Naive Mean | 359.9 | |||
Geometric Mean T/R Ratio (%) | 97.82 | |||
Degrees of Freedom | 26 | |||
90% Confidence Interval (%) | [92.41, 103.5] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 12.52 | 0.1247 | ||
ANOVA | Formulation (p-value) | 0.514 | ||
Sequence (p-value) | 2.0e-5 | |||
Period (p-value) | 0.00891 | |||
= make_2by2by2_from_4by4by4(transformed_data, 'B', 'D')
data_BD_2by2 pumas_be(data_BD_2by2)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 14 | 14 |
TR | 14 | 14 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 369.7 | ||
Geometric Naive Mean | 369.7 | |||
T | Geometric Marginal Mean | 359.9 | ||
Geometric Naive Mean | 359.9 | |||
Geometric Mean T/R Ratio (%) | 97.36 | |||
Degrees of Freedom | 26 | |||
90% Confidence Interval (%) | [90.17, 105.1] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 16.96 | 0.1684 | ||
ANOVA | Formulation (p-value) | 0.5576 | ||
Sequence (p-value) | 0.05751 | |||
Period (p-value) | 0.01014 | |||
= make_2by2by2_from_4by4by4(transformed_data, 'C', 'D')
data_CD_2by2 pumas_be(data_CD_2by2)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 14 | 14 |
TR | 14 | 14 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 373.2 | ||
Geometric Naive Mean | 373.2 | |||
T | Geometric Marginal Mean | 359.9 | ||
Geometric Naive Mean | 359.9 | |||
Geometric Mean T/R Ratio (%) | 96.45 | |||
Degrees of Freedom | 26 | |||
90% Confidence Interval (%) | [92.43, 100.6] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 9.37 | 0.0935 | ||
ANOVA | Formulation (p-value) | 0.1599 | ||
Sequence (p-value) | 1.0e-5 | |||
Period (p-value) | 0.0053 | |||
A nRnT
example with n=3
Similarly if we wanted to treat the dataset bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6
dataset as in Example 4.6 of Patterson and Jones (2017), we are in the nRnT
case where we compare A
and B
for one dosage level and then compare C
and D
for a different dosage level.
Now let us not use the transformed data but stay with the original data values. We again take the approach of converting the design to a \(2 \times 2 \times 2\) design using our make_2by2by2_from_4by4by4
function from above:
= dataset("bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6"); be_data
= make_2by2by2_from_4by4by4(transformed_data, 'A', 'B')
data_AB_2by2 detect_design(data_AB_2by2)
(design = "RT|TR",
num_sequences = 2,
num_periods = 2,
num_formulations = 2,
reference_formulation = 'A',
test_formulations = ['B'],
replicated = non_replicated,
supports_rsabe = false,
crossover = true,)
= make_2by2by2_from_4by4by4(transformed_data, 'C', 'D')
data_CD_2by2 detect_design(data_CD_2by2)
(design = "RT|TR",
num_sequences = 2,
num_periods = 2,
num_formulations = 2,
reference_formulation = 'C',
test_formulations = ['D'],
replicated = non_replicated,
supports_rsabe = false,
crossover = true,)
Here are the comparisons both of AUC and Cmax, in each case:
pumas_be(data_AB_2by2)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 14 | 14 |
TR | 14 | 14 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 367.9 | ||
Geometric Naive Mean | 367.9 | |||
T | Geometric Marginal Mean | 369.7 | ||
Geometric Naive Mean | 369.7 | |||
Geometric Mean T/R Ratio (%) | 100.5 | |||
Degrees of Freedom | 26 | |||
90% Confidence Interval (%) | [92.84, 108.7] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 17.45 | 0.1732 | ||
ANOVA | Formulation (p-value) | 0.9201 | ||
Sequence (p-value) | 0.3733 | |||
Period (p-value) | 0.1069 | |||
pumas_be(data_AB_2by2, endpoint = :Cmax)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 14 | 14 |
TR | 14 | 14 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: Cmax | ||||
Formulations: Reference(R), Test(T) | ||||
Results(Cmax) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 73.17 | ||
Geometric Naive Mean | 73.17 | |||
T | Geometric Marginal Mean | 70.62 | ||
Geometric Naive Mean | 70.62 | |||
Geometric Mean T/R Ratio (%) | 96.51 | |||
Degrees of Freedom | 26 | |||
90% Confidence Interval (%) | [87.15, 106.9] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 22.68 | 0.224 | ||
ANOVA | Formulation (p-value) | 0.5586 | ||
Sequence (p-value) | 0.9181 | |||
Period (p-value) | 0.1429 | |||
pumas_be(data_CD_2by2)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 14 | 14 |
TR | 14 | 14 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 373.2 | ||
Geometric Naive Mean | 373.2 | |||
T | Geometric Marginal Mean | 359.9 | ||
Geometric Naive Mean | 359.9 | |||
Geometric Mean T/R Ratio (%) | 96.45 | |||
Degrees of Freedom | 26 | |||
90% Confidence Interval (%) | [92.43, 100.6] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 9.37 | 0.0935 | ||
ANOVA | Formulation (p-value) | 0.1599 | ||
Sequence (p-value) | 1.0e-5 | |||
Period (p-value) | 0.0053 | |||
pumas_be(data_CD_2by2, endpoint = :Cmax)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 14 | 14 |
TR | 14 | 14 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: Cmax | ||||
Formulations: Reference(R), Test(T) | ||||
Results(Cmax) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 636 | ||
Geometric Naive Mean | 636 | |||
T | Geometric Marginal Mean | 617.1 | ||
Geometric Naive Mean | 617.1 | |||
Geometric Mean T/R Ratio (%) | 97.04 | |||
Degrees of Freedom | 26 | |||
90% Confidence Interval (%) | [91.11, 103.4] | Pass | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 13.9 | 0.1383 | ||
ANOVA | Formulation (p-value) | 0.4232 | ||
Sequence (p-value) | 0.00134 | |||
Period (p-value) | 0.669 | |||
10 Conclusion
This unit provided an overview of the most common experimental designs in bioequivalence testing, moving beyond the simple parallel design to more statistically powerful multi-period approaches.
The main focus was the \(2 \times 2\) crossover design, which is the standard for most bioequivalence studies. Its strength lies in controlling for variability between subjects, which generally provides more precise results with a smaller number of participants.
We also covered replicate designs, which are necessary for highly variable drugs (HVDs) and narrow therapeutic index drugs (NTIDs). By administering a drug more than once to the same subject, these designs allow us to measure the within-subject variability, a key requirement for certain regulatory pathways.
Finally, we looked at how to handle studies with multiple test or reference formulations. This raised the important statistical consideration of multiple comparisons and when it might be necessary to adjust the significance level (\(\alpha\)) to maintain the integrity of the results.
Understanding these different designs is crucial because the design choice determines the correct statistical model for the analysis. In the next unit, we will explore the linear and mixed-effects models that Pumas uses to analyze data from these various experimental setups.
11 Unit exercises
Parallel vs. Crossover Design
Consider a bioequivalence trial comparing formulations \(R\) (reference) and \(T\) (test).
- Describe a simple parallel design and a \(2 \times 2\) crossover design in terms of subject allocation and treatment sequence.
- Discuss one statistical advantage and one logistical disadvantage of using a crossover design instead of a parallel design for bioequivalence testing.
Exploring Randomization: In the section on randomization, we assigned 20 subjects to 4 groups.
- Modify the code to randomize 30 subjects into 3 groups: “R”, “T low dose”, and “T high dose”.
- Run the code twice, once with
Random.seed!(0)
and once withRandom.seed!(1)
. What do you observe about the group assignments? Why is this the case?
Understanding Variability with the Toy Model: The toy model in the “Within subject variability and between subject variability” section helps illustrate BSV and WSV.
- Manipulating BSV: The array
subject_average_concentration_loss_per_time
controls the differences between subjects. Make the values in this array more similar to each other (e.g.,[0.84, 0.85, 0.86, 0.87]
). Rerun the simulation and the mixed model analysis (VarCorr(lmm(...))
). How does the estimated standard deviation forid
(which corresponds to BSV) change? - Manipulating WSV: The random component
rand(Uniform(0.85, 1.15))
controls the variability within a single subject’s repeated measurements. Change this to be narrower, for example,rand(Uniform(0.95, 1.05))
. Rerun the simulation and the mixed model analysis. How does the estimated standard deviation for theResidual
(which corresponds to WSV) change? - Based on your experiments, if you were designing a study for a drug known to have very high BSV but low WSV, would you prefer a parallel or a crossover design? Why?
- Manipulating BSV: The array
The Impact of Alpha Correction: In the
1RnT
example with three formulations, we applied a Bonferroni correction by settinglevel = 0.95
.- Rerun the analysis for AUC from that section without the correction, using the default
level = 0.90
. - Compare the
90%
confidence intervals for theT/R
andS/R
comparisons with the95%
confidence intervals from the original example. Are they wider or narrower? - Could failing to apply an alpha correction in a multiple-comparison scenario lead to a “Pass” conclusion when a corrected analysis would “Fail”? Explain your reasoning based on the confidence intervals.
- Rerun the analysis for AUC from that section without the correction, using the default
Analyzing Different Comparisons: In the section on the four-formulation Williams design (
ADBC_BACD_CBDA_DCAB
), we compared testD
to referencesA
,B
, andC
.- Using the
make_2by2by2_from_4by4by4
helper function and thetransformed_data
, write the code to perform a bioequivalence analysis comparing formulationA
(as reference) to formulationC
(as test). - Run
pumas_be
on the resulting dataset. What is the GMR and 90% confidence interval? Do formulationsA
andC
pass the bioequivalence criteria?
- Using the
Investigating a Parallel Design: The list of available designs included
"R_T"
, which corresponds to a parallel design.- Find a dataset with this design using the
datasets()
function fromPharmaDatasets
. - Load the data and run
detect_design
. Confirm that it is a parallel design with 1 period and 2 sequences. - Run
pumas_be
on this dataset. Look at the “Variability” section of the output. What CV is reported? Is it the overallCV
,CVᵣ
, orCVₜ
? Why can’t a parallel design estimate within-subject variability?
- Find a dataset with this design using the