A Course on Bioequivalence: Unit 4 - Different experimental designs

Authors

Yoni Nazarathy

Andreas Noack

This is Unit 4 of a complete bioequivalence analysis course using Pumas. There are 15 units in the course in total.

1 Unit overview

In the previous unit we considered statistical inference concepts and focused on the hypothesis test formulation for bioequivalence. In this unit we explore various experimental designs used in a bioequivalence trials. In particular we consider:

  • The concept of a a multi-period trial and an experimental design.
  • Randomization, blindness, and open-label.
  • The very popular \(2\times 2\) (sometimes called the \(2\times 2\times 2\)) crossover design.
  • An exploration of product specific guidances.
  • Other popular experimental designs including partial replicate, fully replicate, and multiple treatment designs.
  • Considerations for multiple reference (comparator) products and multiple test products.
  • Multiple comparisons and adjustment of \(\alpha\).
  • General statistical considerations for choosing an experimental design.

We use the following packages.

using Pumas # available with Pumas products - also reexports Distributions
using Bioequivalence  # available with Pumas products
using PharmaDatasets # available with Pumas products
using CairoMakie
using AlgebraOfGraphics
using DataFrames
using DataFramesMeta
using SummaryTables
using StatsBase
using Random
using MixedModels

We have used these packages in the previous units. One new package we use in this unit is MixedModels used for random effects models.

2 The concept of an experimental design

Our focus up to now was on the case where we break the subjects into cohorts of R and T, and then administer the respective dosages to each of the cohorts. This is the very basic parallel design.

However, there are other experimental designs that are more efficient, powerful, and enable detecting and estimating additional quantities.

In particular, the parallel design has only a single period in the sense that at the onset of the trial participants are administered a single formulation and once that treatment is complete, the trial is over. What if we broke up the participants’ trial journey into multiple periods, and in each successive period we would administer a different formulation to the subject?

As a first basic example of more than one period, assume we have this AUC endpoint dataset arising from an hypothetical bioequivalence trial:

be_data_paired = DataFrame(
    id = vcat(fill.(1:5, 2)...),
    sequence = fill("TR", 10),
    period = repeat(1:2, 5),
    AUC = [21.5, 24.6, 22.5, 28.4, 23.8, 22.1, 19.6, 21.4, 19.9, 21.5],
)
simple_table(be_data_paired)
id sequence period AUC
1 TR 1 21.5
1 TR 2 24.6
2 TR 1 22.5
2 TR 2 28.4
3 TR 1 23.8
3 TR 2 22.1
4 TR 1 19.6
4 TR 2 21.4
5 TR 1 19.9
5 TR 2 21.5

As you can see there are 5 subjects in this trial with ids 1,…,5. In contrast to a parallel trial, each subject appears in two rows, once for period 1 and once for period 2. This is sometimes called a paired design because each observation of T on a subject and the corresponding observation of R on the subject form a pair.

There is also a sequence column. In this case there is only a single sequence and it is TR for all subjects. This implies that each subject is first administered the test treatment T in the first period and is then administered the reference treatment R in the second period.

Let us compare and contrast the design we see with be_data_paired with the simpler parallel design that we considered up to now. For clarity let us create be_data_parallel which has the same number of AUC endpoint measurements.

be_data_parallel = DataFrame(
    id = 1:10,
    group = vcat(fill("T", 5), fill("R", 5)),
    period = 1,
    AUC = [21.1, 21.6, 22.8, 19.6, 23.2, 18.8, 19.9, 21.7, 22.3, 23.2],
)
simple_table(be_data_parallel)
id group period AUC
1 T 1 21.1
2 T 1 21.6
3 T 1 22.8
4 T 1 19.6
5 T 1 23.2
6 R 1 18.8
7 R 1 19.9
8 R 1 21.7
9 R 1 22.3
10 R 1 23.2

Here are some point contrasting the two designs, focusing on the datasets that represent these designs:

  1. The number of subjects in be_data_paired is half of that of be_data_parallel. This is obvious since in the two-period design yielding be_data_paired we “reuse” a subject and administer R after we administer T.
  2. The duration of the trial for be_data_paired is typically at least twice as long as the trial for be_data_parallel. This is because each subject has to first receive formulation T. Then after the first period we need a washout period giving the body time to clear of the active ingredient. Then in the second period the subject receives formulation R. Note that the duration of the washout period can sometimes be extensive and is determined by the half life property of the product in question.
  3. In case administration of the formulation affects a subject in the long term (or beyond a washout period), be_data_paired will have the second period measurements affected by the first period.
  4. With be_data_parallel, variability between T and R cannot be isolated from variability between subjects. In a parallel design some of the variation we encounter is due to natural differences between subjects. In contrast with be_data_paired since each subject is administered both T and R, by considering the difference between formulations on each subject, the measurements are potentially less influenced by the variability between subjects. Such pairing is a simple form of blocking which is a key concept in experimental design. It is an approach to remove undesirable variability from the measurements.

Point (1) is often an obvious difference and is a plus of multi-period designs. Having fewer subjects is more ethical and reduces costs. In contrast, point (2) is a plus for the one-period parallel design. It is often a key consideration if the needed washout period is long. Point (3) is a case that occurs if the washout period is “essentially infinite”. This would be a key consideration of trials that consider outcomes, including curing of a condition. It is less of a consideration in bioequivalence trials. Finally, the most important issue is point (4). From a statistical point of view, isolating subject effects is often critical. This point implies that the power of multi-period designs is often superior to the equivalent power of single period (parallel) designs.

In summary there are several reasons to use more than one period (points (1) and (4) above), as well as some potential complications (points (2) and (3) above). We now consider the most popular kind of multi-period design, the \(2 \times 2\) cross over design.

3 Overarching principles: randomization and blindness/open-label

A crucial principle that underlies all sound experimental designs is randomization.

Randomization is the process of assigning subjects to treatment groups (e.g., T or R in a parallel design). It requires using a formal, probabilistic mechanism, such as a coin flip or, more commonly, a computer-generated random sequence. Without randomization, we risk introducing bias. For instance, an investigator might subconsciously assign healthier-looking subjects to one group, or the first subjects to enroll in a trial might be systematically different from those who enroll later. Randomization breaks any systematic connection between the characteristics of the subjects and the treatment they receive, ensuring that, on average, the groups are comparable in all respects, both known and unknown.

To see an example of randomization, say that we have \(20\) subjects that we want to partition into \(4\) groups. For example both the reference and test formulations with fasting and without. We can use the sample function from StatsBase with the replace=false option. Here is example code:

num_subjects = 20
group_names = ["R fasting", "T fasting", "R fed", "T fed"]
num_groups = length(group_names)
subjects_per_group = num_subjects ÷ num_groups # ÷ \div + [TAB], integer division
subjects = 1:20

Random.seed!(0)
shuffled_subjects = sample(subjects, length(subjects); replace = false)
groups = Iterators.partition(shuffled_subjects, subjects_per_group)

for (i, group) in enumerate(groups)
    println(group_names[i], ": ", sort(group))
end
R fasting: [3, 5, 9, 15, 18]
T fasting: [7, 8, 10, 12, 17]
R fed: [4, 6, 11, 16, 20]
T fed: [1, 2, 13, 14, 19]

Note that the Random.seed!(0) statement ensures the same randomization occurs every time we execute this code. The “seed” 0 can be replaced with any other number and that would imply a different randomization.

A related, but distinct, concept is blinding.

Blinding refers to the practice of concealing the treatment assignment from one or more parties involved in the trial. In a single-blind trial, the subjects are unaware of which treatment they are receiving. In a double-blind trial, considered the gold standard in many clinical trials, neither the subjects nor the investigators and their staff know the treatment assignments. This prevents knowledge of the treatment from influencing subjects’ reporting of effects or investigators’ assessment of outcomes. When all parties are aware of the treatment assignments, the trial is called open-label.

While double-blinding is critical for trials with subjective endpoints (like pain relief or mood), bioequivalence studies are typically conducted in an open-label fashion.

This is because the primary endpoints, such as AUC and Cmax, are objective pharmacokinetic measurements derived from laboratory analysis. These values are not susceptible to patient or investigator subjectivity.

However, even in an open-label setting, proper randomization of subjects to their assigned treatments or sequences is non-negotiable, as it remains the fundamental safeguard against bias and ensures the statistical validity of the trial’s conclusions.

4 The \(2 \times 2\) crossover design

The design that yields the be_data_paired data above has the disadvantage that all subjects were first administered T and only later they were administered R. This can cause a bias in the results in case the T treatment affects the R treatment that follows, or in case conditions during the first period are significantly different in comparison to the second period.

To alleviate such potential biases, the \(2 \times 2\) crossover design (sometimes also called \(2 \times 2 \times 2\)) makes use of the notion of a sequence. There are \(2\) sequences, RT and TR. The first sequence implies that subjects receive R in the first period and T in the second period. Similarly, the second sequence is the opposite as subjects receive T in the first period and R in the second. Note that the phrase \(2 \times 2\) stands for “2 sequences” and “2 periods”. Further, when called \(2 \times 2 \times 2\), the additional “2” stands for the number of formulations.

Here is a tabular representation of this design:

Sequence Period 1 Period 2
RT R T
TR T R

As with any trial, the study needs to be randomized. Recruited subjects are allocated either to the RT sequence or the TR sequence. Now, the same benefit as in the paired design creating be_data_paired are achieved. Further, it is possible to test and rule out period effects since in each period we have subjects from both treatments.

This design is undoubtedly the most common and most recommended design to use in a bioequivalence study. Parallel designs are typically only recommended in case when there is a long washout period. Other more complex designs described below are typically for the specific cases that we describe.

Here is example endpoint data from a \(2 \times 2\) crossover study. We display only the first \(8\) rows.

be_data = dataset(joinpath("bioequivalence", "RT_TR", "PJ2017_3_1.csv"))
simple_table(first(be_data, 8))
id sequence period AUC Cmax
1 RT 1 2849 499
1 RT 2 2230 436
2 TR 1 2025 438
2 TR 2 2000 361
3 TR 1 2090 535
3 TR 2 1826 558
4 RT 1 2790 733
4 RT 2 2864 416

We can use the detect_design function from the Bioequivalence package. Pumas uses also this function internally to automatically determine the type of design and study paradigm from the data.

detect_design(be_data)
(design = "RT|TR",
 num_sequences = 2,
 num_periods = 2,
 num_formulations = 2,
 reference_formulation = 'R',
 test_formulations = ['T'],
 replicated = non_replicated,
 supports_rsabe = false,
 crossover = true,)

You can see that the design string is RT|TR capturing the two sequences. You can further see that num_sequences = 2, num_periods = 2, and num_formulations = 2, all in agreement with the \(2 \times 2 \times 2\) naming convention. Other information returned by detect_design also describes this design. For example you can see that Pumas identifies that the reference formulation is R, and so fourth.

Now carrying out bioequivalence analysis via pumas_be, and considering AUC by default, we obtain:

pumas_be(be_data)
Observation Counts
Sequence ╲ Period 1 2
RT 17 17
TR 15 15
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 1879
Geometric Naive Mean 1887
T Geometric Marginal Mean 1848
Geometric Naive Mean 1845
Geometric Mean T/R Ratio (%) 98.36
Degrees of Freedom 30
90% Confidence Interval (%) [94.07, 102.8] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 10.52 | 0.1049
ANOVA Formulation (p-value) 0.5334
Sequence (p-value) 0.1818
Period (p-value) 0.00235

We’ll analyze elements of this output in further detail in the sequence but let us highlight a few points now.

  • Observe the Observation Counts table at the start of the output. As is evident there were 17 subjects in the RT sequence, and 15 subject it the TR sequence. This imbalance is often not planned but can rather occur due to dropouts.
  • Observe that the Paradigm is Non replicated crossover design. We discuss replication (non replicated, partially replicated, and fully replicated designs) below.
  • Pumas chooses a Linear Model for such a design. We discuss this in the next unit.
  • Observe that we obtain estimates of the Marginal Mean both for R and for T. These are model based estimates, and they may differ from the data (Naive) estimates in case of unbalanced data (as we have here).
  • The Geometric Mean T/R Ratio is 98.36 as a percentage. This is obtained from the marginal means.
round(100 * 1848 / 1879, digits = 2) # value slightly differs from output due to rounding
98.35
  • A model based confidence interval of [94.07, 102.8] is obtained and since it is well contained within the 80–125 margin, we Pass.
  • Note that the model uses 30 degrees of freedom. This quantity is related to the variability used to generate the confidence interval. It is a property of the linear model used (see next unit) and the size of the data.
  • The variability estimate is at a CV of 10.52 as a percentage. More on this estimation is in the next unit. The associated standard deviation estimate is 0.1049.
  • Since a statistical model is used, we are able to investigate if there are any unwanted effects in the model. In particular, Type-III p-values are computed for Formulation, Sequence, and Period. When a p-value is non-small we consider the effect as insignificant and when it is small we consider the effect as significant. In this case, since we see a Period p-value of 0.00235, there may be reason to believe that an (unwanted) period effect is taking place.

Note: p-values are a deceptive quantity. As a rule of thumb, if a p-value is between \(0\%\) and an order of \(5\%\) (or sometimes \(10\%\)) then it is quite a clear indication that the underlying factor has an effect. In particular this holds if p-values are very small. For example above, we see that the period almost certainly has an effect because the p-value of 0.00235 is very small. However, if p-values are greater than \(10\%\) (or sometimes \(5\%\)) then their value is not really important. So a p-value of 0.1818 for Sequence above should be treated exactly the same as the p-value of 0.5334 for Formulation above. In particular, if there is no effect, we can expect the p-value to be a uniform quantity anywhere between \(0\%\) and \(100\%\). Thus we should never speak about “high p-values”. We can rather speak about “very low p-values” (as in the case of 0.00235) or “marginally low p-values” (as would have been the case with a p-value of around \(5\%\) to \(10\%\)).

5 Exploring example product specific guidances

Now that we know a bit about parallel and cross over designs, let us carry out a short exploration of FDA’s website for Product-Specific Guidances (PSG) for Generic Drug Development: FDA (2025). As of July 2025 there are 2,288 PSGs listed. To briefly explore a small subset of these PSGs, we look at a few arbitrary products.

For this, since the FDA lists the PSG’s based on the active ingredient alphabetically, we consider the first PSG entry for the first 8 letters A–H. You can click the URL to see the PDF of each PSG to see various guidelines. Below we summarize key points:

  • Abacavir sulfate: 2 by 2 cross over design on healthy males and non-pregnant, non-lactating females.
  • Bacitracin:This is an ointment. A comparative study is recommended.
  • Cabotegravir: parallel design on male and non-pregnant and non-lactating females.
  • Dabigatran etexilate mesylate: Both a fasting and a fed study is recommended with a 2-treatment, 2-sequence, 4-period, fully replicated crossover design on normal healthy males and females.
  • Econazole nitrate: This is a cream. The design is a clinical endpoint randomized, double blind, parallel, placebo-controlled.
  • Famciclovir: 2 by 2 cross over design on healthy males and non-pregnant, non-lactating females.
  • Gabapentin: 2 by 2 cross over design on healthy males and non-pregnant, non-lactating females.
  • Halcinonide: This is a cream. The FDA recommends two studies, a pilot study and pivotal study.

This gives a taste of the fact that \(2 \times 2\) cross over studies are common. It also motivates looking at fully replicated crossover studies as described below. As is typical from creams and ointments, comparative studies are recommended. In contrast to the bioequivalence studies we cover here, based on endpoints, comparative studies typically consider the effects of the product. They are mostly out of the scope of this course.

Note that we present these here only for pedagogic purposes, if conducting a trial for such products you should consult directly with the FDA or the appropriate regulatory agency.

6 Other Designs

So far we have seen the parallel design (single period) and the \(2 \times 2\) crossover design. These are the two most common alternatives. Now that we are aware of the way designs are described based on sequences, periods, and formulations we may consider other designs as well. In particular we may consider designs with more than two sequences, more than two periods, and even more than two formulations.

Let us take an approach of considering some of the common designs by considering the example datasets supplied via the PharmaDatasets package. In general, with that package, the datasets function returns a list of all of the datasets. Those datasets associated with bioequivalence endpoints are set with the prefix "bioequivalence", then we can do some string processing to see the design type which follows "bioequivalence". This is the list:

unique([split(ds, "/")[2] for ds in filter(contains("bioequivalence"), datasets())])
15-element Vector{SubString{String}}:
 "R_T"
 "RR_RT_TR_TT"
 "RTRT_TRTR"
 "RRT_RTR_TRR"
 "RT_TR"
 "RTTR_TRRT"
 "RTT_TRR"
 "RTR_TRT"
 "RTR_TRR"
 "RR_TT"
 "RST_RTS_SRT_STR_TRS_TSR"
 "RRTT_TTRR"
 "RTRT_RTTR_TRRT_TRTR"
 "RRTT_RTTR_TRRT_TTRR"
 "ADBC_BACD_CBDA_DCAB"

In this case we use strings such as "R_T", "RR_RT_TR_TT", etc.. to specify a design. Let us understand how to read this encoding. For example, the datasets with "RT_TR" are for the \(2 \times 2\) crossover designs the we covered above. In all this specification we typically use R for the reference and T for the test as usual.

As you can see, certain designs have 3 or 4 formulations. This is evident with "RST_RTS_SRT_STR_TRS_TSR", and "ADBC_BACD_CBDA_DCAB". When it comes to such designs with multiple formulations, one approach is to consider the design as a specification of comparing a single reference to multiple tests. With this approach, if we consider "RST_RTS_SRT_STR_TRS_TSR", the first case R is still the reference and S and T are two alternative test formulations.

In the second case, "ADBC_BACD_CBDA_DCAB", formulation A is considered as the reference and B, C, and D are considered as test cases. In any case (specific to the bioequivlance package), we consider the lowest letter in the alphabet as the reference formulation. We discuss more issues associated with multiple formulations below.

As an illustration let us take an arbitrary dataset from each group of design available in PharmaDataset. We then print the design information for the dataset and specify the design as

\[ \text{Num Sequences} \times \text{Num Periods} \times \text{Num Formulations} \]

design_types =
    unique([split(ds, "/")[2] for ds in filter(contains("bioequivalence"), datasets())])
for design_type in design_types
    dataset_name = first(filter(contains("bioequivalence/$design_type"), datasets()))
    be_data = dataset(dataset_name)
    dd = detect_design(be_data)

    println("Dataset name: $dataset_name")
    println(
        "\tDesign: $(dd.design),  $(dd.num_sequences)×$(dd.num_periods)×$(dd.num_formulations)",
    )
end
Dataset name: bioequivalence/R_T/FSL2015_7
    Design: R|T,  2×1×2
Dataset name: bioequivalence/RR_RT_TR_TT/CL2009_9_2_1
    Design: RR|RT|TR|TT,  4×2×2
Dataset name: bioequivalence/RTRT_TRTR/PJ2017_6
    Design: RTRT|TRTR,  2×4×2
Dataset name: bioequivalence/RRT_RTR_TRR/SLTGSF2020_DS07
    Design: RRT|RTR|TRR,  3×3×2
Dataset name: bioequivalence/RT_TR/PJ2017_3_12
    Design: RT|TR,  2×2×2
Dataset name: bioequivalence/RTTR_TRRT/CL2009_9_4_1
    Design: RTTR|TRRT,  2×4×2
Dataset name: bioequivalence/RTT_TRR/SLTGSF2020_DS10
    Design: RTT|TRR,  2×3×2
Dataset name: bioequivalence/RTR_TRT/SLTGSF2020_DS03
    Design: RTR|TRT,  2×3×2
Dataset name: bioequivalence/RTR_TRR/SLTGSF2020_DS22
    Design: RTR|TRR,  2×3×2
Dataset name: bioequivalence/RR_TT/From_PJ2017_4_2
    Design: RR|TT,  2×2×2
Dataset name: bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5
    Design: RST|RTS|SRT|STR|TRS|TSR,  6×3×3
Dataset name: bioequivalence/RRTT_TTRR/SLTGSF2020_DS28
    Design: RRTT|TTRR,  2×4×2
Dataset name: bioequivalence/RTRT_RTTR_TRRT_TRTR/SLTGSF2020_DS23
    Design: RTRT|RTTR|TRRT|TRTR,  4×4×2
Dataset name: bioequivalence/RRTT_RTTR_TRRT_TTRR/SLTGSF2020_DS24
    Design: RRTT|RTTR|TRRT|TTRR,  4×4×2
Dataset name: bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6
    Design: ADBC|BACD|CBDA|DCAB,  4×4×4

Let us now consider a few of these designs and their uses.

7 Within subject variability and between subject variability

When we measure a drug’s effect, the results are never identical. This variation, or variability, comes from two main sources: differences between subjects, and differences within the subject. Understanding these sources is crucial for designing an efficient study. The variability of products, typically quantified via the CV, is a key property that requires estimation, affects the nature of the product, and requires consideration when planning a study.

  • Between-subject variability (BSV), or inter-subject variability, describes how different people respond to the same drug. It arises from stable differences between individuals, such as genetics, body weight, metabolism, and organ function. This is why one person might consistently have higher AUC values than another, regardless of the formulation they receive.

  • Within-subject variability (WSV), or intra-subject variability, describes the random, unpredictable fluctuations in a single person’s response. If the same person takes the exact same drug on two different occasions, their AUC values will still differ slightly. This is due to day-to-day physiological changes, minor differences in diet or activity, and measurement error. This variability is an inherent property of the drug and its interaction with a person’s body over time.

A toy example

To get a feel for these two forms of variability let us a create a toy model of concentration over time. This is not a well established pharmacokinetic model but is rather an illustrative conceptual example.

Assume we have 4 subjects, which always start with a (normalized) concentration of 1.0 at onset. Then assume that at every time instance the concentration decreases when multiplied by a subject specific subject_average_concentration_loss_per_time parameter. These are the parameters:

subject_average_concentration_loss_per_time = [0.78, 0.87, 0.88, 0.91]
num_subjects = length(subject_average_concentration_loss_per_time);

As you can see this constant varies across subjects and is as low as 0.78 for the first subject and as low as 0.91 for the fourth subject. This can be viewed as some inherent between subject variation.

Let us also assume that at every time, the concentration for a subject is updated by this inherent factor of the subject multiplied by a slight variation which is a uniform random quantity between 0.85 and 1.15. These random variations then work together to inherently create the within subject variability.

Hence the simulate_subject! function which we now creates, takes a DataFrame (df) and adds to it the trajectory of such a subject. This is done until the subject leaves the trial when the concentration drops below 5%.

function simulate_subject!(df, id, subj_per_counter)
    concentration = 1.0
    time = 0
    while concentration > 0.05
        push!(df, (id, subj_per_counter, time, concentration))
        concentration *=
            subject_average_concentration_loss_per_time[id] * rand(Uniform(0.85, 1.15))
        time += 1
    end
end;

We can now simulate this (toy) model for all four subjects, where we also assume that each subject is administered the formulation over \(3\) periods:

df = DataFrame(id = Int[], subj_per = Int[], time = Int[], concentration = Float64[])
subj_per_counter = 1
for id = 1:num_subjects
    for _ = 1:3
        simulate_subject!(df, id, subj_per_counter)
        subj_per_counter += 1
    end
end

Note that in this toy example there is only a single formulation, say R. Thus in a sense with the example the design is:

Sequence Period 1 Period 2 Period 3
RRR R R R

We now plot the results:

df.id = categorical(df.id)
df.subj_per = categorical(df.subj_per)
plt =
    data(df) *
    mapping(:time, :concentration, group = :subj_per, color = :id) *
    visual(Lines)
draw(
    plt,
    axis = (
        xlabel = "Time",
        ylabel = "Concentration",
        title = "Concentration-Time Profiles",
    ),
)

By summing up each of the profiles along the time axis, we get a rough form of an AUC endpoint (here each time step was \(1\), e.g. one hour). We also carry out a log transformation. Note the composition log ∘ sum where \circ followed by the TAB key yields . Here are these endpoints organized by the subject id where each row is another period.

auc_data = select(
    combine(groupby(df, [:id, :subj_per]), :concentration => log  sum => :auc),
    [:id, :auc],
)
wide_df = DataFrame(
    [g.auc for g in groupby(auc_data, :id)],
    "id: " .* string.(unique(auc_data.id)),
)
simple_table(wide_df)
id: 1 id: 2 id: 3 id: 4
1.33 1.84 1.83 2.26
1.33 1.84 2.19 2.3
1.4 1.82 1.73 2.32

Now just by considering the values above you see that within each subject group (column) there is some variability which arises form the different replications of the subject. This is the within subject variability. Further, across the various columns there is also variability which is ts the between subject variability.

Estimating the WSV and BSV from the toy model data

Without getting into the details at this point, let us use a linear mixed model from the MixedModels package to estimate the WSV and BSV.

The approach we take is sometimes called a one-way random effects ANOVA model. In our case it can be represented as:

\[ \text{AUC}_{ij} = \mu + b_j + w_{ij}, \qquad i = 1,\ldots,3, \qquad j = 1,\ldots,4, \]

where \(i\) is the period index and \(j\) is the subject index. Here \(\mu\) is some overall mean, and it is assumed that \(b_j\) and \(w_{ij}\) are independent (normally distributed) random quantities with a mean of zero. The variability of \(b_j\) is the BSV and the variability of \(w_{ij}\) is the WSV.

A mixed model allows us to estimate these variabilities. We do not discuss the full details at this point:

VarCorr(lmm(@formula(auc ~ 1 + (1 | id)), auc_data, REML = true))
Column Variance Std.Dev
id (Intercept) 0.145379 0.381286
Residual 0.015066 0.122742

The right most column of above table presents estimates of the standard deviations of \(b_j\) and \(w_{ij}\) respectively. Converting these values to CVs we obtain:

BSV = round(sigma2geocv(0.381286), digits = 4)
WSV = round(sigma2geocv(0.122742), digits = 4)
@show BSV
@show WSV;
BSV = 0.3956
WSV = 0.1232

While this toy model and analysis is not indicative of the type of analysis we may do for actual bioequivalence trials, we still see here that we can estimate the variabilities. In particular, in certain settings it is important to understand the WSV as it indicates the nature of the product in question. For example when WSV (as a CV) is greater that \(30\%\) drugs are typically called highly variable.

Design considerations for estimation of variability

A key advantage of crossover designs over parallel designs is that they can account for the large, systematic differences between subjects. By comparing the test and reference formulations within the same person, we effectively remove much of the between-subject variability from the analysis.

However, a standard 2×2 crossover design only allows us to estimate a single, combined measure of the within-subject variability. It cannot distinguish whether the test formulation is inherently more or less variable than the reference formulation.

To estimate the within-subject variability for each formulation separately, we need to administer the same formulation to a subject more than once. This is the primary motivation for replicate designs. Such designs are also useful for drugs that are classified as highly variable drugs (HVD) where there is a high within subject variability. They are also useful for narrow therapeutic index drugs (NTID). Both of these subjects are covered in detail in further units.

8 Replicate designs

A replicate design is a design where a formulation is administered more than once for a subject. There are two categories here:

  • A fully replicate design has both (or all) formulations replicated.
  • A partial replicate design has only one of the formulations replicated. Note that in this case, it is typically the reference formulation which is replicated.

Fully replicate design

This is a common fully replicate design:

Sequence Period 1 Period 2 Period 3 Period 4
TRTR T R T R
RTRT R T R T

Observe that that in each of the sequences TRTR and RTRT, we administer both formulations R and T. Since there are two measurements of each formulation per subject, with this design we can estimate the within subject variability both for R and for T.

In fact, this design is recommended for NTID products as in FDA (2012), as well as for HVD products as in FDA (2011).

With four periods there is obviously more of a chance of encountering dropouts, since subjects are required to participate in the study for longer durations. Nevertheless, such designs generally require a smaller study size than two period designs.

A different variation of this \(2 \times 4 \times 2\) design assigns formulations in a slightly different order than above:

Sequence Period 1 Period 2 Period 3 Period 4
RTTR R T T R
TRRT T R R T

Observe that no matter what, in both of these variations, subjects from each sequence receive both formulations twice, and each period exhibits both formulations.

For illustration here is pumas_be applied to both of these variants of the \(2 \times 4 \times 2\) design (click the tabs to switch between the designs):

be_data = dataset("bioequivalence/RTRT_TRTR/SLTGSF2020_DS12")
pumas_be(be_data, endpoint = :PK)
Observation Counts
Sequence ╲ Period 1 2 3 4
RTRT 38 38 36 37
TRTR 39 38 34 38
Paradigm: Replicated crossover that supports reference scaling
Model: Mixed model (unequal variance)
Criteria: Standard ABE
Endpoint: PK
Formulations: Reference(R), Test(T)
Results(PK) Assessment Criteria
R Geometric Marginal Mean 0.7645
Geometric Naive Mean 0.7588
T Geometric Marginal Mean 0.9105
Geometric Naive Mean 0.883
Geometric Mean T/R Ratio (%) 119.1
Degrees of Freedom 72.68
90% Confidence Interval (%) [89.13, 159.2] Fail CI ⊆ [80, 125]
Variability CVᵣ (%) | σ̂ᵣ 221.5 | 1.3327
CVₜ (%) | σ̂ₜ 288.9 | 1.495
Variability Ratio (%) 112.2
ANOVA Formulation (p-value) 0.3184
Sequence (p-value) 0.2292
Period (p-value) 0
be_data = dataset("bioequivalence/RTTR_TRRT/CL2009_9_4_1")
pumas_be(be_data, endpoint = :AUC)
Observation Counts
Sequence ╲ Period 1 2 3 4
RTTR 5 5 5 5
TRRT 5 5 4 4
Paradigm: Replicated crossover that supports reference scaling
Model: Mixed model (unequal variance)
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 65.41
Geometric Naive Mean 65.1
T Geometric Marginal Mean 76.33
Geometric Naive Mean 76.73
Geometric Mean T/R Ratio (%) 116.7
Degrees of Freedom 19.57
90% Confidence Interval (%) [98.19, 138.8] Fail CI ⊆ [80, 125]
Variability CVᵣ (%) | σ̂ᵣ 39.62 | 0.3818
CVₜ (%) | σ̂ₜ 32.0 | 0.3122
Variability Ratio (%) 81.77
ANOVA Formulation (p-value) 0.1386
Sequence (p-value) 0.2507
Period (p-value) 0.02269

As you can see, in both cases the output now presents estimates of CVᵣ and CVₜ for the within subject variability of the reference and test product respectively. Such estimates are not attainable without replication. Also note that the model is a Mixed model with (unequal variance). We discuss these issues in future units.

The FDA_HighlyVariable and FDA_NarrowTherapeuticIndex specifications

Future units are focused on the topics of highly variable drugs, narrow therapeutic index drugs, and reference scaling approaches. At this point let us just appreciate that we often (but not always) use such designs with reference scaling which is an approach for bioequivalence used for highly variable drugs and narrow therapeutic index drugs.

At this point appreciate that to carry out such analysis with pumas_be, we specify the FDA_HighlyVariable or FDA_NarrowTherapeuticIndex as the second argument. Here is what the output looks like (click the tabs to switch between FDA_HighlyVariable and FDA_NarrowTherapeuticIndex).

be_data = dataset("bioequivalence/RTRT_TRTR/SLTGSF2020_DS12")
pumas_be(be_data, endpoint = :PK, FDA_HighlyVariable)
Observation Counts
Sequence ╲ Period 1 2 3 4
RTRT 38 38 36 37
TRTR 39 38 34 38
Paradigm: Replicated crossover that supports reference scaling
Model: Mixed model (unequal variance)
Criteria: FDA RSABE for HV
Endpoint: PK
Formulations: Reference(R), Test(T)
Results(PK) Assessment Criteria
R Geometric Marginal Mean 0.7645
Geometric Naive Mean 0.7588
T Geometric Marginal Mean 0.9105
Geometric Naive Mean 0.883
Geometric Mean T/R Ratio (%) 119.1 Pass GMR ∈ [80, 125]
Degrees of Freedom 72.68
90% Confidence Interval (%) [89.13, 159.2]
Variability CVᵣ (%) | σ̂ᵣ 221.5 | 1.3327 OK for RS CVᵣ ≥ Minimal CVᵣ
CVₜ (%) | σ̂ₜ 288.9 | 1.495
Variability Ratio (%) 112.2
ANOVA Formulation (p-value) 0.3184
Sequence (p-value) 0.2292
Period (p-value) 0
Reference Scaling Params Reference Scaling Constant 0.7967
Minimal CVᵣ for Reference Scaling (%) 30.0 | 0.294
Reference Scaling Analysis Geometric Mean T/R Ratio (%) 131.4
Standard Error (Log Scale) 0.1777
90% Confidence Interval (%) [97.69, 176.7]
Degrees of Freedom 71
Howe's Approx RSABE Stat (95%) -0.9468 Pass ≤ 0
be_data = dataset("bioequivalence/RTTR_TRRT/CL2009_9_4_1")
pumas_be(be_data, endpoint = :AUC, FDA_NarrowTherapeuticIndex)
Observation Counts
Sequence ╲ Period 1 2 3 4
RTTR 5 5 5 5
TRRT 5 5 4 4
Paradigm: Replicated crossover that supports reference scaling
Model: Mixed model (unequal variance)
Criteria: FDA RSABE for NTI
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 65.41
Geometric Naive Mean 65.1
T Geometric Marginal Mean 76.33
Geometric Naive Mean 76.73
Geometric Mean T/R Ratio (%) 116.7
Degrees of Freedom 19.57
90% Confidence Interval (%) [98.19, 138.8] Fail CI ⊆ [80, 125]
Variability CVᵣ (%) | σ̂ᵣ 39.62 | 0.3818
CVₜ (%) | σ̂ₜ 32.0 | 0.3122
Variability Ratio (%) 81.77
ANOVA Formulation (p-value) 0.1386
Sequence (p-value) 0.2507
Period (p-value) 0.02269
Reference Scaling Params Reference Scaling Constant 1.11
Reference Scaling Analysis Geometric Mean T/R Ratio (%) 118.4
Standard Error (Log Scale) 0.0649
90% Confidence Interval (%) [104.8, 134]
Degrees of Freedom 7
Howe's Approx RSABE Stat (95%) -0.03577 Fail ≤ 0
Variability Ratio Quantile (95%) 1.592 Pass ≤ 2.5

We’ll analyze these approaches and the complete output arising from such specifications in future units.

Partial replicate design

Sometimes, our key motivation for a replicate design is estimating the within subject variability of the reference, and the within subject variability of the test product is not as critical.

For this we can use a partial replicate design. This is for example a partial replicate design:

Sequence Period 1 Period 2 Period 3
RRT R R T
RTR R T R
TRR T R R

Here is pumas_be applied to data from such a design.

be_data = dataset("bioequivalence/RRT_RTR_TRR/SLTGSF2020_DS04")
pumas_be(be_data, endpoint = :PK)
Observation Counts
Sequence ╲ Period 1 2 3
RRT 17 17 17
RTR 17 17 17
TRR 17 17 17
Paradigm: Replicated crossover that supports reference scaling
Model: Mixed model (unequal variance)
Criteria: Standard ABE
Endpoint: PK
Formulations: Reference(R), Test(T)
Results(PK) Assessment Criteria
R Geometric Marginal Mean 41.58
Geometric Naive Mean 41.58
T Geometric Marginal Mean 57.06
Geometric Naive Mean 57.06
Geometric Mean T/R Ratio (%) 137.2
Degrees of Freedom 48.95
90% Confidence Interval (%) [118.7, 158.6] Fail CI ⊆ [80, 125]
Variability CVᵣ (%) | σ̂ᵣ 61.96 | 0.57
ANOVA Formulation (p-value) 0.0006
Sequence (p-value) 0.1543
Period (p-value) 0.8618

As you can see the estimated variability is CVᵣ which stands for the within subject variability of the R.

Again, we can use FDA_HighlyVariable for reference scaling, details are in future units.

pumas_be(be_data, FDA_HighlyVariable, endpoint = :PK)
Observation Counts
Sequence ╲ Period 1 2 3
RRT 17 17 17
RTR 17 17 17
TRR 17 17 17
Paradigm: Replicated crossover that supports reference scaling
Model: Mixed model (unequal variance)
Criteria: FDA RSABE for HV
Endpoint: PK
Formulations: Reference(R), Test(T)
Results(PK) Assessment Criteria
R Geometric Marginal Mean 41.58
Geometric Naive Mean 41.58
T Geometric Marginal Mean 57.06
Geometric Naive Mean 57.06
Geometric Mean T/R Ratio (%) 137.2 Fail GMR ∈ [80, 125]
Degrees of Freedom 48.95
90% Confidence Interval (%) [118.7, 158.6]
Variability CVᵣ (%) | σ̂ᵣ 61.96 | 0.57 OK for RS CVᵣ ≥ Minimal CVᵣ
ANOVA Formulation (p-value) 0.0006
Sequence (p-value) 0.1543
Period (p-value) 0.8618
Reference Scaling Params Reference Scaling Constant 0.7967
Minimal CVᵣ for Reference Scaling (%) 30.0 | 0.294
Reference Scaling Analysis Geometric Mean T/R Ratio (%) 137.2
Standard Error (Log Scale) 0.0866
90% Confidence Interval (%) [118.7, 158.7]
Degrees of Freedom 48
Howe's Approx RSABE Stat (95%) -0.02774 Pass ≤ 0

Note that we cannot use FDA_NarrowTherapeuticIndex as this requires a fully replicate design. Also we can only use FDA_HighlyVariable with certain partial replicate designs. For example look at supports_rsabe = for these two designs:

be_data = dataset("bioequivalence/RRT_RTR_TRR/SLTGSF2020_DS04")
detect_design(be_data)
(design = "RRT|RTR|TRR",
 num_sequences = 3,
 num_periods = 3,
 num_formulations = 2,
 reference_formulation = 'R',
 test_formulations = ['T'],
 replicated = partially_replicated,
 supports_rsabe = true,
 crossover = true,)
be_data = dataset("bioequivalence/RTR_TRR/SLTGSF2020_DS22")
detect_design(be_data)
(design = "RTR|TRR",
 num_sequences = 2,
 num_periods = 3,
 num_formulations = 2,
 reference_formulation = 'R',
 test_formulations = ['T'],
 replicated = partially_replicated,
 supports_rsabe = false,
 crossover = true,)

Repeated (replicated) parallel design

Another type of replicated design is a parallel design of the form:

Sequence Period 1 Period 2
RR R R
TT T T
be_data = dataset(joinpath("bioequivalence", "RR_TT", "From_PJ2017_4_2"));
design = detect_design(be_data)
(design = "RR|TT",
 num_sequences = 2,
 num_periods = 2,
 num_formulations = 2,
 reference_formulation = 'R',
 test_formulations = ['T'],
 replicated = fully_replicated,
 supports_rsabe = false,
 crossover = false,)

While this design does not have some of the benefits of cross over design it does allow us to estimate the within subject variability both for R and for T:

pumas_be(be_data, endpoint = :PK)
Observation Counts
Sequence ╲ Period 1 2
RR 34 35
TT 38 38
Paradigm: Parallel design
Model: T test (equal variance)
Criteria: Standard ABE
Endpoint: PK
Formulations: Reference(R), Test(T)
Results(PK) Assessment Criteria
R Geometric Naive Mean 3.72
T Geometric Naive Mean 3.891
Geometric Mean T/R Ratio (%) 104.6
Degrees of Freedom 143
90% Confidence Interval (%) [95.25, 114.9] Pass CI ⊆ [80, 125]
Variability CVᵣ (%) | σ̂ᵣ 13.29 | 0.1323
CVₜ (%) | σ̂ₜ 8.14 | 0.0813

As you can see both CVᵣ and CVₜ are estimated. A negative is obviously that subject effects cannot be factored out since a subject is either administered R or T.

9 Multiple formulations (more than 2)

Some studies have more than two formulations. We encode these situations as 1RnT, nR1T, and nRnT (these are not industry-standard encodings but are useful for the discussion here). Here are the types of cases:

  • Multiple test formulations (1RnT): In this case there is a single reference and multiple tests (e.g. n=2 or n=3). As an example, this can be a situation where we consider multiple dosage forms of the test formulation and want to show that they are all bioequivalent to the reference product.
  • Multiple reference formulations (nR1T): In this case we want to compare a single test product to two or more references. For example in one trial we want to compare the test product to both a European reference and a US reference (n=2).
  • Combinations of test and reference (nRnT): In this case we want to have more than one test product and more than one reference product. For example there are two dosage levels (low and high) and a test and reference is tested on each dosage (n=2, and thus there are 4 formulations in total).

Often, such scenarios are carried out in a single bioequivalence trial where the design accounts for the multiple formulations. For example, in the 1RnT case the reference is administered as one formulation, and each of the test formulations are other formulations. In the nR1T case the reference is administered as one formulation, and each of the test formulations are other formulations. Finally in the nRnT case, typically with n=2, there would be four formulations, two of which are reference formulations and the other two are matching reference formulations.

Three formulations design

In the case of three formulations, R, S, and T one popular design is:

Sequence Period 1 Period 2 Period 3
RST R S T
RTS R T S
SRT S R T
STR S T R
TRS T R S
TSR T S R

Observe that there are \(6\) possible permutations to the three formulations, R, S, and T, and this design considers each possible formulation.

  • If we consider this design for the 1RnT case then it is sensible to encode R as the reference formulation while the test formulations are S and T.
  • If we consider this design for the nR1T case then it is sensible to encode T is the test formulation while the reference formulations are R and S.

Four formulations design

In the case of four formulation, denoted A, B, C, and D, with A the reference, a common design is the following:

Sequence Period 1 Period 2 Period 3 Period 4
ABCD A B C D
BCDA B C D A
CDAB C D A B
DABC D A B C

This is a Williams design, also known as a balanced Latin square crossover design.

This design is ensures that:

  • Each formulation appears exactly once in each period and each sequence. This balances period and sequence effects.
  • Each formulation both precedes and follows every other formulation the same number of times across all sequences. This distributes potential first-order carryover effects evenly.

There can be multiple interpretations:

  • If we consider this design for the 1RnT case then it is sensible to encode A as the reference formulation while the test formulations are B, C, and D.
  • If we consider this design for the nR1T case then it is sensible to encode A is the test formulation while the reference formulations are B, C, and D.
  • If we consider this design for the nRnT case, then we can consider A as a reference compared to the test B, and C as a reference compared to the test D.

Multiple comparisons

In each case where there are multiple formulations there are also multiple comparisons. This means that we carry out multiple hypothesis tests. Each such hypothesis test considers a different pair of formulations and has its own null hypothesis \(H_0\) (non-bioequivalence) and alternative hypothesis \(H_1\) (bioequivalence holds).

As an example for n=2:

  • 1RnT: We compare the single reference formulation (R) to each of the test formulations (S and T). So we have:

\[ {\cal T}_{\texttt{comparison 1}} = \begin{cases} H_0: & \texttt{R vs T}~\text{not bioequivalent},\\ H_1: & \texttt{R vs T}~\text{bioequivalent}. \end{cases} \]

\[ {\cal T}_{\texttt{comparison 2}} = \begin{cases} H_0: & \texttt{R vs S}~\text{not bioequivalent},\\ H_1: & \texttt{R vs S}~\text{bioequivalent}. \end{cases} \]

  • nR1T: We compare the single test formulation (T) to multiple reference formulations (R and S). So we have:

\[ {\cal T}_{\texttt{comparison 1}} = \begin{cases} H_0: & \texttt{R vs T}~\text{not bioequivalent},\\ H_1: & \texttt{R vs T}~\text{bioequivalent}. \end{cases} \]

\[ {\cal T}_{\texttt{comparison 2}} = \begin{cases} H_0: & \texttt{S vs T}~\text{not bioequivalent},\\ H_1: & \texttt{S vs T}~\text{bioequivalent}. \end{cases} \]

  • nRnT: We compare reference A to test B and reference C to test D. Hence for each reference formulation there is an \(H_0\) and an \(H_1\). So we have:

\[ {\cal T}_{\texttt{comparison 1}} = \begin{cases} H_0: & \texttt{A vs B}~\text{not bioequivalent},\\ H_1: & \texttt{A vs B}~\text{bioequivalent}. \end{cases} \]

\[ {\cal T}_{\texttt{comparison 2}} = \begin{cases} H_0: & \texttt{C vs D}~\text{not bioequivalent},\\ H_1: & \texttt{C vs D}~\text{bioequivalent}. \end{cases} \]

Alpha (\(\alpha\)) adjustment

Recalling that hypothesis tests control the Type-I error via \(\alpha\), if we carry out multiple comparisons, each with its own \(\alpha\), there is a possibility of increasing the probability of Type-I error.

As a hypothetical example assume we are in the 1RnT case with \(n=10\).

This is hypothetical because we will rarely try more than three test formulations in a single trial. In this case, say bioequivalence does not hold for any of the test formulations. That is, \(H_0\) holds for all 10 trials. Still, let us see the probability of wrongfully rejecting at least one of the hypothesis and saying one of the formulations is bioequivalent:

  • Under \(H_0\) there is a probability of \(\alpha\) to reject \(H_0\) wrongfully for a specific test formulation.
  • Thus there is a probability of \(1-\alpha\) of not making a Type-I error of that test formulation.
  • If we assume the tests are independent, then there is a probability of \((1-\alpha)^{10}\) of not making any Type-I error.
  • Hence the probability of making at least one Type-I error is \(1-(1-\alpha)^{10}\):

Let us see this for the typical \(\alpha = 5\%\):

α = 0.05
consumer_risk = 1 - (1 - α)^10
0.4012630607616213

Hence the consumer’s risk from that trial is inflated from \(5\%\) to about \(40\%\)!

Now in practice we won’t carry out n=10 comparisons but rather n=2 or n=3. Still in that case:

consumer_risk_2_comparisons = round(1 - (1 - α)^2, digits = 4)
consumer_risk_3_comparisons = round(1 - (1 - α)^3, digits = 4);
@show consumer_risk_2_comparisons
@show consumer_risk_3_comparisons;
consumer_risk_2_comparisons = 0.0975
consumer_risk_3_comparisons = 0.1426

Hence we are still at riks of increasing the probability of some Type-I error to either about \(9.8\%\) or about \(14.2\%\).

A simple approach for mitigating this increase in the consumer’s risk is the Bonferroni correction where we adjust \(\alpha\) by dividing it by the number of trials.

See how this approach works for n=2, n=3, and (the hypothetical case of) n=10:

consumer_risk_2_comparisons_adjusted = round(1 - (1 - α / 2)^2, digits = 6)
consumer_risk_3_comparisons_adjusted = round(1 - (1 - α / 3)^3, digits = 6)
consumer_risk_10_comparisons_adjusted = round(1 - (1 - α / 10)^10, digits = 6)
@show consumer_risk_2_comparisons_adjusted
@show consumer_risk_3_comparisons_adjusted
@show consumer_risk_10_comparisons_adjusted;
consumer_risk_2_comparisons_adjusted = 0.049375
consumer_risk_3_comparisons_adjusted = 0.049171
consumer_risk_10_comparisons_adjusted = 0.04889

So, you can see that by adjusting alpha we are able to keep the consumer risk roughly at \(5\%\).

Note however that such adjustments to \(\alpha\) reduce the power of each of the tests. Hence while we are able to keep the consumer riks roughly fixed, we end up increasing the producer’s riks. More of this are in future units where we discuss power analysis.

The ICH guidance specifically suggests when to adjust \(\alpha\) and when not to. For this consider Section 2.2.5 in International council for harmonisation of technical requirements for pharmaceuticals for human use (2024):

For the case of nR1T (this is called “Multiple Comparator Products” in that guidance) it is recommended not to carry out an adjustment:

In studies with multiple comparator products, multiplicity correction, i.e., alpha adjustment, is not needed because comparator products are considered independent and region-specific. Decisions will be made independently about a test product relative to a single comparator product within a single jurisdiction.

However in the case 1RnT (“multiple test products”) it is suggested:

The need to apply multiplicity correction in pivotal trials depends on the underlying objectives of the trial:

  1. If the objective is to achieve BE for all test formulations vs. the comparator product, no alpha adjustment is needed.
  1. If the objective is to show BE for any of the test formulations, multiplicity (alpha) adjustment may be needed.

In any case, as with any design choice and statistical methodology choice:

The objective of the trial and method for multiplicity correction should be pre-specified in the study protocol.

Models and data use

In each of the cases 1RnT, nR1T, and nRnT we have a choice if to use a single statistical model based on all the collected data of the design, or to break up the data into separate models. As an example if we consider 1RnT with n=2. We can compare the reference formulation to each of the test formulations based on a single model, or we can separate the data and consider two separate models.

That is, we can have one model that generates all the comparisons (in linear models these are sometimes called contrasts), or we can create a different model for each comparison. The one model approach has the benefit of using all data together and thus potentially enables better estimates and higher statistical power. However this is also a drawback because comparisons between two formulations are also influenced by a third (or even fourth) formulation that is not relevant for that comparison.

While there are several approaches, our general suggested approach here is:

  • 1RnT: Use a single model comprised of all data. Pumas’ pumas_be function supports this.
  • nR1T: Use separate models, with each model separating out the data and using only data, converting it to a \(2 \times 2\) cross over design. Then carry out the analysis for each filtered dataset separately.
  • nRnT: Similarly, separate out the models, with each model converting the data to a \(2 \times 2\) design and then carry out the analysis for each filtered dataset separately.

Let us now see some examples.

A 1RnT example with n=3

We’ll use the bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5 dataset from the PharmaDatasets.

This data is from Example 4.5 of Patterson and Jones (2017). Note that in that book it is treated as a nR1T case in contrast our example below. Let’s first load the dataset:

be_data = dataset("bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5");

Now let’s apply pumas_be which considers the study as 1RnT by default. Here following the ICH guidance above, let’s assume we are in case (b) where the objective is “to show BE for any of the test formulations”. In this case let’s carry out \(\alpha\) adjustment. In pumas_be this is done via the level keyword argument (do not confuse with the alpha and level_y keyword arguments that have other uses).

With a Bonforoni correction, since we carry out two comparisons, we want to half \(\alpha\) from \(5\%\) to \(2.5\%\). So the hypothesis tests are to be carried out with \(\alpha = 0.025\).

However, the level keyword is for the width of the confidence intervals. As you can recall from Unit 3, the idea of carrying out a TOST is based on a \(100\times(1-2\alpha)\%\) confidence interval. Hence the confidence interval width we want is \(95\%\) (in contrast to the default \(90\%\)). Thus we set level = 0.95.

Let us see it both for AUC and Cmax (click on the tabs to switch):

pumas_be(be_data, level = 0.95)
Observation Counts
Sequence ╲ Period 1 2 3
RST 8 8 8
RTS 11 11 11
SRT 11 11 11
STR 10 10 10
TRS 10 10 10
TSR 10 10 10
Paradigm: 3 formulations
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(S and T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 3808
Geometric Naive Mean 3824
S Geometric Marginal Mean 5346
Geometric Naive Mean 5412
Geometric Mean S/R Ratio (%) 140.4
Degrees of Freedom 114
95% Confidence Interval (%) [130.1, 151.5] Fail CI ⊆ [80, 125]
T Geometric Marginal Mean 4426
Geometric Naive Mean 4426
Geometric Mean T/R Ratio (%) 116.2
Degrees of Freedom 114
95% Confidence Interval (%) [107.7, 125.5] Fail CI ⊆ [80, 125]
Variability CV (%) | σ̂ 21.22 | 0.2099
pumas_be(be_data, level = 0.95, endpoint = :Cmax)
Observation Counts
Sequence ╲ Period 1 2 3
RST 9 9 8
RTS 11 11 11
SRT 11 11 11
STR 10 10 10
TRS 11 11 10
TSR 10 10 10
Paradigm: 3 formulations
Model: Linear model
Criteria: Standard ABE
Endpoint: Cmax
Formulations: Reference(R), Test(S and T)
Results(Cmax) Assessment Criteria
R Geometric Marginal Mean 832.4
Geometric Naive Mean 837.5
S Geometric Marginal Mean 1327
Geometric Naive Mean 1340
Geometric Mean S/R Ratio (%) 159.4
Degrees of Freedom 118
95% Confidence Interval (%) [143.6, 176.9] Fail CI ⊆ [80, 125]
T Geometric Marginal Mean 1082
Geometric Naive Mean 1078
Geometric Mean T/R Ratio (%) 129.9
Degrees of Freedom 118
95% Confidence Interval (%) [117.1, 144.2] Fail CI ⊆ [80, 125]
Variability CV (%) | σ̂ 29.69 | 0.2907

You can see that Pumas detects the Paradigm is 3 formulations and then assigns the tests and reference accordingly. For example for AUC, the estimated GMR is \(140.4\%\) and the confidence interval (notice it is 95%) is [130.1, 151.5] which falls outside of the desired [80, 125] range. We will revisit more details of the output and the model in future units.

An nR1T example with n=2

Again using the bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5 dataset, let us now consider it as nR1T (multiple references). Following the approach outlined above, let us transform the dataset into separate \(2 \times 2\) datasets, data_ST_2by2 and data_RT_2by2:

be_data = dataset("bioequivalence/RST_RTS_SRT_STR_TRS_TSR/PJ2017_4_5")
@rtransform! be_data :formulation = :sequence[:period];
data_ST = @rsubset be_data :formulation in ['S', 'T'];
data_ST_2by2 = @rtransform data_ST :sequence = replace(:sequence, "R" => "");
@rtransform! data_ST_2by2 :period = findfirst(==(:formulation), :sequence);

data_RT = @rsubset be_data :formulation in ['R', 'T'];
data_RT_2by2 = @rtransform data_RT :sequence = replace(:sequence, "S" => "");
@rtransform! data_RT_2by2 :period = findfirst(==(:formulation), :sequence);

We can inspect the design of the datasets we formed with this transformation:

detect_design(data_ST_2by2)
(design = "RT|TR",
 num_sequences = 2,
 num_periods = 2,
 num_formulations = 2,
 reference_formulation = 'S',
 test_formulations = ['T'],
 replicated = non_replicated,
 supports_rsabe = false,
 crossover = true,)
detect_design(data_RT_2by2)
(design = "RT|TR",
 num_sequences = 2,
 num_periods = 2,
 num_formulations = 2,
 reference_formulation = 'R',
 test_formulations = ['T'],
 replicated = non_replicated,
 supports_rsabe = false,
 crossover = true,)

Note that in both cases the design string is RT|TR (and does not use S). This is just due to the letters R and T being a standard way to specify the design inside Pumas.

Now we can carry out the comparisons (click on the tabs to switch between comparisons - note that in each case we present AUC followed by Cmax):

Note that at the moment, pumas_be’s output labels S as R in both cases, but it should be treated as S:

pumas_be(data_ST_2by2)
Observation Counts
Sequence ╲ Period 1 2
RT 29 29
TR 31 31
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 5339
Geometric Naive Mean 5412
T Geometric Marginal Mean 4459
Geometric Naive Mean 4426
Geometric Mean T/R Ratio (%) 83.52
Degrees of Freedom 56
90% Confidence Interval (%) [78.84, 88.48] Fail CI ⊆ [80, 125]
Variability CV (%) | σ̂ 18.73 | 0.1857
ANOVA Formulation (p-value) 0
Sequence (p-value) 0.4313
Period (p-value) 0.174
pumas_be(data_ST_2by2, endpoint = :Cmax)
Observation Counts
Sequence ╲ Period 1 2
RT 30 29
TR 32 31
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: Cmax
Formulations: Reference(R), Test(T)
Results(Cmax) Assessment Criteria
R Geometric Marginal Mean 1323
Geometric Naive Mean 1340
T Geometric Marginal Mean 1091
Geometric Naive Mean 1078
Geometric Mean T/R Ratio (%) 82.52
Degrees of Freedom 58
90% Confidence Interval (%) [76.17, 89.41] Fail CI ⊆ [80, 125]
Variability CV (%) | σ̂ 26.7 | 0.2624
ANOVA Formulation (p-value) 0.00018
Sequence (p-value) 0.8896
Period (p-value) 0.5804
pumas_be(data_RT_2by2)
Observation Counts
Sequence ╲ Period 1 2
RT 30 30
TR 30 30
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 3815
Geometric Naive Mean 3824
T Geometric Marginal Mean 4430
Geometric Naive Mean 4426
Geometric Mean T/R Ratio (%) 116.1
Degrees of Freedom 57
90% Confidence Interval (%) [108.7, 124] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 21.63 | 0.2139
ANOVA Formulation (p-value) 0.00036
Sequence (p-value) 0.4096
Period (p-value) 0.5645
pumas_be(data_RT_2by2, endpoint = :Cmax)
Observation Counts
Sequence ╲ Period 1 2
RT 31 30
TR 31 31
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: Cmax
Formulations: Reference(R), Test(T)
Results(Cmax) Assessment Criteria
R Geometric Marginal Mean 837.5
Geometric Naive Mean 837.5
T Geometric Marginal Mean 1083
Geometric Naive Mean 1078
Geometric Mean T/R Ratio (%) 129.4
Degrees of Freedom 59
90% Confidence Interval (%) [118.8, 140.8] Fail CI ⊆ [80, 125]
Variability CV (%) | σ̂ 28.56 | 0.28
ANOVA Formulation (p-value) 0
Sequence (p-value) 0.2598
Period (p-value) 0.07524

A 1RnT example with n=3

We’ll use the bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6 dataset from the PharmaDatasets. This data is from Example 4.6 of Patterson and Jones (2017). Note that in that book it is treated as a nRnT case in contrast to our analysis now which considers it as having multiple tests.

be_data = dataset("bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6")
pumas_be(be_data)
Observation Counts
Sequence ╲ Period 1 2 3 4
ADBC 7 7 7 7
BACD 7 7 7 7
CBDA 7 7 7 7
DCAB 7 7 7 7
Paradigm: 4 formulations
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(A), Test(B, C and D)
Results(AUC) Assessment Criteria
A Geometric Marginal Mean 367.9
Geometric Naive Mean 367.9
B Geometric Marginal Mean 369.7
Geometric Naive Mean 369.7
Geometric Mean B/A Ratio (%) 100.5
Degrees of Freedom 78
90% Confidence Interval (%) [94.7, 106.6] Pass CI ⊆ [80, 125]
C Geometric Marginal Mean 2985
Geometric Naive Mean 2985
Geometric Mean C/A Ratio (%) 811.4
Degrees of Freedom 78
90% Confidence Interval (%) [764.8, 860.8] Fail CI ⊆ [80, 125]
D Geometric Marginal Mean 2879
Geometric Naive Mean 2879
Geometric Mean D/A Ratio (%) 782.6
Degrees of Freedom 78
90% Confidence Interval (%) [737.6, 830.2] Fail CI ⊆ [80, 125]
Variability CV (%) | σ̂ 13.35 | 0.1329

Now you may observe that in this example, formulations C and D fail with estimates and confidence intervals for the GMR way out of the desired range. In this dataset this is because they are with different dosage amounts which are in fact 8 times as high. For this we transform the data by dividing the AUC by 8 every time the formulation is C or D. This is done with the following code using @rtransform from the DataFramesMeta package.

transformed_data =
    @rtransform(be_data, :AUC = :sequence[:period] in ['C', 'D'] ? :AUC / 8 : :AUC);
be_result = pumas_be(transformed_data)
Observation Counts
Sequence ╲ Period 1 2 3 4
ADBC 7 7 7 7
BACD 7 7 7 7
CBDA 7 7 7 7
DCAB 7 7 7 7
Paradigm: 4 formulations
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(A), Test(B, C and D)
Results(AUC) Assessment Criteria
A Geometric Marginal Mean 367.9
Geometric Naive Mean 367.9
B Geometric Marginal Mean 369.7
Geometric Naive Mean 369.7
Geometric Mean B/A Ratio (%) 100.5
Degrees of Freedom 78
90% Confidence Interval (%) [94.7, 106.6] Pass CI ⊆ [80, 125]
C Geometric Marginal Mean 373.2
Geometric Naive Mean 373.2
Geometric Mean C/A Ratio (%) 101.4
Degrees of Freedom 78
90% Confidence Interval (%) [95.6, 107.6] Pass CI ⊆ [80, 125]
D Geometric Marginal Mean 359.9
Geometric Naive Mean 359.9
Geometric Mean D/A Ratio (%) 97.82
Degrees of Freedom 78
90% Confidence Interval (%) [92.2, 103.8] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 13.35 | 0.1329

As you can see, now in this case, all formulations pass. Such a transformation obviously only needs to be done when the experimental setup merits it.

A nR1T example with n=3

Let us now consider the same data as nR1T. For this let us create three different datasets that each compare the test D to the references, A, B, and C respectively.

We’ll continue working on transformed_data from above. Let us create a data processing helper function make_2by2by2_from_4by4by4 that converts the \(4 \times 4 \times 4\) design into a \(2 \times 2 \times 2\) design with the specification formulations:

function make_2by2by2_from_4by4by4(original_df, formulation_1, formulation_2)
    df = deepcopy(original_df)
    all_formulations = unique(vcat(unique.(df.sequence)...))
    formulations_to_drop = setdiff(all_formulations, [formulation_1, formulation_2])
    @rtransform! df :formulation = :sequence[:period]
    @rsubset! df :formulation in [formulation_1, formulation_2]
    @rtransform! df :sequence =
        replace(:sequence, formulations_to_drop[1] => "", formulations_to_drop[2] => "")
    @rtransform! df :period = findfirst(==(:formulation), :sequence)
    return df
end;

We now use this function for the three comparisons we consider. You can again switch between tabs in each case below. Here are the results only for AUC:

data_AD_2by2 = make_2by2by2_from_4by4by4(transformed_data, 'A', 'D')
pumas_be(data_AD_2by2)
Observation Counts
Sequence ╲ Period 1 2
RT 14 14
TR 14 14
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 367.9
Geometric Naive Mean 367.9
T Geometric Marginal Mean 359.9
Geometric Naive Mean 359.9
Geometric Mean T/R Ratio (%) 97.82
Degrees of Freedom 26
90% Confidence Interval (%) [92.41, 103.5] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 12.52 | 0.1247
ANOVA Formulation (p-value) 0.514
Sequence (p-value) 2.0e-5
Period (p-value) 0.00891
data_BD_2by2 = make_2by2by2_from_4by4by4(transformed_data, 'B', 'D')
pumas_be(data_BD_2by2)
Observation Counts
Sequence ╲ Period 1 2
RT 14 14
TR 14 14
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 369.7
Geometric Naive Mean 369.7
T Geometric Marginal Mean 359.9
Geometric Naive Mean 359.9
Geometric Mean T/R Ratio (%) 97.36
Degrees of Freedom 26
90% Confidence Interval (%) [90.17, 105.1] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 16.96 | 0.1684
ANOVA Formulation (p-value) 0.5576
Sequence (p-value) 0.05751
Period (p-value) 0.01014
data_CD_2by2 = make_2by2by2_from_4by4by4(transformed_data, 'C', 'D')
pumas_be(data_CD_2by2)
Observation Counts
Sequence ╲ Period 1 2
RT 14 14
TR 14 14
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 373.2
Geometric Naive Mean 373.2
T Geometric Marginal Mean 359.9
Geometric Naive Mean 359.9
Geometric Mean T/R Ratio (%) 96.45
Degrees of Freedom 26
90% Confidence Interval (%) [92.43, 100.6] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 9.37 | 0.0935
ANOVA Formulation (p-value) 0.1599
Sequence (p-value) 1.0e-5
Period (p-value) 0.0053

A nRnT example with n=3

Similarly if we wanted to treat the dataset bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6 dataset as in Example 4.6 of Patterson and Jones (2017), we are in the nRnT case where we compare A and B for one dosage level and then compare C and D for a different dosage level.

Now let us not use the transformed data but stay with the original data values. We again take the approach of converting the design to a \(2 \times 2 \times 2\) design using our make_2by2by2_from_4by4by4 function from above:

be_data = dataset("bioequivalence/ADBC_BACD_CBDA_DCAB/PJ2017_4_6");
data_AB_2by2 = make_2by2by2_from_4by4by4(transformed_data, 'A', 'B')
detect_design(data_AB_2by2)
(design = "RT|TR",
 num_sequences = 2,
 num_periods = 2,
 num_formulations = 2,
 reference_formulation = 'A',
 test_formulations = ['B'],
 replicated = non_replicated,
 supports_rsabe = false,
 crossover = true,)
data_CD_2by2 = make_2by2by2_from_4by4by4(transformed_data, 'C', 'D')
detect_design(data_CD_2by2)
(design = "RT|TR",
 num_sequences = 2,
 num_periods = 2,
 num_formulations = 2,
 reference_formulation = 'C',
 test_formulations = ['D'],
 replicated = non_replicated,
 supports_rsabe = false,
 crossover = true,)

Here are the comparisons both of AUC and Cmax, in each case:

pumas_be(data_AB_2by2)
Observation Counts
Sequence ╲ Period 1 2
RT 14 14
TR 14 14
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 367.9
Geometric Naive Mean 367.9
T Geometric Marginal Mean 369.7
Geometric Naive Mean 369.7
Geometric Mean T/R Ratio (%) 100.5
Degrees of Freedom 26
90% Confidence Interval (%) [92.84, 108.7] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 17.45 | 0.1732
ANOVA Formulation (p-value) 0.9201
Sequence (p-value) 0.3733
Period (p-value) 0.1069
pumas_be(data_AB_2by2, endpoint = :Cmax)
Observation Counts
Sequence ╲ Period 1 2
RT 14 14
TR 14 14
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: Cmax
Formulations: Reference(R), Test(T)
Results(Cmax) Assessment Criteria
R Geometric Marginal Mean 73.17
Geometric Naive Mean 73.17
T Geometric Marginal Mean 70.62
Geometric Naive Mean 70.62
Geometric Mean T/R Ratio (%) 96.51
Degrees of Freedom 26
90% Confidence Interval (%) [87.15, 106.9] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 22.68 | 0.224
ANOVA Formulation (p-value) 0.5586
Sequence (p-value) 0.9181
Period (p-value) 0.1429
pumas_be(data_CD_2by2)
Observation Counts
Sequence ╲ Period 1 2
RT 14 14
TR 14 14
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: AUC
Formulations: Reference(R), Test(T)
Results(AUC) Assessment Criteria
R Geometric Marginal Mean 373.2
Geometric Naive Mean 373.2
T Geometric Marginal Mean 359.9
Geometric Naive Mean 359.9
Geometric Mean T/R Ratio (%) 96.45
Degrees of Freedom 26
90% Confidence Interval (%) [92.43, 100.6] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 9.37 | 0.0935
ANOVA Formulation (p-value) 0.1599
Sequence (p-value) 1.0e-5
Period (p-value) 0.0053
pumas_be(data_CD_2by2, endpoint = :Cmax)
Observation Counts
Sequence ╲ Period 1 2
RT 14 14
TR 14 14
Paradigm: Non replicated crossover design
Model: Linear model
Criteria: Standard ABE
Endpoint: Cmax
Formulations: Reference(R), Test(T)
Results(Cmax) Assessment Criteria
R Geometric Marginal Mean 636
Geometric Naive Mean 636
T Geometric Marginal Mean 617.1
Geometric Naive Mean 617.1
Geometric Mean T/R Ratio (%) 97.04
Degrees of Freedom 26
90% Confidence Interval (%) [91.11, 103.4] Pass CI ⊆ [80, 125]
Variability CV (%) | σ̂ 13.9 | 0.1383
ANOVA Formulation (p-value) 0.4232
Sequence (p-value) 0.00134
Period (p-value) 0.669

10 Conclusion

This unit provided an overview of the most common experimental designs in bioequivalence testing, moving beyond the simple parallel design to more statistically powerful multi-period approaches.

The main focus was the \(2 \times 2\) crossover design, which is the standard for most bioequivalence studies. Its strength lies in controlling for variability between subjects, which generally provides more precise results with a smaller number of participants.

We also covered replicate designs, which are necessary for highly variable drugs (HVDs) and narrow therapeutic index drugs (NTIDs). By administering a drug more than once to the same subject, these designs allow us to measure the within-subject variability, a key requirement for certain regulatory pathways.

Finally, we looked at how to handle studies with multiple test or reference formulations. This raised the important statistical consideration of multiple comparisons and when it might be necessary to adjust the significance level (\(\alpha\)) to maintain the integrity of the results.

Understanding these different designs is crucial because the design choice determines the correct statistical model for the analysis. In the next unit, we will explore the linear and mixed-effects models that Pumas uses to analyze data from these various experimental setups.

11 Unit exercises

  1. Parallel vs. Crossover Design

    Consider a bioequivalence trial comparing formulations \(R\) (reference) and \(T\) (test).

    1. Describe a simple parallel design and a \(2 \times 2\) crossover design in terms of subject allocation and treatment sequence.
    2. Discuss one statistical advantage and one logistical disadvantage of using a crossover design instead of a parallel design for bioequivalence testing.
  2. Exploring Randomization: In the section on randomization, we assigned 20 subjects to 4 groups.

    1. Modify the code to randomize 30 subjects into 3 groups: “R”, “T low dose”, and “T high dose”.
    2. Run the code twice, once with Random.seed!(0) and once with Random.seed!(1). What do you observe about the group assignments? Why is this the case?
  3. Understanding Variability with the Toy Model: The toy model in the “Within subject variability and between subject variability” section helps illustrate BSV and WSV.

    1. Manipulating BSV: The array subject_average_concentration_loss_per_time controls the differences between subjects. Make the values in this array more similar to each other (e.g., [0.84, 0.85, 0.86, 0.87]). Rerun the simulation and the mixed model analysis (VarCorr(lmm(...))). How does the estimated standard deviation for id (which corresponds to BSV) change?
    2. Manipulating WSV: The random component rand(Uniform(0.85, 1.15)) controls the variability within a single subject’s repeated measurements. Change this to be narrower, for example, rand(Uniform(0.95, 1.05)). Rerun the simulation and the mixed model analysis. How does the estimated standard deviation for the Residual (which corresponds to WSV) change?
    3. Based on your experiments, if you were designing a study for a drug known to have very high BSV but low WSV, would you prefer a parallel or a crossover design? Why?
  4. The Impact of Alpha Correction: In the 1RnT example with three formulations, we applied a Bonferroni correction by setting level = 0.95.

    1. Rerun the analysis for AUC from that section without the correction, using the default level = 0.90.
    2. Compare the 90% confidence intervals for the T/R and S/R comparisons with the 95% confidence intervals from the original example. Are they wider or narrower?
    3. Could failing to apply an alpha correction in a multiple-comparison scenario lead to a “Pass” conclusion when a corrected analysis would “Fail”? Explain your reasoning based on the confidence intervals.
  5. Analyzing Different Comparisons: In the section on the four-formulation Williams design (ADBC_BACD_CBDA_DCAB), we compared test D to references A, B, and C.

    1. Using the make_2by2by2_from_4by4by4 helper function and the transformed_data, write the code to perform a bioequivalence analysis comparing formulation A (as reference) to formulation C (as test).
    2. Run pumas_be on the resulting dataset. What is the GMR and 90% confidence interval? Do formulations A and C pass the bioequivalence criteria?
  6. Investigating a Parallel Design: The list of available designs included "R_T", which corresponds to a parallel design.

    1. Find a dataset with this design using the datasets() function from PharmaDatasets.
    2. Load the data and run detect_design. Confirm that it is a parallel design with 1 period and 2 sequences.
    3. Run pumas_be on this dataset. Look at the “Variability” section of the output. What CV is reported? Is it the overall CV, CVᵣ, or CVₜ? Why can’t a parallel design estimate within-subject variability?

References

FDA. 2011. “FDA, Draft Guidance on Progesterone. Recommended Apr 2010; Revised Feb 2011.” 2011. https://www.accessdata.fda.gov/drugsatfda_docs/psg/Progesterone_caps_19781_RC02-11.pdf.
———. 2012. “FDA, Draft Guidance on Warfarin Sodium. Recommended Dec 2012.” 2012. https://www.accessdata.fda.gov/drugsatfda_docs/psg/Warfarin_Sodium_tab_09218_RC12-12.pdf.
———. 2025. “FDA Product-Specific Guidances for Generic Drug Development.” 2025. https://www.accessdata.fda.gov/scripts/cder/psg/index.cfm.
International council for harmonisation of technical requirements for pharmaceuticals for human use, ICH -. 2024. “Bioequivalence for Immediate Release Solid Oral Dosage Forms - M13A - July 2024.” 2024. https://database.ich.org/sites/default/files/ICH_M13A_Step4_Final_Guideline_2024_0723.pdf.
Patterson, Scott D, and Byron Jones. 2017. Bioequivalence and Statistics in Clinical Pharmacology. Chapman; Hall/CRC. https://www.routledge.com/Bioequivalence-and-Statistics-in-Clinical-Pharmacology/Patterson-Jones/p/book/9780367782443.

Reuse