using Pumas # available with Pumas products
using Bioequivalence # available with Pumas products
using BioequivalencePower # available with Pumas products
using PharmaDatasets # available with Pumas products
using DataFrames
using DataFramesMeta
using StatsBase
using SummaryTables
using CairoMakie

A Course on Bioequivalence: Unit 8 - Basics of Power Analysis
This is Unit 8 of a complete bioequivalence analysis course using Pumas. There are 15 units in the course in total.
1 Unit overview
In this unit we understand the basics of power analysis. This extends an introduction to power analysis from Unit3.
Planning of bioequivalence trials is intimately related to statistical power. Loosely, the power is the probability of succeeding with the trial. More formally, it is the probability of rejecting \(H_0\) when the actual reality is within \(H_1\).
The most important levers that we have when planning a trial are:
- The number of subjects in the trial.
- The design.
With power analysis we see how to vary these choices so as to achieve a desired level of power, all under certain assumptions about the underlying attributes of the reference and test product.
We use the following packages in this unit.
We have used all of these packages in previous units. The key package in the current unit is BioequivalencePower
. You may also consult the documentation for this package for further insights.
2 Revisiting the basics
Recall from Unit 3 that when we carry out bioequivalence analysis, the type of hypothesis test that we are implementing is the following:
\[ {\cal T_{\text{TOST}}} = \begin{cases} H_{0}:& \mu_T - \mu_R \le \delta_- \qquad \text{or} \qquad \delta_+ \le \mu_T - \mu_R \\[5pt] H_{1}:& \qquad \qquad \delta_- < \mu_T - \mu_R < \delta_+\\ \end{cases} \]
Here \(H_0\) implies not being bioequivalent and \(H_1\) implies being bioequivalent.
Typical values for the (non-statistical) limits are \(\delta_- = \log(0.8) \approx -0.22314\) and \(\delta_- = \log(1.25) \approx 0.22314\).
The hypothesis test is about the population parameters \(\mu_R\) and \(\mu_T\), their difference, or equivalently about the geometric mean ratio
\[ \text{GMR} = e^{\mu_T - \mu_R}. \]
In Unit 2 we saw the GMR as,
\[ \text{GMR} = \frac{\text{geomean}(x_1^T,\ldots,x_{n_T}^T)}{\text{geomean}(x_1^R,\ldots,x^R_{n_R})} = e^{\text{mean}(\log(x_1^T),\ldots,\log(x_{n_T}^T)) ~-~ \text{mean}(\log(x_1^R),\ldots,\log(x^R_{n_R}))}. \]
Yet this form presents an estimator of the GMR based on the data. Whereas, when we carry power analysis, we typically do not use data, but rather consider the theoretical population quantity \(e^{\mu_T - \mu_R}\) or the difference \(\mu_T - \mu_R\).
We should also realize that while the hypothesis test \({\cal T_{\text{TOST}}}\) is not about the CV, the variability (sometimes on its several forms), is another related population quantity that has an effect.
Alpha, Beta, and Power
Recall from Unit 3 that the decision framework has Type-I errors and Type-II errors as follows:
Decision: Retain \(H_0\) | Decision: Reject \(H_0\) | |
---|---|---|
Reality: \(H_0\) | ✓ (True Negative) | Type I Error (False Positive) (erroneous rejection of \(H_0\)) |
Reality: \(H_1\) | Type II Error (False Negative) (erroneously failing to reject \(H_0\)) |
✓ (True Positive) |
Type-I errors capture the consumers’ risk, while the Type-II error is the producer’s risk.
In particular \(\alpha\) is defined as the maximal probability of Type-I error.
Let us also define \(\beta\) as the probability of Type-II error and its complement is the power.
Hence in summary:
\[ \begin{align*} \alpha &\ge {\mathbb P}(\text{Type I error}) \\ \beta &= {\mathbb P}(\text{Type II error}) \\ \text{power} &= 1 - \beta \end{align*} \]
In general our goal is to plan a study and a statistical study protocol such:
- For sure \(\alpha = 0.05\) (consumer’s are guaranteed to be protected), even if our assumptions about certain parameters such as the GMR and CV were off.
- The power is at the order of \(0.8\) (or \(\beta = 0.2\)), based on assumptions about the GMR and CV.
Consumers are the priority by default. We then need to make sure producers have a good chance of success.
A simple hypothesis test example
To understand the interplay of \(\alpha\) and \(\beta\), let us momentarily digress from the complex \({\cal T}_{\text{TOST}}\) hypothesis and consider a simpler case where we only need to decide about the mean of a normal distribution.
In this very simple case, assume the standard deviation is known at \(\sigma = 2\).
In particular, take this is our simple hypothesis formulation:
\[ {\cal T_{\text{simple}}} = \begin{cases} H_{0}:& \mu = 15 \\[5pt] H_{1}:& \mu = 20\\ \end{cases} \]
While we only present this formulation for pedagogic purposes, we can also link it to a hypothetical example. For example, ssume we know that molar dose concentrations from two batch productions vary and are either with a mean of \(15\) for batch \(0\) and a mean of \(20\) for a batch \(1\). Assume further that we take a single physical pill and using some measurement we need to determine if we are in \(H_0\) or \(H_1\).
We use a threshold \(\tau\) and our rule is that if the measured quantity is greater than \(\tau\) we determine \(H_1\), and otherwise we determine \(H_0\).
In this example assume \(\tau = 18.29\) (specifically set so that \(\alpha = 0.05\)).
For such a threshold we can compute \(\alpha\) and \(\beta\) using the cdf
and ccdf
functions from the Distributions
package. The cdf
function standing for “cumulative distribution function” just tells us the probability that a quantity is less than or equal to a value. The ccdf
with the extra c
meaning “complementary”, is one minus that quantity and thus implies the probability of being greater than a value.
= 15, 20, 2
μ₀, μ₁, σ = Normal(μ₀, σ), Normal(μ₁, σ)
dist₀, dist₁ = round(quantile(dist₀, 0.95), digits = 3)
τ = round(ccdf(dist₀, τ), digits = 3)
α = round(cdf(dist₁, τ), digits = 3)
β println("τ: ", τ)
println("Probability of Type I error: ", α)
println("Probability of Type II error: ", β)
println("Power: ", 1 - β)
τ: 18.29
Probability of Type I error: 0.05
Probability of Type II error: 0.196
Power: 0.804
As you can see, these values are aligned with the typical target values of \(\alpha = 0.05\) exactly, and the power being at about \(80\%\).
The nice thing about such a simplistic hypothesis is that it is easy to visualize. Here it is:
= 5:0.1:25
grid = lines(grid, pdf.(dist₀, grid), color = :blue)
fig, ax = τ:0.1:25, 5:0.1:τ+0.1
h₀grid, h₁grid = collect(h₀grid)
fillx = pdf.(dist₀, fillx)
filly poly!(
ax,reverse(fillx)],
[fillx; zeros(length(filly)); reverse(filly)],
[= (:blue, 0.25),
color = 0,
strokewidth
)
lines!(ax, grid, pdf.(dist₁, grid), color = :green)
= collect(h₁grid)
fillx2 = pdf.(dist₁, fillx2)
filly2 poly!(
ax,reverse(fillx2)],
[fillx2; zeros(length(filly2)); reverse(filly2)],
[= (:green, 0.25),
color = 0,
strokewidth
)
lines!(ax, [τ, τ], [0, 0.25], color = :red, linewidth = 3)
xlims!(ax, 5, 25)
ylims!(ax, 0, 0.25)
text!(ax, L"\beta", position = (17, 0.015), align = (:center, :center), fontsize = 20)
text!(ax, L"\alpha", position = (19, 0.015), align = (:center, :center), fontsize = 20)
text!(ax, L"\tau", position = (τ + 0.25, 0.24), align = (:center, :center), fontsize = 20)
text!(ax, L"H_0", position = (μ₀, 0.21), align = (:center, :center), fontsize = 20)
text!(ax, L"H_1", position = (μ₁, 0.21), align = (:center, :center), fontsize = 20)
= "x"
ax.xlabel = "Probability density"
ax.ylabel fig
As you can see the area to the right of the green zone, under \(H_1\) is the power.
With this consider what would happen if the value of \(\mu\) under \(H_1\) would change, for example to \(\mu = 21\), or to \(\mu = 19\). What would this do to the power? How would it change from \(80.4\%\)? Let’s calculate it:
= 19
μ₁ = Normal(μ₁, σ)
dist₁ = round(cdf(dist₁, τ), digits = 3)
β println("Probability of Type II error: ", β)
println("Power: ", 1 - β)
Probability of Type II error: 0.361
Power: 0.639
= 21
μ₁ = Normal(μ₁, σ)
dist₁ = round(cdf(dist₁, τ), digits = 3)
β println("Probability of Type II error: ", β)
println("Power: ", 1 - β)
Probability of Type II error: 0.088
Power: 0.912
Understanding a power curve
Such modifications of the actual \(H_1\) scenario (\(\mu\) above) help create power curves. In general, in the bioequivalence context, we can modify other parameters and settings such as:
- CV
- GMR
- \(\alpha\)
- Sample size
- Design
A power curve is a visual representation of the power as a function of such parameters, assumptions, and settings.
Still, out of the bioequvilance context, let’s see this for the simple hypothesis case above where we vary \(\mu_1\) and also consider various alternative values of \(\sigma\).
\[ {\cal T_{\text{simple}}} = \begin{cases} H_{0}:& \mu = 15 \\[5pt] H_{1}:& \mu = \mu_1\\ \end{cases} \]
Here we calculate and plot the curve:
function simple_power(μ₁, σ; μ₀ = 15, α = 0.05)
= Normal(μ₀, σ), Normal(μ₁, σ)
dist₀, dist₁ = round(quantile(dist₀, 0.95), digits = 3)
τ = round(ccdf(dist₀, τ), digits = 3)
α = round(cdf(dist₁, τ), digits = 3)
β return 1 - β
end
= 15:0.1:25
μ₁_range
= [simple_power(μ₁, 2) for μ₁ in μ₁_range]
power_curve
= Figure()
fig = Axis(
ax 1, 1],
fig[= "μ₁",
xlabel = "Power",
ylabel = "Power Curve (σ = 2, μ₀ = 15, α = 0.05)",
title = 15:25,
xticks = 0:0.1:1,
yticks
)lines!(ax, μ₁_range, power_curve, color = :blue, label = "Power")
xlims!(ax, 15, 25)
ylims!(ax, 0, 1)
fig
Observe that here the power will depend on \(\mu_1\). For example, in an extreme case where \(\mu_1=15\) (as in \(H_0\)), the power will be \(\alpha\). Then as \(\mu_1\) is farther to the right from \(15\), the power increases. In the quite extreme case of \(\mu_1 = 25\) there is hardly any overlap between the distribution of \(H_0\) and \(H_1\) and thus the power is almost \(100\%\).
Now say we have a few finite scenarios for \(\sigma\), e.g. \(0.75\), \(2\) (as before), or \(3.5\). Then we can compute the power for each of these cases, where in this case we still present it as a function of \(\mu_1\) on the plot.
= [0.75, 2.0, 3.5]
σ_levels = [simple_power(μ₁, σ_levels[1]) for μ₁ in μ₁_range]
power_curve_low_var_ = [simple_power(μ₁, σ_levels[2]) for μ₁ in μ₁_range]
power_curve_mid_var_ = [simple_power(μ₁, σ_levels[3]) for μ₁ in μ₁_range]
power_curve_high_var_
= Figure()
fig = Axis(
ax 1, 1],
fig[= "μ₁",
xlabel = "Power",
ylabel = "Power Curve (various σ values, μ₀ = 15, α = 0.05)",
title = 15:25,
xticks = 0:0.1:1,
yticks
)lines!(ax, μ₁_range, power_curve_low_var_, color = :green, label = "σ = $(σ_levels[1])")
lines!(ax, μ₁_range, power_curve_mid_var_, color = :blue, label = "σ = $(σ_levels[2])")
lines!(ax, μ₁_range, power_curve_high_var_, color = :red, label = "σ = $(σ_levels[3])")
axislegend(ax, position = :rb)
xlims!(ax, 15, 25)
ylims!(ax, 0, 1)
fig
It is obvious that lower \(\sigma\) (less variability) increases the power in this case.
Such analysis can go on and on. When applied to more realistic hypothesis tests, such as for example the T-test, a critical quantity that affects power is the sample size. In classical statistical books you will find power curves for such tests, and without computers such curves were often used to help choose sample sizes. We now seen how this approach works for bioequivalence.
You may consider further aspects of power analysis and general hypothesis tests in the context of the Julia language in Chapter 7 of Nazarathy and Klok (2021).
3 Back to the bioequivalence case
In the Bioequivalence analysis case, similar analysis holds, the difference is that the \({\cal T_{\text{TOST}}}\) hypothesis formulation is more complicated and hence the underlying power curves are more complicated. For this we rely on more complex computations under the hood, sometimes using analytical computations, sometimes using numerical computations, and sometimes using Monte Carlo simulations.
All these cases are captured with the BioequivalencePower
package.
A pilot study
As a simple example assume we are constrained (e.g. by product specific guidances) to carry a \(2 \times 2\) crossover study for a new generic product T
, which we want to compare to R
.
Assume that before embarking on the complete study, we carry out a small pilot study which yields this AUC data:
= DataFrame(
pilot_study = [1, 1, 2, 2, 3, 3, 4, 4, 10, 10, 11, 11, 12, 12, 13, 13, 14],
id = vcat(fill("TR", 8), fill("RT", 9)),
sequence = [1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1],
period = vcat(
AUC 19.2, missing, 19.1, 15.9, 24.8, 14.9, 15.5, 15.3],
[15.0, 15.2, 14.9, missing, 15.3, 15.2, 15.1, 15.3, 15.2],
[
),
)simple_table(pilot_study)
id | sequence | period | AUC |
1 | TR | 1 | 19.2 |
1 | TR | 2 | missing |
2 | TR | 1 | 19.1 |
2 | TR | 2 | 15.9 |
3 | TR | 1 | 24.8 |
3 | TR | 2 | 14.9 |
4 | TR | 1 | 15.5 |
4 | TR | 2 | 15.3 |
10 | RT | 1 | 15 |
10 | RT | 2 | 15.2 |
11 | RT | 1 | 14.9 |
11 | RT | 2 | missing |
12 | RT | 1 | 15.3 |
12 | RT | 2 | 15.2 |
13 | RT | 1 | 15.1 |
13 | RT | 2 | 15.3 |
14 | RT | 1 | 15.2 |
In this case assume we had \(8\) subjects in each sequence (RT
and TR
) but due to complications, there were multiple dropouts so only 3 subjects completed both periods on each of the sequences.
When we run pumas_be
on the data we get the following:
pumas_be(pilot_study)
Observation Counts | ||
Sequence ╲ Period | 1 | 2 |
RT | 5 | 3 |
TR | 4 | 3 |
Paradigm: Non replicated crossover design | ||||
Model: Linear model | ||||
Criteria: Standard ABE | ||||
Endpoint: AUC | ||||
Formulations: Reference(R), Test(T) | ||||
Results(AUC) | Assessment | Criteria | ||
R | Geometric Marginal Mean | 15.21 | ||
Geometric Naive Mean | 15.2 | |||
T | Geometric Marginal Mean | 17.16 | ||
Geometric Naive Mean | 17.48 | |||
Geometric Mean T/R Ratio (%) | 112.9 | |||
Degrees of Freedom | 4 | |||
90% Confidence Interval (%) | [96.61, 131.8] | Fail | CI ⊆ [80, 125] | |
Variability | CV (%) | σ̂ | 12.68 | 0.1263 | ||
ANOVA | Formulation (p-value) | 0.1724 | ||
Sequence (p-value) | 0.1528 | |||
Period (p-value) | 0.1919 | |||
The fact that this pilot study fails BE (the GMR confidence interval as a percentage is [96.61, 131.8]
) is not the key point.
The more important point of value from the pilot study is to learn that the GMR point estimate is at \(112.9\%\) and that the CV estimate is at \(12.68\%\). While these are clearly estimates and do not reflect the exact population values, we get an indication for what the kind of GMR and the kind of CV to expect.
Such preliminary estimates can also be obtained from other historical studies of the approved drug, and or similar approvals in other jurisdictions, etc.
Planning the pivotal study
Importantly we now want to plan a \(2\times 2\) trial that will have a low enough producers’s riks (no more than \(20%\)) - and as always keep the consumer’s risk at no more than \(5\%\).
One approach for this is to consider slightly more conservative estimates of the measured GMR and CV from the pilot study. For example we can take the GMR at \(115\%\) and the CV at \(15\%\).
Now with these assumptions let us rely on one of the power_analysis
function from the BioequivalencePower
. In specifying RatioTOST, ED2x2Crossover
as the first two arguments to power_analysis
, we are indicating that the study is a crossover design and that the input parameters are with respect to GMR’s and CV’s as opposed to differences of means and standard deviations.
Importantly, the specification CV = 0.15
and peθ₀ = 1.15
, indicate our assumptions about the CV and the GMR. In particular, notice that this 1.15
value which log transforms to 0.1398, lies within \(H_1\).
Note that this format of power_analysis
is motivated by the popular PowerTOST R package package. In fact, Pumas’s BioequivalencePower
provides feature parity with PowerTOST as well as a few more options.
power_analysis(RatioTOST, ED2x2Crossover, CV = 0.15, peθ₀ = 1.15)
Note that we see θ₀ = 114%
instead of θ₀ = 115%
due to rounding issues.
Let us outline the essence of this power_analysis
display.
- We see that the analysis was for a target power of \(80\%\) with
Target = 80%
. - A key result of the analysis is choosing
n=42
subjects (this means21
in each sequence). This choice of subject will yield a power of \(81\%\) as you can see viaEstimated = 81%
. This target power can be updated as an input setting. - The minimum acceptable power is set (by default) at \(70\%\). This quantity is used by sensitivity analysis with respect to the GMR and the CV to check how far those quantities can deviate.
- On the top left display we see part of a power curve as a function of the CV. The curve starts at the input CV of
15%
where the power is81%
. Yet we see that if the actual CV is greater the power decreases. This is plotted up until the power hits the minimum accepted power of70%
, at a CV of17.4%
. We are also presented with the fact that this increase of the CV as a percentage is a+16.3%
increase. So it means we have a “margin” of error on our assumed CV of about16%
. - On the top right display we see a similar analysis for the GMR. Here we see that we can have a GMR of up to
116.3%
if we wish to stay above the minimum acceptable power. - Finally the bottom left display has a discrete plot as it deals with the sample size (as steps of 2 because there are two sequences in this design). We see that we will stay above the minimum acceptable power if we have up
10
dropouts, reducing the number of subjects from42
to32
.
This type of analysis then allows us to get a bit more certainty about our trial planning. It combines power analysis and some robustness analysis on our parameter assumptions.
Such analysis is not the only approach. A different approach is a Bayesian approach where we assume some distribution on GMR and CV. See the BioequivalencePower
docs as a starting point. You may also consider Lv et al. (2023).
Still sticking with this robustness analysis, as another example say we now have a target power of \(90\%\) and we certainly don’t want to go below \(80\%\). With this we can use the power_analysis
function using a slightly more complex interface:
power_analysis(
RatioTOST{ED2x2Crossover}(CV = 0.15, peθ₀ = 1.15),
HypothesisTestMetaParams(target_power = 0.9);
= PowerAnalysisConstants(min_power = 0.8),
pac )
You can see that now \(58\) subjects are recommended and we can sustain dropouts down to \(42\) subjects, as consistent with the computation above.
4 More features of the BioequivalencePower
package
The two main functions of the package that we use are power
and samplesize
.
power
- takes in parameters and details of a study including the number of subjects and the assumptions on the point in \(H_1\) where we lie, and determines the power we shall obtain.samplesize
- helps determine the sample size we need to achieve a target power.
Here are some examples that illustrate what can be done. The implication of each case should be self-evident.
Basic samplesize and power
Here is a sample size calculation for a parallel trial with CV at \(30\%\) and the GMR at \(93\%\).
samplesize(RatioTOST, ED2Parallel, CV = 0.3, peθ₀ = 0.93)
Determined sample size: n = 96 achieved power = 80.171%
HypothesisTestMetaParams Settings: α = 0.05 target_power = 0.8
Study: BioequivalencePower.RatioTOST{BioequivalencePower.ED2Parallel} CV = 0.3 θ₀ = 0.93 [θ₁, θ₂] = [0.8, 1.25]
Since the suggestion is n = 96
subjects, let us verify that the power
is indeed at that level with this number of subjects:
power(RatioTOST, ED2Parallel, n = 96, CV = 0.3, peθ₀ = 0.93)
Computed achieved power: 80.171%
HypothesisTestMetaParams Settings: n = 96 α = 0.05
Study: BioequivalencePower.RatioTOST{BioequivalencePower.ED2Parallel} CV = 0.3 θ₀ = 0.93 [θ₁, θ₂] = [0.8, 1.25]
And if we were to have one less in each treatment, hence n = 94
, observe the power drops below \(80\%\):
power(RatioTOST, ED2Parallel, n = 94, CV = 0.3, peθ₀ = 0.93)
Computed achieved power: 79.414%
HypothesisTestMetaParams Settings: n = 94 α = 0.05
Study: BioequivalencePower.RatioTOST{BioequivalencePower.ED2Parallel} CV = 0.3 θ₀ = 0.93 [θ₁, θ₂] = [0.8, 1.25]
Power for unbalanced design
While sample size calculations always give us numbers for balanced designs (equal number of subjects per sequence), we can query the power in unbalanced cases (as we would sometimes get with dropouts). For example:
power(RatioTOST, ED2Parallel, n = [52, 44], CV = 0.3, peθ₀ = 0.93)
Computed achieved power: 79.925%
HypothesisTestMetaParams Settings: n = [52, 44] α = 0.05
Study: BioequivalencePower.RatioTOST{BioequivalencePower.ED2Parallel} CV = 0.3 θ₀ = 0.93 [θ₁, θ₂] = [0.8, 1.25]
Seeing the samplesize search
By using verbose = true
, we can see the steps that the samplesize
function takes to find a suggested sample size. In this case, let us consider a more complicated design (here the first \(2\) stands for the number of formulations, the next \(4\) for the number of sequences, and the next \(4\) for the number of periods):
samplesize(
RatioTOST,
ED2x4x4ReplicateCrossover,= 0.85,
peθ₀ = 0.35,
CV = 0.75,
target_power = true,
verbose )
[ Info: n = 88: 0.5094811021600534 [ Info: n = 92: 0.524505955896785 [ Info: n = 96: 0.5391723757631878 [ Info: n = 104: 0.5674425265970613 [ Info: n = 116: 0.6072452349329205 [ Info: n = 140: 0.6779087111438776 [ Info: n = 180: 0.7718052680276569 [ Info: n = 160: 0.7283330721661994 [ Info: Note: There is a non-uniform split: [43, 43, 42, 42] [ Info: n = 170: 0.7508422488260919 [ Info: Note: There is a non-uniform split: [42, 41, 41, 41] [ Info: n = 165: 0.7397844130016891 [ Info: Note: There is a non-uniform split: [42, 42, 42, 41] [ Info: n = 167: 0.7442620616723767 [ Info: n = 168: 0.7465159421565855 [ Info: Note: There is a non-uniform split: [43, 42, 42, 42] [ Info: n = 169: 0.7486745735975423
Determined sample size: n = 172 achieved power = 75.52%
HypothesisTestMetaParams Settings: α = 0.05 target_power = 0.75
Study: BioequivalencePower.RatioTOST{BioequivalencePower.ED2x4x4ReplicateCrossover} CV = 0.35 θ₀ = 0.85 [θ₁, θ₂] = [0.8, 1.25]
You can see that at some points during the search, the number of subjects n
is not divisible by 4 and hence there is a message of non-uniform split
. Then Pumas adjusts the search until finding n = 172
which is the lowest multiple of 4 with power greater than \(75\%\).
More on the package
Finally this function provides an overview of all most of the features you can find in the BioequivalencePower
package. Consult the docs for further information.
supported_designs_and_studies()
(designs = 15×8 DataFrame Row │ DesignType DesignString NumTreatm ⋯ │ DataType String Union… ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ ED2Parallel parallel 2 ⋯ 2 │ ED2x2Crossover 2x2 2 3 │ ED3x3Crossover 3x3 3 4 │ ED3x6x3Crossover 3x6x3 3 5 │ ED4x4Crossover 4x4 4 ⋯ 6 │ ED2x2x3ReplicateCrossover 2x2x3 2 7 │ ED2x2x4ReplicateCrossover 2x2x4 2 8 │ ED2x4x4ReplicateCrossover 2x4x4 2 9 │ ED2x3x3PartialReplicate 2x3x3 2 ⋯ 10 │ ED2x4x2Balaam 2x4x2 2 11 │ ED2x2x2RepeatedCrossover 2x2x2r 2 12 │ EDPaired paired 1 13 │ EDGeneralCrossover A general design used for dose p… ⋯ 14 │ EDGeneralParallel A general design used for dose p… 15 │ EDGeneral A general design used for dose p… 6 columns omitted, studies = 15×16 DataFrame Row │ StudyType ED2Parallel ED2x2Crossover ED3x3Cr ⋯ │ Any String String String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ DifferenceTOST num|mc num|mc num|mc ⋯ 2 │ RatioTOST num|mc num|mc|slmc num|mc 3 │ TwoDifferenceTOSTs mc mc mc 4 │ TwoRatioTOSTs mc mc mc 5 │ FiducialRatioTOST mc mc - ⋯ 6 │ NonInferiorityDifferenceOST num num num 7 │ NonInferiorityRatioOST num num num 8 │ ExpandingLimitsRatioTOST - - - 9 │ ReferenceScaledRatioTOST - - - ⋯ 10 │ NarrowTherapeuticIndexDrugRatioT… - - - 11 │ DoseProportionalityStudy - - - 12 │ (BayesianWrapper, DifferenceTOST) nume nume nume 13 │ (BayesianWrapper, RatioTOST) nume nume nume ⋯ 14 │ (BayesianWrapper, NonInferiority… nume nume nume 15 │ (BayesianWrapper, NonInferiority… nume nume nume 13 columns omitted, legend = 5×2 DataFrame Row │ Code Meaning │ String String ─────┼─────────────────────────────────── 1 │ num Numerical 2 │ anum Approximate Numerical 3 │ nume Numerical Expectation 4 │ mc Monte Carlo 5 │ slmc Subject Level Monte Carlo, methods = 15×8 DataFrame Row │ StudyType power samplesize pvalue confint ⋯ │ Any String String String String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ DifferenceTOST yes yes yes yes ⋯ 2 │ RatioTOST yes yes yes yes 3 │ TwoDifferenceTOSTs yes yes 4 │ TwoRatioTOSTs yes yes 5 │ FiducialRatioTOST yes yes yes ⋯ 6 │ NonInferiorityDifferenceOST yes yes 7 │ NonInferiorityRatioOST yes yes 8 │ ExpandingLimitsRatioTOST yes yes 9 │ ReferenceScaledRatioTOST yes yes ⋯ 10 │ NarrowTherapeuticIndexDrugRatioT… yes yes 11 │ DoseProportionalityStudy yes yes 12 │ (BayesianWrapper, DifferenceTOST) yes yes 13 │ (BayesianWrapper, RatioTOST) yes yes ⋯ 14 │ (BayesianWrapper, NonInferiority… yes yes 15 │ (BayesianWrapper, NonInferiority… yes yes 3 columns omitted,)
As you can see, there are many features and variations with the package. Consult the docs for further information.
5 Conclusion
In this unit, we delved into the fundamentals of power analysis, a critical step in planning any bioequivalence trial. We established that power is the probability of correctly concluding bioequivalence when the test and reference products are indeed equivalent (i.e., rejecting H₀ when H₁ is true). The primary goal of power analysis is to determine the necessary sample size to achieve a desired level of power, typically 80% or 90%, given a set of assumptions.
The key levers at our disposal are the study design and the number of subjects. We saw how these, combined with our assumptions about the true geometric mean ratio (GMR) and coefficient of variation (CV), dictate the expected power of a study. These assumptions are often derived from pilot studies or existing literature.
The BioequivalencePower
package provides a powerful and user-friendly toolkit for these tasks. We explored how the samplesize
function directly computes the required number of subjects, the power
function calculates the power for a given study setup, and the comprehensive power_analysis
function not only determines the sample size but also provides a valuable sensitivity analysis. This sensitivity analysis shows how robust our study plan is to deviations from our initial GMR and CV assumptions, giving us a crucial margin of safety.
Ultimately, a well-executed power analysis ensures that a study is designed efficiently—large enough to have a high probability of success, but not wastefully large—while always maintaining the strict 5% limit on consumer risk (Type I error).
We also mention that many blog posts on BEBAC (2025) can be a highly valuable resource for understanding power analysis for bioequivalence trials.
6 Unit exercises
Sample Size for Different Designs
A pharmaceutical company is planning a study for a new drug formulation. Based on a pilot study, they assume a GMR of 95% (
peθ₀ = 0.95
) and an intra-subject CV of 20% (CV = 0.20
). They want to achieve at least 80% power.- If they plan to use a standard 2x2 crossover design (
ED2x2Crossover
), how many subjects will they need in total? Use thesamplesize
function. - Due to concerns about potential carryover effects, they are also considering a parallel design (
ED2Parallel
). How many subjects would be needed for this design with the same assumptions? - Briefly explain the difference in the required sample sizes between the crossover and parallel designs.
- If they plan to use a standard 2x2 crossover design (
The Sensitivity of Power
Imagine a 2x2 crossover study has been planned with 36 subjects. The planners assumed a GMR of 1.05 and a CV of 25%.
- Using the
power
function, calculate the expected power for this study under the planned assumptions. - After the study is complete, the analysis reveals that the true CV was actually closer to 30%. What would the power have been with this higher variability?
- In another scenario, what if the CV was indeed 25%, but the true GMR was 1.10 (further from 1.00 than assumed)? Would the power be higher or lower than the original calculation in (a)? Calculate it to confirm your intuition.
- What if the GMR was 1.02 (closer to 1.00 than assumed)? Calculate the power in this case. This exercise demonstrates how power is a function of the true (but unknown) parameters.
- Using the
Interpreting a Full Power Analysis Report
Run the following command to generate a power analysis report:
power_analysis(RatioTOST, ED2x2Crossover, CV = 0.18, peθ₀ = 1.08, target_power = 0.90)
Based on the output, answer the following questions:
- What is the recommended total sample size to achieve the target power of 90%?
- The sensitivity analysis for the CV shows a “Crit. val”. What does this value represent in the context of the study’s power?
- If the true GMR turned out to be 1.10, would the study still have power greater than the minimum acceptable level of 70%? (Hint: Look at the GMR sensitivity plot).
- According to the “Sensitivity to n” plot, what is the smallest number of subjects the study could have (due to dropouts) and still maintain the minimum acceptable power of 70%?
The “Cost” of Power
A sponsor is planning a parallel design study and, based on previous data, is confident that the GMR is 0.97 and the CV is 0.30. They are debating how much power to aim for.
- Use the
samplesize
function to determine the required sample size for 80% power. - Now, calculate the required sample size for 90% power.
- Finally, calculate the required sample size for 95% power.
- Compare the increase in the number of subjects required to go from 80% to 90% power, versus the increase required to go from 90% to 95% power. What does this tell you about the “cost” of achieving very high levels of power in terms of sample size?
- Use the