get.min.size {LiftTest} | R Documentation |
This function generates the minimum sample size required to obtain a statistically significant result for a given power. For more details, please refer to the paper Liu et al., (2023).
get.min.size(p1, p2, p_treat, method='relative', power=0.8, alpha=0.05)
p1 |
success probability of the control group |
p2 |
success probability of the treatment group |
p_treat |
the percentage of the treatment group |
method |
two methods are provided: method =
c( |
power |
the power you want to achieve. Industry standard is power = 0.8, which is also the default value |
alpha |
significance level. By default alpha = 0.05 |
The minimum required sample size is approximated by the asymptotic
power function.
Let N = n_1 + n_2
and \kappa = n_1/N
. We define
\sigma_{a,n} = \sqrt{n_1^{-1}p_1(1-p_1) + n_2^{-1}p_2(1-p_2)},
\bar\sigma_{a,n} = \sqrt{(n_1^{-1} + n_2^{-1})\bar p(1-\bar p)}.
where \bar p = \kappa p_1 + (1-\kappa) p_2
. \sigma_{a,n}
is the standard deviation of the absolute lift and
\bar\sigma_{a,n}
can be viewed as the standard deviation of
the combined sample of the control and treatment groups.
Let \delta_a = p_2 - p_1
be the absolute lift.
The asymptotic power function based on the absolute lift is given by
\beta_{Absolute}(\delta_a) \approx \Phi\left( -cz_{\alpha/2} +
\frac{\delta_a}{\sigma_{a,n}} \right) + \Phi\left( -cz_{\alpha/2} -
\frac{\delta_a}{\sigma_{a,n}} \right).
The asymptotic power function based on the relative lift is given by
\beta_{Relative}(\delta_a) \approx \Phi
\left( -cz_{\alpha/2} \frac{p_0}{\bar p} +
\frac{\delta_a}{\sigma_{a,n}} \right) +
\Phi \left( -cz_{\alpha/2} \frac{p_0}{\bar p} -
\frac{\delta_a}{\sigma_{a,n}} \right),
where \Phi(\cdot)
is the CDF of the standard normal distribution N(0,1)
,
z_{\alpha/2}
is the upper (1-\alpha/2)
quantile of N(0,1)
,
and c = {\bar\sigma_{a,n}}/\sigma_{a,n}
.
Given a power (say power=0.80), it is difficult to get a closed form of the
minimum sample size. Note that when \delta_a > 0
, the first term of
the power function dominates the second term, so we can ignore the second
term and derive the closed form for the minimum sample size. Similarly,
when \delta_a < 0
, the second term of the power function dominates
the first term, so we can ignore the first term. In particular, the closed
form for the minimum sample size is given by
N_{Relative} = \left( \frac{p_1(1-p_1)}{\kappa} + \frac{p_2(1-p_2)}{(1-\kappa)} \right) \left( \Phi^{-1}(\beta)p_1/\bar p + cz_{\alpha/2} \right)^2 / \delta_a^2,
N_{Absolute} = \left( \frac{p_1(1-p_1)}{\kappa} + \frac{p_2(1-p_2)}{(1-\kappa)} \right) \left( \Phi^{-1}(\beta) + cz_{\alpha/2} \right)^2 / \delta_a^2.
Return the required minimum sample size. This is the total sample size of control group + treatment group
Wanjun Liu, Xiufan Yu, Jialiang Mao, Xiaoxu Wu, and Justin Dyer. 2023. Quantifying the Effectiveness of Advertising: A Bootstrap Proportion Test for Brand Lift Testing. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23)
p1 <- 0.1; p2 <- 0.2
get.min.size(p1, p2, p_treat=0.5, method='relative', power=0.8, alpha=0.05)