This function generates the minimum sample size required to obtain a statistically significant result for a given power. For more details, please refer to the paper Liu et al., (2023).
get.min.size(p1, p2, p_treat, method='relative', power=0.8, alpha=0.05)Return the required minimum sample size. This is the total sample size of control group + treatment group
success probability of the control group
success probability of the treatment group
the percentage of the treatment group
two methods are provided: method = c(\(\texttt{`relative'}\), \(\texttt{`absolute'}\)). \(\texttt{`relative'}\) means min sample size based on the relative lift. \(\texttt{`absolute'}\) means min sample size based on the absolute lift.
the power you want to achieve. Industry standard is power = 0.8, which is also the default value
significance level. By default alpha = 0.05
The minimum required sample size is approximated by the asymptotic power function. Let \(N = n_1 + n_2\) and \(\kappa = n_1/N\). We define $$ \sigma_{a,n} = \sqrt{n_1^{-1}p_1(1-p_1) + n_2^{-1}p_2(1-p_2)}, $$ $$ \bar\sigma_{a,n} = \sqrt{(n_1^{-1} + n_2^{-1})\bar p(1-\bar p)}. $$ where \(\bar p = \kappa p_1 + (1-\kappa) p_2\). \(\sigma_{a,n}\) is the standard deviation of the absolute lift and \(\bar\sigma_{a,n}\) can be viewed as the standard deviation of the combined sample of the control and treatment groups. Let \(\delta_a = p_2 - p_1\) be the absolute lift. The asymptotic power function based on the absolute lift is given by $$ \beta_{Absolute}(\delta_a) \approx \Phi\left( -cz_{\alpha/2} + \frac{\delta_a}{\sigma_{a,n}} \right) + \Phi\left( -cz_{\alpha/2} - \frac{\delta_a}{\sigma_{a,n}} \right). $$ The asymptotic power function based on the relative lift is given by $$ \beta_{Relative}(\delta_a) \approx \Phi \left( -cz_{\alpha/2} \frac{p_0}{\bar p} + \frac{\delta_a}{\sigma_{a,n}} \right) + \Phi \left( -cz_{\alpha/2} \frac{p_0}{\bar p} - \frac{\delta_a}{\sigma_{a,n}} \right), $$
where \(\Phi(\cdot)\) is the CDF of the standard normal distribution \(N(0,1)\), \(z_{\alpha/2}\) is the upper \((1-\alpha/2)\) quantile of \(N(0,1)\), and \(c = {\bar\sigma_{a,n}}/\sigma_{a,n}\).
Given a power (say power=0.80), it is difficult to get a closed form of the minimum sample size. Note that when \(\delta_a > 0\), the first term of the power function dominates the second term, so we can ignore the second term and derive the closed form for the minimum sample size. Similarly, when \(\delta_a < 0\), the second term of the power function dominates the first term, so we can ignore the first term. In particular, the closed form for the minimum sample size is given by
$$ N_{Relative} = \left( \frac{p_1(1-p_1)}{\kappa} + \frac{p_2(1-p_2)}{(1-\kappa)} \right) \left( \Phi^{-1}(\beta)p_1/\bar p + cz_{\alpha/2} \right)^2 / \delta_a^2, $$ $$ N_{Absolute} = \left( \frac{p_1(1-p_1)}{\kappa} + \frac{p_2(1-p_2)}{(1-\kappa)} \right) \left( \Phi^{-1}(\beta) + cz_{\alpha/2} \right)^2 / \delta_a^2. $$
Wanjun Liu, Xiufan Yu, Jialiang Mao, Xiaoxu Wu, and Justin Dyer. 2023. Quantifying the Effectiveness of Advertising: A Bootstrap Proportion Test for Brand Lift Testing. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23)
p1 <- 0.1; p2 <- 0.2
get.min.size(p1, p2, p_treat=0.5, method='relative', power=0.8, alpha=0.05)
Run the code above in your browser using DataLab