To test if a SNP is associated with a gene probe, we use the simple linear regression
$$y_i = \beta_0+\beta_1 x_i + \epsilon_i,$$
where \(y_i\) is the gene expression level of the \(i\)-th subject,
\(x_i\) is the genotype of the \(i\)-th subject, and
\(\epsilon_i\) is the random error term with mean zero and standard deviation \(\sigma\). Additive coding for genotype is used. That is,
\(x_i=0\) indicates wildtype homozygotes;
\(x_i=1\) indicates heterozygotes; and \(x_i=2\) indicates mutation heterozygotes.
To test if the SNP is associated with the gene probe, we test the null hypothesis \(H_0: \beta_1=0\) versus the alternative hypothesis \(H_1: \beta_1 = \delta\), where \(\delta\neq 0\).
Denote \(\theta\) as the minor allele frequency (MAF) of the SNP. Under Hardy-Weinberg equilibrium, we can calculate the variance of genotype of the SNP:
\(\sigma^2_x=2 \theta (1-\theta)\),
where \(\sigma^2_x\) is the variance of the
predictor (i.e. the SNP) \(x_i\).
The exact power calculation formula can be derived as
$$1-T_{n-2, \lambda}(t_{n-2}(\alpha/2)) + T_{n-2, \lambda}(-t_{n-2}(\alpha/2)),$$
where \(T_{n-2, \lambda}(a)\) is the value at \(a\) of cumulative distribution function of non-central t distribution with \(n-2\) degrees of freedom
and non-centrality parameter \(\lambda=\delta/\sqrt{\sigma^2/[(n-1)\tilde{\sigma}^2_{x}]}\). And \(\tilde{\sigma}^2_{x}=\sum_{i=1}^n(x_i - \bar{x})^2/(n-1)\).
Dupont and Plummer (1998) mentioned the following relationship:
$$\sigma^2 = \sigma^2_y - \beta_1^2 \sigma^2_x.$$
So we can plug in the above equation to the power calculation formula.
Under Hardy-Weinberg equilibrium, we have \(\sigma_x^2=2\theta(1-\theta)\),
where \(\theta\) is the minor allele frequency (MAF).
Hence, the non-centrality parameter can be rewritten as
$$\lambda=\frac{\delta}{\sqrt{
\left(\sigma_y^2 - \delta^2 2\left(1-\hat{\theta}\right)\hat{\theta}\right)/
\left[(n-1)2\left(1-\hat{\theta}\right)\hat{\theta}\right]
}}$$
We adopted the parameters from the GTEx cohort (see the Power analysis" section of Nature Genetics, 2013; https://www.nature.com/articles/ng.2653), where they modeled the expression data as having a log-normal distribution with a log standard deviation of 0.13 within each genotype class (AA, AB, BB). This level of noise is based on estimates from initial GTEx data. In their power analysis, they assumed the across-genotype difference delta = 0.13 (i.e., equivalent to detecting a log expression change similar to the standard deviation within a single genotype class).