If we would like to test potential non-linear relationship between genotype of a SNP and expression of a gene, we can use un-balanced one-way ANOVA. Actually, an article published by the GTEx Consortium in 2013 used this approach.
Suppose there are \(k=3\) groups of subjects: (1) mutation homozygotes; (2) heterozygotes; and (3) wildtype homozygotes. We would like to test if the mean expression \(\mu_i\), \(i=1, \ldots, k\), of the gene is the same among the \(k\) groups of subjects. We can use the following one-way ANOVA model to characterize the relationship between observed gene expression level \(y_{ij}\) and the population mean expression level \(\mu_i\):
$$y_{ij} = \mu_i + \epsilon_{ij}, \quad \epsilon_{ij} \sim N\left(0, \sigma^2\right),$$
where \(i=1,\ldots, k\), \(j= 1, \ldots, n_i\),
\(y_{ij}\) is the observed gene expression level for the \(j\)-th subject in the \(i\)-th group, \(\mu_i\) is the mean gene expression level of the \(i\)-th group, \(\epsilon_{ij}\) is the random error, \(k\) is the number of groups, \(n_i\) is the number of subjects in the \(i\)-th group.
Denote the total number of subjects as \(N = \sum_{i=1}^{k} n_i\).
That is, we have \(n_1\) mutation homozygotes, \(n_2\) heterozygotes, and \(n_3\) wildtype homozygotes.
We would like to test the null hypothesis \(H_0\) and alternative hypothesis \(H_1\):
$$H_0: \mu_1 = \mu_2 = \mu_3,$$
$$H_1: \mbox{not all means are the same}.$$
According to O'Brien and Muller (1993), the power calculation formula for unbalanced one-way ANOVA is
$$power=Pr\left(\left.F\geq F_{1-\alpha}\left(k-1, N-k\right)\right| F\sim
F_{k-1, N-k, \lambda}\right),$$
where \(k=3\) is the number of groups of subjects, \(N\) is the total number
of subjects, \(F_{1-\alpha}\left(k-1, N-k\right)\) is the
\(100(1-\alpha)\)-th percentile of central F distribution with degrees of freedoms \(k-1\) and \(N-k\), and \(F_{k-1, N-k, \lambda}\) is the non-central F distribution
with degrees of freedoms \(k-1\) and \(N-k\) and non-central parameter (ncp)
\(\lambda\). The ncp \(\lambda\) is equal to
$$\lambda=\frac{N}{\sigma^2}\sum_{i=1}^{k} w_i \left(\mu_i-\mu\right)^2,$$ where \(\mu_i\) is the mean gene expression level
for the \(i\)-th group of subjects, \(w_i\) is the weight for the \(i\)-th group of subjects, \(\sigma^2\) is the variance of the random errors in ANOVA (assuming each group has equal variance), and
\(\mu\) is the weighted mean gene expression level
$$\mu=\sum_{i=1}^{k}w_i \mu_i.$$
The weights \(w_i=n_i/N\) are the sample proportions for the 3 groups of subjects, where \(N=n_1+n_2+n_3\) is the total number of subjects. Hence,
\(\sum_{i=1}^{3}w_i = 1\).
Based on Hardy-Weinberg Equilibrium, we have
\(w_1 = \theta^2\), \(w_2 = 2\theta(1-\theta)\), and \(w_3 = (1-\theta)^2\),
where \(\theta\) is MAF.
Without loss of generality, we set \(\mu_1 = -\delta_1\), \(\mu_2=0\),
and \(\mu_3 = \delta_2\).
We adopted the parameters from the GTEx cohort (see the Power analysis" section of Nature Genetics, 2013; https://www.nature.com/articles/ng.2653), where they modeled the expression data as having a log-normal distribution with a log standard deviation of 0.13 within each genotype class (AA, AB, BB). This level of noise is based on estimates from initial GTEx data. In their power analysis, they assumed the across-genotype difference delta = 0.13 (i.e., equivalent to detecting a log expression change similar to the standard deviation within a single genotype class).