powerEQTL.ANOVA: Power Calculation for EQTL Analysis Based on Un-Balanced One-Way ANOVA

Description

Power calculation for eQTL analysis that tests if a SNP is associated to a gene probe by using un-balanced one-way ANOVA. This function can be used to calculate one of the 3 parameters (power, sample size, and minimum allowable MAF) by setting the corresponding parameter as NULL and providing values for the other 2 parameters.

Usage

powerEQTL.ANOVA(MAF,
                deltaVec=c(-0.13, 0.13),
                n=200,
                power = NULL,
                sigma = 0.13,
                FWER = 0.05,
                nTests = 200000,
                n.lower = 4,
                n.upper = 1e+30)

Arguments

MAF

numeric. Minor allele frequency.

deltaVec

numeric. A vector having 2 elements. The first element is equal to $\mu_2-\mu_1$ and the second element is equal to $\mu_3-\mu_2$, where $\mu_1$ is the mean gene expression level for the mutation homozygotes, $\mu_2$ is the mean gene expression level for the heterozygotes, and $\mu_3$ is the mean gene expression level for the wild-type gene expression level.

integer. Total number of subjects.

power

numeric. Power for testing if 3 genotypes have the same mean gene expression levels.

sigma

numeric. Standard deviation of the random error.

FWER

numeric. Family-wise Type I error rate.

nTests

integer. Number of tests (i.e., number of all (SNP, gene) pairs) in eQTL analysis.

n.lower

numeric. Lower bound of the total number of subjects. Only used when calculating total number of subjects.

n.upper

numeric. Upper bound of the total number of subjects. Only used when calculating total number of subjects.

Value

power if the input parameter power = NULL.

sample size (total number of subjects) if the input parameter n = NULL;

minimum detectable slope if the input parameter slope = NULL.

Details

If we would like to test potential non-linear relationship between genotype of a SNP and expression of a gene, we can use un-balanced one-way ANOVA. Actually, an article published by the GTEx Consortium in 2013 used this approach.

Suppose there are $k=3$ groups of subjects: (1) mutation homozygotes; (2) heterozygotes; and (3) wildtype homozygotes. We would like to test if the mean expression $\mu_i$, $i=1, \ldots, k$, of the gene is the same among the $k$ groups of subjects. We can use the following one-way ANOVA model to characterize the relationship between observed gene expression level $y_{ij}$ and the population mean expression level $\mu_i$: $$y_{ij} = \mu_i + \epsilon_{ij}, \quad \epsilon_{ij} \sim N\left(0, \sigma^2\right),$$ where $i=1,\ldots, k$, $j= 1, \ldots, n_i$, $y_{ij}$ is the observed gene expression level for the $j$-th subject in the $i$-th group, $\mu_i$ is the mean gene expression level of the $i$-th group, $\epsilon_{ij}$ is the random error, $k$ is the number of groups, $n_i$ is the number of subjects in the $i$-th group. Denote the total number of subjects as $N = \sum_{i=1}^{k} n_i$. That is, we have $n_1$ mutation homozygotes, $n_2$ heterozygotes, and $n_3$ wildtype homozygotes.

We would like to test the null hypothesis $H_0$ and alternative hypothesis $H_1$: $$H_0: \mu_1 = \mu_2 = \mu_3,$$ $$H_1: \mbox{not all means are the same}.$$

According to O'Brien and Muller (1993), the power calculation formula for unbalanced one-way ANOVA is $$power=Pr\left(\left.F\geq F_{1-\alpha}\left(k-1, N-k\right)\right| F\sim F_{k-1, N-k, \lambda}\right),$$ where $k=3$ is the number of groups of subjects, $N$ is the total number of subjects, $F_{1-\alpha}\left(k-1, N-k\right)$ is the $100(1-\alpha)$-th percentile of central F distribution with degrees of freedoms $k-1$ and $N-k$, and $F_{k-1, N-k, \lambda}$ is the non-central F distribution with degrees of freedoms $k-1$ and $N-k$ and non-central parameter (ncp) $\lambda$. The ncp $\lambda$ is equal to $$\lambda=\frac{N}{\sigma^2}\sum_{i=1}^{k} w_i \left(\mu_i-\mu\right)^2,$$ where $\mu_i$ is the mean gene expression level for the $i$-th group of subjects, $w_i$ is the weight for the $i$-th group of subjects, $\sigma^2$ is the variance of the random errors in ANOVA (assuming each group has equal variance), and $\mu$ is the weighted mean gene expression level $$\mu=\sum_{i=1}^{k}w_i \mu_i.$$ The weights $w_i=n_i/N$ are the sample proportions for the 3 groups of subjects, where $N=n_1+n_2+n_3$ is the total number of subjects. Hence, $\sum_{i=1}^{3}w_i = 1$. Based on Hardy-Weinberg Equilibrium, we have $w_1 = \theta^2$, $w_2 = 2\theta(1-\theta)$, and $w_3 = (1-\theta)^2$, where $\theta$ is MAF.

Without loss of generality, we set $\mu_1 = -\delta_1$, $\mu_2=0$, and $\mu_3 = \delta_2$.

We adopted the parameters from the GTEx cohort (see the Power analysis" section of Nature Genetics, 2013; https://www.nature.com/articles/ng.2653), where they modeled the expression data as having a log-normal distribution with a log standard deviation of 0.13 within each genotype class (AA, AB, BB). This level of noise is based on estimates from initial GTEx data. In their power analysis, they assumed the across-genotype difference delta = 0.13 (i.e., equivalent to detecting a log expression change similar to the standard deviation within a single genotype class).

References

The GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 45:580-585, 2013.

O'Brien, RG and Muller, KE. Unified power analysis for t-tests through multivariate hypotheses. In LK Edwards, editor, Applied Analysis of Variance in Behavioral Science, pages 297-344. New York: Dekker, 1993.

Dong X, Li X, Chang T-W, Scherzer CR, Weiss ST, and Qiu W. powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics, 2021;, btab385

Examples

Run this code

# NOT RUN {
# calculate power
powerEQTL.ANOVA(MAF = 0.1,
                deltaVec = c(-0.13, 0.13),
                n = 282,
                power = NULL,
                sigma = 0.13,
                FWER = 0.05,
                nTests = 200000)
                
# calculate sample size (total number of subjects)
powerEQTL.ANOVA(MAF = 0.1,
                deltaVec = c(-0.13, 0.13),
                n = NULL,
                power = 0.8,
                sigma = 0.13,
                FWER = 0.05,
                nTests = 200000) 
                
# calculate minimum allowable MAF
powerEQTL.ANOVA(MAF = NULL,
                deltaVec = c(-0.13, 0.13),
                n = 282,
                power = 0.8,
                sigma = 0.13,
                FWER = 0.05,
                nTests = 200000)                 
                       
                       
# }

Run the code above in your browser using DataLab