An ExpressionSet object storing simulated genotype data. The minor allele frequency (MAF) of cases has different prior than that of controls.
data("esSimDiffPriors")In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.
In cluster \(+\), the minor allele frequency (MAF) \(\theta_{x+}\) of cases is greater than the MAF \(\theta_{y+}\) of controls.
In cluster \(0\), the MAF \(\theta_{0}\) of cases is equal to the MAF of controls.
In cluster \(-\), the MAF \(\theta_{x-}\) of cases is smaller than the MAF \(\theta_{y-}\) of controls.
The proportions of the 3 clusters of SNPs are \(\pi_{+}\), \(\pi_{0}\), and \(\pi_{-}\), respectively.
We assume a “half-flat shape” bivariate prior for the MAF in cluster \(+\) $$2h_{x+}\left(\theta_{x+}\right)h_{y+}\left(\theta_{y+}\right) I\left(\theta_{x+}>\theta_{y+}\right), $$ where \(I(a)\) is hte indicator function taking value \(1\) if the event \(a\) is true, and value \(0\) otherwise. The function \(h_{x+}\) is the probability density function of the beta distribution \(Beta\left(\alpha_{x+}, \beta_{x+}\right)\). The function \(h_{y+}\) is the probability density function of the beta distribution \(Beta\left(\alpha_{y+}, \beta_{y+}\right)\).
We assume \(\theta_{0}\) has the beta prior \(Beta(\alpha_0, \beta_0)\).
We also assume a “half-flat shape” bivariate prior for the MAF in cluster \(-\) $$2h_{x-}\left(\theta_{x-}\right)h_{y-}\left(\theta_{y-}\right) I\left(\theta_{x-}>\theta_{y-}\right). $$ The function \(h_{x-}\) is the probability density function of the beta distribution \(Beta\left(\alpha_{x-}, \beta_{x-}\right)\). The function \(h_{y-}\) is the probability density function of the beta distribution \(Beta\left(\alpha_{y-}, \beta_{y-}\right)\).
Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes. That is, given MAF \(\theta\), the probabilities of genotypes are $$Pr(geno=2) = \theta^2$$ $$Pr(geno=1) = 2\theta\left(1-\theta\right)$$ $$Pr(geno=0) = \left(1-\theta\right)^2$$
We also assume the genotypes \(0\) (wild-type), \(1\) (heterozygote), and \(2\) (mutation) follows a multinomial distribution \(Multinomial\left\{1, \left[ \theta^2, 2\theta\left(1-\theta\right), \left(1-\theta\right)^2 \right]\right\}\)
We set the number of cases as \(100\), the number of controls as \(100\), and the number of SNPs as \(1000\).
The hyperparameters are \(\alpha_{x+}=2\), \(\beta_{x+}=3\), \(\alpha_{y+}=2\), \(\beta_{y+}=8\), \(\pi_{+}=0.1\),
\(\alpha_{0}=2\), \(\beta_{0}=5\), \(\pi_{0}=0.8\),
\(\alpha_{x-}=2\), \(\beta_{x-}=8\), \(\alpha_{y-}=2\), \(\beta_{y-}=3\), \(\pi_{-}=0.1\).
Note that when we generate MAFs from the half-flat shape bivariate priors, we might get very small MAFs or get MAFs \(>0.5\). In these cased, we then delete this SNP.
So the final number of SNPs generated might be less than the initially-set number \(1000\) of SNPs.
For the dataset stored in esSim, there are \(838\) SNPs.
\(64\) SNPs are in cluster -, \(708\) SNPs are in cluster \(0\),
and \(66\) SNPs are in cluster \(+\).
Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.
# NOT RUN {
data(esSimDiffPriors)
print(esSimDiffPriors)
pDat=pData(esSimDiffPriors)
print(pDat[1:2,])
print(table(pDat$memSubjs))
fDat=fData(esSimDiffPriors)
print(fDat[1:2,])
print(table(fDat$memGenes))
print(table(fDat$memGenes2))
# }
Run the code above in your browser using DataLab