gwas: Empirical Bayes Genome Wide Association Mapping

Description

The gwas function calculates the likelihood ratio for each marker under the empirical Bayesian framework. The method also works with multiple populations.

Usage

gwas(y,gen,fam=NULL,chr=NULL,window=NULL)

Arguments

Numeric vector of observations ($n$) describing the trait to be analyzed. NA is allowed.

gen

Numeric matrix containing the genotypic data. A matrix with $n$ rows of observations and ($m$) columns of molecular markers. SNPs must be coded as 0, 1, 2, for founder homozygous, heterozygous and reference homozigous. NA is all

fam

Numeric vector of length ($n$) indicating which subpopulations ($i.e.$ family) each observation comes from. Default assumes that all observations are from the same populations.

chr

Numeric vector indicating the number of markers in each chromosome. The sum of $chr$ must be equal to the number of columns in $gen$. Default assumes that all markers are from the same chromosome.

window

Numeric. If specified, genetic distance between markers is used for moving window strategy (Wang 2015). Window must be specified in Morgans ($e.g.$ 0.05 would represent 5cM). Genetic distance is calculated assuming that individuals are RILs.

Value

The function nam returns a list containing the method deployed ($Method$), predicted parameters and statistical test ($PolyTest$), genetic map for NAM panels ($MAP$) and the marker names ($SNPs$).

Details

Special incidence matrix is recreated to optimize the information provided by the subpopulations. Each locus is re-coded as a vector with length $f$ equal to number of subpopulations, or NAM families. For example, a locus heterozigous from an individual from subpopulation 2 is coded as [ 1, 0, 1 , ... ,$f$ ], a locus homozigous for the reference allele from any subpopulation is coded as [ 2, 0, 0, ... , $f$ ] and a locus homozigous for the founder allele from an individual from subpopulation 1 is coded as [ 0, 2, 0, ... ,$f$ ]. The base model for genome scanning includes the fixed effect ($Xb$), the marker ($Zu$), the polygene ($g$) and the residuals ($e$). If the $window$ term is specified, the model for genome scanning includes three extra terms, the left side genome ( $Zu[k-1]$ ), the right side genome ( $Zu[k+1]$ ) and window polygene ( $-g[k]$ ). The polygenic term is calculated only once (Zhang et al 2010) using eigendecomposition (Zhou ans Stephens 2012). Efficient inversion of capacitance matrix is obtained through the Woodbury matrix identities. To avoid memory issues, one can use the function $gwas2$ with the same arguments, except that window is not allowed.

References

Wang, Q. An Empirical Bayes Method for Genome-Wide Association Studies. W799/Statistical Genomics. PAG XXXII. 2015. Zhang et al. 2010. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42:355-360. Zhou, X., & Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies. Nature genetics, 44(7), 821-824.

Examples

Run this code

data(tpod)
test=gwas(y=y,gen=gen[,1:240],fam=fam,chr=chr[1:12],window=0.05)
manhattan(test,type="h",lwd=3)

Run the code above in your browser using DataLab