Learn R Programming

rrBLUP (version 3.2)

GWA: Genome-wide association analysis

Description

Performs genome-wide association analysis based on the mixed model $$y = X \beta + Z g + \varepsilon$$ where $\beta$ is a vector of fixed effects that can model both environmental factors and population structure. The variable $g$ models the genetic background of each line as a random effect with $Var[g] = K \sigma^2_g$. The residual variance is $Var[\varepsilon] = I \sigma_e^2$.

Usage

GWA(y, G, Z=NULL, X=NULL, min.MAF=0.05, n.core=1, check.rank=FALSE)

Arguments

y
Vector ($n \times 1$) of observations. Missing values (NA) are omitted.
G
Matrix ($t \times m$) of genotypes for $t$ lines with $m$ bi-allelic markers. Genotypes should be coded as {-1,0,1} = {aa,Aa,AA}. Fractional (imputed) and missing (NA) values are allowed.
Z
0-1 matrix ($n \times t$) relating observations to lines. If not passed, the identity matrix is used.
X
Design matrix ($n \times p$) for the fixed effects. If not passed, a vector of 1's is used to model the intercept.
min.MAF
Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is assigned a zero score.
n.core
For Mac, Linux, and UNIX users, setting n.core > 1 will enable parallel execution on a machine with multiple cores. R package multicore must be installed for this to work. Do not run multicore from within the R GUI; you must use the command line.
check.rank
If TRUE, function will check the rank of the augmented design matrix for each marker. Markers for which the design matrix is singular are assigned a zero score.

Value

  • Returns $m \times 1$ vector of the marker scores, which equal $-log_{10}$(p-value)

Details

This function implements the iterative, generalized least-squares method of Kang et al. (2010), using mixed.solve for variance component estimation. The use of a minimum MAF is typically adequate to ensure the problem is well-posed. However, if an error message indicates the problem is singular, set check.rank to TRUE. This will slow down the algorithm but should fix the error.

References

Kang et al. 2010. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42:348-354.

Examples

Run this code
#random population of 200 lines with 1000 markers
G <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
  G[i,] <- ifelse(runif(1000)<0.5,-1,1)
}

QTL <- 100*(1:5) #pick 5 QTL
u <- rep(0,1000) #marker effects
u[QTL] <- 1
g <- as.vector(crossprod(t(G),u))
h2 <- 0.5
y <- g + rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g)))

scores <- GWA(y=y,G=G)

Run the code above in your browser using DataLab