Learn R Programming

gaston (version 1.4.9)

association.test: Association Test

Description

Association tests between phenotype and SNPs.

Usage

association.test(x, Y = x@ped$pheno, X = matrix(1, nrow(x)), method = c("lm", "lmm"), response = c("quantitative", "binary"), test = c("score", "wald", "lrt"), K, eigenK, beg = 1, end = ncol(x), p = 0, tol = .Machine$double.eps^0.25, multithreaded = FALSE, ...)

Arguments

Y
The phenotype vector. Default is the column (pheno) of x@ped
X
A covariable matrix. The default is a column vector of ones, to include an intercept in the model
method
Method to use: "lm" for (generalized) linear model, and "lmm" for (generalized) linear mixed model
response
Is "Y" a quantitative or a binary phenotype?
test
Which test to use. For binary phenotypes, test = "score" is mandatory
K
A Genetic Relationship Matrix (as produced by GRM), or a list of such matrices. For test = "score".
eigenK
Eigen decomposition of the Genetic Relationship Matrix (as produced by the function eigen). For test = "wald" or "lrt".
beg
Index of the first SNP tested for association
end
Index of the last SNP tested for association
p
Number of Principal Components to include in the model with fixed effect (for test = "wald" or "lrt")
tol
Parameter for the likelihood maximization (as in optimize)
multithreaded
Logical. If TRUE, several threads are used.
...
Additional parameters for lmm.aireml or logistic.mm.aireml (if test = "score").

Value

A data frame, giving for each considered SNP, some of the following columns depending on the values of method and test:

Details

Tests the association between the phenotype and requested SNPs in x.

If method = "lm" and response = "quantitative" are used, a simple linear regression is performed to test each SNP in the considered interval. Precisely, the following model is considered for each SNP, $$ Y = (X|PC)\alpha + G\beta + \varepsilon $$ with $epsilon ~ N(0, sigma^2 I_n)$, $G$ the genotype vector of the SNP, $X$ the covariates matrix, and $PC$ the matrix of the first $p$ principal components. A Wald test is performed, independently of the value of test.

There is no test available yet for method = "lm" and response = "binary". You can consider treating the binary phenotype as quantitative. If method = "lmm" and response = "quantitative", the following model in considered for each SNP $$ Y = (X|PC)\alpha + G\beta + \omega + \varepsilon $$ with $omega ~ N(0, tau K)$ and $epsilon ~ N(0, sigma^2 I_n)$. $G$ is the genotype vector of the SNP, $K$ is a Genetic Relationship Matrix (GRM) $X$ the covariates matrix, and $PC$ the matrix of the first $p$ principal components.

If test = "score", all parameters are estimated with the same procedure as in lmm.aireml and the argument K is used to specify the GRM matrix (or a list of GRM matrices for inclusion of several random effects in the model). The argument p is ignored. For Wald and LRT tests the procedure used is the same as in lmm.diago and eigenK is used to specify the GRM matrix.

If method = "lmm" and response = "binary", the following model in considered for each SNP $$ \mbox{logit}(P[Y=1| X, G, \omega]) = X\alpha + G\beta + \omega$$ with $omega ~ N(0, tau K)$. $G$ is the genotype vector of the SNP, $K$ is a Genetic Relationship Matrix (GRM), $X$ the covariable matrix. A score test is performed, independantly of the value of test. All parameters under null model are estimated with the same procedure as in logistic.mm.aireml. It is possible to give a list in argument K for inclusion of several random effects in the model. The argument p is ignored.

Note: in the present implementation, using several threads is possible for Wald and LRT tests, but is likely to be slower than using only one, hence the multithreaded parameter. This will change in the future.

See Also

lmm.diago, lmm.aireml, logistic.mm.aireml, optimize

Examples

Run this code

# Load data
data(AGT)
x <- as.bed.matrix(AGT.gen, AGT.fam, AGT.bim)
standardize(x) <- "p"

# generate a random positive matrix (to play the role of the GRM)
set.seed(1)
R <- random.pm(nrow(x))

# simulate quantitative phenotype with effect of SNP #351 and a polygenic component 
y <- x %*% c(rep(0,350),0.25,rep(0,ncol(x)-351)) + lmm.simu(0.3,1,eigenK=R$eigen)$y

# association test with mixed model (default is quantitative response, score test)
t_score <- association.test(x, y, K = R$K, method = "lmm")
t_wald <- association.test(x, y, eigenK = R$eigen, method = "lmm", test = "wald")

# comparaison des p-valeurs obtenues par les deux tests
plot( t_score$p, t_wald$p, log = "xy", xlab = "score", ylab = "wald")

# (mini) Manhattan plot
plot(-log10(t_score$p), xlab="SNP index", ylab = "-log(p)")

# link between p-values and LD with SNP #351
lds <- LD(x, 351, c(1,ncol(x)))
plot(lds, -log10(t_score$p), xlab="r^2", ylab="-log(p)")

# use y to simulate a binary phenotype
y1 <- as.numeric(y > mean(y))
t_binary <- association.test(x, y1, K = R$K, response = "binary")
# (mini) Manhattan plot
plot(-log10(t_binary$p), xlab="SNP index", ylab = "-log(p)")

Run the code above in your browser using DataLab