vbdm: fit a discrete mixture model

Description

Fits a discrete mixture model for rare variant association analysis. Uses an approximate variational Bayes coordinate ascent algorithm for a computationally efficient solution.

Usage

vbdm(y, G, X=NULL, thres=0.05, genotypes=TRUE,
     include.mean=TRUE, minor.allele=TRUE, impute="MEAN",
     eps=1e-4, scaling=TRUE, nperm=0, maxit=1000, hyper=c(2,2))

Arguments

A vector of continuous phenotypes.

A matrix of genotypes or variables of interest. Rows are treated as individuals and columns are treated as genotypes.

An optional matrix of covariates.

thres

If the matrix is of genotypes, then this specifies a minor allele frequency threshold. Variants with a MAF greater than this threshold are excluded from the analysis.

genotypes

This specifies whether or not to treat G as a matrix of genotypes. If it is treated as genotypes then it will be filtered based on thres, and there are more options for missing data imputations. The default genotype encoding is additive (e.

include.mean

This specifies whether to add an interecept term to the model. If no covariates are provided it is automatically added, but if there are covariates provided it can be optional.

minor.allele

When minor.allele=TRUE and genotypes=TRUE the genotypes are flipped so that the major allele genotype is encoded as 0.

impute

If there is missing data in G this specifies the method with which to impute the missing data. There are two options impute="MEAN" which sets any missing genotype to the expected dosage given the MAF, or impute="MAJOR"

eps

The tolerance for convergence of the coordinate ascent algorithm based on the change in the lower bound of the log marginal likeilhood.

scaling

Whether or not to scale the genotypes to have mean 0 and variance 1.

nperm

Optional parameter defining the number of null permutations of the vbdm likelihood ratio test. This can be used to generate an exact p-value

maxit

The maximum number of iterations allowed for the vbdm algorithm.

hyper

The hyperparameters for the prior defined over the mixing probability parameter. The first hyperparameter is the alpha parameter, and the second is the beta parameter.

Value

yThe phenotype vector passed to vbdm.
GThe genotype matrix passed to vbdm. Note that any variables that were dropped will be dropped from this matrix.
XThe covariate matrix passed to vbdm. Will include intercept term if it was added earlier.
keepA vector of indices of the kept variables in G (if any were excluded based on thres)
pvecThe vector of estimated posterior probabilities for each variable in G.
gammaA vector of additive covariate effect estimates.
thetaThe estimated effect of the variables in G.
sigmaThe estimated error variance.
probThe estimated mixing parameter.
lbThe lower bound of the marginal log likelihood.
lbnullThe lower bound of the marginal log likelihood under the null model.
lrtThe approximate likelihood ratio test based on the lower bounds.
p.valueA p-value computed based on lrt with the assumption that lrt~chi^2_1
lbpermIf nperm>0, the lower bound of the fitted null permutations.
lrtpermIf nperm>0, the likelihood ratio test of the fitted null permutations.
p.value.permIf nperm>0, the empirical p-value based on the fitted null permutations.

References

Logsdon, B.A., et al. (2014) A Variational Bayes Discrete Mixture Test for Rare Variant Association., Genetic Epidemiology, Vol. 38(1), 21-30 2014

Examples

Run this code

#generate some test data
library(vbdm)
set.seed(3)
n <- 1000
m <- 20
G <- matrix(rbinom(n*m,2,.01),n,m);
beta1 <- rbinom(m,1,.2)
y <- G%*%beta1+rnorm(n,0,1.3)

#with scaling:
res <- vbdm(y=y,G=G);
T5 <- summary(lm(y~rowSums(scale(G))))$coef[2,4];
cat('vbdm p-value:',res$p.value,'<nT5>p-value:',T5,'<n>')
#vbdm p-value: 0.001345869 
#T5 p-value: 0.9481797 

#without scaling:
res <- vbdm(y=y,G=G,scaling=FALSE)
T5 <- summary(lm(y~rowSums(G)))$coef[2,4];
cat('vbdm p-value:',res$p.value,'<nT5>p-value:',T5,'<n>')
#vbdm p-value: 0.0005315836 
#T5 p-value: 0.904476 

#run 100 permutations
set.seed(2)
res <- vbdm(y=y,G=G,scaling=FALSE,nperm=1e2);
cat('vbdm approximate p-value:',res$p.value,'<nvbdm>permutation p-value <',res$p.value.perm,'<n>');
#vbdm approximate p-value: 0.0005315836 
#vbdm permutation p-value: 0</n><keyword>vbdm</keyword>
<keyword>association</keyword>
<keyword>genetic</keyword>
<keyword>rare</keyword>
<keyword>variational</keyword></nvbdm></n></nT5></n></nT5>