vbdmR: fit a discrete mixture model (R implementation)

Description

Fits a discrete mixture model for rare variant association analysis. Uses an approximate variational Bayes coordinate ascent algorithm for a computationally efficient solution. This is the slow but well documented R implementation.

Usage

vbdmR(y, G, X=NULL, tol=1e-4, thres=0.05, scaling=TRUE, 
      hyper=c(2,2))

Arguments

A vector of continuous phenotypes.

A matrix of genotypes or variables of interest.

An optional matrix of covariates.

tol

The tolerance for convergence based on the change in the lower bound on the marginal log likelihood in the vbdm algorithm.

thres

If the matrix is of genotypes, then this specifies a minor allele frequency threshold. Variants with a MAF greater than this threshold are excluded from the analysis.

scaling

Whether or not to scale the genotypes to have mean 0 and variance 1.

hyper

The hyperparameters for the prior defined over the mixing probability parameter. The first hyperparameter is the alpha parameter, and the second is the beta parameter.

Value

yThe phenotype vector passed to vbdmR.
GThe genotype matrix passed to vbdmR. Note that any variables that were dropped will be dropped from this matrix.
XThe covariate matrix passed to vbdmR. Will include intercept term if it was added earlier.
keepA vector of indices of the kept variables in G (if any were excluded based on thres)
pvecThe vector of estimated posterior probabilities for each variable in G.
gammaA vector of additive covariate effect estimates.
thetaThe estimated effect of the variables in G.
sigmaThe estimated error variance.
probThe estimated mixing parameter.
lbThe lower bound of the marginal log likelihood.
lbnullThe lower bound of the marginal log likelihood under the null model.
lrtThe approximate likelihood ratio test based on the lower bounds.
p.valueA p-value computed based on lrt with the assumption that lrt~chi^2_1

Details

This function contains the much slower, but well documented R implementation of the vbdm algorithm. This function does not have all of the sanity checks that vbdm has, and should therefore only be used for diagnostic purposes.

Examples

Run this code

#generate some test data
library(vbdm)
set.seed(3)
n <- 1000
m <- 20
G <- matrix(rbinom(n*m,2,.01),n,m);
beta1 <- rbinom(m,1,.2)
y <- G%*%beta1+rnorm(n,0,1.3)

#compare implementations
res1 <- vbdm(y=y,G=G);
res2 <- vbdmR(y=y,G=G);
T5 <- summary(lm(y~rowSums(scale(G))))$coef[2,4];
cat('vbdm p-value:',res1$p.value,
  '<nvbdmR>p-value:',res2$p.value,
  '<nT5>p-value:',T5,'<n>')
#vbdm p-value: 0.001345869 
#vbdmR p-value: 0.001345869 
#T5 p-value: 0.9481797</n><keyword>vbdm</keyword>
<keyword>association</keyword>
<keyword>genetic</keyword>
<keyword>rare</keyword>
<keyword>variational</keyword></nT5></nvbdmR>