mbmdr: Model Based Multifactor Dimensionality Reduction

Description

mbmdr implements the Model Based Multifactor Dimensionality Reduction (MB-MDR) method proposed by Calle et al.(2008) as a dimension reduction method for exploring gene-gene interactions.

Usage

mbmdr(y, data, order, covar=NULL, exclude=NA, risk.threshold=0.1, 
        output=NULL, adjust=c("none","covariates","main effects","both"), 
        first.model=NULL, list.models=NULL, use.logistf=TRUE, 
        printStep1=FALSE, ...)

Arguments

Vector containing the dependent variable.

data

A data.frame (or object coercible by as.data.frame to a data frame) containing the SNP information with values 0,1,2. For example: 0 = common homozygous genotype, 1 = heterozygous genotype, 2 = variant homozygous genotype.

order

Single integer that specifies the order of interactions to be analyzed. If list.models = NULL (value by default) all possible interactions of the specified order are analyzed.

covar

(Optional) A data.frame or object coercible by as.data.frame to a data frame containing the covariates for adjusting regression models. Only used if adjust="covariates" or adjust="both".

exclude

(Optional) Value/s of missing data. If missings in data are coded differently than NA it should be specified. For example exclude=c(NA,-1) specifies that both, NA and -1 indicate a

risk.threshold

Threshold used at the first MB-MDR stage for defining the risk category of a multilocus genotype. It should be a conservative value. The default value is risk.threshold=0.1.

output

(Optional) Output file name for storing mbmdr results on file, or NULL (default) for output as R object. If the number of models to be analyzed is too large, it is preferable to store the output in a file. This

adjust

Type of regressions adjustment. Options are "none", "covariates", "main effects" or "both". By default no adjustment is performed; adjust="none".

first.model

(Optional) Numerical vector of length equal to order for specifying the initial interaction model for mbmdr; previous models will not be evaluated. This is useful to continue mbmdr computation af

list.models

(Optional) Exhaustive list of models to be analyzed. Only models in list will be analyzed. It can be: a vector of lenght order specifying a unique model, a matrix (n x order) containing models by rows, or a stri

use.logistf

Boolean value indicating whether or not to use the logistf package for logistic regressions when separation of data points is observed. It only has effect if logistic regression (family=binomial(link = "logit")) is specifie

printStep1

Boolean value. If true, the details of mbmdr step 1 are printed for every model. This slows the process, so it is only advisable when the number of models to analyze is small. By default printStep1=FALSE.

...

For regression arguments: arguments to be passed to glm calls. Mainly to specify the error distribution and link function to be used in the regression models. For example, use family=binomial(link=logit) for specifying logistic r

Value

mbmdr returns an object of class mbmdr with the following attributes:
callThe matched call.
yThe outcome used.
dataThe SNPs data used.
covarThe covariate data used.
resultDataframe with those interactions that have at least a significant genotype. For each interaction (rows), the following information is returned: lll{ SNP1...SNPx Names of snps in interaction. NH Number of significant High risk genotypes in the interaction. betaH Regresion coeficient in step2 for High risk exposition. WH Wald statistic for High risk category. PH P-value of the Wald test for the High risk category. NL Number of significant Low risk genotypes in the interaction. betaL Regresion coeficient in step2 for Low risk exposition. WL Wald statistic for Low risk category. PL P-value of the Wald test for the Low risk category. MIN.P Minimun p-value (min(PH,PL)) for the interaction model. }
If printStep1 argument is set to TRUE, the result of the first step in mbmdr is printed for each genotype with the following information: lll{ ... Genotype. cases (only for case/control outcome) Number of cases with the specific genotype. controls (only for case/control outcome) Number of controls with the specific genotype. beta Regression coefficient for this genotype. p.value Wald test p-value for this genotype. category Predicted risk category for this genotype. }

Details

MB-MDR is a method for identifying multi-locus genotypes that are associated with a phenotype of interest, and allows to adjust for marginal and confounding effects. The exploration of interacions is performed in three steps: Step1 Each genotype is tested for association with the response and is classified as high risk, low risk or not significant, and all genotypes of the same class are merged. The threshold for considering significant evidence is the value specified in risk.threshold (by default risk.threshold=0.1). If printStep1=TRUE, the MBMDR function prints this classification. Step2 For each risk categories, high and low, a new association test is performed. The result provides a Wald statistic for the high and for the low categories. Step3 The significance is explored through a permutation test on the maximum Wald statistics.

References

Calle M.L., Urrea V., Vellalta G., Malats N., Steen K.V. (2008) Improving strategies for detecting genetic patterns of disease susceptibility in association studies. Statistics in Medicine 27, 6532-6546.

Examples

Run this code

#---  Dicotomous outcome  -------------

# load example data
data(simSNP)

# MB-MDR analysis of all possible 2nd order interactions (it may take some time)
# The order of the interactions to be explored is specified by order=2
# fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=2,family=binomial(link=logit))
# print(fit)

# MB-MDR analysis of the epistatic effect of SNP1 and SNP2 (Model 2 1)
# The specific model to be analyzed is specified by list.models=c(2,1)
fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=2,list.models=c(2,1),
             family=binomial(link=logit),printStep1=TRUE)
print(fit)


# MB-MDR analysis of the epistatic effect of SNP1 and SNP2, adjusted for variable X
# The specific model to be analyzed is specified by list.models=c(2,1)
# The adjustment statement is specified by adjust="covariates"
fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=2,list.models=c(2,1),
             covar=simSNP$X,adjust="covariates",family=binomial(link=logit))
print(fit)


# MB-MDR analysis of all 2nd order interactions restricted to a subset of snps 
# (all possible 2nd order interactions among SNP1, SNP2 and SNP3 are explored,
# it may take some time)
# SNP1, SNP2 and SNP3 are placed in columns 3 to 5 of simSNP. This is specified
# by data=simSNP[,3:5]
# fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:5],order=2,family=binomial(link=logit),
#             printStep1=TRUE)
# print(fit)


# MB-MDR analysis of all possible 3rd order interactions (it may take some time)
# The order of the interactions to be explored is specified by order=3
# fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=3,family=binomial(link=logit))
# print(fit)


# MB-MDR analysis of the 3rd order epistatic effect of SNP1, SNP2 and SNP3
# The specific model to be analyzed is specified by list.models=c(3,2,1)
# fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=3,list.models=c(3,2,1),
#              family=binomial(link=logit),printStep1=TRUE)
# print(fit)


#---  Continuous outcome  --------------
# load example data
data(simSNPcont)

# MB-MDR analysis of all possible 2nd order interactions (it may take some time)
# The order of the interactions to be explored is specified by order=2
# fit <- mbmdr(y=simSNPcont$Y,data=simSNPcont[,2:11],order=2)
# print(fit)

# MB-MDR analysis of the epistatic effect of SNP1 and SNP2 (Model 2 1)
# The specific model to be analyzed is specified by list.models=c(2,1)
fit <- mbmdr(y=simSNPcont$Y,data=simSNPcont[,2:11],order=2,
	           list.models=c(2,1),printStep1=TRUE)
print(fit)

Run the code above in your browser using DataLab