permute.mdr: Function to perform a permutation test after fitting an MDR model

Description

After fitting an object of class 'mdr', performs a permutation test to assess the statistical significance of the balanced accuracy evaluation measure of the 'best model'.

Usage

permute.mdr(accuracy, loci, N.permute, method = c("CV", "3WS", "none"), data, cv, K, x = NULL, proportion = NULL, ratio = NULL, equal = "HR", genotype = c(0, 1, 2), LRT=FALSE)

Arguments

accuracy

the accuracy measure reported from the MDR model fit (after fitting mdr.cv, mdr.3WS, or mdr)

loci

the identified loci from the MDR model fit with mdr.cv or mdr.3WS, or prespecified set of loci fit with mdr

N.permute

the number of data permutations to perform

method

internal validation method used to fit the model: "CV" for mdr.cv, "3WS" for mdr.3WS, "none" for mdr

data

dataset used to fit the MDR model; first column is the binary response vector and subsequent columns are numeric SNP data

if method="CV", the number of cross-validation intervals

the maximum size of interaction to consider

if method="3WS", the number of models to save from the training set to be evaluated in the testing set; if NULL, default is number of total loci

proportion

if method="3WS", a vector with the ratio of data for training:testing:validation sets; if NULL, default is c(2,2,1)

ratio

case/control ratio threshold to ascribe high-risk/low-risk status of a genotype combination; if NULL, default is the ratio of cases to controls in the whole dataset

equal

how to treat genotype combinations with case/control ratio equal to the threshold; if NULL, default is "HR" for high-risk, but can also consider "LR" for low-risk

genotype

a numeric vector of possible genotypes arising in data; if NULL, default is c(0,1,2)

LRT

a logical indicating if a likelihood ratio test for significant interaction should be performed

Value

Permutation P-value: the empirical p-value based on the permutation distribution; i.e. the proportion of permutations with balanced accuracy > accuracy
Permutation Distribution: a vector with the top balanced accuracies from all N.permute permutations
LRT P-value: if LRT=TRUE, the empirical p-value for a test of interaction based on the LRT distribution
LRT Distribution: if LRT=TRUE, a vector with p-values for the LRT test of interaction from all N.permute permutations

Warning

MDR is a combinatorial search approach, so considering high-order interactions and a large number of permutations can be computationally expensive.

Details

Obtains permuted datasets by permuting the response vector only, in order to preserve the LD structure within the genetic data.

References

Ritchie MD et al (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hm Genet 69(1): 138-147.

Hahn LW, Ritchie MD, Moore JH (2003). Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19(3):376-82.

Velez DR et al (2007). A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31(4): 306-315.

Motsinger-Reif AA (2008). The effect of alternative permutation testing strategies on the performance of multifactor dimensionality reduction. BMC Research Notes 1:139.

Edwards TL et al (2010). A General Framework for Formal Tests of Interaction after Exhaustive Search Methods with Applications to MDR and MDR-PDT. PLoS One 5(2).

Examples

Run this code

#load data
data(mdr1)

#fit an mdr object to a subset of the sample data
fit<-mdr.3WS(data=mdr1[,1:11],K=2)

####save the accuracy
acc<-fit$'final model accuracy'

###save the final model loci
loc<-fit$'final model'

####run permutation test on 10 permutations
perm<-permute.mdr(accuracy=acc, loci=loc, N.permute=10, method="3WS",data=mdr1[,1:11], K=2, LRT=TRUE)

###empirical p-value
perm$'Permutation P-value'

Run the code above in your browser using DataLab