mdr2: Sample data for MDR package for n=250, p=100
Description
This dataset provides case/control disease status and genetic information.
Format
A simulated data frame with 250 observations on 101 variables. 'Response' is a binary vector representing case(1) or control(0) status for a disease. Variables 'SNP.1' to 'SNP.100' are numeric variables which represent genotype information (coded as 0,1,2) at 100 loci.Details
This data was simulated with an equal number of cases and controls according to a variation on the purely-epistatic XOR model of Li and Reich and represents a two-way interaction in the absence of marginal effects at 5 percent heritability. The true disease-causing loci are SNP.4 and SNP.9, generated with minor allele frequency 0.5. The expected balanced accuracy for this model is 67.09The penetrance function used to generate the case/control data based on the 9 possible genotype combinationsis as follows:
Genotype |
BB |
Bb |
bb |
AA |
0.199 |
0.05 |
0.199 |
Aa |
0.05 |
0.199 |
0.05 |
aa |
0.199 |
0.05 |
0.199 |
Genotype |
BB |
References
Li W, Reich J. 2000. A complete enumeration and classification of two-locus disease models. Hum Hered 50(6):334-49.Culverhouse R, et al (2002). A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet, 70(2):461-471.