This function performs the W-test to calculate high-order interactions in case-control studies
for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined
log of odds ratio. It follows a chi-squared probability distribution with data-adaptive degrees of freedom. For pairwise interaction
calculation, the user has 3 options: (1) calculate a single pair's W-value, (2) calculate pairwise or high-order interaction for a list of variables,
which p-values are smaller than a threshold (input.pval); (3) calculate the pairwise or high-order interaction exhaustively for all variables.
For both main effect and interaction effect calculation, the output can be filtered by p-values, such that only sets with smaller p-value
than a threshold (output.pval) will be returned.
wtest.high(data, y, w.order = 3, hf1 = "default.hf1",
hf.high.order = "default.high", which.pair = NULL, output.pval = NULL,
sort = TRUE, input.pval = 0.1, input.poolsize = 10)a data frame or matrix contains genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1).
a numeric vector composed of 0 or 1; or a factor variable with two levels.
an integer value, indicating the order of high-way interactions. For example, w.order = 3 for three-way interaction analysis.
a data frame or matrix, contains the h and f values for main effect (w.order =1) calculation at the number of categorical combinations (k) = 2 or 3. Default hf1 is h = k/(k-1) and f = k-1, where k = 2 to 3, in which the first row is the h and f for k = 2, and second row is the h and f for k = 3.
a data frame or matrix, contains the h and f values for high-order interaction effect calculation (w.order > 1). Default hf.high.order is h = k/(k-1) and f = k-1, where k is the number of genotype categories of a cluster of SNPs.
a numeric vector, with length = w.order. It contains the column number of the variable set to calculate. If which.pair is specified, the w.value for that set is returned. Default which.pair = NULL, when main or interaction effect will be calculated exhaustively.
a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the output.pval.
a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE.
a p-value threshold to select markers for high-order interaction calculation, used only when w.order > 1. When specified, only markers with main effect p-value smaller than input.pval will be passed to interaction effect calculation. Default = 0.10. Set input.pval = NULL or 1 for exhaustive pairwise calculation.
an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to execute high-order interaction calculation, used only when w.order > 2. It selects top input.poolsize number of variables to calculate pairwise interactions. It can be used separately or jointly with input.pval, whichever gives smaller input variable pool size. Default = 10. Set input.poolsize = NULL for exhaustive pairwise calculation. It can be useful when the user is exploring the data, and there may be a large number of variables with extremely small main effect p-values.
An object "wtest" containing:
the "w.order" specified.
When order > 1 and which.pair = NULL, the test results include: (for pair) [pair name, W-value, k, p-value]; (for first variable in the pair) [W-value, k, p-value]; (for second variable in the pair) [W-value, k, p-value]...
The h and f values used in main effect calculation.
The h and f values used in high-order interaction calculation.
W-test is a model-free statistical test orginally proposed to measure main effect or pairwise interactions in case-control studies with categorical variables. It can be extended to high-order interaction detection by the wtest.high() function. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.
When w.order > 1, the wtest() will automatically calculate the main effect first and then do a pre-filter before calculating interactions.
This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001
for less output, or input.pval=1 or NULL for exhaustive pairwise calculation. Another optional filter is input.poolsize. It will take the top input.poolsize
number of variables to calculated pairwise effect exhaustively, selected by smallest p-value; when used together with input.pval, the smaller set will be passed to pairwise calculation.
Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research.doi:10.1093/nar/gkw347.
# NOT RUN {
data(mydata)
data(phenotype1)
## Step 1. HF Calculation.
# Please note that parameter B is recommended to be greater than 400 for w.order = 1 or 2.
# For high order interaction analysis (w.order > 2), it is recommended to use default n.sample.
hf1 <- hf(data = mydata, w.order = 1, B = 100)
hf.high <- hf(data = mydata, w.order = 3, B = 30, n.marker = 10)
## Step 2. Application
w1 <- wtest.high(mydata, phenotype1, w.order = 1, hf1 = hf1)
w3 <- wtest.high(mydata, phenotype1, w.order = 3, input.pval = 0.3,
input.poolsize = 50, output.pval = 0.5, hf1 = hf1, hf.high.order = hf.high)
w.pair <- wtest.high(mydata, phenotype1, w.order = 3, which.pair = c(10,13,20),
hf.high.order = hf.high)
# }
Run the code above in your browser using DataLab