wtest.high: W-test for high order interaction analysis

Description

This function performs the W-test to calculate high-order interactions in case-control studies for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined log of odds ratio. It follows a chi-squared probability distribution with data-adaptive degrees of freedom. For pairwise interaction calculation, the user has 3 options: (1) calculate a single pair's W-value, (2) calculate pairwise or high-order interaction for a list of variables, which p-values are smaller than a threshold (input.pval); (3) calculate the pairwise or high-order interaction exhaustively for all variables. For both main effect and interaction effect calculation, the output can be filtered by p-values, such that only sets with smaller p-value than a threshold (output.pval) will be returned.

Usage

wtest.high(data, y, w.order = 3, hf1 = "default.hf1",
  hf.high.order = "default.high", which.pair = NULL, output.pval = NULL,
  sort = TRUE, input.pval = 0.1, input.poolsize = 10)

Arguments

data

a data frame or matrix contains genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1).

a numeric vector composed of 0 or 1; or a factor variable with two levels.

w.order

an integer value, indicating the order of high-way interactions. For example, w.order = 3 for three-way interaction analysis.

hf1

a data frame or matrix, contains the h and f values for main effect (w.order =1) calculation at the number of categorical combinations (k) = 2 or 3. Default hf1 is h = k/(k-1) and f = k-1, where k = 2 to 3, in which the first row is the h and f for k = 2, and second row is the h and f for k = 3.

hf.high.order

a data frame or matrix, contains the h and f values for high-order interaction effect calculation (w.order > 1). Default hf.high.order is h = k/(k-1) and f = k-1, where k is the number of genotype categories of a cluster of SNPs.

which.pair

a numeric vector, with length = w.order. It contains the column number of the variable set to calculate. If which.pair is specified, the w.value for that set is returned. Default which.pair = NULL, when main or interaction effect will be calculated exhaustively.

output.pval

a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the output.pval.

sort

a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE.

input.pval

a p-value threshold to select markers for high-order interaction calculation, used only when w.order > 1. When specified, only markers with main effect p-value smaller than input.pval will be passed to interaction effect calculation. Default = 0.10. Set input.pval = NULL or 1 for exhaustive pairwise calculation.

input.poolsize

an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to execute high-order interaction calculation, used only when w.order > 2. It selects top input.poolsize number of variables to calculate pairwise interactions. It can be used separately or jointly with input.pval, whichever gives smaller input variable pool size. Default = 10. Set input.poolsize = NULL for exhaustive pairwise calculation. It can be useful when the user is exploring the data, and there may be a large number of variables with extremely small main effect p-values.

Value

An object "wtest" containing:

order

the "w.order" specified.

results

When order > 1 and which.pair = NULL, the test results include: (for pair) [pair name, W-value, k, p-value]; (for first variable in the pair) [W-value, k, p-value]; (for second variable in the pair) [W-value, k, p-value]...

hf1

The h and f values used in main effect calculation.

hf2

The h and f values used in high-order interaction calculation.

Details

W-test is a model-free statistical test orginally proposed to measure main effect or pairwise interactions in case-control studies with categorical variables. It can be extended to high-order interaction detection by the wtest.high() function. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.

When w.order > 1, the wtest() will automatically calculate the main effect first and then do a pre-filter before calculating interactions. This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001 for less output, or input.pval=1 or NULL for exhaustive pairwise calculation. Another optional filter is input.poolsize. It will take the top input.poolsize number of variables to calculated pairwise effect exhaustively, selected by smallest p-value; when used together with input.pval, the smaller set will be passed to pairwise calculation.

References

Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research.doi:10.1093/nar/gkw347.

Examples

Run this code

# NOT RUN {
data(mydata)
data(phenotype1)

## Step 1. HF Calculation.
# Please note that parameter B is recommended to be greater than 400 for w.order = 1 or 2.
# For high order interaction analysis (w.order > 2), it is recommended to use default n.sample.
hf1 <- hf(data = mydata, w.order = 1, B = 100)
hf.high <- hf(data = mydata, w.order = 3, B = 30, n.marker = 10)

## Step 2. Application
w1 <- wtest.high(mydata, phenotype1, w.order = 1, hf1 = hf1)
w3 <- wtest.high(mydata, phenotype1, w.order = 3, input.pval = 0.3,
            input.poolsize = 50, output.pval = 0.5, hf1 = hf1, hf.high.order = hf.high)
w.pair <- wtest.high(mydata, phenotype1, w.order = 3, which.pair = c(10,13,20),
            hf.high.order = hf.high)
# }

Run the code above in your browser using DataLab