wtest: W-test

Description

This function performs the W-test to calculate main effect or pairwise interactions in case-control studies for categorical data sets. The test measures target variables' distributional difference between cases and controls via a combined log of odds ratio. It follows a Chi-squared probability distribution with data-adaptive degrees of freedom. For pairwise interaction calculation, the user has 3 options: (1) calculate a single pair's W-value, (2) calculate pairwise interaction for a list of variables, which p-values are smaller than a threshold (input.pval); (3) calculate the pairwise interaction exhaustively for all variables. For both main and interaction calculation, the output can be filtered by p-values, such that only sets with smaller p-value than a threshold (output.pval) will be returned. An extension of the W-test for rare variant analysis is available in zfa package.

Usage

wtest(data, y, w.order = c(1, 2), hf1 = "default.hf1",
  hf2 = "default.hf2", which.marker = NULL, output.pval = NULL,
  sort = TRUE, input.pval = 0.1, input.poolsize = 150)

Arguments

data

a data frame or matrix containing genotypes in the columns. Genotypes should be coded as (0, 1, 2) or (0, 1).

a numeric vector of 0 or 1.

w.order

an integer value of 0 or 1. w.order = 1 for main effect calculation; w.order = 2 for pairwise calculation.

hf1

h and f values to calculate main effect, organized as a matrix, with columns (k, h, f), k = 2 to 3. Needed when w.order = 1.

hf2

h and f values to calculate interaction associations, organized as a matrix, with columns (k, h, f), k = 2 to 9. Needed when w.order = 2.

which.marker

a numeric vector, when w.order = 1, a single value indicating the column index of a SNP to calculate, when w.order = 2, a vector indicating the column index of a SNP-pair to calculate. Default which.marker = NULL means main or interaction effect will be calculated exhaustively.

output.pval

a p-value threshold for filtering the output. If NULL, all the results will be listed; otherwise, the function will only output the results with p-values smaller than the output.pval.

sort

a logical value indicating whether or not to sort the output by p-values in ascending order. Default = TRUE.

input.pval

a p-value threshold to select markers for pairwise calculation, used only when w.order = 2. When specified, only markers with main effect p-value smaller than input.pval will be passed to interaction effect calculation. Default = 0.10. Set input.pval = NULL or 1 for exhaustive pairwise calculation.

input.poolsize

an integer, with value less than the number of input variables. It is an optional filter to control the maximum number of variables to include in pairwise calculation, used only when w.order = 2. When specified, the function selects top input.poolsize number of variables to calculate pairwise interactions. It can be used separately or jointly with input.pval, whichever gives smaller input variable pool size. Default = 50. Set input.poolsize = NULL for exhaustive pairwise calculation. It can be useful for data exploration, when there are a large number of variables with extremely small main effect p-values.

Value

An object "wtest" containing:

order

the "w.order" specified.

results

When w.order = 1, the test results include: the ID of SNP, the W value, k, and p-value. When w.order = 2 and which.marker = NULL, the test results include: (information of the pair, column 1-5) [SNP1 name, SNP2, name, W-value, k, p-value]; (Information of the first variable in the pair, column 6-8) [W-value, k, p-value]; (Information of the second variable in the pair, column 9-11) [W-value, k, p-value].

hf1

The h and f values used in main effect calculation.

hf2

The h and f values used in pairwise interaction calculation.

Details

W-test is a model-free statistical test to measure main effect or pairwise interactions in case-control studies with categorical variables. Theoretically, the test statistic follows a Chi-squared distribution with f degrees of freedom. The data-adaptive degree of freedom f, and a scalar h in the test statistics allow the W-test to correct for distributional bias due to sparse data and small sample size. Let k be the number of columns of the 2 by k contingency table formed by a single variable or a variable pair. When the sample size is large and there is no population stratification, the h and f will approximate well to the theoretical value h = (k-1)/k, and f = k-1. When sample size is small and there is population stratification, the h and f will vary to correct for distributional bias caused by the data structure.

When w.order =2, the wtest() will automatically calculate the main effect first and then do a pre-filter before calculating interactions. This filtering is to avoid overloading the memory before having a better understanding of the data. User can specify a smaller input.pval such as 0.05 or 0.001 for less output, or input.pval=1 or NULL for exhaustive pairwise calculation. Another optional filter is input.poolsize. It will take the top input.poolsize number of variables to calculated pairwise effect exhaustively, selected by smallest p-value; when used together with input.pval, the smaller set will be passed to pairwise calculation.

References

Maggie Haitian Wang, Rui Sun, Junfeng Guo, Haoyi Weng, Jack Lee, Inchi Hu, Pak Sham and Benny C.Y. Zee (2016). A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Research. doi:10.1093/nar/gkw347.

Maggie Haitian Wang, Haoyi Weng, Rui Sun, Jack Lee, William K.K. Wu, Ka Chun Chong, Benny C.Y. Zee. (2017). A Zoom-Focus algorithm (ZFA) to locate the optimal testing region for rare variant association tests. Bioinformatics, 33(15), 2330-2336.

Examples

Run this code

# NOT RUN {
data(diabetes.geno)
data(phenotype1)

## Step 1. HF Calculation
# Please note that parameter B is recommended to be greater than 400.
hf1 <- hf(data = diabetes.geno, w.order = 1, B = 100)
hf2 <- hf(data = diabetes.geno, w.order = 2, B = 50)

## Step 2. W-test Calculation
w1 <- wtest(diabetes.geno, phenotype1, w.order = 1, hf1 = hf1)
w2 <- wtest(diabetes.geno, phenotype1, w.order = 2, input.pval = 0.3,
            input.poolsize = 50, output.pval = 0.01, hf1 = hf1, hf2 = hf2)
w.pair <- wtest(diabetes.geno, phenotype1, w.order = 2, which.marker = c(10,13), hf2 = hf2)
# }

Run the code above in your browser using DataLab