minPtest(y, x, SNPtoGene, formula = NULL, cov = NULL, matchset = NULL, permutation = 1000, seed = NULL, subset = NULL, parallel = FALSE, ccparallel = FALSE, trace = FALSE, aggregation.fun = min, adj.method=c("bonferroni","holm","hochberg", "hommel","BH","BY","fdr","none"), ...)
n
.
n * p
matrix of covariates (i.e. SNPs) containing the genotypes coded by 0, 1 and 2. Thus, each column is assumed to represent one of the SNPs with corresponding column names. Detail for SNP coding are given below.
p x 2
comprising SNP names (first column) which are same as the column names of x, and the gene names (second column) on which the SNPs are located.
glm
or clogistic
, the latter requires library Epi, else the default method Cochran Armitage Trend Test, which requires library scrime, is fitted. Details of model specification are given below.
n * q
matrix containing the covariates for adjustment with corresponding column names.
n
containing matching numbers, needed for conditional logistic regression.
permutation
. Allows reproducibility even when running in parallel and for different numbers of parallel processes.
sfInit()
, should be called before calling minPtest. See Details.
"min"
) is applied to obtain candidate gene region-level summaries. Any other function to integrate the p-values into one single test statistic can be used, e.g., median
or different functions designed by the user.
"bonferroni"
) is used. Any other correction method as in p.adjust
can be used.
aggregation.fun
. In case of NA/NaN
within the evaluated marginal trend p-values for the SNPs from the original data (psnp
) or within the permuted trend p-values in the permutation samples (psnpperm
), na.rm=TRUE
has to be specified.
nrgene * 1
matrix of permutation-based p-values of the min P test for each candidate gene.nrgene * 1
matrix of corrected permutation-based p-values for each candidate gene.nrsnp * 1
matrix of marginal trend p-values for each SNP from the original data set.nrsnp * 1
matrix of corrected marginal trend p-values for each SNP from the original data set.nrsnp * n.permute
matrix of permuted trend p-values for each SNP in each permutation step.nrgene * 1
matrix of min P test statistics for each candidate gene from the original data set.nrgene * n.permute
matrix of permuted min P test statistics for each candidate gene in each permutation step.x
.p x 2
comprising of SNP names (first column) and names of the genes (second column) on which the SNPs are located.Computation of the min P test is based on the marginal trend p-values for a set of univariate SNP disease association and the trend p-values for the permutation samples for each SNP. The minPtest package brings together three different kinds of tests to compute such p-values that are scattered over several R packages, and automatically selects the one most appropriate for the design at hand. In any case a response vector y
, a SNP matrix x
and a mapping matrix SNPtoGene
are required. Then the default, a Cochran Armitage Trend Test (Cochran, 1954; Armitage, 1955), is automatically fitted to compute p-values. The Cochran Armitage Trend Test does not depend on covariates and matching scenario. Additionally adding a formula, see also glm
from package base, and a covariate matrix cov
an unconditional logistic regression is fitted. Unconditional logistic regression can be used without or with covariates for adjustment; either formula=y~1
or formula=y~cov1+cov2+...
. The former does not need any information relative to covariates and matching scenario. However, the latter is general for frequency matching with the inclusion of matching variables for adjustment specified in the covariate matrix cov
. Providing a matchset, as in the case of 1:1; 1:2 etc. matching, and a formula, see also clogistic
from package Epi, a conditional logistic regression is fitted. Conditional logistic regression can be used without or with covariates for adjustment; either formula=y~1
or formula=y~cov1+cov2...
. In the latter case covariates other than matching variables can be used and have to be specified in the covariate matrix cov
. In general, there are two possibilities to specify the formula, first if no covariates are used for adjustment, the formula has to be written as y~1
without specifying the covariate matrix cov
. Second if covariates other than SNPs are used for adjustment, the formula has to be written as response vector y
on the left of a ~
operator, and the clinical covariates on the right, as well as a covariate matrix has to be specified.
If SNPs genotypes are coded by 0, 1 and 2, they are included as continuous variables in the logistic regression models. If SNPs are coded as carrier SNPs 0 and 1, they are included as binary variables in the logistic regression models. If covariates are used for adjustment, the column names of the covariate matrix cov
have to be specified as used in the formula specification, to link the formula with the covariate matrix cov
.
Missing SNP genotypes in x
or, if used, missing values in cov
are accounted for, as each marginal test makes use of the available data for that SNP in x
and for that covariate in cov
only. The minPtest uses all subjects with available data for each SNP (and covariates) when fitting Cochran Armitage Trend Test or unconditional logistic regression. Note that in conditional logistic regression, the matched subjects are removed together in case of 1:1 matching. In the 1:2 matching scenario, matched subjects are removed when the missing occurs in a case, otherwise when a missing occurs in one control, only that control is removed.
Concerning parallelization on a compute cluster, i.e. with argument ccparallel=TRUE
, there are two possibilities to run minPtest:
sfInit()
, should be called before calling minPtest.
ccparallel
has to be set to TRUE and number of cpus can be chosen in the sfInit()
function.
sfCluster is a Unix tool for convenient management of R parallel processes. It is available at www.imbi.uni-freiburg.de/parallel, with detailed information.
A print function returns a short overviews of the results. The print function describes the number of subjects included in the analysis, which method is used by the package, briefing of the number of genes, the number of SNPs, the number of missings in the SNP matrix x
and the number of permutations used for the fit. A summary.minPtest
and a plot.minPtest
function are available.
Chen,B.E. et al. (2006). Resampling-based multiple hypothesis testing procedures for genetic case-control association studies. Genetic Epidemiology, 30, 495-507.
Cochran,W.G. (1954). Some methods for strengthening the common chi-squared tests. Biometrics, 10(4), 417-451.
Knaus,J. et al. (2009). Easier parallel computing in R with snowfall and sfCluster. The R Journal, 1, 54-59.
Westfall,P.H. et al.(2002). Multiple tests for genetic effects in association studies. Methods Mol Biol, 184, 143-168.
Westfall,P.H. and Young,S.S. (1993). Resampling-Based Multiple Testing: Example and Methods for p-Value Adjustment. Wiley, New York.
summary.minPtest
, plot.minPtest
# generate a simulated data set as in the example of the function generateSNPs
# consisting of 100 subjects and 200 SNPs on 5 genes.
SNP <- c(6,26,54,135,156,186)
BETA <- c(0.9,0.7,1.5,0.5,0.6,0.8)
SNPtoBETA <- matrix(c(SNP,BETA),ncol=2,nrow=6)
colnames(SNPtoBETA) <- c("SNP.item","SNP.beta")
set.seed(191)
sim1 <- generateSNPs(n=100,gene.no=5,block.no=4,block.size=10,p.same=0.9,
p.different=0.75,p.minor=c(0.1,0.4,0.1,0.4),n.sample=80,SNPtoBETA=SNPtoBETA)
# Cochran Armitage Trend Test without covariates and default permutations.
# Example: Run R sequential
### Seed
set.seed(10)
seed1 <- sample(1:1e7,size=1000)
###
minPtest.object <- minPtest(y=sim1$y, x=sim1$x, SNPtoGene=sim1$SNPtoGene,
seed=seed1)
Run the code above in your browser using DataLab