skatOMeta: Combine SKAT-O analyses from one or more studies.

Description

Takes as input `seqMeta` objects (from e.g. prepScores), and meta analyzes them, using SKAT-O. See the package vignette for more extensive documentation.

Usage

skatOMeta(..., SNPInfo = NULL, skat.wts = function(maf) {     stats::dbeta(maf, 1, 25) }, burden.wts = function(maf) {     as.numeric(maf <= 0.01)="" },="" rho =" c(0," 1),="" method =" c("integration"," "saddlepoint",="" "liu"),="" snpnames =" "Name"," aggregateby =" "gene"," mafrange =" c(0," 0.5),="" verbose =" FALSE)

Arguments

...

seqMeta objects

SNPInfo

The SNP Info file. This should contain the fields listed in snpNames and aggregateBy. Only SNPs in this table will be meta analyzed, so this may be used to restrict the analysis.

skat.wts

Either a function to calculate testing weights for SKAT, or a character specifying a vector of weights in the SNPInfo file. For skatOMeta the default are the `beta' weights.

burden.wts

Either a function to calculate weights for the burden test, or a character specifying a vector of weights in the SNPInfo file. For skatOMeta the default are the T1 weights.

rho

A sequence of values that specify combinations of SKAT and a burden test to be considered. Default is c(0,1), which considers SKAT and a burden test.

method

p-value calculation method. Should be one of 'saddlepoint', 'integration', or 'liu'.

snpNames

The field of SNPInfo where the SNP identifiers are found. Default is 'Name'

aggregateBy

The field of SNPInfo on which the skat results were aggregated. Default is 'gene'. Though gene groupings are not explicitely required for single snp analysis, it is required to find where single snp information is stored in the seqMeta objects.

mafRange

Range of MAF's to include in the analysis (endpoints included). Default is all SNPs (0

verbose

logical. Whether progress bars should be printed.

Value

a data frame with the following columns:

Details

skatOMeta() implements the SKAT-Optimal test, which picks the `best' combination of SKAT and a burden test, and then corrects for the flexibility afforded by this choice. Specifically, if the SKAT statistic is Q1, and the squared score for a burden test is Q2, SKAT-O considers tests of the form (1-rho)*Q1 + rho*Q2, where rho between 0 and 1. The values of rho are specified by the user using the argument rho. In the simplest form, which is the default, SKAT-O computes a SKAT test and a T1 test, and reports the minimum p-value, corrected for multiple testing. See the vignette or the accompanying references for more details. If there is a single variant in the gene, or the burden test is undefined (e.g. there are no rare alleles for the T1 test), SKAT is reported (i.e. rho=0). Note 1: the SKAT package uses the same weights for both SKAT and the burden test, which this function does not. Note 2: all studies must use coordinated SNP Info files - that is, the SNP names and gene definitions must be the same. Note 3: The method of p-value calculation is much more important here than in SKAT. The `integration' method is fast and typically accurate for p-values larger than 1e-9. The saddlepoint method is slower, but has higher relative accuracy. Note 4: Since p-value calculation can be slow for SKAT-O, and less accurate for small p-values, a reasonable alternative would be to first calculate SKAT and a burden test, and record the minimum p-value, which is a lower bound for the SKAT-O p-value. This can be done quickly and accurately. Then, one would only need to perform SKAT-O on the small subset of genes that are potentially interesting. Please see the package vignette for more details.

References

Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011) Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test (SKAT). American Journal of Human Genetics. Lee, S. and Wu, M.C. and Lin, X. (2012) Optimal tests for rare variant effects in sequencing association studies. Biostatistics.

Examples

Run this code

## Not run: 
# ### load example data for 2 studies
# data(seqMetaExample)
# 
# ####run on each study:
# cohort1 <- prepScores(Z=Z1, y~sex+bmi, SNPInfo = SNPInfo, data =pheno1)
# cohort2 <- prepScores(Z=Z2, y~sex+bmi, SNPInfo = SNPInfo, kins=kins, data=pheno2)
# 
# #### combine results:
# ##skat-O with default settings:
# out1 <- skatOMeta(cohort1, cohort2, SNPInfo = SNPInfo, method = "int")
# head(out1)
# 
# ##skat-O, using a large number of combinations between SKAT and T1 tests:
# out2 <- skatOMeta(cohort1, cohort2, rho=seq(0,1,length=11), SNPInfo=SNPInfo, method="int")
# head(out2)
# 
# #rho = 0 indicates SKAT gave the smaller p-value (or the T1 is undefined) 
# #rho=1 indicates the burden test was chosen
# # 0 < rho < 1 indicates some other value was chosen
# #notice that most of the time either the SKAT or T1 is chosen
# table(out2$rho)
# 
# ##skat-O with beta-weights used in the burden test:
# out3 <- skatOMeta(cohort1,cohort2, burden.wts = function(maf){dbeta(maf,1,25) }, 
#                   rho=seq(0,1,length=11),SNPInfo = SNPInfo, method="int")
# head(out3)
# table(out3$rho)
# 
# ########################
# ####binary data
# cohort1 <- prepScores(Z=Z1, ybin~1, family=binomial(), SNPInfo=SNPInfo, data=pheno1)
# out.bin <- skatOMeta(cohort1, SNPInfo = SNPInfo, method="int")
# head(out.bin)
# 
# ####################
# ####survival data
# cohort1 <- prepCox(Z=Z1, Surv(time,status)~strata(sex)+bmi, SNPInfo=SNPInfo, 
#                    data=pheno1)
# out.surv <- skatOMeta(cohort1, SNPInfo = SNPInfo, method="int")
# head(out.surv)
# 
# ##########################################
# ###Compare with SKAT and T1 tests on their own:
# cohort1 <- prepScores(Z=Z1, y~sex+bmi, SNPInfo=SNPInfo, data=pheno1)
# cohort2 <- prepScores(Z=Z2, y~sex+bmi, SNPInfo=SNPInfo, kins=kins, data=pheno2)
# 
# out.skat <- skatMeta(cohort1,cohort2,SNPInfo=SNPInfo)
# out.t1 <- burdenMeta(cohort1,cohort2, wts= function(maf){as.numeric(maf <= 0.01)}, 
#                      SNPInfo=SNPInfo)
#            
# #plot results 
# #We compare the minimum p-value of SKAT and T1, adjusting for multiple tests 
# #using the Sidak correction, to that of SKAT-O.
# 
# par(mfrow=c(1,3))
# pseq <- seq(0,1,length=100)
# plot(y=out.skat$p, x=out1$p,xlab="SKAT-O p-value", ylab="SKAT p-value", main ="SKAT-O vs SKAT")
# lines(y=pseq,x=1-(1-pseq)^2,col=2,lty=2, lwd=2)
# abline(0,1)
# 
# plot(y=out.t1$p, x=out1$p,xlab="SKAT-O p-value", ylab="T1 p-value", main ="SKAT-O vs T1")
# lines(y=pseq,x=1-(1-pseq)^2,col=2,lty=2, lwd=2)
# abline(0,1)
# 
# plot(y=pmin(out.t1$p, out.skat$p,na.rm=T), x=out1$p,xlab="SKAT-O p-value", 
#      ylab="min(T1,SKAT) p-value", main ="min(T1,SKAT) vs SKAT-O")	
# lines(y=pseq,x=1-(1-pseq)^2,col=2,lty=2, lwd=2)
# abline(0,1)
# legend("bottomright", lwd=2,lty=2,col=2,legend="Bonferroni correction")	
# ## End(Not run)

Run the code above in your browser using DataLab