Learn R Programming

GWASTools (version 1.12.2)

batchTest: Batch Effects of Genotyping

Description

batchChisqTest calculates Chi-square values for batches from 2-by-2 tables of SNPs, comparing each batch with the other batches. batchFisherTest calculates Fisher's exact test values.

Usage

batchChisqTest(genoData, batchVar, chrom.include = 1:22, sex.include = c("M", "F"), scan.exclude = NULL, return.by.snp = FALSE, correct = TRUE, verbose = TRUE, outfile = NULL)
batchFisherTest(genoData, batchVar, chrom.include = 1:22, sex.include = c("M", "F"), scan.exclude = NULL, return.by.snp = FALSE, conf.int = FALSE, verbose = TRUE, outfile = NULL)

Arguments

genoData
batchVar
A character string indicating which annotation variable should be used as the batch.
chrom.include
Integer vector with codes for chromosomes to include. Default is 1:22 (autosomes). Use 23, 24, 25, 26, 27 for X, XY, Y, M, Unmapped respectively
sex.include
Character vector with sex to include. Default is c("M", "F"). If sex chromosomes are present in chrom.include, only one sex is allowed.
scan.exclude
An integer vector containing the IDs of scans to be excluded.
return.by.snp
Logical value to indicate whether snp-by-batch matrices should be returned.
conf.int
Logical value to indicate if a confidence interval should be computed.
correct
Logical value to specify whether to apply the Yates continuity correction.
verbose
Logical value specifying whether to show progress information.
outfile
A character string to append in front of ".RData" for naming the output file.

Value

If outfile=NULL (default), all results are returned as a list. If outfile is specified, no data is returned but the list is saved to disk as "outfile.RData."batchChisqTest returns a list with the following elements:
mean.chisq
a vector of mean chi-squared values for each batch.
lambda
a vector of genomic inflation factor computed as median(chisq) / 0.456 for each batch.
chisq
a matrix of chi-squared values with SNPs as rows and batches as columns. Only returned if return.by.snp=TRUE.
batchFisherTest returns a list with the following elements:
mean.or
a vector of mean odds-ratio values for each batch. mean.or is computed as 1/mean(pmin(or, 1/or)) since the odds ratio is >1 when the batch has a higher allele frequency than the other batches and <1 for="" the="" reverse.<="" dd="">
lambda
a vector of genomic inflation factor computed as median(-2*log(pval) / 1.39 for each batch.
Each of the following is a matrix with SNPs as rows and batches as columns, and is only returned if return.by.snp=TRUE:
pval
P value
oddsratio
Odds ratio
confint.low
Low value of the confidence interval for the odds ratio. Only returned if conf.int=TRUE.
confint.high
High value of the confidence interval for the odds ratio. Only returned if conf.int=TRUE.
batchChisqTest and batchFisherTest both also return the following if return.by.snp=TRUE:
allele.counts
matrix with total number of A and B alleles over all batches.
min.exp.freq
matrix of minimum expected allele frequency with SNPs as rows and batches as columns.
Warnings:If outfile is not NULL, another file will be saved with the name "outfile.warnings.RData" that contains any warnings generated by the function.

Details

Because of potential batch effects due to sample processing and genotype calling, batches are an important experimental design factor. batchChisqTest calculates the Chi square values from 2-by-2 table for each SNP, comparing each batch with the other batches. batchFisherTest calculates Fisher's Exact Test from 2-by-2 table for each SNP, comparing each batch with the other batches. For each SNP and each batch, batch effect is evaluated by a 2-by-2 table: # of A alleles, and # of B alleles in the batch, versus # of A alleles, and # of B alleles in the other batches. Monomorphic SNPs are set to NA for all batches. The default behavior is to combine allele frequencies from males and females and return results for autosomes only. If results for sex chromosomes (X or Y) are desired, use chrom.include with values 23 and/or 25 and sex.include="M" or "F".

If there are only two batches, the calculation is only performed once and the values for each batch will be identical.

See Also

GenotypeData, chisq.test, fisher.test

Examples

Run this code
library(GWASdata)
file <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(file)
data(illuminaScanADF)
genoData <-  GenotypeData(gds, scanAnnot=illuminaScanADF)

# autosomes only, sexes combined (default)
res.chisq <- batchChisqTest(genoData, batchVar="plate")
res.chisq$mean.chisq
res.chisq$lambda

# X chromosome for females
res.chisq <- batchChisqTest(genoData, batchVar="status",
  chrom.include=23, sex.include="F", return.by.snp=TRUE)
head(res.chisq$chisq)

# Fisher exact test of "status" on X chromosome for females
res.fisher <- batchFisherTest(genoData, batchVar="status",
  chrom.include=23, sex.include="F", return.by.snp=TRUE)
qqPlot(res.fisher$pval)

close(genoData)

Run the code above in your browser using DataLab