Learn R Programming

gPCA (version 1.0)

gPCA.batchdetect: Guided Principal Components Analysis

Description

Tests for batch effects an $n \times p$ data set with batch vector given by batch using the $\delta$ statistic resulting from guided principal componenets analysis (gPCA).

Usage

gPCA.batchdetect(x, batch, filt = NULL, nperm = 1000, center = FALSE, scaleY=FALSE, seed = NULL)

Arguments

x
an $n x p$ matrix of data where $n$ denotes observations and $p$ denotes the number of features (e.g. probe, gene, SNP, etc.).
batch
a length $n$ vector that indicates batch (group or class) for each observation.
filt
(optional) the number of features to retain after applying a variance filter. If NULL, no filter is applied. Filtering can significantly reduce the processing time in the case of very large data sets.
nperm
the number of permutations to perform for the permutation test, default is 1000.
center
(logical) Is your data x centered? If not, then center=FALSE and gPCA.batchdetect will center it for you.
scaleY
(logical) Do you want to scale the Y matrix by the number of samples in each batch? If not, then center=FALSE (default), otherwise, center=TRUE.
seed
the seed number for set.seed(). Default is NULL.

Value

delta
test statistic $\delta$ from gPCA.
p.val
$p$-value associated with $\delta$ resulting from gPCA.
delta.p
nperm length vector of delta values resulting from the permuation test.
batch
returns your length $n$ batch vector.
filt
returns the number of features the variance filter retained.
n
the number of observations
p
the number of features
b
the number of batches
PCu
principal component matrix from unguided PCA.
PCg
principal component matrix from gPCA.
varPCu1
the proportion out of the total variance associated with the first principal component of unguided PCA.
varPCg1
the proportion out of the total variance associated with the first principal component of gPCA.
cumulative.var.u
length $n$ vector of the cumulative variance of the $i=1,\dots,n$ principal components from unguided PCA.
cumulative.var.g
length $b$ vector of the cumulative variance of the $k=1,\dots,b$ principal components from gPCA.

Details

Guided principal components analysis (gPCA) is an extension of principal components analysis (PCA) that guides the singular value decomposition (SVD) of PCA by applying SVD to $\mathbf{Y}'\mathbf{X}$ where $\mathbf{Y}$ is a $n \times b$ batch indicator matrix of ones when an observation $i (i=1,\dots,n)$ is in batch $b$ and zeros otherwise.

The test statistic $\delta$ along with a one-sided $p$-value results from a gPCA.batchdetect() call, along with the values of $\delta_p$ from the permutation test. The $\delta_p$ values can be used to visualize the permutation distribution of your test using the gDist function. For more information on gPCA, please see reese.

References

Reese, S. E., Archer, K. J., Therneau, T. M., Atkinson, E. J., Vachon, C. M., de Andrade, M., Kocher, J. A., and Eckel-Passow, J. E. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal components analysis. Bioinformatics, (in review).

See Also

gDist, PCplot, CumulativeVarPlot,

Examples

Run this code
data(caseDat)
batch<-caseDat$batch
data<-caseDat$data
out<-gPCA.batchdetect(x=data,batch=batch,center=FALSE,nperm=250)
out$delta ; out$p.val

## Plots:
gDist(out)
CumulativeVarPlot(out,ug="unguided",col="blue")
PCplot(out,ug="unguided",type="1v2")
PCplot(out,ug="unguided",type="comp",npcs=4)

Run the code above in your browser using DataLab