Learn R Programming

dils (version 0.8.1)

ScalablePCA: Perform Principal Component Analysis on a large data set

Description

Run prcomp on subsamples of the data set and compile the results for the first dimension.

Usage

ScalablePCA(x, filename = NULL, db = NULL, subsample = 10000, n.subsamples = 1000, ignore.cols, use.cols, return.sds = FALSE, progress.bar = FALSE)

Arguments

x
data.frame, data over which to run PCA
filename
character, name of the file containing the data. This must be a tab-delimited file with a header row formatted per the default options for read.delim.
db
Object type, database connection to table containing the data (NOT IMPLEMENTED).
subsample
numeric or logical, If an integer, size of each subsample. If FALSE, runs PCA on entire data set.
n.subsamples
numeric, number of subsamples.
ignore.cols
numeric, indices of columns not to include.
use.cols
numeric, indices of columns to use.
return.sds
logical, if TRUE return the standard deviations of each network's edge weights.
progress.bar
logical, if TRUE then progress in running subsamples will be shown.

Value

If return.sds is FALSE, return named vector of component weights for first dimension of principal component analysis (see example for comparison to prcomp).If return.sds is TRUE, return a list.
coefficients
named vector of the component weights for first dimension of principal component analysis (see example for comparison to prcomp).
sds
named vector of the standard deviations of each network's edge weights.

Details

Scales the function prcomp to data sets with an arbitrarily large number of rows by running prcomp on repeated subsamples of the rows.

References

https://github.com/shaptonstahl/

See Also

prcomp

Examples

Run this code
data(iris)        # provides example data
prcomp(iris[,1:4], center=FALSE, scale.=FALSE)$rotation[,1]
ScalablePCA(iris, subsample=10, use.cols=1:4)
ScalablePCA(iris, subsample=10, ignore.cols=5)

Run the code above in your browser using DataLab