Learn R Programming

jackstraw (version 1.1)

jackstraw: Non-Parametric Jackstraw (Wrapper)

Description

Estimates statistical significance of association between variables and their latent variables (LVs).

Usage

jackstraw(dat, method = "PCA", FUN = NULL, r = NULL, ...)

Value

jackstraw returns a list consisting of

p.value

m p-values of association tests between variables and their principal components

obs.stat

m observed F-test statistics

null.stat

s*B null F-test statistics

Optional Arguments (see linked functions)

s

a number of ``synthetic'' null variables. Out of m variables, s variables are independently permuted.

B

a number of resampling iterations.

r1

a numeric vector of latent variables (e.g., PCs) of interest. Not appropriate for all methods or functions.

covariate

a model matrix of covariates with n observations. Must include an intercept in the first column. Not appropriate for all methods and functions.

verbose

a logical specifying to print the computational progress. By default, FALSE.

seed

a seed for the random number generator.

Details

This is a wrapper for a few different functions using the jackstraw method. Overall, it computes m p-values of association between m variables and their LVs. Its resampling strategy accounts for the over-fitting characteristics due to direct computation of LVs from the observed data and protects against an anti-conservative bias.

For advanced use, one may consider computing association between variables and a subset of r estimated LVs. For example, when there may be r=3 significant PCs, a user can carry out significance tests for the top two PCs (while adjusting for the third PC), by specifying r1=c(1,2) and r=3.

Please take a careful look at your data and use appropriate graphical and statistical criteria to determine a number of interesting/significant LVs, r. It is assumed that r latent variables account for systematic variation in the data.

For advanced usage, see jackstraw.PCA, jackstraw.LFA, and jackstraw.FUN.

If s is not supplied, s is set to about 10% of m variables. If B is not supplied, B is set to m*10/s.

References

Chung and Storey (2013) Statistical significance of variables driving systematic variation in high-dimensional data Bioinformatics, 31(4): 545-554 http://bioinformatics.oxfordjournals.org/content/31/4/545

See Also

permutationPA jackstraw.PCA jackstraw.LFA jackstraw.FUN

Examples

Run this code
# NOT RUN {
set.seed(1234)
## simulate data from a latent variable model: Y = BL + E
B = c(rep(1,50),rep(-1,50), rep(0,900))
L = rnorm(20)
E = matrix(rnorm(1000*20), nrow=1000)
dat = B %*% t(L) + E
dat = t(scale(t(dat), center=TRUE, scale=TRUE))

## apply the jackstraw
out = jackstraw(dat, r=1, method="PCA")

## Use optional arguments
## For example, set s and B for a balance between speed of the algorithm and accuracy of p-values
# }
# NOT RUN {
out = jackstraw(dat, r=1, s=10, B=1000, seed=5678)
# }

Run the code above in your browser using DataLab