brokenStick: The Broken Stick Method

Description

The Broken Stick model is one proposed method for estimating the number of statistically significant principal components.

Usage

brokenStick(k, n)
bsDimension(lambda, FUZZ = 0.005)

Value

The brokenStick function returns, as a real number, the expected value of the k-th longest piece when breaking a stick of length one into n total pieces. Most commonly used via the idiom brokenStick(1:N, N) to get the entire vector of lengths at one time.

The bsDimension function returns an integer, the number of significant components under this model. This is computed by finding the last point at which the observed variance is bugger than the expected value under the broken stick model by at least FUZZ.

Arguments

k: An integer between 1 and n.
n: An integer; the total number of principal components.
lambda: The set of variances from each component from a principal components analysis. These are assumed to be already sorted in decreasing order. You can also supply a SamplePCA object, and the variances will be automatically extracted.
FUZZ: A real number; anything smaller than FUZZ is assumed to equal zero for all practical purposes.

Author

Kevin R. Coombes <krc@silicovore.com>

Details

The Broken Stick model is one proposed method for estimating the number of statistically significant principal components. The idea is to model \(N\) variances by taking a stick of unit length and breaking it into \(N\) pieces by randomly (and simultaneously) selecting break points from a uniform distribution.

References

Jackson, D. A. (1993). Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology 74, 2204--2214.

Legendre, P. and Legendre, L. (1998) Numerical Ecology. 2nd English ed. Elsevier.

Examples

Run this code

brokenStick(1:10, 10)
sum( brokenStick(1:10, 10) )
fakeVar <- c(30, 20, 8, 4, 3, 2, 1)
bsDimension(fakeVar)

Run the code above in your browser using DataLab