hdpca (version 1.1.3)

select.nspike: Finding Distant Spikes

Description

Estimates the number of distant spikes in the population based on the Generalized Spiked Population model. A finite upper bound (n.spikes.max) of the number of distant spikes must be provided.

Usage

select.nspike(samp.eval, p, n, n.spikes.max, evals.out = FALSE, smooth = TRUE)

Arguments

samp.eval

Numeric vector containing the sample eigenvalues. The vector must have dimension n or n-1, it may be unordered.

p

The number of features.

n

The number of samples.

n.spikes.max

Upper bound of the number of distant spikes in the population.

evals.out

Logical. If TRUE, the estimated spikes and non-spikes are returned.

smooth

Logical. If TRUE, kernel smoothing will be performed on the estimated population eigenvalue spectrum. Default is TRUE.

Value

n.spikes

Estimated number of distant spikes.

spikes

If evals.out=TRUE, estimated distant spikes are returned.

nonspikes

If evals.out=TRUE, estimated non-spikes are returned.

loss

If evals.out=TRUE, L-infinity loss function for the spectrum estimation is returned.

Details

The function searches between \(0\) and n.spikes.max to find out the number of distant spikes in the population. It also estimates both non-spiked and spiked eigenvalues based on the \(\lambda\)-estimation method.

The argument smooth is useful when the user assumes the population spectral distribution to be continuous.

References

Dey, R. and Lee, S. (2019). Asymptotic properties of principal component analysis and shrinkage-bias adjustment under the generalized spiked population model. Journal of Multivariate Analysis, Vol 173, 145-164.

See Also

hdpc_est,pc_adjust

Examples

Run this code
# NOT RUN {
data(hapmap)
#n = 198, p = 75435 for this data

####################################################
# }
# NOT RUN {
#If you just want the estimated number of spikes
train.eval<-hapmap$train.eval
n<-hapmap$nSamp
p<-hapmap$nSNP

select.nspike(train.eval,p,n,n.spikes.max=10,evals.out=FALSE)

#If you want the estimated spikes and non-spikes
out<-select.nspike(train.eval,p,n,n.spikes.max=10,evals.out=TRUE)
# }

Run the code above in your browser using DataLab