hdpca (version 1.1.3)

hdpc_est: High-dimensional PCA estimation

Description

Estimates the population eigenvalues, angles between the sample and population eigenvectors, correlations between the sample and population PC scores, and the asymptotic shrinkage factors. Three different estimation methods can be used.

Usage

hdpc_est(samp.eval, p, n, method = c("d.gsp", "l.gsp", "osp"), 
n.spikes, n.spikes.max, n.spikes.out, nonspikes.out = FALSE, smooth = TRUE)

Arguments

samp.eval

Numeric vector containing the sample eigenvalues. The vector must have dimension n or n-1, it may be unordered.

p

The number of features.

n

The number of samples.

method

String specifying the estimation method. Possible values are "d.gsp" (default),"l.gsp" and "osp".

n.spikes

Number of distant spikes in the population (Optional).

n.spikes.max

Upper bound of the number of distant spikes in the population. Optional, but needed if n.spikes is not specified. Ignored if n.spikes is specified.

n.spikes.out

Number of distant spikes to be returned in the output (Optional). If not specified, all the estimated distant spikes are returned.

nonspikes.out

Logical. If TRUE and method="l.gsp", the estimated set of non-spikes are returned. If TRUE and method="osp", the estimated value of the non-spike is returned.

smooth

Logical. If TRUE and method="l.gsp", kernel smoothing will be performed on the estimated population eigenvalue spectrum. Default is TRUE.

Value

spikes

An array of estimated distant spikes. If n.spikes.out is specified, only largest n.spikes.out many eigenvalues are returned.

n.spikes

Number of distant spikes. If n.spikes is not provided, then the estimated value is returned.

angles

An array of estimated cosines of angles between the sample and population eigenvectors corresponding to the distant spikes. The \(k^{th}\) element of the array is the estimated cosine of the angle between \(k^{th}\) sample and population eigenvectors. If n.spikes.out is specified, only first n.spikes.out many \(\cos\)(angle)-s are returned.

correlations

An array of estimated correlations between the sample and population PC scores corresponding to the distant spikes. The \(k^{th}\) element of the array is the estimated correlation between \(k^{th}\) sample and population PC scores. If n.spikes.out is specified, only first n.spikes.out many correlations are returned.

shrinkage

An array of estimated asymptotic shrinkage factors corresponding to the distant spikes. If n.spikes.out is specified, only first n.spikes.out many shrinkage factors are returned.

loss

If method="l.gsp", L-infinity loss function for the spectrum estimation is returned.

nonspikes

If nonspikes.out=TRUE, estimated non-spikes are returned. If \(\lambda\)-estimation method is used then this is a numeric vector of length p-n.spikes. If OSP model based method is used then this is a scalar number.

Details

The different choices for method are:

  • "d.gsp": \(d\)-estimation method based on the Generalized Spiked Population (GSP) model.

  • "l.gsp": \(\lambda\)-estimation method based on the GSP model.

  • "osp": Estimation method based on the Ordinary Spiked Population (OSP) model.

At least one of n.spikes and n.spikes.max must be provided. If n.spikes is provided then n.spikes.max is ignored, else n.spikes.max is used to find out the number of distant spikes using select.nspike.

The argument nonspikes.out is ignored if method="d.gsp".

The argument smooth is useful when the user assumes the population spectral distribution to be continuous.

References

Dey, R. and Lee, S. (2019). Asymptotic properties of principal component analysis and shrinkage-bias adjustment under the generalized spiked population model. Journal of Multivariate Analysis, Vol 173, 145-164.

See Also

select.nspike,pc_adjust

Examples

Run this code
# NOT RUN {
data(hapmap)
#n = 198, p = 75435 for this data

####################################################
# }
# NOT RUN {
train.eval<-hapmap$train.eval
n<-hapmap$nSamp
p<-hapmap$nSNP

m<-select.nspike(train.eval,p,n,n.spikes.max=10,evals.out=FALSE)$n.spikes
out<-hdpc_est(train.eval, p, n, method = "d.gsp", 
n.spikes=m, n.spikes.out=2, nonspikes.out = FALSE)	#Output 2 spikes, no non-spike

out<-hdpc_est(train.eval, p, n, method = "l.gsp", 
n.spikes=m, nonspikes.out = FALSE)	#Output m many spikes, no non-spike

out<-hdpc_est(train.eval, p, n, method = "l.gsp", 
n.spikes.max=10, nonspikes.out = TRUE)	#Output all eigenvalues

out<-hdpc_est(train.eval, p, n, method = "osp", 
n.spikes=m, n.spikes.out=2, nonspikes.out = TRUE)	#Output m many spikes, no non-spike
# }

Run the code above in your browser using DataCamp Workspace