Learn R Programming

esaBcv (version 1.0.1)

EsaBcv: Estimate Latent Factor Matrix

Description

Find out the best number of factors using Bi-Cross-Validation (BCV) with Early-Stopping-Alternation (ESA) and then estimate the factor matrix.

Usage

EsaBcv(Y, X = NULL, r.limit = 20, niter = 3, nRepeat = 12, only.r = F,
  svd.method = "fast", center = F)

Arguments

Y
observed data matrix. p is the number of variables and n is the sample size. Dimension is c(n, p)
X
the known predictors of size c(n, k) if any. Default is NULL (no known predictors). k is the number of known covariates.
r.limit
the maximum number of factor to try. Default is 20. Can be set to Inf.
niter
the number of iterations for ESA. Default is 3.
nRepeat
number of repeats of BCV. In other words, the random partition of $Y$ will be repeated for nRepeat times. Default is 12.
only.r
whether only to estimate and return the number of factors.
svd.method
either "fast", "propack" or "standard". "fast" is using the fast.svd function in package corpcor to compute SVD, "propack" is using the propack.svd
center
logical, whether to center the data before factor analysis. Default is False.

Value

  • EsaBcv returns an obejct of class "esabcv" The function plot plots the cross-validation results and points out the number of factors estimated An object of class "esabcv" is a list containing the following components:
  • best.rthe best number of factor estimated
  • estSigmathe diagonal entries of estimated $\Sigma$ which is a vector of length p
  • estUthe estimated $U$. Dimension is c(n, r)
  • estDthe estimated diagonal entries of $D$ which is a vector of length r
  • estVthe estimated $V$. Dimension is c(p, r)
  • betathe estimated $\beta$ which is a matrix of size c(k, p). Return NULL if the argument X is NULL.
  • estSthe estimated signal(factor) matrix $S$ where $$S = 1 \mu' + X \beta + n^{1/2}U D V'$$
  • muthe sample centers of each variable which is a vector of length p. It's an estimate of $\mu$. Return NULL if the argument center is False.
  • max.rthe actual maximum number of factors used. For the details of how this is decided, please refer to Owen and Wang (2015)
  • result.lista matrix with dimension c(nRepeat, (max.r + 1)) storing the detailed BCV entrywise MSE of each repeat for r from 0 to max.r

Details

The model is $$Y = 1 \mu' + X \beta + n^{1/2}U D V' + E \Sigma^{1/2}$$ where $D$ and $\Sigma$ are diagonal matrices, $U$ and $V$ are orthogonal and $mu'$ and $V'$ represent _mu transposed_ and _V transposed_ respectively. The entries of $E$ are assumed to be i.i.d. standard Gaussian. The model assumes heteroscedastic noises and especially works well for high-dimensional data. The method is based on Owen and Wang (2015). A warning is that when nonnull X is given or centering the data is required (which is essentially adding a known covariate with all $1$), the method will first use linear regression to estimate the coefficients of the known covariates, and then estimate the latent factors from the residuals, whose low-rank part is actually the proportion of the latent factors that are orthogonal to X minus the noises that are projected to X. Thus, to make the residuals still low rank, k should be a small number. Even though the latent factors estimated from the residuals will be biased, the estimate of the whole signal (factor) matrix S will still be OK.

References

Art B. Owen and Jingshu Wang(2015), Bi-cross-validation for factor analysis, http://de.arxiv.org/pdf/1503.03515

See Also

ESA, plot.esabcv

Examples

Run this code
Y <- matrix(rnorm(100), nrow = 10)
EsaBcv(Y)

Run the code above in your browser using DataLab