cv.svd: Cross-Validation for choosing the rank of an SVD approximation.

Description

Perform Wold- or Gabriel-style cross-validation for determining the appropriate rank SVD approximation of a matrix.

Usage

cv.svd.gabriel(x, krow = 2, kcol = 2, 
                 maxrank = floor(min(n - n/krow, p - p/kcol)))
                 
  cv.svd.wold(x, k = 5, maxrank = 20, tol = 1e-4, maxiter = 20)

Arguments

the matrix to cross-validate.

the number of folds (for Wold-style CV).

krow

the number of row folds (for Gabriel-style CV).

kcol

the number of column folds (for Gabriel-style CV).

maxrank

the maximum rank to cross-validate up to.

tol

the convergence tolerance for impute.svd.

maxiter

the maximum number of iterations for impute.svd.

Value

call

the function call

msep

the mean square error of prediction (MSEP); this is a matrix whose columns contain the mean square errors in the predictions of the holdout sets for ranks 0, 1, ..., maxrank across the different replicates.

maxrank

the maximum rank for which prediction error is estimated; this is equal to nrow(msep)+1.

krow

the number of row folds (for Gabriel-style only).

kcol

the number of column folds (for Gabriel-style only).

rowsets

the partition of rows into krow holdout sets (for Gabriel-style only).

colsets

the partition of the columns into kcol holdout sets (for Gabriel-style only).

the number of folds (for Wold-style only).

sets

the partition of indices into k holdout sets (for Wold-style only).

Details

These functions are for cross-validating the SVD of a matrix. They assume a model $X = U D V' + E$ with the terms being signal and noise, and try to find the best rank to truncate the SVD of x at for minimizing prediction error. Here, prediction error is measured as sum of squares of residuals between the truncated SVD and the signal part.

For both types of cross-validation, in each replicate we leave out part of the matrix, fit an SVD approximation to the left-in part, and measure prediction error on the left-out part.

In Wold-style cross-validation, the holdout set is "speckled", a random set of elements in the matrix. The missing elements are predicted using impute.svd.

In Gabriel-style cross-validation, the holdout set is "blocked". We permute the rows and columns of the matrix, and leave out the lower-right block. We use a modified Schur-complement to predict the held-out block. In Gabriel-style, there are krow*kcol total folds.

References

Gabriel, K.R. (2002). Le biplot--outil d'explaration de donn<U+00E9>es multidimensionelles. J. Roy. Stat. Soc. Series B 40 186--196.

Owen, A.B. and Perry, P.O. (2009). Bi-cross-validation of the SVD and the non-negative matrix factorization. Annals of Applied Statistics 3(2) 564--594.

Wold, S. (1978). Cross-validitory estimation of the number of components in factor and principal components models. Technometrics 20(4) 397--405.

Examples

Run this code

# NOT RUN {
  # generate a rank-2 matrix plus noise
  n <- 50; p <- 20; k <- 2
  u <- matrix( rnorm( n*k ), n, k )
  v <- matrix( rnorm( p*k ), p, k )
  e <- matrix( rnorm( n*p ), n, p )
  x <- u %*% t(v) + e
  
  # perform 5-fold Wold-style cross-validtion
  (cvw <- cv.svd.wold( x, 5, maxrank=10 ))
  
  # perform (2,2)-fold Gabriel-style cross-validation
  (cvg <- cv.svd.gabriel( x, 2, 2, maxrank=10 ))
# }

Run the code above in your browser using DataLab