svdr: Find a few approximate largest singular values and corresponding singular vectors of a matrix.

Description

The randomized method for truncated SVD by P. G. Martinsson and colleagues finds a few approximate largest singular values and corresponding singular vectors of a sparse or dense matrix. It is a fast and memory-efficient way to compute a partial SVD, similar in performance for many problems to irlba. The svdr method is a block method and may produce more accurate estimations with less work for problems with clustered large singular values (see the examples). In other problems, irlba may exhibit faster convergence.

Usage

svdr(x, k, it = 3, extra = 10, center = NULL, Q = NULL)

Arguments

numeric real- or complex-valued matrix or real-valued sparse matrix.

dimension of subspace to estimate (number of approximate singular values to compute).

fixed number of algorithm iterations, larger values improve accuracy.

extra

number of extra vectors of dimension ncol(x), larger values generally improve accuracy (with increased computational cost).

center

optional column centering vector whose values are implicitly subtracted from each column of A without explicitly forming the centered matrix (preserving sparsity). Optionally specify center=TRUE as shorthand for center=colMeans(x). Use for efficient principal components computation.

optional initial random matrix, defaults to a matrix of size ncol(x) by k + extra with entries sampled from a normal random distribution.

Value

Returns a list with entries:

d:: k approximate singular values
u:: k approximate left singular vectors
v:: k approximate right singular vectors
mprod:: The total number of matrix vector products carried out

References

Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions N. Halko, P. G. Martinsson, J. Tropp. Sep. 2009.

Examples

Run this code

# NOT RUN {
set.seed(1)

A <- matrix(runif(400), nrow=20)
svdr(A, 3)$d

# Compare with svd
svd(A)$d[1:3]

# Compare with irlba
irlba(A, 3)$d

# }
# NOT RUN {
# A problem with clustered large singular values where svdr out-performs irlba.
tprolate <- function(n, w=0.25)
{
  a <- rep(0, n)
  a[1] <- 2 * w
  a[2:n] <- sin( 2 * pi * w * (1:(n-1)) ) / ( pi * (1:(n-1)) )
  toeplitz(a)
}

x <- tprolate(512)
set.seed(1)
tL <- system.time(L <- irlba(x, 20))
tR <- system.time(R <- svdr(x, 20))
S <- svd(x)
plot(S$d)
data.frame(time=c(tL[3], tR[3]),
           error=sqrt(c(crossprod(L$d - S$d[1:20]), crossprod(R$d - S$d[1:20]))),
           row.names=c("IRLBA", "Randomized SVD"))

# But, here is a similar problem with clustered singular values where svdr
# doesn't out-perform irlba as easily...clusters of singular values are,
# in general, very hard to deal with!
# (This example based on https://github.com/bwlewis/irlba/issues/16.)
set.seed(1)
s <- svd(matrix(rnorm(200 * 200), 200))
x <- s$u %*% (c(exp(-(1:100)^0.3) * 1e-12 + 1, rep(0.5, 100)) * t(s$v))
tL <- system.time(L <- irlba(x, 5))
tR <- system.time(R <- svdr(x, 5))
S <- svd(x)
plot(S$d)
data.frame(time=c(tL[3], tR[3]),
           error=sqrt(c(crossprod(L$d - S$d[1:5]), crossprod(R$d - S$d[1:5]))),
           row.names=c("IRLBA", "Randomized SVD"))
# }