
Last chance! 50% off unlimited learning
Sale ends in
cor
function. However, because the resulting matrix is not-sparse, this function still cannot be used with very large matrices.cor.sparse(X, Y = NULL, cov = FALSE)
X
. When Y
is specified, the result is a rectangular (non-sparse!) Matrix of size nrow(X)
by nrow(Y)
with the correlation coefficients between the columns of X
and Y
.
When cov = T
, the result is a covariance matrix (i.e. a non-normalized correlation).
The computation of the standard deviation (to turn covariance into correlation) is trivial in the case Y = NULL
, as they are found on the diagonal of the covariance matrix. In the case Y != NULL
uses the principle that
cor
in the base packages, cosSparse
, assocSparse
for other sparse association measures.# reasonably fast (though not instantly!) with
# sparse matrices up to a resulting matrix size of 1e8 cells.
# However, the calculations and the resulting matrix take up lots of memory
X <- rSparseMatrix(1e4, 1e4, 1e5)
system.time(M <- cor.sparse(X))
print(object.size(M), units = "auto") # more than 750 Mb
# Most values are low, so it often makes sense
# to remove low values to keep results sparse
M <- drop0(M, tol = 0.4)
print(object.size(M), units = "auto") # normally reduces size by half or more
length(M@x) / prod(dim(M)) # down to less than 0.05% non-zero entries
# comparison with other methods
# cor.sparse is much faster than cor from the stats package
# but cosSparse is even quicker than both!
X <- rSparseMatrix(1e3, 1e3, 1e4)
X2 <- as.matrix(X)
# if there is a warning, try again with different random X
system.time(McorRegular <- cor(X2))
system.time(McorSparse <- cor.sparse(X))
system.time(McosSparse <- cosSparse(X))
# cor and cor.sparse give identical results
all.equal(McorSparse, McorRegular)
# cor.sparse and cosSparse are not identical, but close
McosSparse <- as.matrix(McosSparse)
dimnames(McosSparse) <- NULL
all.equal(McorSparse, McosSparse)
# Actually, cosSparse and cor.sparse are *almost* identical!
cor(as.dist(McorSparse), as.dist(McosSparse))
# Visually it looks completely identical
# Note: this takes some time to plot
plot(as.dist(Mcor.sparse), as.dist(McosSparse))
# So: consider using cosSparse instead of cor or cor.sparse.
# With sparse matrices, this gives mostly the same results,
# but much larger matrices are possible
# and the computations are quicker and more sparse
Run the code above in your browser using DataLab