bigmemory::big.matrix inputsPerform principal component analysis (PCA) directly on a
bigmemory::big.matrix without copying the data into R memory. The
exported helpers mirror the structure of base R's prcomp() while avoiding
the need to materialise large matrices.
resolve_big_pointer(x, arg, allow_null = FALSE)pca_scores_bigmatrix(
xpMat,
rotation,
center,
scale,
ncomp = -1L,
block_size = 1024L
)
pca_variable_loadings(rotation, sdev)
pca_variable_correlations(rotation, sdev, column_sd, scale = NULL)
pca_variable_contributions(loadings)
pca_individual_contributions(scores, sdev, total_weight = NA_real_)
pca_individual_cos2(scores)
pca_variable_cos2(correlations)
# S3 method for bigpca
summary(object, ...)
# S3 method for summary.bigpca
print(x, digits = max(3, getOption("digits") - 3), ...)
# S3 method for bigpca
plot(
x,
y,
type = c("scree", "contributions", "correlation_circle", "biplot"),
max_components = 25L,
component = 1L,
top_n = 20L,
components = c(1L, 2L),
data = NULL,
draw = TRUE,
...
)
For pca_bigmatrix(), a bigpca object mirroring a prcomp result
with elements sdev, rotation, optional center and scale vectors,
column_sd, eigenvalues, explained_variance, cumulative_variance, and
the sample covariance matrix. The object participates in S3 generics such as
summary() and plot().
A numeric matrix of scores with rows corresponding to observations and columns to retained components.
A numeric matrix containing variable loadings for each component.
A numeric matrix of correlations between variables and components.
A numeric matrix where each entry represents the contribution of a variable to a component.
For summary.bigpca(), a summary.bigpca object containing
component importance measures.
A summary.bigpca object.
Character string naming the argument being validated. Used to construct informative error messages.
Logical flag indicating whether NULL is accepted for the
argument. When TRUE, a NULL input is returned unchanged.
Either a bigmemory::big.matrix or an external pointer such
as mat@address that references the source big.matrix.
A rotation matrix such as the rotation element returned by
pca_bigmatrix().
For pca_scores_bigmatrix(), a numeric vector of column means
(optional).
Optional numeric vector of scaling factors returned by
pca_bigmatrix(). If supplied, it indicates the PCA was performed on
standardised variables.
Number of components to retain. Use a non-positive value to keep all components returned by the decomposition.
Number of rows to process per block when streaming data through BLAS kernels. Larger values improve throughput at the cost of additional memory.
A numeric vector of component standard deviations, typically the
sdev element from pca_bigmatrix().
A numeric vector with the marginal standard deviation of
each original variable. When scale is supplied, correlations are computed
on the standardised scale without rescaling by column_sd.
A numeric matrix such as the result of
pca_variable_loadings().
For pca_individual_contributions() and
pca_individual_cos2(), a numeric matrix of component scores where rows
correspond to observations and columns to components.
Optional positive scalar giving the effective number of
observations to use when computing contributions. Defaults to the number of
rows in scores.
For pca_variable_cos2(), a numeric matrix of
correlations between variables and components.
A bigpca object created by pca_bigmatrix(),
pca_stream_bigmatrix(), or related helpers.
Additional arguments passed to plotting helpers.
Number of significant digits to display when printing importance metrics.
Currently unused.
The plot to draw. Options include "scree" (variance explained), "contributions" (top contributing variables), "correlation_circle" (variable correlations with selected components), and "biplot" (joint display of scores and loadings).
Maximum number of components to display in scree plots.
Component index to highlight when drawing contribution plots.
Number of variables to display in contribution plots.
Length-two integer vector selecting the components for correlation circle and biplot views.
Optional data source (matrix, data frame, bigmemory::big.matrix,
or external pointer) used to compute scores for biplots when
x$scores is unavailable.
Logical; if FALSE, return the data prepared for the selected
plot instead of drawing it.
pca_scores_bigmatrix(): Project observations into principal component
space while streaming from a big.matrix.
pca_variable_loadings(): Compute variable loadings (covariances between
original variables and components).
pca_variable_correlations(): Compute variable-component correlations given
column standard deviations.
pca_variable_contributions(): Derive the relative contribution of each variable
to the retained components.
pca_individual_contributions(): Compute the relative contribution of individual
observations to each component.
pca_individual_cos2(): Compute squared cosine values measuring the quality
of representation for individual observations.
pca_variable_cos2(): Compute squared cosine values measuring the quality
of representation for variables.
summary(bigpca): Summarise the component importance metrics for a
bigpca result.
print(summary.bigpca): Print the component importance summary produced by
summary.bigpca().
plot(bigpca): Visualise PCA diagnostics such as scree, correlation
circle, contribution, and biplot displays.
bigpca, pca_scores_bigmatrix(), pca_variable_loadings(),
pca_variable_correlations(), pca_variable_contributions(), and the
streaming variants pca_stream_bigmatrix() and companions.
bigpca
bigpca
bigpca
if (FALSE) { # requireNamespace("bigmemory", quietly = TRUE)
set.seed(123)
mat <- bigmemory::as.big.matrix(matrix(rnorm(40), nrow = 10))
pca <- pca_bigmatrix(mat, center = TRUE, scale = TRUE, ncomp = 3)
scores <- pca_scores_bigmatrix(mat, pca$rotation, pca$center, pca$scale, ncomp = 3)
loadings <- pca_variable_loadings(pca$rotation, pca$sdev)
correlations <- pca_variable_correlations(pca$rotation, pca$sdev, pca$column_sd, pca$scale)
contributions <- pca_variable_contributions(loadings)
list(scores = scores, loadings = loadings, correlations = correlations,
contributions = contributions)
}
Run the code above in your browser using DataLab