Learn R Programming

bigPCAcpp (version 0.9.0)

pca_bigmatrix: Principal component analysis for bigmemory::big.matrix inputs

Description

Perform principal component analysis (PCA) directly on a bigmemory::big.matrix without copying the data into R memory. The exported helpers mirror the structure of base R's prcomp() while avoiding the need to materialise large matrices.

Usage

resolve_big_pointer(x, arg, allow_null = FALSE)

pca_scores_bigmatrix( xpMat, rotation, center, scale, ncomp = -1L, block_size = 1024L )

pca_variable_loadings(rotation, sdev)

pca_variable_correlations(rotation, sdev, column_sd, scale = NULL)

pca_variable_contributions(loadings)

pca_individual_contributions(scores, sdev, total_weight = NA_real_)

pca_individual_cos2(scores)

pca_variable_cos2(correlations)

# S3 method for bigpca summary(object, ...)

# S3 method for summary.bigpca print(x, digits = max(3, getOption("digits") - 3), ...)

# S3 method for bigpca plot( x, y, type = c("scree", "contributions", "correlation_circle", "biplot"), max_components = 25L, component = 1L, top_n = 20L, components = c(1L, 2L), data = NULL, draw = TRUE, ... )

Value

For pca_bigmatrix(), a bigpca object mirroring a prcomp result with elements sdev, rotation, optional center and scale vectors, column_sd, eigenvalues, explained_variance, cumulative_variance, and the sample covariance matrix. The object participates in S3 generics such as summary() and plot().

A numeric matrix of scores with rows corresponding to observations and columns to retained components.

A numeric matrix containing variable loadings for each component.

A numeric matrix of correlations between variables and components.

A numeric matrix where each entry represents the contribution of a variable to a component.

For summary.bigpca(), a summary.bigpca object containing component importance measures.

Arguments

x

A summary.bigpca object.

arg

Character string naming the argument being validated. Used to construct informative error messages.

allow_null

Logical flag indicating whether NULL is accepted for the argument. When TRUE, a NULL input is returned unchanged.

xpMat

Either a bigmemory::big.matrix or an external pointer such as mat@address that references the source big.matrix.

rotation

A rotation matrix such as the rotation element returned by pca_bigmatrix().

center

For pca_scores_bigmatrix(), a numeric vector of column means (optional).

scale

Optional numeric vector of scaling factors returned by pca_bigmatrix(). If supplied, it indicates the PCA was performed on standardised variables.

ncomp

Number of components to retain. Use a non-positive value to keep all components returned by the decomposition.

block_size

Number of rows to process per block when streaming data through BLAS kernels. Larger values improve throughput at the cost of additional memory.

sdev

A numeric vector of component standard deviations, typically the sdev element from pca_bigmatrix().

column_sd

A numeric vector with the marginal standard deviation of each original variable. When scale is supplied, correlations are computed on the standardised scale without rescaling by column_sd.

loadings

A numeric matrix such as the result of pca_variable_loadings().

scores

For pca_individual_contributions() and pca_individual_cos2(), a numeric matrix of component scores where rows correspond to observations and columns to components.

total_weight

Optional positive scalar giving the effective number of observations to use when computing contributions. Defaults to the number of rows in scores.

correlations

For pca_variable_cos2(), a numeric matrix of correlations between variables and components.

object

A bigpca object created by pca_bigmatrix(), pca_stream_bigmatrix(), or related helpers.

...

Additional arguments passed to plotting helpers.

digits

Number of significant digits to display when printing importance metrics.

y

Currently unused.

type

The plot to draw. Options include "scree" (variance explained), "contributions" (top contributing variables), "correlation_circle" (variable correlations with selected components), and "biplot" (joint display of scores and loadings).

max_components

Maximum number of components to display in scree plots.

component

Component index to highlight when drawing contribution plots.

top_n

Number of variables to display in contribution plots.

components

Length-two integer vector selecting the components for correlation circle and biplot views.

data

Optional data source (matrix, data frame, bigmemory::big.matrix, or external pointer) used to compute scores for biplots when x$scores is unavailable.

draw

Logical; if FALSE, return the data prepared for the selected plot instead of drawing it.

Functions

  • pca_scores_bigmatrix(): Project observations into principal component space while streaming from a big.matrix.

  • pca_variable_loadings(): Compute variable loadings (covariances between original variables and components).

  • pca_variable_correlations(): Compute variable-component correlations given column standard deviations.

  • pca_variable_contributions(): Derive the relative contribution of each variable to the retained components.

  • pca_individual_contributions(): Compute the relative contribution of individual observations to each component.

  • pca_individual_cos2(): Compute squared cosine values measuring the quality of representation for individual observations.

  • pca_variable_cos2(): Compute squared cosine values measuring the quality of representation for variables.

  • summary(bigpca): Summarise the component importance metrics for a bigpca result.

  • print(summary.bigpca): Print the component importance summary produced by summary.bigpca().

  • plot(bigpca): Visualise PCA diagnostics such as scree, correlation circle, contribution, and biplot displays.

See Also

bigpca, pca_scores_bigmatrix(), pca_variable_loadings(), pca_variable_correlations(), pca_variable_contributions(), and the streaming variants pca_stream_bigmatrix() and companions.

bigpca

bigpca

bigpca

Examples

Run this code
if (FALSE) { # requireNamespace("bigmemory", quietly = TRUE)
set.seed(123)
mat <- bigmemory::as.big.matrix(matrix(rnorm(40), nrow = 10))
pca <- pca_bigmatrix(mat, center = TRUE, scale = TRUE, ncomp = 3)
scores <- pca_scores_bigmatrix(mat, pca$rotation, pca$center, pca$scale, ncomp = 3)
loadings <- pca_variable_loadings(pca$rotation, pca$sdev)
correlations <- pca_variable_correlations(pca$rotation, pca$sdev, pca$column_sd, pca$scale)
contributions <- pca_variable_contributions(loadings)
list(scores = scores, loadings = loadings, correlations = correlations,
     contributions = contributions)
}

Run the code above in your browser using DataLab