match.bca.gen: Block Coordinate Ascent Method for General (Balanced or Unbalanced) Data

Description

Solve a feature matching problem by block coordinate ascent

Usage

match.bca.gen(x, unit = NULL, cluster = NULL, w = NULL, 
	method = c("cyclical", "random"), control = list())

Value

A list of class matchFeat with components

cluster: integer vector of cluster assignments (length = now(x))
objective: minimum objective value
mu: sample mean for each cluster/class (feature-by-cluster matrix)
V: sample covariance for each cluster/class (feature-by-feature-by-cluster 3D array)
size: integer vector of cluster sizes
call: function call

Arguments

x: data matrix (rows=instances, columns=features)
unit: vector of unit labels (length = number of rows of x)
cluster: integer specifying the number of classes/clusters to assign the feature vectors to OR integer vector specifiying the initial cluster assignment.
w: feature weights in loss function. Can be specified as single positive number, vector, or positive definite matrix
method: sweeping method for block coordinate ascent: cyclical or random (simple random sampling without replacement)
control: optional list of tuning parameters

Details

If cluster is an integer vector, it must have the same length as unit and its values must range between 1 and the number of clusters.

The list control can contain a field maxit, an integer that fixes the maximum number of algorithm iterations.

References

Degras (2022) "Scalable feature matching across large data collections." tools:::Rd_expr_doi("10.1080/10618600.2022.2074429")
Wright (2015). Coordinate descent algorithms. https://arxiv.org/abs/1502.04759

Examples

Run this code

data(optdigits)
nobs <- nrow(optdigits$x) # total number of observations
n <- length(unique(optdigits$unit)) # number of statistical units
rmv <- sample.int(nobs, n-1) # remove (n-1) observations to make data unbalanced
min.m <- max(table(optdigits$unit[-rmv])) # smallest possible number of clusters
# lower values will result in an error message 
m <- min.m
result <- match.bca.gen(optdigits$x[-rmv,], optdigits$unit[-rmv], m)

Run the code above in your browser using DataLab