This function implements the sequentially-allocated latent structure optimization (SALSO) to find a clustering or feature allocation that minimizes various loss functions. The SALSO method was presented at the workshop "Bayesian Nonparametric Inference: Dependence Structures and their Applications" in Oaxaca, Mexico on December 6, 2017.
salso(expectedPairwiseAllocationMatrix, structure = c("clustering",
"featureAllocation")[1], loss = c("squaredError", "absoluteError", "binder",
"lowerBoundVariationOfInformation")[1], nCandidates = 100,
budgetInSeconds = 10, maxSize = 0)
A n
-by-n
symmetric matrix
whose (i,j)
elements gives the estimated expected number of times that items
i
and j
are in the same subset (i.e., cluster or feature).
Either "clustering"
or "featureAllocation"
to indicate
the optimization seeks to produce a clustering or a feature allocation.
One of "squaredError"
, "absoluteError"
, "binder"
, or
"lowerBoundVariationOfInformation"
to indicate the optimization should seeks to
minimize squared error loss, absolute error loss, Binder loss (Binder 1978), or the lower
bound of the variation of information loss (Wade & Ghahramani 2017), respectively. When
structure="clustering"
, the first three are equivalent. When
structure="featureAllocation"
, only the first two are valid.
The (maximum) number of candidates to consider. Fewer than
nCandidates
may be considered if the time in budgetInSeconds
is exceeded.
The computational cost is linear in the number of candidates and there are rapidly
diminishing returns to more candidates.
The (maximum) number of seconds to devote to the optimization. When this time is exceeded, no more candidates are considered.
Either zero or a positive integer. If a positive integer, the
optimization is constrained to produce solutions whose number of clusters or number of
features is no more than the supplied value. If zero, the size is not constrained.
To avoid overfitting in feature allocation estimation, it is recommended that
"maxSize"
be close the mean number of features (i.e., columns) in the
feature allocations that generated the expectedPairwiseAllocationMatrix
.
A clustering (as a vector of cluster labels) or a feature allocation (as a binary matrix of feature indicators).
Wade, S. and Ghahramani, Z. (2017). Bayesian cluster analysis: Point estimation and credible balls. Bayesian analysis.
Binder, D. (1978). Bayesian Cluster Analysis. Biometrika, 65: 31<U+2013>38.
# NOT RUN {
probabilities <- expectedPairwiseAllocationMatrix(iris.clusterings)
salso(probabilities)
expectedCounts <- expectedPairwiseAllocationMatrix(USArrests.featureAllocations)
salso(expectedCounts,"featureAllocation")
# }
# NOT RUN {
# }
Run the code above in your browser using DataCamp Workspace