chips

This function provides a partition to a subset of items which has high
marginal probability based on samples from a partition distribution
using the CHiPS greedy search method (Dahl, Page, Barrientos, 2024).

The SALSO algorithm is an efficient randomized greedy search method to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. The algorithm is implemented for many loss functions, including the Binder loss and a generalization of the variation of information loss, both of which allow for unequal weights on the two types of clustering mistakes. Efficient implementations are also provided for Monte Carlo estimation of the posterior expected loss of a given clustering estimate. See Dahl, Johnson, Müller (2022) <doi:10.1080/10618600.2022.2069779>.

David B Dahl

salso

Search Algorithms and Loss Functions for Bayesian Clustering

David B. Dahl

Devin J. Johnson

Peter Müller

Alex Crichton

Brendan Zabarauskas

David Tolnay

Jim Turner

Josh Stone

R. Janis Goldschmidt

Sean McArthur

Stefan Lankes

The Cranelift Project Developers 

The CryptoCorrosion Contributors 

The Rand Project Developers 

The Rust Project Developers 

Ulrik Sverdrup "bluss"

bluss 

chips function

<dl><dt>x</dt>
<dd>A \(B\)-by-\(n\) matrix, where each of the \(B\) rows
represents a clustering of \(n\) items using cluster labels. For the
\(b\)th clustering, items \(i\) and \(j\) are in the same cluster if
<code>x[b, i] == x[b, j]</code>.</dd>
<dt>threshold</dt>
<dd>The minimum marginal probability for the partial partition.
Values closer to 1.0 will yield a partition of fewer items and values
closer to 0.0 will yield a partition of more items.</dd>
<dt>nRuns</dt>
<dd>The number of runs to try, where the best result is returned.</dd>
<dt>intermediateResults</dt>
<dd>Should intermediate subset partitions be returned?</dd>
<dt>allCandidates</dt>
<dd>Should all the final subset partitions from multiple runs
be returned?</dd>
<dt>nCores</dt>
<dd>The number of CPU cores to use, i.e., the number of
simultaneous runs at any given time. A value of zero indicates to use all
cores on the system.</dd></dl>

Arguments

CHiPS Partition Greedy Search — chips

<dl>

<dt>x</dt>
<dd>A \(B\)-by-\(n\) matrix, where each of the \(B\) rows
represents a clustering of \(n\) items using cluster labels. For the
\(b\)th clustering, items \(i\) and \(j\) are in the same cluster if
<code>x[b, i] == x[b, j]</code>.</dd>


<dt>threshold</dt>
<dd>The minimum marginal probability for the partial partition.
Values closer to 1.0 will yield a partition of fewer items and values
closer to 0.0 will yield a partition of more items.</dd>


<dt>nRuns</dt>
<dd>The number of runs to try, where the best result is returned.</dd>


<dt>intermediateResults</dt>
<dd>Should intermediate subset partitions be returned?</dd>


<dt>allCandidates</dt>
<dd>Should all the final subset partitions from multiple runs
be returned?</dd>


<dt>nCores</dt>
<dd>The number of CPU cores to use, i.e., the number of
simultaneous runs at any given time. A value of zero indicates to use all
cores on the system.</dd>

</dl>

CHiPS Partition Greedy Search

Last chance! 50% off unlimited learning

chips: CHiPS Partition Greedy Search

Description

Usage

Value

Arguments

Examples