Learn R Programming

Dirichlet Process Clustering with Dissimilarities

The DPCD package implements a Bayesian hierarchical model for clustering dissimilarity data. The two main objectives of DPCD are to 1. Find a configuration of the objects such that the distance between them approximate the input dissimilarities, accounting for measurement error, and 2. Obtain a meaningful clustering structure for the objects.

Installation

You can install the DPCD package from CRAN with:

install.packages('DPCD')

Usage

The main function of the DPCD package is run_dpcd(), which fits a specified DPCD model to dissimilarity data. Here’s a basic example:

library(DPCD)

# Simulate data and calculate distances
set.seed(123)
x <- rbind(matrix(rnorm(20), ncol = 2), matrix(rnorm(30, mean = 4), ncol = 2))
dis_mat <- dist(x)

# Fit the Equal Spherical DPCD model
fit <- run_dpcd(dis_mat, model = "ES", p = 2, niter = 50000, nburn = 10000)
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|

The posterior samples can be accessed via the samples attribute of the returned object. You can use the extract_clusters() function to obtain a cluster assignment for each observation:

clusters <- extract_clusters(fit$samples)
clusters
#>  [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Posterior predictive checks can be performed using the post_predictive() function to ensure that the model fits well. Note that, by default, the input dissimilarities are scaled so that the maximum value is 1. This can be toggled off with scale = TRUE, but it is recommended to provide different hyperparameters (using the hyper_params argument) if scaling is disabled.

ppc <- post_predictive(fit$samples, dis_mat)

The latent object configuration can be plotted using the plot_objects() function. Since the latent coordinates of the objects are non-identifiable, a target matrix must be provided to which the posterior draws of the object coordinates are aligned.

target_matrix <- cmdscale(dis_mat, k = 2)
plot_objects(fit$samples, target_matrix = target_matrix, show_clusters = TRUE,
             main = "DPCD Estimated Configuration")

Copy Link

Version

Install

install.packages('DPCD')

Version

0.0.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Samuel Morrissette

Last Published

December 19th, 2025

Functions in DPCD (0.0.1)

bs_score

Calculate the Bayesian Silhouette Score
procrustes

Procrustes Transformation
prior_predictive

Prior Predictive Check
post_predictive

Posterior Predictive Check
extract_clusters

Extract clusters from MCMC samples
plot_objects

Plot the Object Configuration
makeSphericalSigma

Create a spherical covariance matrix
mcmc_example

MCMC Example Output
makeDiagonalSigma

Create a diagonal covariance matrix
dis_mat_example

Dissimilarity Matrix Example
run_dpcd

Run Dirichlet Process Clustering with Dissimilarities