Learn R Programming

The package provides some basic implementation of a Gibbs sampler for a Chinese Restaurant Process along with some visual aids to help understand how the sampling works.

This is developed as part of a postgraduate school project for an Advanced Bayesian Nonparametric course. It is inspired by Tamara Broderick's presentation on Nonparametric Bayesian statistics given at the Simons institute.

Example usages

require(nonparametric.bayes)

generate_dirichlet_clusters(10, 10)
cluster_datapoints(split_data$x, sigma0=diag(3^2, 2))

Copy Link

Version

Install

install.packages('nonparametric.bayes')

Monthly Downloads

226

Version

0.0.1

License

MIT + file LICENSE

Maintainer

Erik-Cristian Seulean

Last Published

November 29th, 2021

Functions in nonparametric.bayes (0.0.1)

cluster_datapoints

Gibbs sampling for the Chinese Restaurant Process Implementation details can be found in the associated paper The algorithm stops at every 1000th iteration and prints the current cluster configuration.
rDPM_visual

Sequentially generate draws from a Dirichlet process mixture model, by showing step by step the iterations taken. The plot is centered at 0, with x and y from -5 to 5. The mixture draws the centres for clusters from a Normal distribution with mean mu and standard deviation sigma_0
rdirichlet

Generate a sample from a Dirichlet distirbution Using: https://en.wikipedia.org/wiki/Dirichlet_distribution#Random_number_generation
rDPM

Sequentially generate draws from a Dirichlet process mixture model, by showing step by step the iterations taken. The plot is centered at 0, with x and y from -5 to 5. The mixture draws the centres for clusters from a Normal distribution with mean mu and standard deviation sigma_0 Additional to plotting the points, it also returns the points sampled.
generate_dirichlet_clusters

Draws from a Dirichlet distribution and shows the clusters that were generated by this draw. Varying alpha, will put more or less mass in the first clusters compared to higher clusters (rhos).
generate_split_data

Generates a dataset used to exemplify clustering The cluster centers are set relatively far away to see how well the algorithm performs in simple scenarios
generate_dirichlet_clusters_with_sampled_points

Draws from a Dirichlet distribution and shows the clusters that were generated by this draw. Additionally, adds points to these clusters and shows which clusters are occupied