dcem_cluster_uv

(matrix): The dataset provided by the user (converted to matrix format).

data

(vector): The vector containing the initial means of the Gaussians.

mean_vector

(vector): The vector containing the initial standard deviation for the Gaussians. The initial
sd are set to be 1. They are updated during the iterations of the algorithm.

sd_vector

(vector): The vector containing the initial priors for the Gaussians. They are initialized
uniformly.

prior_vec

(numeric): The number of clusters specified by the user. Default value is 2.

(numeric): The number of iterations for which the algorithm should run. if the
convergence is not achieved within the specified threshold then the algorithm stops and exits.
Default: 200.

iteration_count

(numeric): A small value to check for convergence (if the estimated mean(s) are within this
specified threshold then the algorithm stops and exit).Note: Choosing a very small value (0.0000001) for threshold can increase the runtime substantially
and the algorithm may not converge. On the other hand, choosing a larger value (0.1)
can lead to sub-optimal clustering. Default: 0.00001.

threshold

(numeric): Number of rows in the dataset (After processing the missing values).

numrows

(numeric): Number of columns in the dataset (After processing the missing values).

numcols

Implements the Expectation Maximization algorithm for the univariate data. This function is internally
called by the dcem_train routine.

Implements the Expectation Maximisation (EM)/(EM*) algorithm for clustering finite gaussian mixture models for
both multivariate and univariate datasets. The initialization is done by randomly selecting the samples from the
dataset as the mean of the Gaussian(s). This version implements the faster alternative EM* that avoids revisiting
data by leveraging the heap structure. The algorithm returns a set of Gaussian parameters-posterior probabilities, mean, co-variance matrices
(multivariate data)/standard-deviation (for univariate datasets) and priors.
Reference: Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) <doi:10.1007/s41060-017-0062-1>.
This work is partially supported by NCI Grant 1R01CA213466-01.

Sharma Parichit

DCEM

Clustering for Multivariate and Univariate Data Using
Expectation Maximization Algorithm

Kurban Hasan

Jenne Mark

Dalkilic Mehmet

dcem_cluster_uv function

(numeric): A small value to check for convergence (if the estimated mean(s) are within this
specified threshold then the algorithm stops and exit).

Note: Choosing a very small value (0.0000001) for threshold can increase the runtime substantially
and the algorithm may not converge. On the other hand, choosing a larger value (0.1)
can lead to sub-optimal clustering. Default: 0.00001.

dcem_cluster_uv: dcem_cluster_uv (univariate data): Part of DCEM package.

Description

Usage

Arguments

Value

References