dcem_star_cluster_mv

A matrix: The dataset provided by the user.

data

(matrix): The matrix containing the initial mean(s) for the Gaussian(s).

mean_mat

(list): A list containing the initial covariance matrices for the Gaussian(s).

cov_list

(vector): A vector containing the initial priors for the Gaussian(s).

prior_vec

(numeric): The number of clusters specified by the user. Default value is 2.

(numeric): The number of iterations for which the algorithm should run, if the
convergence is not achieved within the specified threshold then the algorithm stops and exits.
Default: 200.

iteration_count

(numeric): A small value to check for convergence (if the estimated mean(s) are within this
specified threshold then the algorithm stops and exit).Note: Choosing a very small value (0.0000001) for threshold can increase the runtime substantially
and the algorithm may not converge. On the other hand, choosing a larger value (0.1)
can lead to sub-optimal clustering. Default: 0.00001.

threshold

(numeric): Number of rows in the dataset (After processing the missing values).

numrows

(numeric): Number of columns in the dataset (After processing the missing values).

numcols

Implements the EM* algorithm for multivariate data. This function is internally
called by the dcem_star_train routine.

Implements the Expectation Maximisation (EM)/(EM*) algorithm for clustering finite gaussian mixture models for
both multivariate and univariate datasets. The initialization is done by randomly selecting the samples from the
dataset as the mean of the Gaussian(s). This version implements the faster alternative EM* that avoids revisiting
data by leveraging the heap structure. The algorithm returns a set of Gaussian parameters-posterior probabilities, mean, co-variance matrices
(multivariate data)/standard-deviation (for univariate datasets) and priors.
Reference: Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) <doi:10.1007/s41060-017-0062-1>.
This work is partially supported by NCI Grant 1R01CA213466-01.

Sharma Parichit

DCEM

Clustering for Multivariate and Univariate Data Using
Expectation Maximization Algorithm

Kurban Hasan

Jenne Mark

Dalkilic Mehmet

dcem_star_cluster_mv function

(numeric): A small value to check for convergence (if the estimated mean(s) are within this
specified threshold then the algorithm stops and exit).

Note: Choosing a very small value (0.0000001) for threshold can increase the runtime substantially
and the algorithm may not converge. On the other hand, choosing a larger value (0.1)
can lead to sub-optimal clustering. Default: 0.00001.

dcem_star_cluster_mv: dcem_star_cluster_mv (multivariate data): Part of DCEM package.

Description

Usage

Arguments

Value

References