dcem_cluster_uv

(matrix): The dataset provided by the user (converted to matrix format).

data

(vector): The vector containing the initial meu.

(vector): The vector containing the initial standard deviation.

sigma

(vector): The vector containing the initial prior.

prior

(numeric): The number of clusters specified by the user. Default is 2.

num_clusters

(numeric): The number of iterations for which the algorithm should run. If the
convergence is not achieved then the algorithm stops.
Default: 200.

iteration_count

(numeric): A small value to check for convergence (if the estimated meu(s)
are within the threshold then the algorithm stops).Note: Choosing a very small value (0.0000001) for threshold can increase the runtime
substantially and the algorithm may not converge. On the other hand, choosing a larger
value (0.1) can lead to sub-optimal clustering. Default: 0.00001.

threshold

(numeric): The total number of observations in the data.

num_data

(numeric): Number of columns in the dataset (After processing the
missing values).

numcols

Implements the Expectation Maximization algorithm for the univariate data. This function is internally
called by the dcem_train routine.

Implements the Improved Expectation Maximisation EM* and the traditional EM algorithm for clustering
big data (gaussian mixture models for both multivariate and univariate datasets). This version
implements the faster alternative-EM* that expedites convergence via structure based data segregation.
The implementation supports both random and K-means++ based initialization. Reference: Parichit Sharma,
Hasan Kurban, Mehmet Dalkilic (2022) <doi:10.1016/j.softx.2021.100944>. Hasan Kurban,
Mark Jenne, Mehmet Dalkilic (2016) <doi:10.1007/s41060-017-0062-1>.

Sharma Parichit

DCEM

Clustering Big Data using Expectation Maximization Star (EM*)
Algorithm

Kurban Hasan

Dalkilic Mehmet

dcem_cluster_uv function

(numeric): A small value to check for convergence (if the estimated meu(s)
are within the threshold then the algorithm stops).

Note: Choosing a very small value (0.0000001) for threshold can increase the runtime
substantially and the algorithm may not converge. On the other hand, choosing a larger
value (0.1) can lead to sub-optimal clustering. Default: 0.00001.

dcem_cluster_uv: dcem_cluster_uv (univariate data): Part of DCEM package.

Description

Usage

Arguments

Value

References