Implements the Expectation Maximization algorithm for the univariate data. This function is internally called by the dcem_train routine.
dcem_cluster_uv(data, meu, sigma, prior, num_clusters, iteration_count,
threshold, num_data, numcols)
(matrix): The dataset provided by the user (converted to matrix format).
(vector): The vector containing the initial meu.
(vector): The vector containing the initial standard deviation.
(vector): The vector containing the initial prior.
(numeric): The number of clusters specified by the user. Default is 2.
(numeric): The number of iterations for which the algorithm should run. If the convergence is not achieved then the algorithm stops. Default: 200.
(numeric): A small value to check for convergence (if the estimated meu(s) are within the threshold then the algorithm stops).
Note: Choosing a very small value (0.0000001) for threshold can increase the runtime substantially and the algorithm may not converge. On the other hand, choosing a larger value (0.1) can lead to sub-optimal clustering. Default: 0.00001.
(numeric): The total number of observations in the data.
(numeric): Number of columns in the dataset (After processing the missing values).
A list of objects. This list contains parameters associated with the Gaussian(s) (posterior probabilities, meu, standard-deviation and prior)
(1) Posterior Probabilities: prob: A matrix of posterior-probabilities.
(2) Meu(s): meu: It is a vector of meu. Each element of the vector corresponds to one meu.
(3) Sigma: Standard-deviation(s): sigma: A vector of standard deviation.
(4) prior: prior: A vector of prior.
(5) Membership: membership: A vector of cluster membership for data.
Parichit Sharma, Hasan Kurban, Mehmet Dalkilic DCEM: An R package for clustering big data via data-centric modification of Expectation Maximization, SoftwareX, 17, 100944 URL https://doi.org/10.1016/j.softx.2021.100944