Learn R Programming

DCEM (version 1.0.0)

dcem_star_cluster_uv: dcem_star_cluster_uv (univariate data): Part of DCEM package.

Description

Implements the EM* algorithm for the univariate data. This function is internally called by the dcem_star_train routine.

Usage

dcem_star_cluster_uv(data, mean_vector, sd_vector, prior_vec, num, iteration_count,
threshold, numrows, numcols)

Arguments

data

(matrix): The dataset provided by the user (converted to matrix format).

mean_vector

(vector): The vector containing the initial means of the Gaussians.

sd_vector

(vector): The vector containing the initial standard deviation for the Gaussians. The initial sd are set to be 1. They are updated during the iterations of the algorithm.

prior_vec

(vector): The vector containing the initial priors for the Gaussians (initialized uniformly).

num

(numeric): The number of clusters specified by the user. Default is 2.

iteration_count

(numeric): The number of iterations for which the algorithm should run. If the convergence is not achieved within the specified threshold then the algorithm stops and exits. Default is 200.

threshold

(numeric): A small value to check for convergence (if the estimated mean(s) are within this specified threshold then the algorithm stops and exit).

Note: Choosing a very small value (0.0000001) for threshold can increase the runtime substantially and the algorithm may not converge. On the other hand, choosing a larger value (0.1) can lead to sub-optimal clustering. Default is 0.00001.

numrows

(numeric): Number of rows in the dataset (After processing the missing values).

numcols

(numeric): Number of columns in the dataset (After processing the missing values).

Value

A list of objects. This list contains parameters associated with the Gaussian(s) (posterior probabilities, mean, co-variance/standard-deviation and priors)

  1. (1) Posterior Probabilities: sample_out$prob A matrix of posterior-probabilities

  2. (2) Mean(s): sample_out$mean

    For univariate data: It is a vector of means. Each element of the vector corresponds to one Gaussian.

  3. (3) Standard-deviation(s): sample_out$sd

    For univariate data: Vector of standard deviation for the Gaussian(s))

  4. (4) Priors: sample_out$prior A vector of priors for the Gaussian(s).

References

Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) <https://doi.org/10.1007/s41060-017-0062-1>.