Learn R Programming

DCEM (version 1.0.0)

dcem_star_cluster_mv: dcem_star_cluster_mv (multivariate data): Part of DCEM package.

Description

Implements the EM* algorithm for multivariate data. This function is internally called by the dcem_star_train routine.

Usage

dcem_star_cluster_mv(data, mean_mat, cov_list, prior_vec, num, iteration_count,
threshold, numrows, numcols)

Arguments

data

A matrix: The dataset provided by the user.

mean_mat

(matrix): The matrix containing the initial mean(s) for the Gaussian(s).

cov_list

(list): A list containing the initial covariance matrices for the Gaussian(s).

prior_vec

(vector): A vector containing the initial priors for the Gaussian(s).

num

(numeric): The number of clusters specified by the user. Default value is 2.

iteration_count

(numeric): The number of iterations for which the algorithm should run, if the convergence is not achieved within the specified threshold then the algorithm stops and exits. Default: 200.

threshold

(numeric): A small value to check for convergence (if the estimated mean(s) are within this specified threshold then the algorithm stops and exit).

Note: Choosing a very small value (0.0000001) for threshold can increase the runtime substantially and the algorithm may not converge. On the other hand, choosing a larger value (0.1) can lead to sub-optimal clustering. Default: 0.00001.

numrows

(numeric): Number of rows in the dataset (After processing the missing values).

numcols

(numeric): Number of columns in the dataset (After processing the missing values).

Value

A list of objects. This list contains parameters associated with the Gaussian(s) (posterior probabilities, mean, co-variance and priors)

  1. (1) Posterior Probabilities: sample_out$prob (a matrix of posterior-probabilities for the points in the dataset.)

  2. (2) Mean(s): sample_out$mean

    For multivariate data: It is a matrix of means for the Gaussian(s). Each row in the matrix corresponds to a mean for the Gaussian.

  3. (3) Co-variance matrices (in case of multivariate data): sample_out$cov (list of co-variance matrices for the Gaussian(s))

  4. (4) Priors: sample_out$prior (a vector of priors for the Gaussian(s).)

References

Using data to build a better EM: EM* for big data.

Hasan Kurban, Mark Jenne, Mehmet M. Dalkilic (2016) <https://doi.org/10.1007/s41060-017-0062-1>.