Learn R Programming

⚠️There's a newer version (2.0.5) of this package.Take me there.

Package Overview

Implements the Expectation Maximisation Algorithm for clustering the multivariate and univariate datasets. There are two versions of EM implemented- EM and EM* (converge faster by avoiding revisiting the data). For more details on EM*, see the 'References' section below.
The package has been tested with numerical datasets (not recommended for categorical/ordinal data). The package comes bundled with a dataset for demonstration (ionosphere_data.csv). More help about the package can be seen by typing ?DCEM in the R console (after installing the package).

Currently, data imputation is not supported and user has to handle the missing data before using the package.

Contact

For any Bug Fixes/Feature Update(s)

[Parichit Sharma: parishar@iu.edu]

For Reporting Issues

Issues

GitHub Repository Link

Github Repository

Installation Instructions

Installing from CRAN

install.packages(dcem)

Installing from the Binary Package

install.packages(dcem_1.0.0.tgz, repos = NULL, type="source")

How to use the package (An Example: working with the default bundled dataset)

  • The dcem package comes bundled with the ionosphere_data.csv for demonstration. Help about the dataset can be seen by typing ?ionosphere_data in the R console. Additional details can be seen at the link Ionosphere data

  • To use this dataset, paste the following code into the R console.

ionosphere_data = read.csv2(
  file = paste(trimws(getwd()),"/data/","ionosphere_data.csv",sep = ""),
  sep = ",",
  header = FALSE,
  stringsAsFactors = FALSE
)
  • Cleaning the data: Before the model can be trained (dcem_train() function), the data must be cleaned. This simply means to remove all redundant columns (example can be label column). This dataset contains labels in the last column (35th) and only 0's in the 2nd column so let's remove them,

Paste the below code in the R session to clean the dataset.

ionosphere_data =  trim_data("35,2", ionosphere_data)
  • Clustering the data: The dcem_train() learns the parameters of the Gaussian(s) from the input data. It internally calls the dcem_cluster_mv()or dcem_cluster_uv() function for multivariate and univariate data respectively. These

functions assign(s) the probabilistic weights to the sample(s) in the dataset.

Paste the below code in the R session to call the dcem_train() function.

dcem_out = dcem_train(data = ionosphere_data, threshold = 0.0001, iteration_count = 50, num_clusters = 2)
  • Accessing the output: The list returned by the dcem_train() is stored in the dcem_out object. It contains the parameters associated with the clusters (Gaussian(s)). These parameters are namely - posterior probabilities, mean, co-variance (multivariate data) or standard-deviation (univariate data) and priors. Paste the following code in the R session to access any/all the output parameters.
          [1] Posterior Probabilities: `**dcem_out$prob**`: A matrix of posterior-probabilities for the 
              points in the dataset.
              
          [2] Mean(s): `**dcem_out$mean**`
              
              For multivariate data: It is a matrix of means for the gaussians. Each row in the  
              matrix corresponds to a mean for the gaussian.
              
              For univariate data: It is a vector if means. Each element of the vector corresponds 
              to one gaussian.
              
          [3] Co-variance matrices 
          
              For multivariate data: `**dcem_out$cov**`: list of co-variance matrices for the gaussians.
          
              For univariate data: Standard-deviation `**dcem_out$sd**`: vector of standard deviation(s) 
              for the gaussians.
               
          [4] Priors: `**dcem_out$prior**`: a vector of priors for the gaussians.

How to access the help (after installing the package)

?dcem_star_train
?dcem_train
?dcem_test
?DCEM

Copy Link

Version

Install

install.packages('DCEM')

Monthly Downloads

191

Version

1.0.0

License

GPL-3

Maintainer

Sharma Parichit

Last Published

July 24th, 2019

Functions in DCEM (1.0.0)

DCEM

DCEM: Data clustering through Expectation-Maximization algorithm.
means_uv

means_uv: Part of DCEM package.
build_heap

build_heap: Part of DCEM package.
means_uv_impr

means_uv_impr: Part of DCEM package.
dcem_star_cluster_mv

dcem_star_cluster_mv (multivariate data): Part of DCEM package.
dcem_cluster_uv

dcem_cluster_uv (univariate data): Part of DCEM package.
dcem_test

dcem_test: Part of DCEM package.
dcem_train

dcem_train: Part of DCEM package.
get_leaves

get_leaves: Part of DCEM package.
cov_mv

cov_mv: Part of DCEM package.
remove_node

remove_node: Part of DCEM package.
priors

priors: Part of DCEM package.
dcem_star_cluster_uv

dcem_star_cluster_uv (univariate data): Part of DCEM package.
dcem_cluster_mv

dcem_cluster (multivariate data): Part of DCEM package.
validate_data

validate_data: Part of DCEM package.
max_heapify

max_heapify: Part of DCEM package.
ionosphere_data

Ionosphere data: A dataset of 351 radar readings
trim_data

trim_data: Part of DCEM package.
dcem_star_train

dcem_star_train: Part of DCEM package.
sd_uv

sd_uv: Part of DCEM package.
insert_node

insert_node: Part of DCEM package.
means_mv

means_mv: Part of DCEM package.
means_mv_impr

means_mv_impr: Part of DCEM package.