Learn R Programming


output: github_document


R-package binomialMix

Copyright 2019 Faustine Bousquet (faustine.bousquet@tabmo.io or faustine.bousquet@umontpellier.fr) from TabMo and IMAG (Institut Montpelliérain Alexander Grothendieck, University of Montpellier). The binomialMix package is available under the Apache2 license.

Description

The binomialMix package provides a clustering method for longitudinal and non gaussian data. It uses an EM algorithm for GLM.

Instruction for users

Installation

You can install the binomialMix R package with the following R command:

# install.packages("devtools")
devtools::install_git("https://gitlab.com/tabmo/binomialmix")
devtools::install_gitlab("tabmo/binomialMix")

You can also directly use the git repository :

git clone https://gitlab.com/tabmo/binomialMix

Once you cloned the git repository, you can run to install the binomialMix package:

devtools::install("/path/to/binomialMix/pkg") # edit the path

Example of use

  • Import the library :
library(binomialMix)
  • Load the data :
data(adcampaign)

Of course, you can use your own data. The format you need to have is the following :

  • a dataframe is needed
  • a column with factor id representing the objects you want to cluster
  • a target value * a weighted value variable as we are in case of binomial data
  • at least, one column as explicative variable

Run the clustering algorithm Here, we want to cluster advertising campaigns. Each campaigns (column "id") is composed of n_c observations from the whole dataset. We have repeated mesure for a same id level. The explicatives variables could be : day, timeSlot or app_or_site. We want to try with K=3 clusters.

model_formula<-"ctr~timeSlot+day"
weighted_variable<-"impressions"
nb_cluster<-3
df_tocluster<-adcampaign
col_id<-"id"
result_K3<-runEM(model_formula,
                  weighted_variable,
                  nb_cluster,
                  df_tocluster,
                  col_id)
  • Analysis of clustering obtained : The output of the runEM function provides the following values :
  • loglikelihood for each EM iteration
  • estimation of β, λ, π parameters
  • BIC/ICL value
  • Number of fisher iteration needed for each M-Step

Plotting evolution of Loglikelihood over iteration

# Plotting Loglikelihood :
install.packages("ggplot2")
library(ggplot2)
qplot(seq_along(result_K3[[1]]), result_K3[[1]])

Matrix of beta estimated (values taken for last iteration) :

head(result_K3[[2]][[length(result_K3[[2]])]])
##            [,1]       [,2]       [,3]
## [1,] -3.8126661 -5.2914380 -3.2418550
## [2,] -0.4134079  0.3794783  0.4115441
## [3,] -0.2975236  0.2407683  0.4076950
## [4,] -0.1948168  0.2122175  0.3753815
## [5,] -0.1590104  0.4028323  0.1885215
## [6,] -0.2160946  0.3545593  0.1872363

Vector of proportion in each cluster (values taken for last iteration) :

result_K3[[3]][[length(result_K3[[3]])]]
## [1] 0.1871000 0.7246125 0.0883000

Matrix of proability for each campaign to belong to the different cluster (values taken for last iteration) :

## Too large to print here
result_K3[[4]][[length(result_K3[[4]])]]

BIC value as numeric :

paste0("BIC=",result_K3[[5]][[length(result_K3[[5]])]])
## [1] "BIC=387914.537681485"

ICL value as numeric :

paste0("ICL value=",result_K3[[6]][[length(result_K3[[6]])]])
## [1] "ICL value=387919.96962191"

Total number of EM iteration as numeric value :

paste0("Number of EM iteration :",length(result_K3[[7]]))
## [1] "Number of EM iteration :10"

Matrix of Fisher scoring number of iteration at each M step :

matrix(unlist(result_K3[[7]]),ncol=length(result_K3[[7]])-1)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,]    4    3    4    6    3    3    2    1    1
## [2,]    3    2    2    2    2    2    2    1    1
## [3,]    5    4    2    2    3    1    1    1    1
#nrow is equal to the number of cluster
#ncol is equal to the number of iteration

Copy Link

Version

Install

install.packages('binomialMix')

Monthly Downloads

6

Version

1.0.1

License

GPL-3

Maintainer

Faustine Bousquet

Last Published

March 23rd, 2020

Functions in binomialMix (1.0.1)

Incomplete_Loglikelihood_binomiale

Calculate the incomplete loglikelihood from mixture of binomial
init_tau

Initialize the matrix probability of each levels id to be in the clusters
log_density_binom

Calculate de log density of a binomial
my_BIC

Calculate the Bayesian Information Criterion (BIC)
my_ICL

Calculate the Integrated Complete Likelihood (ICL)
init_lambda

Initialize the vector lambda of mixture proportion
init_subset

Initialize the estimation of beta
update_tau

E-step : update of tau
update_w

M-step : Update the diagonal matrix W from beta iterative equation
extract_id

Extract levels as numeric from id column of the dataset
extract_target

Extract target value of GLM
runEM

Run an EM algorithm to obtain a mixture of binomial with K clusters
update_beta

M-step : update of beta parameters
extract_variables

Extract variables from GLM model
init_design_matrices

Initialize design matrices from dataframe to cluster
adcampaign

Advertising campaign dataset
update_z

M-step : Update the matrix of working variables Z from beta iterative equation