boscoclust: Function to perform a co-clustering

Description

This function runs a co-clustering algorithm on ordinal data by using the latent block model (see references for further details). A BOS distribution is used, and the parameters inference is obtained using the SEM-Gbbs algorithm.

Usage

boscoclust(x=matrix(0,nrow=1,ncol=1), idx_list=c(1), kr, kc, init, nbSEM, nbSEMburn, 
          nbRepeat=1, nbindmini, m=0, percentRandomB=0)

Arguments

Matrix made of ordinal data of dimension N*Jtot. The features with the same numbers of levels must be placed side by side. The missing values should be coded as NA.

idx_list

Vector of length D. This argument is useful when variables have different numbers of levels. Element d should indicate where the variables with number of levels m[d] begin in matrix x.

Number of row classes.

Vector of length D. The d^th element indicates the number of column clusters.

Vector of length D. The d^th element defines the number of levels of the ordinal data.

nbSEM

Number of SEM-Gibbs iterations realized to estimate parameters.

nbSEMburn

Number of SEM-Gibbs burn-in iterations for estimating parameters. This parameter must be inferior to nbSEM.

nbRepeat

Number of times sampling on rows and columns will be done for each SEM-Gibbs iteration.

nbindmini

Minimum number of cells belonging to a block.

init

String that indicates the kind of initialisation. Must be one of the following words : "kmeans", "random" or "randomBurnin".

percentRandomB

Vector of length 2. Indicates the percentage of resampling when init is equal to "randomBurnin".

Value

Matrix of dimension N*kr such that V[i,g]=1 if i belongs to cluster g.

@icl

ICL value for co-clustering.

@name

Name of the result.

@paramschain

List of length nbSEMburn. The parameters of the blocks are stored for each iteration of the SEM-Gibbs algorithm.

@pichain

List of length nbSEM. Item i is a vector of length kr that contains the row mixing proportions at iteration i.

@rhochain

List of length nbSEM. Item i is a list of length D whose d^th element contains the column mixing proportions of the group of variables d, for iteration i.

@zc

List of length D. The d^th item is a vector of length J[d] representing the column partitions for the group of variables d.

@zr

Vector of length N with resulting row partitions.

List of length D. Item d is a matrix of dimension J*kc[d] such that W[j,h]=1 if j belongs to cluster h.

Vector of length D. The d^th element represents the number of levels of d^th group of variables.

@params

List of length D. The d^th item represents the blocks parameters for a group of variables d.

@pi

Vector of length kr. This corresponds to the row mixing proportions.

@rho

List of length D. The d^th item represents the column mixing proportion for the d^th group of variables.

@xhat

List of length D. The d^th item represents the dataset of the d^th group of variables, with missing values completed.

@zrchain

Matrix of dimension nbSEM*N. Row i represents the row cluster partitions at iteration i.

@zrchain

List of length D. Item d is a matrix of dimension nbSEM*J[d]. Row i represents the column cluster partitions at iteration i.

Examples

Run this code

# NOT RUN {
  
# }
# NOT RUN {
  
# }
# NOT RUN {
    
  library(ordinalClust)

  # loading the real dataset
  data("dataqol")
  set.seed(5)

  # loading the ordinal data
  M <- as.matrix(dataqol[,2:29])


  # defining different number of categories:
  m=4


  # defining number of row and column clusters
  krow = 4
  kcol = 4

  # configuration for the inference
  nbSEM=50
  nbSEMburn=40
  nbindmini=2
  init = "randomBurnin"
  percentRandomB=c(20,20)



  # Co-clustering execution
  object <- boscoclust(x=M,kr=krow,kc=kcol,m=m,nbSEM=nbSEM,
            nbSEMburn=nbSEMburn, nbindmini=nbindmini, init=init, percentRandomB=percentRandomB)

  
# }
# NOT RUN {
  
# }

Run the code above in your browser using DataLab