modelSelection_Q: Selects the number of groups with ICL criterion

Description

Selects the number of groups with Integrated Classification Likelihood (ICL) criterion.

Usage

modelSelection_Q(
  data,
  n,
  Qmin = 1,
  Qmax,
  directed = TRUE,
  sparse = FALSE,
  sol.hist.sauv
)

Value

The function outputs a list of 7 components:

Qbest Selected value of the number of groups in [Qmin, Qmax].
sol.Qbest Solution of the mainVEM function for the number of groups Qbest.
Qmin Minimum number of groups used.
all.J Vector of length Qmax-Qmin+1. Each value is the estimated ELBO function \(J\) for estimation with \(Q\) groups, \(Qmin \le Q \le Qmax\).
all.ICL Vector of length Qmax-Qmin+1. Each value is the ICL value for estimation with \(Q\) groups, \(Qmin \le Q \le Qmax\).
all.compl.log.likelihood Vector of length Qmax-Qmin+1. Each value is the estimated complete log-likelihood value for estimation with \(Q\) groups, \(Qmin \le Q \le Qmax\).
all.pen Vector of length Qmax-Qmin+1. Each value is the penalty term in ICL for estimation with \(Q\) groups, \(Qmin \le Q \le Qmax\).

Arguments

data

List with 2 components:

Time - Positive real number. [0,Time] is the total time interval of observation.
Nijk - Data matrix with the statistics per process \(N_{ij}\) and sub-intervals \(1\le k\le K\).

n

Total number of nodes, \(1\le i \le n\).

Qmin

Minimum number of groups.

Qmax

Maximum number of groups.

directed

Boolean for directed (TRUE) or undirected (FALSE) case.

sparse

Boolean for sparse (TRUE) or not sparse (FALSE) case.

sol.hist.sauv

List of size Qmax-Qmin+1 obtained from running mainVEM on the data with method='hist'.

References

BIERNACKI, C., CELEUX, G. & GOVAERT, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Machine Intel. 22, 719–725.

CORNELI, M., LATOUCHE, P. & ROSSI, F. (2016). Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks. Neurocomputing 192, 81 – 91.

DAUDIN, J.-J., PICARD, F. & ROBIN, S. (2008). A mixture model for random graphs. Statist. Comput. 18, 173–183.

MATIAS, C., REBAFKA, T. & VILLERS, F. (2018). A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika. 105(3): 665-680.

Examples

Run this code

# load data of a synthetic graph with 50 individuals and 3 clusters
n <- 50

# compute data matrix of counts per subinterval with precision d_max=3
# (ie nb of parts K=2^{d_max}=8).
K <- 2^3
data <- list(Nijk=statistics(generated_Q3$data,n,K,directed=FALSE),
    Time=generated_Q3$data$Time)

# ICL-model selection with groups ranging from 1 to 4
sol.selec_Q <- modelSelection_Q(data,n,Qmin=1,Qmax=4,directed=FALSE,
    sparse=FALSE,generated_sol_hist)

# best number Q of clusters:
sol.selec_Q$Qbest

Run the code above in your browser using DataLab