mainVEM: Adaptative VEM algorithm

Description

Principal adaptative VEM algorithm for histogram with model selection or for kernel method.

Usage

mainVEM(data, n, Qmin, Qmax = Qmin, directed = TRUE, sparse = FALSE,
  method = c("hist", "kernel"), init.tau = NULL, cores = 1, d_part = 5,
  n_perturb = 10, perc_perturb = 0.2, n_random = 0, nb.iter = 50,
  fix.iter = 10, epsilon = 1e-06, filename = NULL)

Arguments

data

Data format depends on the estimation method used!!

Data with hist method - list with 2 components:

data$Time

[0,data$Time] is the total time interval of observation

data$Nijk

Data matrix with counts per process $N_{ij}$ and sub-intervals ; matrix of size $N*Dmax$ where $N = n(n-1)$ or $n(n-1)/2$ is the number of possible node pairs in the graph and $Dmax = 2^{dmax}$ is the size of the finest partition in the histrogram approach

Counts are pre-computed - Obtained through function 'statistics' (auxiliary.R) on data with second format

Data with kernel method - list with 3 components:

data$time.seq: Sequence of observed time points of the m-th event (M-vector)
data$type.seq: Sequence of observed values convertNodePair(i,j,n,directed) (auxiliary.R) of process that produced the mth event (M-vector).
data$Time: [0,data$Time] is the total time interval of observation

Total number of nodes

Qmin

Minimum number of groups

Qmax

Maximum number of groups

directed

Boolean for directed (TRUE) or undirected (FALSE) case

sparse

Boolean for sparse (TRUE) or not sparse (FALSE) case

method

List of string. Can be "hist" for histogram method or "kernel" for kernel method

init.tau

List of initial values of $\tau$ - all tau's are matrices with size $Q\times n$ (might be with different values of Q)

cores

Number of cores for parallel execution

If set to 1 it does sequential execution

Beware: parallelization with fork (multicore) : doesn't work on Windows!

d_part

Maximal level for finest partition of time interval [0,T] used for k-means initializations.

Algorithm takes partition up to depth $2^d$ with $d=1,...,d_{part}$
Explore partitions $[0,T], [0,T/2], [T/2,T], ... [0,T/2^d], ...[(2^d-1)T/2^d,T]$
Total number of partitions $npart= 2^{(d_{part} +1)} -1$

n_perturb

Number of different perturbations on k-means result

When $Qmin < Qmax$, number of perturbations on the result with $Q-1$ or $Q+1$ groups

perc_perturb

Percentage of labels that are to be perturbed (= randomly switched)

n_random

Number of completely random initial points. The total number of initializations for the VEM is $npart*(1+n_{perturb}) +n_{random}$

nb.iter

Number of iterations of the VEM algorithm

fix.iter

Maximum number of iterations of the fixed point into the VE step

epsilon

Threshold for the stopping criterion of VEM and fixed point iterations

filename

Name of the file where to save the results along the computation (increasing steps for $Q$, these are the longest).

The file will contain a list of 'best' results.

Details

The sparse version works only for the histogram approach.

References

DAUDIN, J.-J., PICARD, F. & ROBIN, S. (2008). A mixture model for random graphs. Statist. Comput. 18, 173–183.

DEMPSTER, A. P., LAIRD, N. M. & RUBIN, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–38.

JORDAN, M., GHAHRAMANI, Z., JAAKKOLA, T. & SAUL, L. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37, 183–233.

MATIAS, C., REBAFKA, T. & VILLERS, F. (2018). A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika.

MATIAS, C. & ROBIN, S. (2014). Modeling heterogeneity in random graphs through latent space models: a selective review. Esaim Proc. & Surveys 47, 55–74.

Examples

Run this code

# load data of a synthetic graph with 50 individuals and 3 clusters
n <- 20
Q <- 3

Time <- generated_Q3_n20$data$Time
data <- generated_Q3_n20$data
z <- generated_Q3_n20$z

step <- .001
x0 <- seq(0,Time,by=step)
intens <-  generated_Q3_n20$intens

# VEM-algo kernel
sol.kernel <- mainVEM(data,n,Q,directed=FALSE,method='kernel', d_part=0,
    n_perturb=0)[[1]]
# compute smooth intensity estimators
sol.kernel.intensities <- kernelIntensities(data,sol.kernel$tau,Q,n,directed=FALSE)
# eliminate label switching
intensities.kernel <- sortIntensities(sol.kernel.intensities,z,sol.kernel$tau,
    directed=FALSE)

# VEM-algo hist
# compute data matrix with precision d_max=3
Dmax <- 2^3
Nijk <- statistics(data,n,Dmax,directed=FALSE)
sol.hist <- mainVEM(list(Nijk=Nijk,Time=Time),n,Q,directed=FALSE, method='hist',
    d_part=0,n_perturb=0,n_random=0)[[1]]
log.intensities.hist <- sortIntensities(sol.hist$logintensities.ql,z,sol.hist$tau,
     directed=FALSE)

# plot estimators
par(mfrow=c(2,3))
ind.ql <- 0
for (q in 1:Q){
  for (l in q:Q){
    ind.ql <- ind.ql + 1
    true.val <- intens[[ind.ql]]$intens(x0)
    values <- c(intensities.kernel[ind.ql,],exp(log.intensities.hist[ind.ql,]),true.val)
    plot(x0,true.val,type='l',xlab=paste0("(q,l)=(",q,",",l,")"),ylab='',
        ylim=c(0,max(values)+.1))
    lines(seq(0,1,by=1/Dmax),c(exp(log.intensities.hist[ind.ql,]),
        exp(log.intensities.hist[ind.ql,Dmax])),type='s',col=2,lty=2)
    lines(seq(0,1,by=.001),intensities.kernel[ind.ql,],col=4,lty=3)
  }
}

Run the code above in your browser using DataLab