Estimation of dynamic stockastic block models for different number of groups. Each model combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time.
select.dynsbm(Y, present=NULL, Qmin, Qmax,
edge.type=c("binary","discrete","continuous"), K=-1,
directed=FALSE, self.loop=FALSE,
nb.cores=1,
iter.max=20, nstart=25, perturbation.rate=0.2,
fixed.param=FALSE, bipartition=NULL,
plot=TRUE)
An object of class array
of dimension (T x N x N) containing T adjacency matrices of size (N x N), where N is the number of nodes in the network and T is the number of time points.
NULL
or an object of class matrix
of size (N x T)
containing the presence/absence (coded with 1/0 respectively) of
each N nodes at each of the T time points. When set to NULL
, this object is deduced from Y
(see the "Details" section). Any node must be present at least once among the time points.
Minimum number of groups >1.
Maximum number of groups.
Type of adjacency matrices. This should be (an unambiguous abbreviation of) one of binary
, discrete
or continuous
.
See the "Details" section.
Only if edge.type=="discrete"
. Number of non-zero discrete values (i.e. in 1,..,K).
If TRUE
, the network is supposed to be directed (and
therefore Y
is supposed to be asymmetric).
If TRUE
, self-loops (edges from one node to the same node) are allowed and taken into acount in the estimation procedure.
Number of cores to use, i.e. how many child processes and how many threads will be run simultaneously during the initialization and the estimation steps respectively.
Maximal number of algorithm iterations.
Number of starting points for the iterative estimation algorithm. See the "Details" section.
Rate of perturbation (in [0,1], see nstart
) for the iterative estimation algorithm. This rate is
the fraction of nodes for which its group is randomly shuffled.
If TRUE
, the model parameters remain fixed and
constant in time. By default, fixed.param
is automatically
set to TRUE
in the bipartite case (i.e. bipartition
is not
NULL
; see the "Details" section).
NULL
or a vector of size N specifying a node bipartition
in the case of bipartite networks (see the "Details" section). Each element of this vector is set to 1 or 2 to specify the node belongs to the
first or second set of nodes.
Display a plot with the loglikelihood and the ICL criteria if edge.type=="binary"/"continuous"
. See the "Details" section.
Returns a list of dynsbm
objects. Each object of class dynsbm
is a list
with the following components:
The Markov chain transition matrix
of size (Q x Q).
An object of class matrix
of size (N x T)
containing the group membership estimated by MAP (>0, =0 for absent nodes).
An object of class matrix
of size (T x Q x Q) containing
the sparsity parameters of the model.
Only if edge.type=="discrete"
. An object of class matrix
of size (T x Q x Q x K) containing
the model parameters.
Only edge.type=="continuous"
An object of class matrix
of size (T x Q x Q)
and a vector containing the model parameters.
Completed data log-likelihood.
Number of used algorithm iterations.
Specifies whether the model is build for directed networks.
Specifies whether the model allows self-loops.
This function deals with binary or weighted dynamic/temporal/evolving networks (with discrete or continuous edges).
The adjacency matrices must be coded with 0/1 in the binary
case,
with 0/y where y belongs to the set 1,..,K in the discrete
case
or with 0/y where y is numeric, must be positive and is supposed to fit a gaussian mixture in
the continuous
case.
Presence/absence information allows to model node's arrival or
departure, birth or death, or simply enables to specify missing data
(as absent nodes). If this information is missing (NULL), the presence/absence is deduced automatically from Y
by searching for nodes that do not participate in any edges (lines/columns of O in Y
) and declaring them as absent.
This function does not support the existence of nodes that are never present (error message in this case).
The estimation algorithm is iterative and rely on a starting point. Therefore, it is possible to start the algorithm many times with 'nstart' starting points.
The first starting point is obtained with an ad-hoc use of the kmeans
function.
The follwing starting point are obtained by perturbating the first one (see perturbation.rate
).
The greater nstart
, the more accurate the results.
To select the best number of groups, the "elbow" method consists in finding the point where the slope of the loglikelihood is decreasing (i.e. the loglikelihood is reaching a plateau).
If edge.type=="binary"
, the ICL criteria (plotted in red) has
to be used : the best number of nodes is supposed to maximize the ICL
criteria.
This function has been extended to the case of bipartite networks. In
this case, despite Y
has to be of dimension (T x N x N), it is
possible to give a bipartition of the nodes into two disjoint
sets. For statistical reasons, fixed.param
is automatically set
to TRUE
. Given the total number of groups Q between Qmin and
Qmax, there is Q/2 groups for each set of nodes (when Q is odd,
there is floor(Q/2)+1 groups for the largest set of nodes);
however, there is no guaranty that the final groups are
coherent with the bipartition, i.e. that any group is composed by
nodes of one of the two sets (if not, a warning message is generated).
Catherine Matias and Vincent Miele, Statistical clustering of temporal networks through a dynamic stochastic block model, Journal of the Royal Statistical Society: Series B (2017) http://dx.doi.org/10.1111/rssb.12200 http://arxiv.org/abs/1506.07464
Vincent Miele and Catherine Matias, Revealing the hidden structure of dynamic ecological networks, Royal Society Open Science (2017) http://dx.doi.org/10.1098/rsos.170251 https://arxiv.org/abs/1701.01355
# NOT RUN {
data(simdataT5Q4N40binary)
## estimation for Q=1..6 groups
list.dynsbm <- select.dynsbm(simdataT5Q4N40binary,
Qmin=1, Qmax=6, edge.type="binary", nstart=1)
# }
# NOT RUN {
## better to use nstart>1 starting points
## but estimation can take 1-2 minutes
list.dynsbm <- select.dynsbm(simdataT5Q4N40binary,
Qmin=1, Qmax=6, edge.type="binary", nstart=25)
# }
# NOT RUN {
## selection of Q=4
dynsbm <- list.dynsbm[[4]]
## plotting intra/inter connectivity patterns
connectivity.plot(dynsbm, simdataT5Q4N40binary)
## plotting switches between groups
alluvial.plot(dynsbm)
# }
Run the code above in your browser using DataLab