# Block-Constrained Configuration Model

The next model that can be estimated is the block constrained configuration model. The Karate Club has a well-known partitioning into two communities that can be loaded from the package. We fit a bccm using the bccm function, specifying the vertex labels.

data("vertexlabels") (blockmodel <- bccm(adj = adj_karate, labels = vertexlabels, directed = directed, selfloops = selfloops)) print(blockmodel\$blockOmega)

By default, the function fits a 'full' block structure, where every parameter for in- out-blocks relations are different. In this case this corresponds to three parameters: one for block 1, one for block 2, one for the edges between the two blocks. However, in some cases we see that the parameters of in-blocks relations are similar to each other, as can be seen above looking at the diagonal entries of the block matrix. In this case, a full block structure may overfit the data, while a simpler model could be better. This can be fitted setting the parameter homophily=TRUE. This corresponds to a model where there are only two parameters, one for in-block edges, one for out-block edges, irrespectively of the number of blocks. The result is shown below.

(blockmodel_2 <- bccm(adj = adj_karate, labels = vertexlabels, directed = directed, selfloops = selfloops, homophily = TRUE))

The first test we need to perform is whether either of the model just fitted significantly improve the fit to the data. We do so by performing a likelihood ratio test between the configuration model and the bccms.

lr.test(nullmodel = confmodel, altmodel = blockmodel, seed = 123) lr.test(nullmodel = confmodel, altmodel = blockmodel_2, seed = 123)

Unsurprisingly, the test gives low p-values, confirming the presence of a block structure. Again, we can perform a similar analysis using AIC:

AIC(confmodel) AIC(blockmodel_2) AIC(blockmodel)

From this, we note that while specifying the full bccm gives an improvement in the score compared to the two parameters model, such improvement is relatively small, and appears to not justify the increased complexity. We can verify again this with a likelihood-ratio test:

lr.test(nullmodel = blockmodel_2, altmodel = blockmodel, seed=123)

The result shows that the added complexity of the second model is not justified by the data. We investigate this further by comparing the results with what obtained from random variations.

# First generate random sample from blockmodel random_graphs_bccm2 <- rghype(nsamples = 100, model = blockmodel_2, seed = 123) # Generate the two models for random graphs blockmodels <- lapply(X = random_graphs_bccm2, FUN = bccm, labels = vertexlabels, directed=directed, selfloops=selfloops) blockmodel_2s <- lapply(X = random_graphs_bccm2, FUN = bccm, labels = vertexlabels, directed=directed, selfloops=selfloops, homophily = TRUE) # Compute AICs AIC_blockmodels <- sapply(X = blockmodels, FUN = AIC) AIC_blockmodel_2s <- sapply(X = blockmodel_2s, FUN = AIC) # mean difference in AIC, high value means more complex model is better summary(AIC_blockmodel_2s - AIC_blockmodels)

This confirms the fact that the asymmetry in the two blocks can ascribed to random variations.

# Goodness of Fit

Finally, our framework allows to evaluate the goodness-of-fit of model to data. Similarly to a multinomial goodness-of-fit, we perform a test of the chosen against the maximally complex model that can be formulated. This model provides the baseline against which comparing simpler models in terms of their goodness-of-fit. The 'full model' is specified such that the observed graph is the expected one. It is fitted using the function ghype with the flag unbiased=FALSE. The likelihood ratio test can be performed either manually or using the function gof.test.

fullmodel <- ghype(graph = adj_karate, directed = directed, selfloops = selfloops, unbiased = FALSE) lr.test(nullmodel = blockmodel_2, altmodel = fullmodel, seed = 123) gof.test(model = blockmodel_2, seed = 123)

The reason for this result is the fact that, although the bccm is a good fit for the empirical graph, the latter is characterised by some particular structures that are not encoded by the bccm. For example, the empirical graph is characterised by few 'bridges' between the two communities, managed by relatively low degree vertices. Instead, the bccm assumes that all vertices connect weakly the two communities, with an amount of edges proportional to the degree.