computeGraphClusters: Pefrom graph clustering of various types.

Description

Graph clustering (or decomposition) divides graph into set of subgraphs that span whole graph. Depending on the type argument the subgraphs coudl be either non-intersecting or overlapping. Available types of decomposition include finding connected componenets, modularity clustering.

Usage

computeGraphClusters(channel, graph, type = "connected", createMembership = FALSE, includeMembership = FALSE, weight = FALSE, vertexWhere = graph$vertexWhere, edgeWhere = graph$edgeWhere, distanceTableName = NULL, membershipTableName = NULL, schema = NULL, allTables = NULL, test = FALSE, ...)

Arguments

channel

connection object as returned by odbcConnect.

graph

an object of class 'toagraph' referencing graph tables in Aster database.

type

specifies type of clustering or community detection to perform.

createMembership

logical indicates if vertex cluster membership table should be created (see membershipTableName). Currently, you must set it to TRUE if cluster membership data (see includeMembership) expected in the result. Also, required if operations that create graphs corresponding to some of the clusters to be performed later.

includeMembership

logical indicates if result should contain vertex cluster membership information. Currently, only supported when createMembership is TRUE. WARNING: including cluster membership may result in very large data set returned from Aster into memory.

weight

logical or character: if logical then TRUE indicates using 'weight' edge attribute, otherwise no weight used. If character then use as a name for the edge weight attribute. The edge weight may apply with types 'clustering', 'shortestpath' and centrality measures.

vertexWhere

optionally, a SQL WHERE clause to subset vertex table. When not NULL it overrides vertexWhere condition from the graph.

edgeWhere

optionally, a SQL WHERE clause to subset edge table. When not NULL it overrides edgeWhere condition from the graph.

distanceTableName

this table will contain distances between vertices (or other corresponding metrics associated with community detection algorithm chosen). By default, random table name that begins with toa_temp_graphcluster_distance is generated.

membershipTableName

when createMembership is TRUE then this table will contain vertex cluster membership information. By default, random table name that begins with toa_temp_graphcluster_membership is generated. This argument is ignored when createMembership is FALSE.

schema

name of Aster schema for the table name arguments distanceTableName and membershipTableName. There are two distinct approaches to providing table names: one that uses explicity schema name using this argument and another when table names already contain schema followed by dot and table name. The latter method is not applicable when generating randon table name with schema.

allTables

pre-built information about existing tables.

test

logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions: sqlQuery and sqlSave).

...

other arguments passed on to Aster graph functions except for EDGEWEIGHT argument - use argument weight instead. Aster function areguments are not case-sensetive.

Value

computeGraphClusters returns an object of class "toacommunities" (compatible with both class "communities" and the value returned by clusters - all from the package igraph). It is a list with the following components:

Examples

Run this code

if(interactive()) {

# undirected graph
policeGraphUn = toaGraph("dallaspolice_officer_vertices", "dallaspolice_officer_edges_un", 
     directed = FALSE, key = "officer", source = "officer1", target = "officer2", 
     vertexAttrnames = c("offense_count"), edgeAttrnames = c("weight"))
     
communities = computeGraphClusters(conn, policeGraphUn, type="connected", 
                                   createMembership = TRUE, includeMembership = TRUE,
                                   distanceTableName = "public.shortestpathdistances",
                                   membershipTableName = "public.clustermembership")
                                   
# get first 5 largest connected components as graphs
cluster_graphs = computeGraphClustersAsGraphs(conn, communities = communities, ids = 1:5)

# visualize component 2
library(GGally)
ggnet2(cluster_graphs[[2]], node.label="vertex.names", node.size="offense_count", 
       node.color="color", legend.position="none")

# compute connected components for certain type of subgraph that 
# includes only verteics that start with the letters
communities2 = computeGraphClusters(conn, policeGraphUn, type="connected", membership = TRUE,
                                    distanceTableName = "public.shortestpathdistances",
                                    vertexWhere = "officer ~ '[A-Z].*'", 
                                    edgeWhere = "weight > 0.36")
}

Run the code above in your browser using DataLab