nacf: Sample Network Covariance and Correlation Functions

Description

nacf computes the sample network covariance/correlation function for a specified variable on a given input network. Moran's $I$ and Geary's $C$ statistics at multiple orders may be computed as well.

Usage

nacf(net, y, lag.max = NULL, type = c("correlation", "covariance", "moran", "geary"), neighborhood.type = c("in", "out", "total"), partial.neighborhood = TRUE, mode = "digraph", diag = FALSE, thresh = 0, demean = TRUE)

Arguments

net

one or more graphs.

a numerical vector, of length equal to the order of net.

lag.max

optionally, the maximum geodesic lag at which to compute dependence (defaults to order net-1).

type

the type of dependence statistic to be computed.

neighborhood.type

the type of neighborhood to be employed when assessing dependence (as per neighborhood).

partial.neighborhood

logical; should partial (rather than cumulative) neighborhoods be employed at higher orders?

mode

"digraph" for directed graphs, or "graph" if net is undirected.

diag

logical; does the diagonal of net contain valid data?

thresh

threshold at which to dichotomize net.

demean

logical; demean y prior to analysis?

Value

A vector containing the dependence statistics (ascending from order 0).

Details

nacf computes dependence statistics for the vector y on network net, for neighborhoods of various orders. Specifically, let $A_i$ be the $i$th order adjacency matrix of net. The sample network autocovariance of $y$ on $A_i$ is then given by $$ \sigma_i = \frac{\mathbf{y}^T \mathbf{A}_i \mathbf{y}}{E}, $$ where $E = sum(A_i)$. Similarly, the sample network autocorrelation in the above case is $sigma_i/sigma_0$, where $sigma_0$ is the variance of $y$. Moran's $I$ and Geary's $C$ statistics are defined in the usual fashion as $$ I_i = \frac{N \sum_{j=1}^N \sum_{k=1}^N (y_j-\bar{y}) (y_k-\bar{y}) A_{ijk}}{E \sum_{j=1}^N y_j^2}, $$ and $$ C_i = \frac{(N-1) \sum_{j=1}^N \sum_{k=1}^N (y_j-y_k)^2 A_{ijk}}{2 E \sum_{j=1}^N (y-\bar{y})^2} $$ respectively, where $N$ is the order of $A_i$ and $ybar$ is the mean of $y$. The adjacency matrix associated with the $i$th order neighborhood is defined as the identity matrix for order 0, and otherwise depends on the type of neighborhood involved. For input graph $G=(V,E)$, let the base relation, $R$, be given by the underlying graph of $G$ (i.e., $G U G^T$) if total neighborhoods are sought, the transpose of $G$ if incoming neighborhoods are sought, or $G$ otherwise. The partial neighborhood structure of order $i>0$ on $R$ is then defined to be the digraph on $V$ whose edge set consists of the ordered pairs $(j,k)$ having geodesic distance $i$ in $R$. The corresponding cumulative neighborhood is formed by the ordered pairs having geodesic distance less than or equal to $i$ in $R$. For purposes of nacf, these neighborhoods are calculated using neighborhood, with the specified parameters (including dichotomization at thresh).

The return value for nacf is the selected dependence statistic, calculated for each neighborhood structure from order 0 (the identity) through order lag.max (or $N-1$, if lag.max==NULL). This vector can be used much like the conventional autocorrelation function, to identify dependencies at various lags. This may, in turn, suggest a starting point for modeling via routines such as lnam.

References

Geary, R.C. (1954). “The Contiguity Ratio and Statistical Mapping.” The Incorporated Statistician, 5: 115-145.

Moran, P.A.P. (1950). “Notes on Continuous Stochastic Phenomena.” Biometrika, 37: 17-23.

Examples

Run this code

#Create a random graph, and an autocorrelated variable
g<-rgraph(50,tp=4/49)
y<-qr.solve(diag(50)-0.8*g,rnorm(50,0,0.05))

#Examine the network autocorrelation function
nacf(g,y)                             #Partial neighborhoods
nacf(g,y,partial.neighborhood=FALSE)  #Cumulative neighborhoods

#Repeat, using Moran's I on the underlying graph
nacf(g,y,type="moran") 
nacf(g,y,partial.neighborhood=FALSE,type="moran")

Run the code above in your browser using DataLab