Learn R Programming

PAFit (version 0.9.5)

GetStatistics: Getting summarized statistics from input data

Description

The function summarizes input data into sufficient statistics for estimating the attachment function and node fitness, together with additional information about the data, such as total number of nodes, number of time-steps, maximum degree, and the final degree of the network, etc. . It also provides mechanisms to automatically deal with very large datasets by binning the degree, setting a degree threshold, or grouping time-steps.

Usage

GetStatistics(net , net_type  = "directed" , only_PA  = FALSE , only_true_deg_matrix = FALSE, Binning  = TRUE , G  = 50 , start_deg  = 0, deg_threshold  = 5 , CompressMode  = 0 , CompressRatio = 0.5 , CustomTime  = NULL)

Arguments

net
A three-column matrix whose each row contains information of one edge in the form (from_node id, to_node id, time_stamp). from_node id is the id of the source node. to_node id is the id of the destination node. time_stamp is the arrival time of the edge. from_node id and to_node id are assumed to be integers starting from $0$. time_stamp can be either numeric or string. The value of a time-stamp can be arbitrary, but we assume that a smaller time_stamp (regarded so by the sort function in R) represents an earlier arrival time.
net_type
String. Possible values: "directed" or "undirected". Indicates the type of network. Default value is "directed".
only_PA
Logical. Indicates whether only the statistics for estimating $A_k$ are summarized. if TRUE, the statistics for estimating $\eta_i$ are NOT collected. This will save memory at the cost of unable to estimate node fitness). Default value is FALSE.
only_true_deg_matrix
Logical. Return only the true degree matrix (without binning), and no other statistics is returned. The result cannot be used in PAFit function to estimate PA or fitness. The motivation for this option is that sometimes we only want to get a degree matrix that summerizes the growth process of a very big network for plotting etc. Default value is FALSE.
Binning
Logical. Indicates whether the degree should be binned together. Default value is TRUE.
G
Positive integer. Number of bins. Default value is 50.
start_deg
Integer. The degree from which the program start to binning the degree together. Default value is 0.
deg_threshold
Integer. We only estimate the fitnesses of nodes whose number of new edges acquired is at least deg_threshold. The fitnesses of all other nodes are fixed at 1. Default value is 0.
CompressMode
Integer. Indicates whether the timeline should be compressed. The value of CompressMode:

0: No compression

1: Compressed by using a subset of time-steps. The time stamps in this subset are equally spaced. The size of this subset is CompressRatio times the size of the set of all time stamps.

2: Compressed by only starting from the first time-step when $CompressRatio*100$ percentages of the total number of edges (in the final state of the network) had already been added to the network.

3: This mode offers the most flexibility, but requires user to supply the time stamps in CustomTime. Only time stamps in this CustomTime will be used. This mode can be used, for example, when investigating the change of the attachment function or node fitness in different time intervals.

Default value is 0, i.e. no compression.

CompressRatio
Numeric. Indicates how much we should compress if CompressMode is 1 or 2. Default value is 0.5.
CustomTime
Vector. Custom time stamps. This vector is a subset of the vector that contains all time-stamps. Only effective if CompressMode == 3. In that case, only these time stamps are used.

Value

PAFit_data, which is a list. Some important fields are: , which is a list. Some important fields are:

References

1. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Nonparametric Estimation of the Preferential Attachment Function in Complex Networks: Evidence of Deviations from Log Linearity, Proceedings of ECCS 2014, 141-153 (Springer International Publishing) (http://dx.doi.org/10.1007/978-3-319-29228-1_13).

2. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9): e0137796. doi:10.1371/journal.pone.0137796 (http://dx.doi.org/10.1371/journal.pone.0137796).

3. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. doi:10.1038/srep32558 (www.nature.com/articles/srep32558).

Examples

Run this code
library("PAFit")
net        <- GenerateNet(N = 100 , m = 1 , mode = 1 , alpha = 1 , shape = 5 , rate = 5)
net_stats  <- GetStatistics(net$graph)
summary(net_stats)

Run the code above in your browser using DataLab