The function summarizes input data into sufficient statistics for estimating the attachment function and node fitness, together with additional information about the data, such as total number of nodes, number of time-steps, maximum degree, and the final degree of the network, etc. . It also provides mechanisms to automatically deal with very large datasets by binning the degree, setting a degree threshold, or grouping time-steps.
get_statistics(net_object, only_PA = FALSE ,
only_true_deg_matrix = FALSE ,
binning = TRUE , g = 50 ,
deg_threshold = 0 ,
compress_mode = 0 , compress_ratio = 0.5 ,
custom_time = NULL)An object of class PAFit_data, which is a list. Some important fields are:
A matrix where the (t,k+1) element is the number of nodes with degree \(k\) at time \(t\), counting among all the nodes whose number of new edges acquired is less than deg_thresh
A matrix where the (t,k+1) element is the number of nodes with degree \(k\) at time \(t\)
A matrix where the (t,k+1) element is the number of new edges connect to a degree-\(k\) node at time \(t\)
A vector where the (k+1)-th element is the total number of edges that linked to a degree \(k\) node, counting over all time steps
A matrix recording the degree of all nodes (that satisfy degree_threshold condition) at each time step
A vector where the t-th element is the number of new edges at time \(t\)
A vector where the j-th element is the total number of edges that linked to node \(j\)
Numeric. The number of nodes in the network
Numeric. The number of time steps
Numeric. The maximum degree in the final network
A vector contains the id of all nodes
A vector contains the final degree of all nodes (including those that do not satisfy the degree_threshold condition)
Integer. The specified degree threshold.
Numeric vector. The index in the node_id vector of the nodes we want to estimate (i.e. nodes whose number of new edges acquired is at least deg_thresh)
Integer. The specified degree at which we start binning.
Numeric vector contains the beginning degree of each bin
Numeric vector contains the ending degree of each bin
Numeric vector contains the length of each bin.
Logical. Indicates whether binning was applied or not.
Integer. Number of bins
Integer. The mode of time compression.
Integer. The number of time stamps actually used
The time stamps that are actually used
Numeric.
Vector. The time stamps specified by user.
The parameters can be divided into four groups. The first group specifies input data and how the data will be summarized:
An object of class PAFit_net. You can use the function as.PAFit_net to convert from an edgelist matrix, function from_igraph to convert from an igraph object, function from_networkDynamic to convert from a networkDynamic object, and function graph_from_file to read from a file.
Logical. Indicates whether only the statistics for estimating \(A_k\) are summarized. if TRUE, the statistics for estimating \(\eta_i\) are NOT collected. This will save memory at the cost of unable to estimate node fitness). Default value is FALSE.
Logical. Return only the true degree matrix (without binning), and no other statistics is returned. The result cannot be used in PAFit function to estimate PA or fitness. The motivation for this option is that sometimes we only want to get a degree matrix that summarizes the growth process of a very big network for plotting etc. Default value is FALSE.
Second group of parameters specifies how to bin the degrees:
Logical. Indicates whether the degree should be binned together. Default value is TRUE.
Positive integer. Number of bins. Should be at least 3. Default value is 50.
Third group contains a single parameter specifying how to reduce the number of node fitnesses:
Integer. We only estimate the fitnesses of nodes whose number of new edges acquired is at least deg_threshold. The fitnesses of all other nodes are fixed at 1. Default value is 0.
Last group of parameters specifies how to group the time-stamps:
Integer. Indicates whether the timeline should be compressed. The value of CompressMode:
0: No compression
1: Compressed by using a subset of time-steps. The time stamps in this subset are equally spaced. The size of this subset is CompressRatio times the size of the set of all time stamps.
2: Compressed by only starting from the first time-step when \(CompressRatio*100\) percentages of the total number of edges (in the final state of the network) had already been added to the network.
3: This mode offers the most flexibility, but requires user to supply the time stamps in CustomTime. Only time stamps in this CustomTime will be used. This mode can be used, for example, when investigating the change of the attachment function or node fitness in different time intervals.
Default value is 0, i.e. no compression.
Numeric. Indicates how much we should compress if CompressMode is 1 or 2. Default value is 0.5.
Vector. Custom time stamps. This vector is a subset of the vector that contains all time-stamps. Only effective if CompressMode == 3. In that case, only these time stamps are used.
Thong Pham thongphamthe@gmail.com
For creating the needed input for this function (a PAFit_net object), see as.PAFit_net, from_igraph, from_networkDynamic, and graph_from_file.
For the next step, see Newman, Jeong or only_A_estimate for estimating the attachment function in isolation, only_F_estimate for estimating node fitnesses in isolation, and joint_estimate for joint estimation of the attachment function and node fitnesses.
library("PAFit")
net <- generate_BA(N = 100 , m = 1)
net_stats <- get_statistics(net)
summary(net_stats)
Run the code above in your browser using DataLab