PAFit:

Description

From a PAFit_data object, which contains summary statistics of the dataset, PAFit estimates the attachment function \(A_k\) and node fitness \(\eta_i\) by penalized log-likelihood maximization. It also infers the remaining uncertainties in the estimated results by approximating the confidence intervals of \(A_k\) and \(\eta_i\). Estimation of either the attachment function or node fitness in isolation are also supported. Estimation of the PA function with \(\eta_i = 1\) can be specified by setting only_PA = TRUE. Estimation of node fitness with either \(A_k = k\) or \(A_k = 1\) can be specified by setting only_f = TRUE.

Usage

PAFit (net_stat, 
       only_PA        = FALSE       , only_f         = FALSE       , 
       mode_f         = "Linear_PA" ,
       true_A         = NULL        , true_f         = NULL        , 
       
       mode_reg_A     = 0           , weight_PA_mode = 1           ,
       s              = 10          , lambda         = 1           , 
       auto_lambda    = TRUE        ,  r             = 0.01        , 
       
       alpha_start    = 1           , start_mode_A   = "Log_linear", 
       start_mode_f   = "Constant"  ,
       
       auto_stop      = TRUE        , stop_cond      = 10^-7       , 
       iteration      = 200         , max_iter       = 2e+05       , 
       debug          = FALSE       , q              = 1           ,   
       step_size      = 0.5         ,
      
       normalized_f   = FALSE       , interpolate    = FALSE)

Arguments

net_stat

An object of class "PAFit_data" containing all the summary statistics summerized from the data by the function GetStatistics.

only_PA

Logical. TRUE means that the attachment function \(A_k\) is estimated in isolation(fixing \(\eta_i = 1\)). Default is FALSE.

only_f

Logical. TRUE means that the fitness function is estimated in isolation. Default is FALSE.

mode_f

String. Possible values: "Linear_PA", "Constant_PA" or "Log_linear". In the first two cases, the PA function is fixed. If mode_f == "Linear_PA" then \(A_k = k\) for \(k \ge 1\) and \(A_0 = 1\). If mode_f == "Constant_PA" then \(A_k = 1\) for all \(k\). In the final case of mode_f == "Log_linear", we set \(A_k = k^\alpha\) for \(k \ge 1\) and \(A_0 = 1\). The value of \(\alpha\) is also estimated. Default values is "Linear_PA".

true_A

Numeric vector. User-supplemented value of the PA function. If true_A is supplemented, then only node fitnesses are estimated.

true_f

Numeric vector. User-supplemented value of node fitnesses. If true_f is supplemented, then only the PA function is estimated.

mode_reg_A

Integer. Possible values: 0, 1 or 2. Indicates which regularization term is used for the PA function. For the regularization function used in the PLOS ONE and SR paper, use 0. Default value is 0.

weight_PA_mode

Binary. Indicates how the regularization terms for \(A_k\) are weighted. If weight_PA_mode == 0, the regularization term for \(A_k\) is weighted by the total number of edges connected to degree \(k\) nodes. If weight_PA_mode == 1, the regularization terms have uniform weights. Default value is \(0\).

Positive numeric. The regularization parameter s for node fitness. Default value is 10.

lambda

Non-negative numeric. The absolute strength of the regularization for PA function. Ignored when auto_lambda == TRUE. Default value is 1. lambda == 0 means no regularization for PA.

auto_lambda

Logical. If auto_lambda == TRUE, lambda will be determined automatically from the data by r. Default is TRUE.

Non-negative numeric. The regularization parameter r for the PA function indicates the relative strength of the regularization term. From r, the value of lambda is automatically determined if auto_lambda == TRUE. Default value is 0.01.

alpha_start

Non-negative numeric. The starting value for \(\alpha\) when we use the model \(k^\alpha\). Default value is 1.

start_mode_A

String. Takes one of two values: "Log_linear" (the initial PA function set to k^alpha_start) or "Random" (the initial function is randomly sampled from a uniform distribution). Default value is "Log_linear".

start_mode_f

String. Takes one of two values: "Constant" (the initial node fitnesses are all set to 1) or "Random" (the initial node fitnesses are randomly sampled from a gamma distribution). Default value is "Constant".

auto_stop

Logical. Indicates whether the algorithm stop automatically or not. Default is TRUE

stop_cond

Numeric. If auto_stop = TRUE, the iterative algorithm stops when \(abs(h(ii) - h(ii + 1)) / (abs(h(ii)) + 1) < stop_cond\) where \(h(ii)\) is the value of the objective function at iteration \(ii\). We recommend to choose stop_cond at most equal to \(10^(- number of digits of h - 2)\), in order to ensure that when the algorithm stops, the increase in posterior probability is less than 1% of the current posterior probability. Default is 10^-7.

iteration

Integer. The number of iterations. Ignored if auto_stop == TRUE. Default value is 20.

max_iter

Integer. The maximum number of iterations. Regardless of other settings, the algorithm will stop once the number of iterations reaches this threshold. Default value is 2e+05.

debug

Logical. if debug == TRUE, the value of the objective function \(h\) is printed out at each step. Defaule is FALSE.

Integer. Indicates numbers of previous steps using in the quasi-Newton speedup. Ignored if \(q <= 1\). Defaule is 1.

step_size

Numeric. A number between \((0,1]\) to indicate the step-size of the quasi-Newton speedup. Ignored (no quasi-Newton speedup) if \(q <= 1\). Defaule is 0.5.

normalized_f

Logical. Indicates whether we should normalize the estimated value of node fitness after estimation. Default value is FALSE.

interpolate

Logical. Indicates whether we should perform interpolation for the missing gaps in the estimated \(A_k\). The interpolation, if performed, is a linear regression on log-scale. Default value is FALSE.

Value

an object of class "PAFit_result", which is a list. Some important fields can be divided into five groups. The first group gives the estimated preferential attachment function:

The observed degree vector

The estimated attachment function corresponding to k

center_k

The logarithmic center of the bins

theta

Preferential attachment value corresponding to center_k (before mapping back to \(A_k\))

weight_of_A

The number of A in each bin

loglinear_fit

Result of fitting the log-linear model \(log A_k = \alpha log k + C\) to the estimated \(A_k\)

alpha

The estimated attachment exponent of the log-linear model \(A_k =k^\alpha\)

The confidence interval of the attachment exponent. It is two-sigma. When mode_f != "Log_linear", this confidence interval is estimated from the log_linear fit (fitting \(log k\) to \(log A_k\)) using confint function, so it has a popular meaning as a \(95\)-percentage confidence interval.

alpha_series

The series of \(\alpha\) over iterations if mode_f == "Log_linear"

The second group gives the confidence intervals of the estimated PA function:

var_A

Variances of the estimated A

var_logA

Variances of \(log(A)\)

upper_A

The upper value of the two-sigma confidence interval of A

lower_A

The lower value of the two-sigma confidence interval of A

upper_bin

The upper value of the two-sigma confidence interval of theta

lower_bin

The lower value of the two-sigma confidence interval of theta

The third group gives the estimated node fitnesses:

The estimated node fitnesses \(\eta\)

The fourth group gives the confidence intervals of the estimated node fitnesses:

var_f

Variances of the estimated node fitnesses

upper_f

The upper value of the two-sigma confidence interval of node fitness \(\eta\)

lower_f

The lower value of the two-sigma confidence interval of node fitness \(\eta\)

The final group gives additional information on the iterative process:

objective_value

Values of the objective function \(h\) (posterior probability in log-scale) recorded at each iteration

References

1. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Nonparametric Estimation of the Preferential Attachment Function in Complex Networks: Evidence of Deviations from Log Linearity, Proceedings of ECCS 2014, 141-153 (Springer International Publishing) (http://dx.doi.org/10.1007/978-3-319-29228-1_13). 2. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9): e0137796. doi:10.1371/journal.pone.0137796 (http://dx.doi.org/10.1371/journal.pone.0137796). 3. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. doi:10.1038/srep32558 (www.nature.com/articles/srep32558).

Examples

Run this code

library("PAFit")
net        <- GenerateNet(N = 50,m = 10, mode = 1, alpha = 0.5, shape = 100, rate = 100)
net_stats  <- GetStatistics(net$graph)
result     <- PAFit(net_stats, r = 0.01, s = 100)
summary(result)

Run the code above in your browser using DataLab