PAFit: Joint inference of preferential attachment and node fitness by Minorize-Maximization algorithms

Description

From a PAFit_data object, which contains summary statistics of the dataset, PAFit estimates the attachment function $A_k$ and node fitness $\eta_i$ by penalized log-likelihood maximization. It also infers the remaining uncertainties in the estimated results by approximating the confidence intervals of $A_k$ and $\eta_i$.

Estimation of either the attachment function or node fitness in isolation are also supported. Estimation of the PA function with $\eta_i = 1$ can be specified by setting only_PA = TRUE. Estimation of node fitness with either $A_k = k$ or $A_k = 1$ can be specified by setting only_f = TRUE.

Usage

PAFit (net_stat,  only_PA        = FALSE       , only_f         = FALSE       ,  mode_f         = "Linear_PA" , true_A         = NULL        , true_f         = NULL        ,   mode_reg_A     = 0           , weight_PA_mode = 1           , s              = 10          , lambda         = 1           ,  auto_lambda    = TRUE        ,  r             = 0.01        ,   alpha_start    = 1           , start_mode_A   = "Log_linear",  start_mode_f   = "Constant"  ,  auto_stop      = TRUE        , stop_cond      = 10^-7       ,  iteration      = 200         , max_iter       = 2e+05       ,  debug          = FALSE       , q              = 1           ,    step_size      = 0.5         ,  normalized_f   = FALSE       , interpolate    = TRUE)

Arguments

net_stat

An object of class "PAFit_data" containing all the summary statistics summerized from the data by the function GetStatistics.

only_PA

Logical. TRUE means that the attachment function $A_k$ is estimated in isolation(fixing $\eta_i = 1$). Default is FALSE.

only_f

Logical. TRUE means that the fitness function is estimated in isolation. Default is FALSE.

mode_f

String. Possible values: "Linear_PA", "Constant_PA" or "Log_linear". In the first two cases, the PA function is fixed. If mode_f == "Linear_PA" then $A_k = k$ for $k \ge 1$ and $A_0 = 1$. If mode_f == "Constant_PA" then $A_k = 1$ for all $k$. In the final case of mode_f == "Log_linear", we set $A_k = k^\alpha$ for $k \ge 1$ and $A_0 = 1$. The value of $\alpha$ is also estimated. Default values is "Linear_PA".

true_A

Numeric vector. User-supplemented value of the PA function. If true_A is supplemented, then only node fitnesses are estimated.

true_f

Numeric vector. User-supplemented value of node fitnesses. If true_f is supplemented, then only the PA function is estimated.

mode_reg_A

Integer. Possible values: 0, 1 or 2. Indicates which regularization term is used for the PA function. For the regularization function used in the PLOS ONE and SR paper, use 0. Default value is 0.

weight_PA_mode

Binary. Indicates how the regularization terms for $A_k$ are weighted. If weight_PA_mode == 0, the regularization term for $A_k$ is weighted by the total number of edges connected to degree $k$ nodes. If weight_PA_mode == 1, the regularization terms have uniform weights. Default value is $0$.

Positive numeric. The regularization parameter s for node fitness. Default value is 10.

lambda

Non-negative numeric. The absolute strength of the regularization for PA function. Ignored when auto_lambda == TRUE. Default value is 1. lambda == 0 means no regularization for PA.

auto_lambda

Logical. If auto_lambda == TRUE, lambda will be determined automatically from the data by r. Default is TRUE.

Non-negative numeric. The regularization parameter r for the PA function indicates the relative strength of the regularization term. From r, the value of lambda is automatically determined if auto_lambda == TRUE. Default value is 0.01.

alpha_start

Non-negative numeric. The starting value for alpha when we use the model $k^\alpha$. Default value is 1.

start_mode_A

String. Takes one of two values: "Log_linear" (the initial PA function set to k^alpha_start) or "Random" (the initial function is randomly sampled from a uniform distribution). Default value is "Log_linear".

start_mode_f

String. Takes one of two values: "Constant" (the initial node fitnesses are all set to 1) or "Random" (the initial node fitnesses are randomly sampled from a gamma distribution). Default value is "Constant".

auto_stop

Logical. Indicates whether the algorithm stop automatically or not. Default is TRUE

stop_cond

Numeric. If auto_stop = TRUE, the iterative algorithm stops when $abs(h(ii) - h(ii + 1)) / (abs(h(ii)) + 1) < stop_cond$ where $h(ii)$ is the value of the objective function at iteration $ii$. We recommend to choose stop_cond at most equal to $10^(- number of digits of h - 2)$, in order to ensure that when the algorithm stops, the increase in posterior probability is less than 1% of the current posterior probability. Default is 10^-7.

iteration

Integer. The number of iterations. Ignored if auto_stop == TRUE. Default value is 20.

max_iter

Integer. The maximum number of iterations. Regardless of other settings, the algorithm will stop once the number of iterations reaches this threshold. Default value is 2e+05.

debug

Logical. if debug == TRUE, the value of the objective function $h$ is printed out at each step. Defaule is FALSE.

Integer. Indicates numbers of previous steps using in the quasi-Newton speedup. Ignored if $q <= 1$.="" defaule="" is="" 1.

step_size

Numeric. A number between $(0,1]$ to indicate the step-size of the quasi-Newton speedup. Ignored (no quasi-Newton speedup) if $q <= 1$.="" defaule="" is="" 0.5.

normalized_f

Logical. Indicates whether we should normalize the estimated value of node fitness after estimation. Default value is FALSE.

interpolate

Logical. Indicates whether we should perform interpolation for the missing values of the estimated $A_k$. The interpolation, if performed, is a linear regression on log-scale. Default value is TRUE.

Value

an object of class "PAFit_result", which is a list. Some important fields can be divided into five groups.The first group gives the estimated preferential attachment function:The second group gives the confidence intervals of the estimated PA function:The third group gives the estimated node fitnesses:The fourth group gives the confidence intervals of the estimated node fitnesses:The final group gives additional information on the iterative process:

References

1. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Nonparametric Estimation of the Preferential Attachment Function in Complex Networks: Evidence of Deviations from Log Linearity, Proceedings of ECCS 2014, 141-153 (Springer International Publishing) (http://dx.doi.org/10.1007/978-3-319-29228-1_13).

2. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9): e0137796. doi:10.1371/journal.pone.0137796 (http://dx.doi.org/10.1371/journal.pone.0137796).

3. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. doi:10.1038/srep32558 (www.nature.com/articles/srep32558).

Examples

Run this code

library("PAFit")
net        <- GenerateNet(N = 50,m = 1, mode = 1, alpha = 1, shape = 10, rate = 10)
net_stats  <- GetStatistics(net$graph)
result     <- PAFit(net_stats)

Run the code above in your browser using DataLab