page_rank
The Page Rank algorithm
Calculates the Google PageRank for the specified vertices.
- Keywords
- graphs
Usage
page_rank(
graph,
algo = c("prpack", "arpack", "power"),
vids = V(graph),
directed = TRUE,
damping = 0.85,
personalized = NULL,
weights = NULL,
options = NULL
)page_rank_old(
graph,
vids = V(graph),
directed = TRUE,
niter = 1000,
eps = 0.001,
damping = 0.85,
old = FALSE
)
Arguments
- graph
The graph object.
- algo
Character scalar, which implementation to use to carry out the calculation. The default is
"prpack"
, which uses the PRPACK library (https://github.com/dgleich/prpack). This is a new implementation in igraph version 0.7, and the suggested one, as it is the most stable and the fastest for all but small graphs."arpack"
uses the ARPACK library, the default implementation from igraph version 0.5 until version 0.7.power
uses a simple implementation of the power method, this was the default in igraph before version 0.5 and is the same as callingpage_rank_old
.- vids
The vertices of interest.
- directed
Logical, if true directed paths will be considered for directed graphs. It is ignored for undirected graphs.
- damping
The damping factor (‘d’ in the original paper).
- personalized
Optional vector giving a probability distribution to calculate personalized PageRank. For personalized PageRank, the probability of jumping to a node when abandoning the random walk is not uniform, but it is given by this vector. The vector should contains an entry for each vertex and it will be rescaled to sum up to one.
- weights
A numerical vector or
NULL
. This argument can be used to give edge weights for calculating the weighted PageRank of vertices. If this isNULL
and the graph has aweight
edge attribute then that is used. Ifweights
is a numerical vector then it used, even if the graph has aweights
edge attribute. If this isNA
, then no edge weights are used (even if the graph has aweight
edge attribute. This function interprets edge weights as connection strengths. In the random surfer model, an edge with a larger weight is more likely to be selected by the surfer.- options
Either a named list, to override some ARPACK options. See
arpack
for details; or a named list to override the default options for the power method (ifalgo="power"
). The default options for the power method areniter=1000
andeps=0.001
. This argument is ignored if the PRPACK implementation is used.- niter
The maximum number of iterations to perform.
- eps
The algorithm will consider the calculation as complete if the difference of PageRank values between iterations change less than this value for every node.
- old
A logical scalar, whether the old style (pre igraph 0.5) normalization to use. See details below.
Details
For the explanation of the PageRank algorithm, see the following webpage: http://infolab.stanford.edu/~backrub/google.html, or the following reference:
Sergey Brin and Larry Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the 7th World-Wide Web Conference, Brisbane, Australia, April 1998.
igraph 0.5 (and later) contains two PageRank calculation implementations.
The page_rank
function uses ARPACK to perform the calculation, see
also arpack
.
The page_rank_old
function performs a simple power method, this is
the implementation that was available under the name page_rank
in pre
0.5 igraph versions. Note that page_rank_old
has an argument called
old
. If this argument is FALSE
(the default), then the proper
PageRank algorithm is used, i.e. \((1-d)/n\) is added to the weighted
PageRank of vertices to calculate the next iteration. If this argument is
TRUE
then \((1-d)\) is added, just like in the PageRank paper;
\(d\) is the damping factor, and \(n\) is the total number of vertices.
A further difference is that the old implementation does not renormalize the
page rank vector after each iteration. Note that the old=FALSE
method is not stable, is does not necessarily converge to a fixed point. It
should be avoided for new code, it is only included for compatibility with
old igraph versions.
Please note that the PageRank of a given vertex depends on the PageRank of all other vertices, so even if you want to calculate the PageRank for only some of the vertices, all of them must be calculated. Requesting the PageRank for only some of the vertices does not result in any performance increase at all.
Since the calculation is an iterative process, the algorithm is stopped after a given count of iterations or if the PageRank value differences between iterations are less than a predefined value.
Value
For page_rank
a named list with entries:
A numeric vector with the PageRank scores.
The eigenvalue corresponding to the eigenvector with the page rank scores. It should be always exactly one.
Some information about the underlying
ARPACK calculation. See arpack
for details. This entry is
NULL
if not the ARPACK implementation was used.
For page_rank_old a numeric vector of Page Rank scores.
References
Sergey Brin and Larry Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the 7th World-Wide Web Conference, Brisbane, Australia, April 1998.
See Also
Other centrality scores: closeness
,
betweenness
, degree
Examples
# NOT RUN {
g <- sample_gnp(20, 5/20, directed=TRUE)
page_rank(g)$vector
g2 <- make_star(10)
page_rank(g2)$vector
# Personalized PageRank
g3 <- make_ring(10)
page_rank(g3)$vector
reset <- seq(vcount(g3))
page_rank(g3, personalized=reset)$vector
# }