Learn R Programming

crandep

The goal of crandep is to provide functions for analysing the dependencies of CRAN packages using social network analysis.

Installation

You can install crandep from github with:

# install.packages("devtools")
devtools::install_github("clement-lee/crandep")
library(crandep)
library(dplyr)
library(ggplot2)
library(igraph)

Overview

The functions and example dataset can be divided into the following categories:

  1. For obtaining data frames of package dependencies, use get_dep(), get_dep_all_packages().
  2. For obtaining igraph objects of package dependencies, use get_graph_all_packages() and df_to_graph().
  3. For modelling the number of dependencies, use *pol() and *mix2().
  4. There is also an example data set cran_dependencies.

One or multiple types of dependencies

To obtain the information about various kinds of dependencies of a package, we can use the function get_dep() which takes the package name and the type of dependencies as the first and second arguments, respectively. Currently, the second argument accepts a character vector of one or more of the following words: Depends, Imports, LinkingTo, Suggests, Enhances, Reverse_depends, Reverse_imports, Reverse_linking_to, Reverse_suggests, and Reverse_enhances, or any variations in their letter cases, or if the underscore "_" is replaced by a space.

get_dep("dplyr", "Imports")
#>     from         to    type reverse
#> 1  dplyr        cli imports   FALSE
#> 2  dplyr   generics imports   FALSE
#> 3  dplyr       glue imports   FALSE
#> 4  dplyr  lifecycle imports   FALSE
#> 5  dplyr   magrittr imports   FALSE
#> 6  dplyr    methods imports   FALSE
#> 7  dplyr     pillar imports   FALSE
#> 8  dplyr         R6 imports   FALSE
#> 9  dplyr      rlang imports   FALSE
#> 10 dplyr     tibble imports   FALSE
#> 11 dplyr tidyselect imports   FALSE
#> 12 dplyr      utils imports   FALSE
#> 13 dplyr      vctrs imports   FALSE
get_dep("MASS", c("depends", "suggests"))
#>   from        to     type reverse
#> 1 MASS grDevices  depends   FALSE
#> 2 MASS  graphics  depends   FALSE
#> 3 MASS     stats  depends   FALSE
#> 4 MASS     utils  depends   FALSE
#> 5 MASS   lattice suggests   FALSE
#> 6 MASS      nlme suggests   FALSE
#> 7 MASS      nnet suggests   FALSE
#> 8 MASS  survival suggests   FALSE

For more information on different types of dependencies, see the official guidelines and https://r-pkgs.org/description.html.

In the output, the column type is the type of the dependency converted to lower case. Also, LinkingTo is now converted to linking to for consistency.

get_dep("xts", "LinkingTo")
#>   from  to       type reverse
#> 1  xts zoo linking to   FALSE
get_dep("xts", "linking to")
#>   from  to       type reverse
#> 1  xts zoo linking to   FALSE

For the reverse dependencies, instead of including the prefix “Reverse” in type, we use the argument reverse:

get_dep("abc", c("depends", "depends"), reverse = TRUE)
#>   from       to    type reverse
#> 1  abc abctools depends    TRUE
#> 2  abc  EasyABC depends    TRUE
get_dep("xts", c("linking to", "linking to"), reverse = TRUE)
#>   from       to       type reverse
#> 1  xts ichimoku linking to    TRUE
#> 2  xts  RcppXts linking to    TRUE
#> 3  xts      TTR linking to    TRUE

Theoretically, for each forward dependency

#>   from to type reverse
#> 1    A  B    c   FALSE

there should be an equivalent reverse dependency

#>   from to type reverse
#> 1    B  A    c    TRUE

Aligning the type in the forward dependency and the reverse dependency enables this to be checked easily.

To obtain all types of dependencies, we can use "all" in the second argument, instead of typing a character vector of all words:

df0.rstan <- get_dep("rstan", "all")
dplyr::count(df0.rstan, type)
#>         type  n
#> 1    depends  1
#> 2    imports 10
#> 3 linking to  5
#> 4   suggests 12
df1.rstan <- get_dep("rstan", "all", reverse = TRUE) # too many rows to display
dplyr::count(df1.rstan, type) # hence the summary using count()
#>         type   n
#> 1    depends  20
#> 2   enhances   3
#> 3    imports 139
#> 4 linking to 120
#> 5   suggests  33

As of 2024-08-02, there are 0 packages that have all 10 types of dependencies, and 6 packages that have 9 types of dependencies: Matrix, bigmemory, miceadds, quanteda, rstan, xts.

Building and visualising a dependency network

To build a dependency network, we have to obtain the dependencies for multiple packages. For illustration, we choose the core packages of the tidyverse, and find out what each package Imports. We put all the dependencies into one data frame, in which the package in the from column imports the package in the to column. This is essentially the edge list of the dependency network.

df0.imports <- rbind(
  get_dep("ggplot2", "Imports"),
  get_dep("dplyr", "Imports"),
  get_dep("tidyr", "Imports"),
  get_dep("readr", "Imports"),
  get_dep("purrr", "Imports"),
  get_dep("tibble", "Imports"),
  get_dep("stringr", "Imports"),
  get_dep("forcats", "Imports")
)
head(df0.imports)
#>      from        to    type reverse
#> 1 ggplot2       cli imports   FALSE
#> 2 ggplot2      glue imports   FALSE
#> 3 ggplot2 grDevices imports   FALSE
#> 4 ggplot2      grid imports   FALSE
#> 5 ggplot2    gtable imports   FALSE
#> 6 ggplot2   isoband imports   FALSE
tail(df0.imports)
#>       from        to    type reverse
#> 73 forcats       cli imports   FALSE
#> 74 forcats      glue imports   FALSE
#> 75 forcats lifecycle imports   FALSE
#> 76 forcats  magrittr imports   FALSE
#> 77 forcats     rlang imports   FALSE
#> 78 forcats    tibble imports   FALSE

All types of dependencies, in a data frame

The example dataset cran_dependencies contains all dependencies as of 2020-05-09.

data(cran_dependencies)
cran_dependencies
#> # A tibble: 211,381 × 4
#>    from  to             type     reverse
#>    <chr> <chr>          <chr>    <lgl>  
#>  1 A3    xtable         depends  FALSE  
#>  2 A3    pbapply        depends  FALSE  
#>  3 A3    randomForest   suggests FALSE  
#>  4 A3    e1071          suggests FALSE  
#>  5 aaSEA DT             imports  FALSE  
#>  6 aaSEA networkD3      imports  FALSE  
#>  7 aaSEA shiny          imports  FALSE  
#>  8 aaSEA shinydashboard imports  FALSE  
#>  9 aaSEA magrittr       imports  FALSE  
#> 10 aaSEA Bios2cor       imports  FALSE  
#> # ℹ 211,371 more rows
dplyr::count(cran_dependencies, type, reverse)
#> # A tibble: 8 × 3
#>   type       reverse     n
#>   <chr>      <lgl>   <int>
#> 1 depends    FALSE   11123
#> 2 depends    TRUE     9672
#> 3 imports    FALSE   57617
#> 4 imports    TRUE    51913
#> 5 linking to FALSE    3433
#> 6 linking to TRUE     3721
#> 7 suggests   FALSE   35018
#> 8 suggests   TRUE    38884

This is essentially a snapshot of CRAN. We can obtain all the current dependencies using get_dep_all_packages(), which requires no arguments:

df0.cran <- get_dep_all_packages()
head(df0.cran)
#>       from         to    type reverse
#> 3 AATtools   magrittr imports   FALSE
#> 4 AATtools      dplyr imports   FALSE
#> 5 AATtools doParallel imports   FALSE
#> 6 AATtools    foreach imports   FALSE
#> 7   ABACUS    ggplot2 imports   FALSE
#> 8   ABACUS      shiny imports   FALSE
dplyr::count(df0.cran, type, reverse) # numbers in general larger than above
#>          type reverse      n
#> 1     depends   FALSE  10525
#> 2     depends    TRUE   9097
#> 3    enhances   FALSE    638
#> 4    enhances    TRUE    652
#> 5     imports   FALSE 103771
#> 6     imports    TRUE  95407
#> 7  linking to   FALSE   5872
#> 8  linking to    TRUE   6273
#> 9    suggests   FALSE  65414
#> 10   suggests    TRUE  72346

Network of one type of dependencies, as an igraph object

We can build dependency network using get_graph_all_packages(). Furthermore, we can verify that the forward and reverse dependency networks are (almost) the same, by looking at their size (number of edges) and order (number of nodes).

g0.depends <- get_graph_all_packages(type = "depends")
g0.depends
#> IGRAPH 4c9c1ff DN-- 4627 7491 -- 
#> + attr: name (v/c)
#> + edges from 4c9c1ff (vertex names):
#>  [1] A3         ->xtable   A3         ->pbapply 
#>  [3] abc        ->abc.data abc        ->nnet    
#>  [5] abc        ->quantreg abc        ->MASS    
#>  [7] abc        ->locfit   ABCp2      ->MASS    
#>  [9] abctools   ->abc      abctools   ->abind   
#> [11] abctools   ->plyr     abctools   ->Hmisc   
#> [13] abd        ->nlme     abd        ->lattice 
#> [15] abd        ->mosaic   abodOutlier->cluster 
#> + ... omitted several edges

We could obtain essentially the same graph, but with the direction of the edges reversed, by specifying type = "reverse depends":

# Not run
g0.rev_depends <- get_graph_all_packages(type = "depends", reverse = TRUE)
g0.rev_depends

The dependency words accepted by the argument type is the same as in get_dep(). The two networks’ size and order should be very close if not identical to each other. Because of the dependency direction, their edge lists should be the same but with the column names from and to swapped.

For verification, the exact same graphs can be obtained by filtering the data frame for the required dependency and applying df_to_graph():

g1.depends <- df0.cran |>
  dplyr::filter(type == "depends" & !reverse) |>
  df_to_graph(nodelist = dplyr::rename(df0.cran, name = from))
g1.depends # same as g0.depends
#> IGRAPH 0cb388d DN-- 4627 7491 -- 
#> + attr: name (v/c), type (e/c), reverse (e/l)
#> + edges from 0cb388d (vertex names):
#>  [1] A3         ->xtable   A3         ->pbapply 
#>  [3] abc        ->abc.data abc        ->nnet    
#>  [5] abc        ->quantreg abc        ->MASS    
#>  [7] abc        ->locfit   ABCp2      ->MASS    
#>  [9] abctools   ->abc      abctools   ->abind   
#> [11] abctools   ->plyr     abctools   ->Hmisc   
#> [13] abd        ->nlme     abd        ->lattice 
#> [15] abd        ->mosaic   abodOutlier->cluster 
#> + ... omitted several edges

Copy Link

Version

Install

install.packages('crandep')

Monthly Downloads

334

Version

0.3.11

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Clement Lee

Last Published

December 6th, 2024

Functions in crandep (0.3.11)

lpost_bulk_wrapper

Wrapper of lpost_bulk, assuming power law (theta = 1.0)
lpost_pol_wrapper

Wrapper of lpost_pol, assuming power law (theta = 1.0)
lpost_pow

Unnormalised log-posterior density of discrete power law
mcmc_mix1_wrapper

Wrapper of mcmc_mix1
marg_pow

Marginal log-likelihood and posterior density of discrete power law via numerical integration
mcmc_mix1

Markov chain Monte Carlo for TZP-power-law mixture
mcmc_pol_wrapper

Wrapper of mcmc_pol
mcmc_mix2

Markov chain Monte Carlo for 2-component discrete extreme value mixture distribution
mcmc_pol

Markov chain Monte Carlo for Zipf-polylog distribution
mcmc_mix2_wrapper

Wrapper of mcmc_mix2
obtain_u_set_mix1

Obtain set of thresholds with high posterior density for the TZP-power-law mixture model
obtain_u_set_mix2

Obtain set of thresholds with high posterior density for the 2-component mixture model
mcmc_mix3_wrapper

Wrapper of mcmc_mix3
mcmc_mix3

Markov chain Monte Carlo for 3-component discrete extreme value mixture distribution
obtain_u_set_mix2_constrained

Obtain set of thresholds with high posterior density for the constrained 2-component mixture model
obtain_u_set_mix3

Obtain set of thresholds with high posterior density for the 3-component mixture model
reshape_dep

Reshape the data frame of dependencies
df_to_graph

Construct the giant component of the network from two data frames
Smix2

Survival function of 2-component discrete extreme value mixture distribution
Spol

Survival function of Zipf-polylog distribution
chi_citations

Citation network of CHI papers
dmix3

Probability mass function (PMF) of 3-component discrete extreme value mixture distribution
Smix3

Survival function of 3-component discrete extreme value mixture distribution
dmix2

Probability mass function (PMF) of 2-component discrete extreme value mixture distribution
get_dep

Multiple types of dependencies
get_graph_all_packages

Graph of dependencies of all CRAN packages
dpol

Probability mass function (PMF) of Zipf-polylog distribution
get_dep_all_packages

Dependencies of all CRAN packages
check_dep_word

Check and convert dependency word(s)
conditional_change

Conditionally change a string
cran_dependencies

Dependencies of CRAN packages
lpost_mix2_constrained

Wrapper of lpost_mix2, assuming power law (theta = 1.0) & contrained (alpha > 1.0, xi < 1.0 / (alpha - 1.0))
get_dep_vec

Split a string to a list of dependencies