crandep
The goal of crandep is to provide functions for analysing the dependencies of CRAN packages using social network analysis.
Installation
You can install crandep from github with:
# install.packages("devtools")
devtools::install_github("clement-lee/crandep")library(crandep)
library(dplyr)
library(ggplot2)
library(igraph)Overview
The functions and example dataset can be divided into the following categories:
- For obtaining data frames of package dependencies, use get_dep(),get_dep_all_packages().
- For obtaining igraph objects of package dependencies, use
get_graph_all_packages()anddf_to_graph().
- For modelling the number of dependencies, use *pol()and*mix2().
- There is also an example data set cran_dependencies.
One or multiple types of dependencies
To obtain the information about various kinds of dependencies of a
package, we can use the function get_dep() which takes the package
name and the type of dependencies as the first and second arguments,
respectively. Currently, the second argument accepts a character vector
of one or more of the following words: Depends, Imports,
LinkingTo, Suggests, Enhances, Reverse_depends,
Reverse_imports, Reverse_linking_to, Reverse_suggests, and
Reverse_enhances, or any variations in their letter cases, or if the
underscore "_" is replaced by a space.
get_dep("dplyr", "Imports")
#>     from         to    type reverse
#> 1  dplyr        cli imports   FALSE
#> 2  dplyr   generics imports   FALSE
#> 3  dplyr       glue imports   FALSE
#> 4  dplyr  lifecycle imports   FALSE
#> 5  dplyr   magrittr imports   FALSE
#> 6  dplyr    methods imports   FALSE
#> 7  dplyr     pillar imports   FALSE
#> 8  dplyr         R6 imports   FALSE
#> 9  dplyr      rlang imports   FALSE
#> 10 dplyr     tibble imports   FALSE
#> 11 dplyr tidyselect imports   FALSE
#> 12 dplyr      utils imports   FALSE
#> 13 dplyr      vctrs imports   FALSE
get_dep("MASS", c("depends", "suggests"))
#>   from        to     type reverse
#> 1 MASS grDevices  depends   FALSE
#> 2 MASS  graphics  depends   FALSE
#> 3 MASS     stats  depends   FALSE
#> 4 MASS     utils  depends   FALSE
#> 5 MASS   lattice suggests   FALSE
#> 6 MASS      nlme suggests   FALSE
#> 7 MASS      nnet suggests   FALSE
#> 8 MASS  survival suggests   FALSEFor more information on different types of dependencies, see the official guidelines and https://r-pkgs.org/description.html.
In the output, the column type is the type of the dependency converted
to lower case. Also, LinkingTo is now converted to linking to for
consistency.
get_dep("xts", "LinkingTo")
#>   from  to       type reverse
#> 1  xts zoo linking to   FALSE
get_dep("xts", "linking to")
#>   from  to       type reverse
#> 1  xts zoo linking to   FALSEFor the reverse dependencies, instead of including the prefix “Reverse”
in type, we use the argument reverse:
get_dep("abc", c("depends", "depends"), reverse = TRUE)
#>   from       to    type reverse
#> 1  abc abctools depends    TRUE
#> 2  abc  EasyABC depends    TRUE
get_dep("xts", c("linking to", "linking to"), reverse = TRUE)
#>   from       to       type reverse
#> 1  xts ichimoku linking to    TRUE
#> 2  xts  RcppXts linking to    TRUE
#> 3  xts      TTR linking to    TRUETheoretically, for each forward dependency
#>   from to type reverse
#> 1    A  B    c   FALSEthere should be an equivalent reverse dependency
#>   from to type reverse
#> 1    B  A    c    TRUEAligning the type in the forward dependency and the reverse dependency
enables this to be checked easily.
To obtain all types of dependencies, we can use "all" in the second
argument, instead of typing a character vector of all words:
df0.rstan <- get_dep("rstan", "all")
dplyr::count(df0.rstan, type)
#>         type  n
#> 1    depends  1
#> 2    imports 10
#> 3 linking to  5
#> 4   suggests 12
df1.rstan <- get_dep("rstan", "all", reverse = TRUE) # too many rows to display
dplyr::count(df1.rstan, type) # hence the summary using count()
#>         type   n
#> 1    depends  20
#> 2   enhances   3
#> 3    imports 139
#> 4 linking to 120
#> 5   suggests  33As of 2024-08-02, there are 0 packages that have all 10 types of dependencies, and 6 packages that have 9 types of dependencies: Matrix, bigmemory, miceadds, quanteda, rstan, xts.
Building and visualising a dependency network
To build a dependency network, we have to obtain the dependencies for
multiple packages. For illustration, we choose the core packages of the
tidyverse, and find out what each
package Imports. We put all the dependencies into one data frame, in
which the package in the from column imports the package in the to
column. This is essentially the edge list of the dependency network.
df0.imports <- rbind(
  get_dep("ggplot2", "Imports"),
  get_dep("dplyr", "Imports"),
  get_dep("tidyr", "Imports"),
  get_dep("readr", "Imports"),
  get_dep("purrr", "Imports"),
  get_dep("tibble", "Imports"),
  get_dep("stringr", "Imports"),
  get_dep("forcats", "Imports")
)
head(df0.imports)
#>      from        to    type reverse
#> 1 ggplot2       cli imports   FALSE
#> 2 ggplot2      glue imports   FALSE
#> 3 ggplot2 grDevices imports   FALSE
#> 4 ggplot2      grid imports   FALSE
#> 5 ggplot2    gtable imports   FALSE
#> 6 ggplot2   isoband imports   FALSE
tail(df0.imports)
#>       from        to    type reverse
#> 73 forcats       cli imports   FALSE
#> 74 forcats      glue imports   FALSE
#> 75 forcats lifecycle imports   FALSE
#> 76 forcats  magrittr imports   FALSE
#> 77 forcats     rlang imports   FALSE
#> 78 forcats    tibble imports   FALSEAll types of dependencies, in a data frame
The example dataset cran_dependencies contains all dependencies as of
2020-05-09.
data(cran_dependencies)
cran_dependencies
#> # A tibble: 211,381 × 4
#>    from  to             type     reverse
#>    <chr> <chr>          <chr>    <lgl>  
#>  1 A3    xtable         depends  FALSE  
#>  2 A3    pbapply        depends  FALSE  
#>  3 A3    randomForest   suggests FALSE  
#>  4 A3    e1071          suggests FALSE  
#>  5 aaSEA DT             imports  FALSE  
#>  6 aaSEA networkD3      imports  FALSE  
#>  7 aaSEA shiny          imports  FALSE  
#>  8 aaSEA shinydashboard imports  FALSE  
#>  9 aaSEA magrittr       imports  FALSE  
#> 10 aaSEA Bios2cor       imports  FALSE  
#> # ℹ 211,371 more rows
dplyr::count(cran_dependencies, type, reverse)
#> # A tibble: 8 × 3
#>   type       reverse     n
#>   <chr>      <lgl>   <int>
#> 1 depends    FALSE   11123
#> 2 depends    TRUE     9672
#> 3 imports    FALSE   57617
#> 4 imports    TRUE    51913
#> 5 linking to FALSE    3433
#> 6 linking to TRUE     3721
#> 7 suggests   FALSE   35018
#> 8 suggests   TRUE    38884This is essentially a snapshot of CRAN. We can obtain all the current
dependencies using get_dep_all_packages(), which requires no
arguments:
df0.cran <- get_dep_all_packages()
head(df0.cran)
#>       from         to    type reverse
#> 3 AATtools   magrittr imports   FALSE
#> 4 AATtools      dplyr imports   FALSE
#> 5 AATtools doParallel imports   FALSE
#> 6 AATtools    foreach imports   FALSE
#> 7   ABACUS    ggplot2 imports   FALSE
#> 8   ABACUS      shiny imports   FALSE
dplyr::count(df0.cran, type, reverse) # numbers in general larger than above
#>          type reverse      n
#> 1     depends   FALSE  10525
#> 2     depends    TRUE   9097
#> 3    enhances   FALSE    638
#> 4    enhances    TRUE    652
#> 5     imports   FALSE 103771
#> 6     imports    TRUE  95407
#> 7  linking to   FALSE   5872
#> 8  linking to    TRUE   6273
#> 9    suggests   FALSE  65414
#> 10   suggests    TRUE  72346Network of one type of dependencies, as an igraph object
We can build dependency network using get_graph_all_packages().
Furthermore, we can verify that the forward and reverse dependency
networks are (almost) the same, by looking at their size (number of
edges) and order (number of nodes).
g0.depends <- get_graph_all_packages(type = "depends")
g0.depends
#> IGRAPH 4c9c1ff DN-- 4627 7491 -- 
#> + attr: name (v/c)
#> + edges from 4c9c1ff (vertex names):
#>  [1] A3         ->xtable   A3         ->pbapply 
#>  [3] abc        ->abc.data abc        ->nnet    
#>  [5] abc        ->quantreg abc        ->MASS    
#>  [7] abc        ->locfit   ABCp2      ->MASS    
#>  [9] abctools   ->abc      abctools   ->abind   
#> [11] abctools   ->plyr     abctools   ->Hmisc   
#> [13] abd        ->nlme     abd        ->lattice 
#> [15] abd        ->mosaic   abodOutlier->cluster 
#> + ... omitted several edgesWe could obtain essentially the same graph, but with the direction of
the edges reversed, by specifying type = "reverse depends":
# Not run
g0.rev_depends <- get_graph_all_packages(type = "depends", reverse = TRUE)
g0.rev_dependsThe dependency words accepted by the argument type is the same as in
get_dep(). The two networks’ size and order should be very close if
not identical to each other. Because of the dependency direction, their
edge lists should be the same but with the column names from and to
swapped.
For verification, the exact same graphs can be obtained by filtering the
data frame for the required dependency and applying df_to_graph():
g1.depends <- df0.cran |>
  dplyr::filter(type == "depends" & !reverse) |>
  df_to_graph(nodelist = dplyr::rename(df0.cran, name = from))
g1.depends # same as g0.depends
#> IGRAPH 0cb388d DN-- 4627 7491 -- 
#> + attr: name (v/c), type (e/c), reverse (e/l)
#> + edges from 0cb388d (vertex names):
#>  [1] A3         ->xtable   A3         ->pbapply 
#>  [3] abc        ->abc.data abc        ->nnet    
#>  [5] abc        ->quantreg abc        ->MASS    
#>  [7] abc        ->locfit   ABCp2      ->MASS    
#>  [9] abctools   ->abc      abctools   ->abind   
#> [11] abctools   ->plyr     abctools   ->Hmisc   
#> [13] abd        ->nlme     abd        ->lattice 
#> [15] abd        ->mosaic   abodOutlier->cluster 
#> + ... omitted several edges