Check the Heaviness of Package Dependencies

When developing R packages, we should try to avoid directly setting dependencies to "heavy packages". The "heaviness" for a package means, the number of additional dependent packages it brings to. If your package directly depends on a heavy package, it brings several consequences:

  1. Users need to install a lot of additional packages if your package is installed (which brings the risk that installation of some packages may fail that makes your package cannot be installed neither).
  2. The namespaces that are loaded into your R session after loading your package (by library(your-pkg)) will be huge (you can see the loaded namespaces by sessionInfo()).

You package will be "heavy" as well and it may take long time to load your package.

In the DESCRIPTION file of your package, those "directly dependent pakcages" are always listed in the "Depends" or "Imports" fields. To get rid of the heavy packages that are not offen used in your package, it is better to move them into the "Suggests" fields and load them only when they are needed.

Here pkgndep package checks the heaviness of the packages that your package depends on. For each package listed in the "Depends", "Imports" and "Suggests" fields in the DESCRIPTION file, it opens a new R session, loads the package and counts the number of namespaces that are loaded. The summary of the dependencies is visualized by a customized heatmap.

As an example, I am developing a package cola which depends on a lot of other packages. The dependency heatmap looks like (Figure in the original size is here):

In the heatmap, rows are the packages listed in "Depends", "Imports" and "Suggests" fields, columns are the namespaces that are loaded if each of the package is only loaded to a new R session. The barplots on the right show the number of namespaces that are imported by each package and the time of only loading one of the packages into R.

We can see if all the packages are put in the "Imports" field, 166 namespaces will be loaded after library(cola). Some of the heavy packages such as WGCNA and clusterProfiler are not very frequently used in cola, moving them to "Suggests" field and loading them only when they are needed helps to speed up loading cola. Now the number of namespaces are reduced to only 25 after library(cola).

Usage

To use this package:

library(pkgndep) x = pkgndep("package-name") plot(x)

or

x = pkgndep("path-to-the-package") plot(x)

Executable examples:

library(pkgndep) x = pkgndep("ComplexHeatmap") x plot(x)

Statistics

I ran pkgndep on all packages that are installed in my computer. The table of the number of loaded namespaces as well as the dependency heatmaps are available at https://jokergoo.github.io/pkgndep/stat/.

For a quick look, the top 10 packages with the largest dependencies are:

Package # Namespaces also load packages in Suggests Heatmap
ReportingTools 125 131 view
TCGAbiolinks 118 209 view
epik 116 116 view
minfiData 109 109 view
minfiDataEPIC 109 109 view
ggbio 108 119 view
FlowSorted.Blood.450k 108 108 view
IlluminaHumanMethylation450kanno.ilmn12.hg19 108 108 view
IlluminaHumanMethylation450kmanifest 108 108 view
IlluminaHumanMethylationEPICanno.ilm10b2.hg19 108 108 view

And the top 10 packages with the largest dependencies where packages in "Suggests" are also loaded are:

Package # Namespaces also load packages in Suggests Heatmap
TCGAbiolinks 118 209 view
cola 25 174 view
broom 29 171 view
GSEABase 29 135 view
sesame 73 134 view
ReportingTools 125 131 view
GenomicRanges 17 128 view
ensembldb 57 126 view
AER 36 126 view
BiocGenerics 8 125 view