# packageRank v0.1.0

0

0th

Percentile

Compute and visualize the cross-sectional and longitudinal number and rank percentile of package downloads from RStudio's CRAN mirror.

### features

• provide S3 plot methods for ‘cranlogs’ output.
• compute the rank percentile and nominal rank of a package’s downloads from RStudio’s CRAN mirror.
• visualize a package’s position in the distribution of package download counts for a given day (cross-sectionally) or over time (longitudinally).

NOTE: ‘packageRank’ relies on an active internet connection.

### background

The ‘cranlogs’ package computes the number of downloads using RStudio’s CRAN mirror. For example, we can see that the ‘HistData’ package was downloaded 51 times on the first day of 2019:

cranlogs::cran_downloads(packages = "HistData", from = "2019-01-01",
to = "2019-01-01")
>         date count  package
> 1 2019-01-01    51 HistData


And 787 times in the first week:

cranlogs::cran_downloads(packages = "HistData", from = "2019-01-01",
to = "2019-01-07")
>         date count  package
> 1 2019-01-01    51 HistData
> 2 2019-01-02   100 HistData
> 3 2019-01-03   137 HistData
> 4 2019-01-04   113 HistData
> 5 2019-01-05    85 HistData
> 6 2019-01-06    96 HistData
> 7 2019-01-07   205 HistData


In both cases, the “compared to what?” question lurks in the background. Is 51 downloads large or small? Is the pattern during that week typical or unusual? To help answer these questions, ‘packageRank’ tries to provide some perspective on package download counts.

### visualizing ‘cranlogs’

To visualize output from cranlogs::cran_download(), ‘packageRank’ provides easy-to-use S3 generic plot() methods. All you need to do is to use cran_downloads2() in place of cran_download():

plot(cran_downloads2(package = c("data.table", "Rcpp", "rlang"),
from = "2019-01-01", to = "2019-01-01"), graphics_pkg = "base")


plot(cran_downloads2(package = c("data.table", "Rcpp", "rlang"),
when = "last-month"))


plot(cran_downloads2(package = c("data.table", "Rcpp", "rlang"),
from = "2019-01-01", to = "2019-01-31"))


### compute percentiles and ranks

To compute a package’s rank percentile and nominal rank, use packageRank():

cran_downloads2(package = "HistData", from = "2019-01-01", to = "2019-01-01")
>         date count  package
> 1 2019-01-01    51 HistData

packageRank(package = "HistData", date = "2019-01-01")
> 1 2019-01-01 HistData        51       93.4 920 of 14,020


The rank is “nominal” because it’s possible that multiple packages will have identical numbers of downloads. As a result, a package’s nominal rank (but not its rank percentile) will sometimes be affected by its name: ties are sorted by the alphabetical order of package’s name. Thus, ‘HistData’ benefits from the fact that it appears second in the list (vector) of packages with 51 downloads:

pkg.rank <- packageRank(package = "HistData", date = "2019-01-01")

>
>  dynamicTreeCut        HistData          kimisc  NeuralNetTools
>              51              51              51              51
>   OpenStreetMap       pkgKitten plotlyGeoAssets            spls
>              51              51              51              51
>        webutils            zoom
>              51              51


To avoid the bottleneck of downloading multiple log file, packageRank() is currently limited to individual days. However, to reduce the need to re-download logs for each function call, ‘packageRank’ will make use of memoization via the ‘memoise’ package.

### memoization

Here’s relevant code:

fetchLog <- function(x) data.table::fread(x)

mfetchLog <- memoise::memoise(fetchLog)

if (RCurl::url.exists(url)) {
cran_log <- mfetchLog(url)
}


If you use fetchLog(), the log file, which can be as large as 50 MB, will be downloaded with each function call. If you use mfetchLog(), logs are intelligently cached: logs that have already been downloaded in the current session will not be downloaded again.

### visualization (cross-sectional)

To visualize a package’s position in the distribution on a given day’s downloads, use the following:

plot(packageRank(package = "HistData", date = "2019-05-01"),
graphics_pkg = "base")


This cross-sectional view plots a package’s rank (x-axis) against the logarithm of its downloads (y-axis) and highlights the package’s relative position in the overall distribution. In addition, it illustrates its percentile and its number of downloads (in red); the location of the 75th, 50th and 25th percentiles (dotted gray vertical lines); the package with the most downloads (in this case ‘devtools’) and the total number of downloads (2,982,767) from the CRAN mirror on that day (both in blue).

plot(packageRank(package = c("cholera", "HistData", "regtools"),
date = "2019-05-01"))


### visualization (longitudinal)

To visualize a package’s position in the distribution on a given day’s downloads, use packageRankTime(). Currently, only two time frames, “last-week” and “last-month” are available.

plot(packageRankTime(package = "HistData", when = "last-month"),
graphics_pkg = "base")


The longitudinal view plots the date (x-axis) against the logarithm of a package’s downloads (y-axis). In the background, we can see the same data, plotted in gray, for a stratified random sample of packages.[2] This sample is used to approximate the temporal pattern of all package downloads.

As above, you can pass a vector of packages:

plot(packageRankTime(package = c("Rcpp", "HistData", "rlang"),
when = "last-month"))


### graphics: base R and ‘ggplot2’

All plot are available as both base R graphics and ‘ggplot2’ figures via the graphics_pkg argument (“base” or “ggplot2”) in the plot() methods.

### installation

To install the development version of ‘packageRank’ from GitHub:

# For 'devtools' (< 2.0.0)
devtools::install_github("lindbrook/packageRank", build_vignettes = TRUE)

# For 'devtools' (>= 2.0.0)
devtools::install_github("lindbrook/packageRank", build_opts = c("--no-resave-data", "--no-manual"))


### Notes

1. Because packages with zero downloads are not recorded in the log, there is a censoring problem.

2. Within each 5% interval of rank percentiles (e.g., 0 to 5, 5 to 10, 95 to 100, etc.), a random sample of 5% of packages is selected and tracked over time.