packageRank v0.1.0

0

Monthly downloads

0th

Percentile

Computation and Visualization of Package Download Counts and Percentiles

Compute and visualize the cross-sectional and longitudinal number and rank percentile of package downloads from RStudio's CRAN mirror.

Readme

packageRank: compute and visualize package download counts and percentiles

features

  • provide S3 plot methods for ‘cranlogs’ output.
  • compute the rank percentile and nominal rank of a package’s downloads from RStudio’s CRAN mirror.
  • visualize a package’s position in the distribution of package download counts for a given day (cross-sectionally) or over time (longitudinally).

NOTE: ‘packageRank’ relies on an active internet connection.

background

The ‘cranlogs’ package computes the number of downloads using RStudio’s CRAN mirror. For example, we can see that the ‘HistData’ package was downloaded 51 times on the first day of 2019:

cranlogs::cran_downloads(packages = "HistData", from = "2019-01-01",
  to = "2019-01-01")
>         date count  package
> 1 2019-01-01    51 HistData

And 787 times in the first week:

cranlogs::cran_downloads(packages = "HistData", from = "2019-01-01",
  to = "2019-01-07")
>         date count  package
> 1 2019-01-01    51 HistData
> 2 2019-01-02   100 HistData
> 3 2019-01-03   137 HistData
> 4 2019-01-04   113 HistData
> 5 2019-01-05    85 HistData
> 6 2019-01-06    96 HistData
> 7 2019-01-07   205 HistData

In both cases, the “compared to what?” question lurks in the background. Is 51 downloads large or small? Is the pattern during that week typical or unusual? To help answer these questions, ‘packageRank’ tries to provide some perspective on package download counts.

visualizing ‘cranlogs’

To visualize output from cranlogs::cran_download(), ‘packageRank’ provides easy-to-use S3 generic plot() methods. All you need to do is to use cran_downloads2() in place of cran_download():

plot(cran_downloads2(package = c("data.table", "Rcpp", "rlang"),
  from = "2019-01-01", to = "2019-01-01"), graphics_pkg = "base")

plot(cran_downloads2(package = c("data.table", "Rcpp", "rlang"),
  when = "last-month"))

plot(cran_downloads2(package = c("data.table", "Rcpp", "rlang"),
  from = "2019-01-01", to = "2019-01-31"))

compute percentiles and ranks

To compute a package’s rank percentile and nominal rank, use packageRank():

cran_downloads2(package = "HistData", from = "2019-01-01", to = "2019-01-01")
>         date count  package
> 1 2019-01-01    51 HistData

packageRank(package = "HistData", date = "2019-01-01")
>         date  package downloads percentile          rank
> 1 2019-01-01 HistData        51       93.4 920 of 14,020

Doing so, we can see two additional numbers in addition to raw counts. First, 51 downloads places ‘HistData’ in the 93rd percentile. This statistic, familiar to people who’ve taken a standardized exam, tell us that 93% of packages had fewer downloads than ‘HistData’.[1] Second, 51 downloads “nominally” puts ‘HistData’ in 920th place of the 14,020 packages downloaded.

The rank is “nominal” because it’s possible that multiple packages will have identical numbers of downloads. As a result, a package’s nominal rank (but not its rank percentile) will sometimes be affected by its name: ties are sorted by the alphabetical order of package’s name. Thus, ‘HistData’ benefits from the fact that it appears second in the list (vector) of packages with 51 downloads:

pkg.rank <- packageRank(package = "HistData", date = "2019-01-01")
downloads <- pkg.rank$crosstab

downloads[downloads == 51]
>
>  dynamicTreeCut        HistData          kimisc  NeuralNetTools
>              51              51              51              51
>   OpenStreetMap       pkgKitten plotlyGeoAssets            spls
>              51              51              51              51
>        webutils            zoom
>              51              51

To avoid the bottleneck of downloading multiple log file, packageRank() is currently limited to individual days. However, to reduce the need to re-download logs for each function call, ‘packageRank’ will make use of memoization via the ‘memoise’ package.

memoization

Here’s relevant code:

fetchLog <- function(x) data.table::fread(x)

mfetchLog <- memoise::memoise(fetchLog)

if (RCurl::url.exists(url)) {
  cran_log <- mfetchLog(url)
}

If you use fetchLog(), the log file, which can be as large as 50 MB, will be downloaded with each function call. If you use mfetchLog(), logs are intelligently cached: logs that have already been downloaded in the current session will not be downloaded again.

visualization (cross-sectional)

To visualize a package’s position in the distribution on a given day’s downloads, use the following:

plot(packageRank(package = "HistData", date = "2019-05-01"),
  graphics_pkg = "base")

This cross-sectional view plots a package’s rank (x-axis) against the logarithm of its downloads (y-axis) and highlights the package’s relative position in the overall distribution. In addition, it illustrates its percentile and its number of downloads (in red); the location of the 75th, 50th and 25th percentiles (dotted gray vertical lines); the package with the most downloads (in this case ‘devtools’) and the total number of downloads (2,982,767) from the CRAN mirror on that day (both in blue).

Just like cranlogs::cran_downloads(), you can also pass a vector of packages:

plot(packageRank(package = c("cholera", "HistData", "regtools"),
  date = "2019-05-01"))

visualization (longitudinal)

To visualize a package’s position in the distribution on a given day’s downloads, use packageRankTime(). Currently, only two time frames, “last-week” and “last-month” are available.

plot(packageRankTime(package = "HistData", when = "last-month"),
  graphics_pkg = "base")

The longitudinal view plots the date (x-axis) against the logarithm of a package’s downloads (y-axis). In the background, we can see the same data, plotted in gray, for a stratified random sample of packages.[2] This sample is used to approximate the temporal pattern of all package downloads.

As above, you can pass a vector of packages:

plot(packageRankTime(package = c("Rcpp", "HistData", "rlang"),
  when = "last-month"))

graphics: base R and ‘ggplot2’

All plot are available as both base R graphics and ‘ggplot2’ figures via the graphics_pkg argument (“base” or “ggplot2”) in the plot() methods.

installation

To install the development version of ‘packageRank’ from GitHub:

# For 'devtools' (< 2.0.0)
devtools::install_github("lindbrook/packageRank", build_vignettes = TRUE)

# For 'devtools' (>= 2.0.0)
devtools::install_github("lindbrook/packageRank", build_opts = c("--no-resave-data", "--no-manual"))

Notes

  1. Because packages with zero downloads are not recorded in the log, there is a censoring problem.

  2. Within each 5% interval of rank percentiles (e.g., 0 to 5, 5 to 10, 95 to 100, etc.), a random sample of 5% of packages is selected and tracked over time.

Functions in packageRank

Name Description
plot.package_rank_time Plot method for timeSeriesRank().
print.cranlogs Print method for packageRank().
fetchLog Fetch Package Logs.
summary.package_rank_time Summary method for timeSeriesRank().
packageRank Package download counts and rank percentiles (cross-sectional).
print.package_rank Print method for packageRank().
plot.package_rank Plot method for packageRank().
cran_downloads2 Daily package downloads from the RStudio CRAN mirror.
plot.cranlogs Plot method for cran_downloads2().
packageRankTime Package download counts and rank percentiles (longitudinal).
summary.cranlogs Summary method for packageRank().
print.package_rank_time Print method for timeSeriesRank().
summary.package_rank Summary method for packageRank().
No Results!

Vignettes of packageRank

Name
introduction.Rmd
No Results!

Last month downloads

Details

Type Package
Date 2019-05-13
URL https://github.com/lindbrook/packageRank
BugReports https://github.com/lindbrook/packageRank/issues
License GPL (>= 2)
Encoding UTF-8
Language en-US
LazyData true
RoxygenNote 6.1.1
VignetteBuilder knitr
NeedsCompilation no
Packaged 2019-05-13 16:43:53 UTC; peter
Repository CRAN
Date/Publication 2019-05-16 07:40:03 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/packageRank)](http://www.rdocumentation.org/packages/packageRank)