# packageRank v0.1.0

Monthly downloads

## Computation and Visualization of Package Download Counts and Percentiles

Compute and visualize the cross-sectional and longitudinal number
and rank percentile of package downloads from RStudio's CRAN mirror.

## Readme

## packageRank: compute and visualize package download counts and percentiles

### features

- provide S3 plot methods for ‘cranlogs’ output.
- compute the rank percentile and nominal rank of a package’s downloads from RStudio’s CRAN mirror.
- visualize a package’s position in the distribution of package download counts for a given day (cross-sectionally) or over time (longitudinally).

NOTE: ‘packageRank’ relies on an active internet connection.

### background

The ‘cranlogs’ package computes the number of downloads using RStudio’s CRAN mirror. For example, we can see that the ‘HistData’ package was downloaded 51 times on the first day of 2019:

```
cranlogs::cran_downloads(packages = "HistData", from = "2019-01-01",
to = "2019-01-01")
> date count package
> 1 2019-01-01 51 HistData
```

And 787 times in the first week:

```
cranlogs::cran_downloads(packages = "HistData", from = "2019-01-01",
to = "2019-01-07")
> date count package
> 1 2019-01-01 51 HistData
> 2 2019-01-02 100 HistData
> 3 2019-01-03 137 HistData
> 4 2019-01-04 113 HistData
> 5 2019-01-05 85 HistData
> 6 2019-01-06 96 HistData
> 7 2019-01-07 205 HistData
```

In both cases, the “compared to what?” question lurks in the background. Is 51 downloads large or small? Is the pattern during that week typical or unusual? To help answer these questions, ‘packageRank’ tries to provide some perspective on package download counts.

### visualizing ‘cranlogs’

To visualize output from `cranlogs::cran_download()`

, ‘packageRank’
provides easy-to-use S3 generic plot() methods. All you need to do is to
use cran_downloads2() in place of cran_download():

```
plot(cran_downloads2(package = c("data.table", "Rcpp", "rlang"),
from = "2019-01-01", to = "2019-01-01"), graphics_pkg = "base")
```

```
plot(cran_downloads2(package = c("data.table", "Rcpp", "rlang"),
when = "last-month"))
```

```
plot(cran_downloads2(package = c("data.table", "Rcpp", "rlang"),
from = "2019-01-01", to = "2019-01-31"))
```

### compute percentiles and ranks

To compute a package’s rank percentile and nominal rank, use
`packageRank()`

:

```
cran_downloads2(package = "HistData", from = "2019-01-01", to = "2019-01-01")
> date count package
> 1 2019-01-01 51 HistData
packageRank(package = "HistData", date = "2019-01-01")
> date package downloads percentile rank
> 1 2019-01-01 HistData 51 93.4 920 of 14,020
```

Doing so, we can see two additional numbers in addition to raw counts. First, 51 downloads places ‘HistData’ in the 93rd percentile. This statistic, familiar to people who’ve taken a standardized exam, tell us that 93% of packages had fewer downloads than ‘HistData’.[1] Second, 51 downloads “nominally” puts ‘HistData’ in 920th place of the 14,020 packages downloaded.

The rank is “nominal” because it’s possible that multiple packages will have identical numbers of downloads. As a result, a package’s nominal rank (but not its rank percentile) will sometimes be affected by its name: ties are sorted by the alphabetical order of package’s name. Thus, ‘HistData’ benefits from the fact that it appears second in the list (vector) of packages with 51 downloads:

```
pkg.rank <- packageRank(package = "HistData", date = "2019-01-01")
downloads <- pkg.rank$crosstab
downloads[downloads == 51]
>
> dynamicTreeCut HistData kimisc NeuralNetTools
> 51 51 51 51
> OpenStreetMap pkgKitten plotlyGeoAssets spls
> 51 51 51 51
> webutils zoom
> 51 51
```

To avoid the bottleneck of downloading multiple log file,
`packageRank()`

is currently limited to individual days. However, to
reduce the need to re-download logs for each function call,
‘packageRank’ will make use of memoization via the ‘memoise’
package.

### memoization

Here’s relevant code:

```
fetchLog <- function(x) data.table::fread(x)
mfetchLog <- memoise::memoise(fetchLog)
if (RCurl::url.exists(url)) {
cran_log <- mfetchLog(url)
}
```

If you use `fetchLog()`

, the log file, which can be as large as 50 MB,
will be downloaded with each function call. If you use `mfetchLog()`

,
logs are intelligently cached: logs that have already been downloaded in
the current session will not be downloaded again.

### visualization (cross-sectional)

To visualize a package’s position in the distribution on a given day’s downloads, use the following:

```
plot(packageRank(package = "HistData", date = "2019-05-01"),
graphics_pkg = "base")
```

This cross-sectional view plots a package’s rank (x-axis) against the logarithm of its downloads (y-axis) and highlights the package’s relative position in the overall distribution. In addition, it illustrates its percentile and its number of downloads (in red); the location of the 75th, 50th and 25th percentiles (dotted gray vertical lines); the package with the most downloads (in this case ‘devtools’) and the total number of downloads (2,982,767) from the CRAN mirror on that day (both in blue).

Just like cranlogs::cran_downloads(), you can also pass a vector of packages:

```
plot(packageRank(package = c("cholera", "HistData", "regtools"),
date = "2019-05-01"))
```

### visualization (longitudinal)

To visualize a package’s position in the distribution on a given day’s
downloads, use `packageRankTime()`

. Currently, only two time frames,
“last-week” and “last-month” are available.

```
plot(packageRankTime(package = "HistData", when = "last-month"),
graphics_pkg = "base")
```

The longitudinal view plots the date (x-axis) against the logarithm of a package’s downloads (y-axis). In the background, we can see the same data, plotted in gray, for a stratified random sample of packages.[2] This sample is used to approximate the temporal pattern of all package downloads.

As above, you can pass a vector of packages:

```
plot(packageRankTime(package = c("Rcpp", "HistData", "rlang"),
when = "last-month"))
```

### graphics: base R and ‘ggplot2’

All plot are available as both base R graphics and ‘ggplot2’ figures via the graphics_pkg argument (“base” or “ggplot2”) in the plot() methods.

### installation

To install the development version of ‘packageRank’ from GitHub:

```
# For 'devtools' (< 2.0.0)
devtools::install_github("lindbrook/packageRank", build_vignettes = TRUE)
# For 'devtools' (>= 2.0.0)
devtools::install_github("lindbrook/packageRank", build_opts = c("--no-resave-data", "--no-manual"))
```

### Notes

Because packages with zero downloads are not recorded in the log, there is a censoring problem.

Within each 5% interval of rank percentiles (e.g., 0 to 5, 5 to 10, 95 to 100, etc.), a random sample of 5% of packages is selected and tracked over time.

## Functions in packageRank

Name | Description | |

plot.package_rank_time | Plot method for timeSeriesRank(). | |

print.cranlogs | Print method for packageRank(). | |

fetchLog | Fetch Package Logs. | |

summary.package_rank_time | Summary method for timeSeriesRank(). | |

packageRank | Package download counts and rank percentiles (cross-sectional). | |

print.package_rank | Print method for packageRank(). | |

plot.package_rank | Plot method for packageRank(). | |

cran_downloads2 | Daily package downloads from the RStudio CRAN mirror. | |

plot.cranlogs | Plot method for cran_downloads2(). | |

packageRankTime | Package download counts and rank percentiles (longitudinal). | |

summary.cranlogs | Summary method for packageRank(). | |

print.package_rank_time | Print method for timeSeriesRank(). | |

summary.package_rank | Summary method for packageRank(). | |

No Results! |

## Vignettes of packageRank

Name | ||

introduction.Rmd | ||

No Results! |

## Last month downloads

## Details

Type | Package |

Date | 2019-05-13 |

URL | https://github.com/lindbrook/packageRank |

BugReports | https://github.com/lindbrook/packageRank/issues |

License | GPL (>= 2) |

Encoding | UTF-8 |

Language | en-US |

LazyData | true |

RoxygenNote | 6.1.1 |

VignetteBuilder | knitr |

NeedsCompilation | no |

Packaged | 2019-05-13 16:43:53 UTC; peter |

Repository | CRAN |

Date/Publication | 2019-05-16 07:40:03 UTC |

imports | cranlogs , data.table , ggplot2 , memoise , parallel , RCurl |

suggests | knitr , rmarkdown |

depends | R (>= 3.4) |

Contributors |

#### Include our badge in your README

```
[![Rdoc](http://www.rdocumentation.org/badges/version/packageRank)](http://www.rdocumentation.org/packages/packageRank)
```