compareOverlap: Compare y between newDat and refDat for shared values of x

Description

Compute dy <- (y - yRef) for all cases where x == xRef, where x and y are columns of newDat and xRef and yRef are columns of refDat.

Also compute dyRef <- dy / yRef.

Return silently a data.frame with columns x, y, yRef, dy, and dyRef.

Also if min(yRef)*max(yRef)>0 plot(dyRef) else plot(dy).

Usage

compareOverlap(y=2, yRef=y, x=1, 
      xRef=x, newDat, refDat, 
      ignoreCase=TRUE, ...)

Arguments

y, yRef

columns of newDat, refDat, respectively, to compare, ignoring case in the names unless ignoreCase is FALSE.

x, xRef

columns of newDat, refDat, respectively, to match when comparing y with yRef.

As with y and yRef, ignore case in name matching unless ignoreCase is FALSE.

newDat, refDat

data.frames of new and reference data in which to search for overlap, i.e., common values of newD[, x] and refDat[, xRef], and for those observations to compare newDat[, y] to refDat[, yRef].

ignoreCase

logical: If TRUE, ignore case when searching for columns of newDat and refDat to match y, yRef, x, and xRef.

...

optional arguments to pass to plot

Value

Invisibly return a data.frame with columns x, paste0(y, 'New'), past0(yRef, 'Ref'), dy, and dyRef of the data compared.

Details

This function is particularly useful for updating datasets that are obtained from sources like the Bureau of Justice Statistics, which publish many series with each update including the most recent 11 years. This function can be used to evaluate the extent of equivalence between, e.g., historical data in refDat with the latest data in newDat.

Examples

Run this code

# NOT RUN {
nDat <- data.frame(yr=2000:2015, 
          Y=0:15)
rDat <- data.frame(Yr=2018:2011, 
          y=c(17:13, 13:11))
nrDat <- compareOverlap(
  newDat=nDat, refDat=rDat)

# Correct answer
NRdat <- data.frame(yr=2011:2015, 
  YNew=11:15, yRef=c(11:13, 13:14), 
  dy=c(0,0,0, 1, 1), 
  dyRef=c(0,0,0, 1,1) / 
        c(11:13, 13:14))

# }
# NOT RUN {
all.equal(nrDat, NRdat)
# }