fullmatch()
, reducing memory requirements
for fullmatch()
and heading off certain user errors.makedist(structure.fmla, data, fn = function(trtvar, dat, ...) {
matrix(0, sum(trtvar), sum(!trtvar), dimnames = list(names(trtvar)[trtvar], names(trtvar)[!trtvar]))
}, ...)
data
,
with a treatment variable on the LHS and either structure.fmla
is evaluated
and fn
operates.fn
structure.fmla
. Each of these is a number of treatments by
number of controls matrix of distances, with the distance between
treatments and controls calculated by the user-given function fn
. The list also has some attributes that are not of direct interest to
the user, but are used by fullmatch()
.
fn
should be a function with first two arguments
trtvar
, a treatment variable, and
dat
, a data frame. There may be additional arguments. If the
function uses variables in dat
, these should be referenced
using names from the input trtvar
, particularly if the sample is being
split into strata (ie structure.fmla
has a non-trivial RHS).
When this happens, fn
will be passed a trtvar
input
observations for only a subset of the rows of dat
, so it has to
use trtvar
to decide which rows of dat
to operate on; it
does this by lining up names of the (shorter) vector trtvar
with row names of dat
.fullmatch
, pscore.dist
, mahal.dist
data(nuclearplants)
##-- A distance function used in P. Rosenbaum's (2002) book
rankdiffs <- function(trtvar, dat, vars)
{
dmt <- matrix(0,sum(trtvar), sum(!trtvar))
for (vv in vars) {
vvr <- rank(dat[names(trtvar),vv])
dmt <- dmt + abs(outer(vvr[trtvar], vvr[!trtvar],"-"))
}
round(dmt)
}
##-- Gives a warning because this fn doesn't assign dimnames
(rdd1 <- makedist(pr~1, nuclearplants[nuclearplants$pt==0,], rankdiffs, c("cap","date")))
fullmatch(rdd1)
##-- fullmatch() knows its value should be ordered as the nuclearplants data set is
rdd1$m
##-- now fullmatch() doesn't know the proper order of units and has to guess
fullmatch(rdd1$m)
(rdd2 <- makedist(pr~pt, nuclearplants, rankdiffs, c("cap","date")))
fullmatch(rdd2)
##- Distance on a propensity score
scalardiffs <- function(trtvar,data,scalarname) {
sclr <- data[names(trtvar), scalarname]
names(sclr) <- names(trtvar)
abs(outer(sclr[trtvar],sclr[!trtvar], '-'))
}
nuclearplants$pscore <- glm(pr~.-(pr+cost), family=binomial(),
data=nuclearplants)$linear.predictors
##-- Distance for propensity score matching w/o prior stratification
psd1 <- makedist(pr~1, nuclearplants, scalardiffs, "pscore")
fullmatch(psd1)
##-- Distance for propensity score matching within levels of "pt"
psd2 <- makedist(pr~pt, nuclearplants, scalardiffs, "pscore")
fullmatch(psd2)
Run the code above in your browser using DataLab