ivmte: Instrumental Variables: Extrapolation by Marginal Treatment Effects

Description

This function provides a general framework for using the marginal treatment effect (MTE) to extrapolate. The model is the same binary treatment instrumental variable (IV) model considered by Imbens and Angrist (1994) (10.2307/2951620) and Heckman and Vytlacil (2005) (10.1111/j.1468-0262.2005.00594.x). The framework on which this function is based was developed by Mogstad, Santos and Torgovitsky (2018) (10.3982/ECTA15463). See also the recent survey paper on extrapolation in IV models by Mogstad and Torgovitsky (2018) (10.1146/annurev-economics-101617-041813). A detailed description of the module and its features can be found in Shea and Torgovitsky (2021).

Usage

ivmte(
  data,
  target,
  late.from,
  late.to,
  late.X,
  genlate.lb,
  genlate.ub,
  target.weight0 = NULL,
  target.weight1 = NULL,
  target.knots0 = NULL,
  target.knots1 = NULL,
  m0,
  m1,
  uname = u,
  m1.ub,
  m0.ub,
  m1.lb,
  m0.lb,
  mte.ub,
  mte.lb,
  m0.dec,
  m0.inc,
  m1.dec,
  m1.inc,
  mte.dec,
  mte.inc,
  equal.coef,
  ivlike,
  components,
  subset,
  propensity,
  link = "logit",
  treat,
  outcome,
  solver,
  solver.options,
  solver.presolve,
  solver.options.criterion,
  solver.options.bounds,
  lpsolver,
  lpsolver.options,
  lpsolver.presolve,
  lpsolver.options.criterion,
  lpsolver.options.bounds,
  criterion.tol = 1e-04,
  initgrid.nx = 20,
  initgrid.nu = 20,
  audit.nx = 2500,
  audit.nu = 25,
  audit.add = 100,
  audit.max = 25,
  audit.tol,
  rescale,
  point,
  point.eyeweight = FALSE,
  bootstraps = 0,
  bootstraps.m,
  bootstraps.replace = TRUE,
  levels = c(0.99, 0.95, 0.9),
  ci.type = "backward",
  specification.test = TRUE,
  noisy = FALSE,
  smallreturnlist = FALSE,
  debug = FALSE
)

Arguments

data

data.frame or data.table used to estimate the treatment effects.

target

character, target parameter to be estimated. The function allows for ATE ('ate'), ATT ('att'), ATU ('atu'), LATE ('late'), and generalized LATE ('genlate').

late.from

a named vector or a list declaring the baseline values of Z used to define the LATE. The name associated with each value should be the name of the corresponding variable.

late.to

a named vector or a list declaring the comparison set of values of Z used to define the LATE. The name associated with each value should be the name of the corresponding variable.

late.X

a named vector or a list declaring the values to condition on. The name associated with each value should be the name of the corresponding variable.

genlate.lb

lower bound value of unobservable u for estimating the generalized LATE.

genlate.ub

upper bound value of unobservable u for estimating the generalized LATE.

target.weight0

user-defined weight function for the control group defining the target parameter. A list of functions can be submitted if the weighting function is in fact a spline. The arguments of the function should be variable names in data. If the weight is constant across all observations, then the user can instead submit the value of the weight instead of a function.

target.weight1

user-defined weight function for the treated group defining the target parameter. See target.weight0 for details.

target.knots0

user-defined set of functions defining the knots associated with spline weights for the control group. The arguments of the function should consist only of variable names in data. If the knots are constant across all observations, then the user can instead submit the vector of knots instead of a function.

target.knots1

user-defined set of functions defining the knots associated with spline weights for the treated group. See target.knots0 for details.

one-sided formula for the marginal treatment response function for the control group. Splines may also be incorporated using the expression uSpline, e.g. uSpline(degree = 2, knots = c(0.4, 0.8), intercept = TRUE). The intercept argument may be omitted, and is set to TRUE by default.

one-sided formula for the marginal treatment response function for the treated group. See m0 for details.

uname

variable name for the unobservable used in declaring the MTRs. The name can be provided with or without quotation marks.

m1.ub

numeric value for upper bound on MTR for the treated group. By default, this will be set to the largest value of the observed outcome in the estimation sample.

m0.ub

numeric value for upper bound on MTR for the control group. By default, this will be set to the largest value of the observed outcome in the estimation sample.

m1.lb

numeric value for lower bound on MTR for the treated group. By default, this will be set to the smallest value of the observed outcome in the estimation sample.

m0.lb

numeric value for lower bound on MTR for the control group. By default, this will be set to the smallest value of the observed outcome in the estimation sample.

mte.ub

numeric value for upper bound on treatment effect parameter of interest.

mte.lb

numeric value for lower bound on treatment effect parameter of interest.

m0.dec

logical, set to FALSE by default. Set equal to TRUE if the MTR for the control group should be weakly monotone decreasing.

m0.inc

logical, set to FALSE by default. Set equal to TRUE if the MTR for the control group should be weakly monotone increasing.

m1.dec

logical, set to FALSE by default. Set equal to TRUE if the MTR for the treated group should be weakly monotone decreasing.

m1.inc

logical, set to FALSE by default. Set equal to TRUE if the MTR for the treated group should be weakly monotone increasing.

mte.dec

logical, set to FALSE by default. Set equal to TRUE if the MTE should be weakly monotone decreasing.

mte.inc

logical, set to FALSE by default. Set equal to TRUE if the MTE should be weakly monotone increasing.

equal.coef

one-sided formula to indicate which terms in m0 and m1 should be constrained to have the same coefficients. These terms therefore have no effect on the MTE.

ivlike

formula or vector of formulas specifying the regressions for the IV-like estimands. Which coefficients to use to define the constraints determining the treatment effect bounds (alternatively, the moments determining the treatment effect point estimate) can be selected in the argument components. If no argument is passed, then a linear regression will be performed to estimate the MTR coefficients.

components

a list of vectors of the terms in the regression specifications to include in the set of IV-like estimands. No terms should be in quotes. To select the intercept term, include the name intercept. If the factorized counterpart of a variable is included in the IV-like specifications, e.g. factor(x) where x = 1, 2, 3, the user can select the coefficients for specific factors by declaring the components factor(x)-1, factor(x)-2, factor(x)-3. See l on how to input the argument. If no components for a IV specification are given, then all coefficients from that IV specification will be used to define constraints in the partially identified case, or to define moments in the point identified case.

subset

a single subset condition or list of subset conditions corresponding to each regression specified in ivlike. The input must be logical. See l on how to input the argument. If the user wishes to select specific rows, construct a binary variable in the data set, and set the condition to use only those observations for which the binary variable is 1, e.g. the binary variable is use, and the subset condition is use == 1.

propensity

formula or variable name corresponding to propensity to take up treatment. If a formula is declared, then the function estimates the propensity score according to the formula and link specified in link. If a variable name is declared, then the corresponding column in the data is taken as the vector of propensity scores. A variable name can be passed either as a string (e.g propensity = 'p'), a variable (e.g. propensity = p), or a one-sided formula (e.g. propensity = ~p).

link

character, name of link function to estimate propensity score. Can be chosen from 'linear', 'probit', or 'logit'. Default is set to 'logit'. The link should be provided with quoation marks.

treat

variable name for treatment indicator. The name can be provided with or without quotation marks.

outcome

variable name for outcome variable. The name can be provided with or without quotation marks.

solver

character, name of the programming package in R used to obtain the bounds on the treatment effect. The function supports 'gurobi', 'cplexapi', rmosek, 'lpsolveapi'. The name of the solver should be provided with quotation marks.

solver.options

list, each item of the list should correspond to an option specific to the solver selected.

solver.presolve

boolean, default set to TRUE. Set this parameter to FALSE if presolve should be turned off for the LP/QCQP problems.

solver.options.criterion

list, each item of the list should correspond to an option specific to the solver selected. These options are specific for finding the minimum criterion.

solver.options.bounds

list, each item of the list should correspond to an option specific to the solver selected. These options are specific for finding the bounds.

lpsolver

character, deprecated argument for lpsolver.

lpsolver.options

list, deprecated argument for solver.options.

lpsolver.presolve

boolean, deprecated argument for solver.presolve.

lpsolver.options.criterion

list, deprecated argument for solver.options.criterion.

lpsolver.options.bounds

list, deprecated argument for solver.options.bounds.

criterion.tol

tolerance for the criterion function, and is set to 1e-4 by default. The criterion measures how well the IV-like moments/conditional means are matched using the l1-norm. Statistical noise may prohibit the theoretical LP/QCQP problem from being feasible. That is, there may not exist a set of MTR coefficients that are able to match all the specified moments. The function thus first estimates the minimum criterion, which is reported in the output under the name 'minimum criterion', with a criterion of 0 meaning that all moments were able to be matched. The function then relaxes the constraints by tolerating a criterion up to minimum criterion * (1 + criterion.tol). Set criterion.tol to a value greater than 0 to allow for more conservative bounds.

initgrid.nx

integer determining the number of points of the covariates used to form the initial constraint grid for imposing shape restrictions on the MTRs.

initgrid.nu

integer determining the number of points in the open interval (0, 1) drawn from a Halton sequence. The end points 0 and 1 are additionally included. These points are always a subset of the points defining the audit grid (see audit.nu). These points are used to form the initial constraint grid for imposing shape restrictions on the u components of the MTRs.

audit.nx

integer determining the number of points on the covariates space to audit in each iteration of the audit procedure.

audit.nu

integer determining the number of points in the open interval (0, 1) drawn from a Halton sequence. The end points 0 and 1 are additionally included. These points are used to audit whether the shape restrictions on the u components of the MTRs are satisfied. The initial grid used to impose the shape constraints in the LP/QCQP problem are constructed from a subset of these points.

audit.add

maximum number of points to add to the initial constraint grid for imposing each kind of shape constraint. For example, if there are 5 different kinds of shape constraints, there can be at most audit.add * 5 additional points added to the constraint grid.

audit.max

maximum number of iterations in the audit procedure.

audit.tol

feasibility tolerance when performing the audit. By default to set to be 1e-06, which is equal to the default feasibility tolerances of Gurobi (solver = "gurobi"), CPLEX (solver = "cplexapi"), and Rmosek (solver = "rmosek"). This parameter should only be changed if the feasibility tolerance of the solver is changed, or if numerical issues result in discrepancies between the solver's feasibility check and the audit.

rescale

boolean, set to TRUE by default. This rescalels the MTR components to improve stability in the LP/QCQP optimization.

point

boolean. Set to TRUE if it is believed that the treatment effects are point identified. If set to TRUE and IV-like formulas are passed, then a two-step GMM procedure is implemented to estimate the treatment effects. Shape constraints on the MTRs will be ignored under point identification. If set to TRUE and the regression-based criteria is used instead, then OLS will be used to estimate the MTR coefficients used to estimate the treatment effect. If not declared, then the function will determine whether or not the target parameter is point identified.

point.eyeweight

boolean, default set to FALSE. Set to TRUE if the GMM point estimate should use the identity weighting matrix (i.e. one-step GMM).

bootstraps

integer, default set to 0. This determines the number of bootstraps used to perform statistical inference.

bootstraps.m

integer, default set to size of data set. Determines the size of the subsample drawn from the original data set when performing inference via the bootstrap. This option applies only to the case of constructing confidence intervals for treatment effect bounds, i.e. it does not apply when point = TRUE.

bootstraps.replace

boolean, default set to TRUE. This determines whether the resampling procedure used for inference will sample with replacement.

levels

vector of real numbers between 0 and 1. Values correspond to the level of the confidence intervals constructed via bootstrap.

ci.type

character, default set to 'both'. Set to 'forward' to construct the forward confidence interval for the treatment effect bound. Set to 'backward' to construct the backward confidence interval for the treatment effect bound. Set to 'both' to construct both types of confidence intervals.

specification.test

boolean, default set to TRUE. Function performs a specification test for the partially identified case when bootstraps > 0.

noisy

boolean, default set to TRUE. If TRUE, then messages are provided throughout the estimation procedure. Set to FALSE to suppress all messages, e.g. when performing the bootstrap.

smallreturnlist

boolean, default set to FALSE. Set to TRUE to exclude large intermediary components (i.e. propensity score model, LP/QCQP model, bootstrap iterations) from being included in the return list.

debug

boolean, indicates whether or not the function should provide output when obtaining bounds. The option is only applied when solver = 'gurobi' or solver = 'rmosek'. The output provided is the same as what the Gurobi API would send to the console.

Value

Returns a list of results from throughout the estimation procedure. This includes all IV-like estimands; the propensity score model; bounds on the treatment effect; the estimated expectations of each term in the MTRs; the components and results of the LP/QCQP problem.

Details

When the function is used to estimate bounds, and statistical inference is not performed, the function returns the following objects.

audit.count: the number of audits required until there were no more violations; or the number of audits performed before the audit procedure was terminated.
audit.criterion: the minimum criterion.
audit.grid: a list containing the points used to define the audit grid, as well as a table of points where the shape constraints were violated.
bounds: a vector with the estimated lower and upper bounds of the target treatment effect.
call.options: a list containing all the model specifications and call options generating the results.
gstar: a list containing the estimate of the weighted means for each component in the MTRs. The weights are determined by the target parameter declared in target, or the weights defined by target.weight1, target.knots1, target.weight0, target.knots0.
gstar.coef: a list containing the coefficients on the treated and control group MTRs.
gstar.weights: a list containing the target weights used to estimate gstar.
result: a list containing the LP/QCQP model, and the full output from solving the problem.
solver: the solver used in estimation.
moments: the number of elements in the S-set used to generate achieve (partial) identification.
propensity: the propensity score model. If a variable is fed to the propensity argument when calling ivmte, then the returned object is a list containing the name of variable given by the user, and the values of that variable used in estimation.
s.set: a list of all the coefficient estimates and weights corresponding to each element in the S-set.
splines.dict: a list including the specifications of each spline declared in each MTR.
messages: a vector of character strings logging the output of the estimation procedure.

If bootstraps is greater than 0, then statistical inference will be performed and the output will additionally contain the following objects.

bootstraps: the number of bootstraps.
bootstraps.failed: the number of bootstraps that failed (e.g. due to collinearity) and had to be repeated.
bounds.bootstraps: the estimates of the bounds from every bootstrap draw.
bounds.ci: forward and/or backward confidence intervals for the bound estimates at the levels specified in levels.
bounds.se: bootstrap standard errors on the lower and upper bound estimates.
p.value: p-value for the estimated bounds. p-values are constructed by finding the level at which the confidence interval no longer contains 0.
propensity.ci: confidence interval for coefficient estimates of the propensity score model.
propensity.se: standard errors for the coefficient estimates of the propensity score model.
specification.p.value: p-value from a specification test. The specification test is only performed if the minimum criterion is not 0.

If point = TRUE and bootstraps = 0, then point estimation is performed using two-step GMM. The output will contain the following objects.

j.test: test statistic and results from the asymptotic J-test.
moments: a vector. Each element is the GMM criterion for each moment condition used in estimation.
mtr.coef: coefficient estimates for the MTRs.
point.estimate: point estimate of the treatment effect.
redundant: indexes for the moment conditions (i.e. elements in the S set) that were linearly independent and could be dropped.

If point = TRUE and bootstraps is not 0, then point estimation is performed using two-step GMM, and additional statistical inference is performed using the bootstrap samples. The output will contain the following additional objects.

bootstraps: the number of bootstraps.
bootstraps.failed: the number of bootstraps that failed (e.g. due to collinearity) and had to be repeated.
j.test: test statistic and result from the J-test performed using the bootstrap samples.
j.test.bootstraps: J-test statistic from each bootstrap.
mtr.bootstraps: coefficient estimates for the MTRs from each bootstrap sample. These are used to construct the confidence intervals and standard errors for the MTR coefficients.
mtr.ci: confidence intervals for each MTR coefficient.
mtr.se: standard errors for each MTR coefficient estimate.
p.value: p-value for the treatment effect point estimate estimated using the bootstrap.
point.estimate.bootstraps: treatment effect point estimate from each bootstrap sample. These are used to construct the confidence interval, standard error, and p-value for the treatment effect.
point.estimate.ci: confidence interval for the treatment effect.
point.estimate.se: standard error for the treatment effect estimate.
propensity.ci: confidence interval for the coefficients in the propensity score model, constructed using the bootstrap.
propensity.se: standard errors for the coefficient estimates of the propensity score model.

Examples

Run this code

# NOT RUN {
dtm <- ivmte:::gendistMosquito()

ivlikespecs <- c(ey ~ d | z,
                 ey ~ d | factor(z),
                 ey ~ d,
                 ey ~ d | factor(z))
jvec <- l(d, d, d, d)
svec <- l(, , , z %in% c(2, 4))

ivmte(ivlike = ivlikespecs,
      data = dtm,
      components = jvec,
      propensity = d ~ z,
      subset = svec,
      m0 = ~  u + I(u ^ 2),
      m1 = ~  u + I(u ^ 2),
      uname = u,
      target = "att",
      m0.dec = TRUE,
      m1.dec = TRUE,
      bootstraps = 0,
      solver = "lpSolveAPI")

# }

Run the code above in your browser using DataLab