ivmteEstimate: Single iteration of estimation procedure from Mogstad, Torgovitsky, Santos (2018)

Description

This function estimates bounds on treatment effect parameters, following the procedure described in Mogstad, Torgovitsky (2017). Of the target parameters, the user can choose from the ATE, ATT, ATU, LATE, and generalized LATE. The user is required to provide a polynomial expression for the marginal treatment responses (MTR), as well as a set of regressions. By restricting the set of coefficients on each term of the MTRs to be consistent with the regression estimates, the function is able to restrict itself to a set of MTRs. The bounds on the treatment effect parameter correspond to finding coefficients on the MTRs that maximize their average difference.

Usage

ivmteEstimate(data, target, late.Z, late.from, late.to, late.X, eval.X,
  genlate.lb, genlate.ub, target.weight0, target.weight1,
  target.knots0 = NULL, target.knots1 = NULL, m0, m1, uname = u,
  m1.ub, m0.ub, m1.lb, m0.lb, mte.ub, mte.lb, m0.dec, m0.inc, m1.dec,
  m1.inc, mte.dec, mte.inc, ivlike, components, subset, propensity,
  link = "logit", treat, lpsolver, criterion.tol = 0,
  initgrid.nx = 20, initgrid.nu = 20, audit.nx = 2500,
  audit.nu = 25, audit.add = 100, audit.max = 25,
  audit.grid = NULL, save.grid = FALSE, point = FALSE,
  point.eyeweight = FALSE, point.center = NULL,
  point.redundant = NULL, count.moments = TRUE, orig.sset = NULL,
  orig.criterion = NULL, vars_y, vars_mtr, terms_mtr0, terms_mtr1,
  vars_data, splinesobj, noisy = TRUE, smallreturnlist = FALSE,
  seed = 12345, debug = FALSE, environments)

Arguments

data

data.frame or data.table used to estimate the treatment effects.

target

character, target parameter to be estimated. Currently function allows for ATE ('ate'), ATT ('att'), ATU ('atu'), LATE ('late'), and generalized LATE ('genlate').

late.Z

vector of variable names used to define the LATE.

late.from

baseline set of values of Z used to define the LATE.

late.to

comparison set of values of Z used to define the LATE.

late.X

vector of variable names of covariates which we condition on when defining the LATE.

eval.X

numeric vector of the values at which we condition variables in late.X on when estimating the LATE.

genlate.lb

lower bound value of unobservable u for estimating the generalized LATE.

genlate.ub

upper bound value of unobservable u for estimating the generalized LATE.

target.weight0

user-defined weight function for the control group defining the target parameter. A list of functions can be submitted if the weighting function is in fact a spline. The arguments of the function should be variable names in data. If the weight is constant across all observations, then the user can instead submit the value of the weight instead of a function.

target.weight1

user-defined weight function for the treated group defining the target parameter. See target.weight0 for details.

target.knots0

user-defined set of functions defining the knots associated with spline weights for the control group. The arguments of the function should consist only of variable names in data. If the knots are constant across all observations, then the user can instead submit the vector of knots instead of a function.

target.knots1

user-defined set of functions defining the knots associated with spline weights for the treated group. See target.knots0 for details.

one-sided formula for the marginal treatment response function for the control group. Splines may also be incorporated using the expression uSpline, e.g. uSpline(degree = 2, knots = c(0.4, 0.8), intercept = TRUE). The intercept argument may be omitted, and is set to TRUE by default.

one-sided formula for marginal treatment response function for treated group. Splines can also be incorporated using the expression "uSplines(degree, knots, intercept)". The intercept argument may be omitted, and is set to TRUE by default.

uname

variable name for the unobservable used in declaring the MTRs. The name can be provided with or without quotation marks.

m1.ub

numeric value for upper bound on MTR for the treated group. By default, this will be set to the largest value of the observed outcome in the estimation sample.

m0.ub

numeric value for upper bound on MTR for the control group. By default, this will be set to the largest value of the observed outcome in the estimation sample.

m1.lb

numeric value for lower bound on MTR for the treated group. By default, this will be set to the smallest value of the observed outcome in the estimation sample.

m0.lb

numeric value for lower bound on MTR for the control group. By default, this will be set to the smallest value of the observed outcome in the estimation sample.

mte.ub

numeric value for upper bound on treatment effect parameter of interest.

mte.lb

numeric value for lower bound on treatment effect parameter of interest.

m0.dec

logical, set to FALSE by default. Set equal to TRUE if the MTR for the control group should be weakly monotone decreasing.

m0.inc

logical, set to FALSE by default. Set equal to TRUE if the MTR for the control group should be weakly monotone increasing.

m1.dec

logical, set to FALSE by default. Set equal to TRUE if the MTR for the treated group should be weakly monotone decreasing.

m1.inc

logical, set to FALSE by default. Set equal to TRUE if the MTR for the treated group should be weakly monotone increasing.

mte.dec

logical, set to FALSE by default. Set equal to TRUE if the MTE should be weakly monotone decreasing.

mte.inc

logical, set to FALSE by default. Set equal to TRUE if the MTE should be weakly monotone increasing.

ivlike

formula or vector of formulas specifying the regressions for the IV-like estimands. Which coefficients to use to define the constraints determining the treatment effect bounds (alternatively, the moments determining the treatment effect point estimate) can be selected in the argument components.

components

a list of vectors of the terms in the regression specifications to include in the set of IV-like estimands. No terms should be in quotes. To select the intercept term, include the name intercept. If the factorized counterpart of a variable is included in the IV-like specifications, e.g. factor(x) where x = 1, 2, 3, the user can select the coefficients for specific factors by declaring the components factor(x)-1, factor(x)-2, factor(x)-3. See l on how to input the argument. If no components for a IV specification are given, then all coefficients from that IV specification will be used to define constraints in the partially identified case, or to define moments in the point identified case.

subset

a single subset condition or list of subset conditions corresponding to each regression specified in ivlike. The input must be logical. See l on how to input the argument. If the user wishes to select specific rows, construct a binary variable in the data set, and set the condition to use only those observations for which the binary variable is 1, e.g. the binary variable is use, and the subset condition is use == 1.

propensity

formula or variable name corresponding to propensity to take up treatment. If a formula is declared, then the function estimates the propensity score according to the formula and link specified in link. If a variable name is declared, then the corresponding column in the data is taken as the vector of propensity scores. A variable name can be passed either as a string (e.g propensity = 'p'). , a variable (e.g. propensity = p), or a one-sided formula (e.g. propensity = ~p.

link

character, name of link function to estimate propensity score. Can be chosen from 'linear', 'probit', or 'logit'. Default is set to 'logit'.

treat

variable name for treatment indicator. The name can be provided with or without quotation marks.

lpsolver

character, name of the linear programming package in R used to obtain the bounds on the treatment effect. The function supports 'gurobi', 'cplexapi', 'lpsolveapi'.

criterion.tol

tolerance for violation of observational equivalence, set to 0 by default. Statistical noise may prohibit the theoretical LP problem from being feasible. That is, there may not exist a set of coefficients on the MTR that are observationally equivalent with regard to the IV-like regression coefficients. The function therefore first estimates the minimum violation of observational equivalence. This is reported in the output under the name 'minimum criterion'. The constraints in the LP problem pertaining to observational equivalence are then relaxed by the amount minimum criterion * (1 + criterion.tol). Set criterion.tol to a value greater than 0 to allow for more conservative bounds.

initgrid.nx

integer determining the number of points of the covariates used to form the initial constraint grid for imposing shape restrictions on the MTRs.

initgrid.nu

integer determining the number of evenly spread points in the interval [0, 1] of the unobservable u used to form the initial constraint grid for imposing shape restrictions on the MTRs.

audit.nx

integer determining the number of points on the covariates space to audit in each iteration of the audit procedure.

audit.nu

integer determining the number of points in the interval [0, 1], corresponding to space of unobservable u, to audit in each iteration of the audit procedure.

audit.add

maximum number of points to add to the initial constraint grid for imposing each kind of shape constraint. For example, if there are 5 different kinds of shape constraints, there can be at most audit.add * 5 additional points added to the constraint grid.

audit.max

maximum number of iterations in the audit procedure.

audit.grid

list, contains the A A matrix used in the audit for the original sample, as well as the RHS vector used in the audit from the original sample.

save.grid

boolean, set to FALSE by default. Set to true if the fine grid from the audit should be saved. This option is used for inference procedure under partial identification, which uses the fine grid from the original sample in all bootstrap resamples.

point

boolean, default set to FALSE. Set to TRUE if it is believed that the treatment effects are point identified. If set to TRUE, then a two-step GMM procedure is implemented to estimate the treatment effects. Shape constraints on the MTRs will be ignored under point identification.

point.eyeweight

boolean, default set to FALSE. Set to TRUE if the GMM point estimate should use the identity weighting matrix (i.e. one-step GMM).

point.center

numeric, a vector of GMM moment conditoins evaluated at a solution. When bootstrapping, the moment conditions from the original sample can be passed through this argument to recenter the bootstrap distribution of the J-statistic.

point.redundant

vector of integers indicating which components in the S-set are redundant.

count.moments

boolean, indicate if number of linearly independent moments should be counted.

orig.sset

list, only used for bootstraps. The list caontains the gamma moments for each element in the S-set, as well as the IV-like coefficients.

orig.criterion

numeric, only used for bootstraps. The scalar corresponds to the minimum observational equivalence criterion from the original sample.

vars_y

character, variable name of observed outcome variable.

vars_mtr

character, vector of variables entering into m0 and m1.

terms_mtr0

character, vector of terms entering into m0.

terms_mtr1

character, vector of terms entering into m1.

vars_data

character, vector of variables that can be found in the data.

splinesobj

list of spline components in the MTRs for treated and control groups. Spline terms are extracted using removeSplines. This object is supposed to be a dictionary of splines, containing the original calls of each spline in the MTRs, their specifications, and the index used for renaming each component.

noisy

boolean, default set to TRUE. If TRUE, then messages are provided throughout the estimation procedure. Set to FALSE to suppress all messages, e.g. when performing the bootstrap.

smallreturnlist

boolean, default set to FALSE. Set to TRUE to exclude large intermediary components (i.e. propensity score model, LP model, bootstrap iterations) from being included in the return list.

seed

integer, the seed that determines the random grid in the audit procedure.

debug

boolean, indicates whether or not the function should provide output when obtaining bounds. The option is only applied when lpsolver = 'gurobi'. The output provided is the same as what the Gurobi API would send to the console.

environments

a list containing the environments of the MTR formulas, the IV-like formulas, and the propensity score formulas. If a formulas is not provided, and thus no environment can be found, then the parent.frame() is assigned by default.

Value

Returns a list of results from throughout the estimation procedure. This includes all IV-like estimands; the propensity score model; bounds on the treatment effect; the estimated expectations of each term in the MTRs; the components and results of the LP problem.

Details

The estimation procedure relies on the propensity to take up treatment. The propensity scores can either be estimated as part of the estimation procedure, or the user can specify a variable in the data set already containing the propensity scores.

Constraints on the shape of the MTRs and marginal treatment effects (MTE) can be imposed by the user, also. Specifically, bounds and monotonicity restrictions are permitted. These constraints are only enforced over a subset of the data. However, an audit procedure randomly selects points outside of this subset to determine whether or not the constraints hold. The user can specify how stringent this audit procedure is using the function arguments.