assess: Assess models with regression

Description

Fit ordinary least squares (OLS) and logistic models. And fit models for causal inference such as differences-in-differences and interrupted time series. Run these models to evaluate program performance or test intervention effects (e.g., healthcare programs). Options are available for top coding the outcome variable as well as propensity scores. New data can optionally be returned that has these additional variables and constructed variables that are used for DID and ITS models.

Usage

assess(
  formula,
  data,
  regression = "none",
  did = "none",
  its = "none",
  intervention = NULL,
  int.time = NULL,
  treatment = NULL,
  interrupt = NULL,
  topcode = NULL,
  propensity = NULL,
  newdata = FALSE
)

Value

a list of results from selected regression models. Will return new data if selected. And returns relevant model information such as variable names, type of analysis, formula, study information, and summary of ITS effects if analyzed.

Arguments

formula: a formula object. Use 'Y ~ .' in DID and ITS models to only specify the constructed model variables (e.g., right side of the DID model: Y ~ Post.All + Int.Var + DID). If regression=ols or regression=logistic, 'Y ~ .' will use all variables in the data.frame as is standard in formulas.
data: a data.frame in which to interpret the variables named in the formula.
regression: Select a regression method for standard regression models (i.e., neither DID nor ITS). Options are regression="ols" (ordinary least squares AKA linear) or regression="logistic". Default is regression="none" for no standard regression model.
did: option for Differences-in-Differences (DID) regression. Select did="two" for models with only 2 time points (e.g., pre/post-test). Select did="many" for >= 3 time points (e.g., monthly time points in 12 months of data). Default is did="none" for no DID.
its: option for Interrupted Time Series (ITS) regression. Select its="one" for one group (e.g., intervention only). Select its="two" for two groups (intervention and control). Default is did="none" for no ITS.
intervention: optional intervention variable name selected for DID, ITS, and propensity score models that indicate which cases are in the intervention or not.
int.time: optional intervention time variable name selected for DID or ITS models. This indicates the duration of time relative to when the intervention started.
treatment: optional treatment start period variable name selected for DID models. Select 1 value from 'int.time' to indicate the start of the intervention.
interrupt: optional interruption (or intervention) period(s) variable name selected for ITS models. Select 1 or 2 values from 'int.time' to indicate the start and/or key intervention periods.
topcode: optional value selected to top code Y (or left-hand side) of the formula. Analyses will be performed using the new top coded variable.
propensity: optional character vector of variable names to perform a propensity score model. This requires the 'intervention' option to be selected. All models will include 'pscore' (propensity score) in the analysis as a covariate adjustment using the propensity score.
newdata: optional logical value that indicates if you want the new data returned. newdata=TRUE will return the data with any new columns created from the DID, ITS, propensity score, or top coding. The default is newdata=FALSE. No new data will be returned if none was created.

References

Angrist, J. D., & Pischke, J. S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press. ISBN: 9780691120355.

Linden, A. (2015). Conducting Interrupted Time-series Analysis for Single- and Multiple-group Comparisons. The Stata Journal, 15, 2, 480-500. https://doi.org/10.1177/1536867X1501500208

Examples

Run this code

# ordinary least squares R^2
summary(assess(hp ~ mpg+wt, data=mtcars, regression="ols")$model)

# logistic
summary(assess(formula=vs~mpg+wt+hp, data=mtcars, regression="logistic")$model)

# OLS with a propensity score
summary(assess(formula=los ~ month+program, data=hosprog, intervention = "program",
regression="ols", propensity=c("female","age","risk"))$model)

# OLS: top coding los at 8.27 and propensity score means (top.los and pscore)
summary(assess(formula=los ~ month+program, data=hosprog, intervention = "program",
regression="ols", topcode=8.27, propensity=c("female","age","risk"),
newdata=TRUE)$newdata[, c("los", "top.los", "pscore")])

# differences-in-differences model: using 2 time periods, pre- and post-intervention
summary(assess(formula=los ~ ., data=hosprog, intervention = "program",
int.time="month", treatment = 5, did="two")$DID)

# DID model: using time points
summary(assess(formula=los ~ ., data=hosprog, intervention = "program",
int.time="month", treatment = 5, did="many")$DID)

#interrupted time series model: two groups and 1 interruption (interrupt= 5)
summary(assess(formula=los ~ ., data=hosprog, intervention = "program",
int.time="month", its="two", interrupt = 5)$ITS)

#interrupted time series model: two groups and 2 interruptions (interrupt= c(5,9))
summary(assess(formula=los ~ ., data=hosprog, intervention = "program",
int.time="month", its="two", interrupt = c(5,9))$ITS)

Run the code above in your browser using DataLab