assess: Assess models with regression

Description

Fit ordinary least squares (OLS) and logistic models. And fit models for causal inference such as differences-in-differences and interrupted time series. Run these models to evaluate program performance or test intervention effects (e.g., healthcare programs). Options are available for top coding the outcome variable as well as propensity scores. New data can optionally be returned that has these additional variables and constructed variables that are used for DID and ITS models.

Usage

assess(
  formula,
  data,
  regression = "none",
  did = "none",
  its = "none",
  intervention = NULL,
  int.time = NULL,
  treatment = NULL,
  interrupt = NULL,
  subset = NULL,
  stagger = NULL,
  topcode = NULL,
  propensity = NULL,
  newdata = FALSE
)

Value

a list of results from selected regression models. Will return new data if selected. And returns relevant model information such as variable names, type of analysis, formula, study information, and summary of ITS effects if analyzed.

Arguments

formula: a formula object. Use 'Y ~ .' in DID and ITS models to only specify the constructed model variables (e.g., right side of the DID model: Y ~ Post.All + Int.Var + DID). If regression=ols or regression=logistic, 'Y ~ .' will use all variables in the data.frame as is standard in formulas.
data: a data.frame in which to interpret the variables named in the formula.
regression: Select a regression method for standard regression models (i.e., neither DID nor ITS). Options are regression="ols" (ordinary least squares AKA linear) or regression="logistic". Default is regression="none" for no standard regression model.
did: option for Differences-in-Differences (DID) regression. Select did="two" for models with only 2 time points (e.g., pre/post-test). Select did="many" for >= 3 time points (e.g., monthly time points in 12 months of data). Default is did="none" for no DID.
its: option for Interrupted Time Series (ITS) regression. Select its="one" for one group (e.g., intervention only). Select its="two" for two groups (intervention and control). Default is did="none" for no ITS.
intervention: optional intervention variable name selected for DID, ITS, and propensity score models that indicate which cases are in the intervention or not.
int.time: optional intervention time variable name selected for DID or ITS models. This indicates the duration of time relative to when the intervention started.
treatment: optional treatment start period variable name selected for DID models. Select 1 value from 'int.time' to indicate the start of the intervention.
interrupt: optional interruption (or intervention) period(s) variable name selected for ITS models. Select 1 or more values from 'int.time' to indicate the start and/or key intervention periods. There needs to be at least 2 time points per period, at least 3 is better. For example, interrupt= c(3, 5, 7) will suffice, especially if you want to isolate certain periods but interrupt= c(3, 6, 9) may provide more useful information.
subset: an expression defining a subset of the observations to use in the regression model. The default is NULL, thereby using all observations. Specify, for example, data$hospital == "NY" or c(1:100,200:300) respectively to use just those observations. This is helpful when doing a submodel for DID or ITS after identifying similar groups. DID and ITS models could be improved by limiting the choice of control groups to only those with similar values on the intervention indicator and baseline trend variable (e.g., 'ITS.Time' and 'ITS.Int') with p-values >= 0.10.
stagger: optional list to indicate staggered entry into the intervention or treatment group. Relevant model variables are re-coded to appropriate values and can be used for a form of 'stacked' DID or ITS. If a group of cases joins X months after the primary sample, model variables are adjusted X months. This three element list named: 'a' = a character vector for the name of the grouping column; 'b' = specific categories or levels that indicate which cases have a staggered entry; and 'c' = the time point values at staggered entry. Both 'b' and 'c' must have identical lengths. For ITS models, the staggered entry time must be: interrupt 1 < stagger time < interrupt 2. For example, a WHO health policy may have started in the 3rd year of the study period in NY and Toronto but Chicago and LA joined 6 and 12 months later, therefore stagger= list(a= 'city', b=c('Chicago', 'LA')), c=(30, 36) while interrupt= 25. Default is NULL.
topcode: optional value selected to top code Y (or left-hand side) of the formula. Analyses will be performed using the new top coded variable.
propensity: optional character vector of variable names to perform a propensity score model. This requires the 'intervention' option to be selected. All models will include 'pscore' (propensity score) in the analysis as a covariate adjustment using the propensity score.
newdata: optional logical value that indicates if you want the new data returned. newdata=TRUE will return the data with any new columns created from the DID, ITS, propensity score, or top coding. The default is newdata=FALSE. No new data will be returned if none was created.

References

Angrist, J. D., & Pischke, J. S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press. ISBN: 9780691120355.

Gebski, V., et al. (2012). Modelling Interrupted Time Series to Evaluate Prevention and Control of Infection in Healthcare. Epidemiology & Infections, 140, 2131–2141. https://doi.org/10.1017/S0950268812000179

Linden, A. (2015). Conducting Interrupted Time-series Analysis for Single- and Multiple-group Comparisons. The Stata Journal, 15, 2, 480-500. https://doi.org/10.1177/1536867X1501500208

Examples

Run this code

# ordinary least squares R^2
summary(assess(hp ~ mpg+wt, data=mtcars, regression="ols")$model)

# logistic
summary(assess(formula=vs~mpg+wt+hp, data=mtcars, regression="logistic")$model)

# OLS with a propensity score
summary(assess(formula=los ~ month+program, data=hosprog, intervention = "program",
regression="ols", propensity=c("female","age","risk"))$model)

# OLS: top coding los at 8.27 and propensity score means (top.los and pscore)
summary(assess(formula=los ~ month+program, data=hosprog, intervention = "program",
regression="ols", topcode=8.27, propensity=c("female","age","risk"),
newdata=TRUE)$newdata[, c("los", "top.los", "pscore")])

# differences-in-differences model: using 2 time periods, pre- and post-intervention
summary(assess(formula=los ~ ., data=hosprog, intervention = "program",
int.time="month", treatment = 5, did="two")$DID)

# DID model: using time points
summary(assess(formula=los ~ ., data=hosprog, intervention = "program",
int.time="month", treatment = 5, did="many")$DID)

#interrupted time series model: two groups and 1 interruption (interrupt= 5)
summary(assess(formula=los ~ ., data=hosprog, intervention = "program",
int.time="month", its="two", interrupt = 5)$ITS)

#interrupted time series model: two groups and 2 interruptions (interrupt= c(5,9))
summary(assess(formula=los ~ ., data=hosprog, intervention = "program",
int.time="month", its="two", interrupt = c(5,9))$ITS)

Run the code above in your browser using DataLab