Learn R Programming

tidysynth (version 0.2.1)

generate_predictor: generate_predictor

Description

Create one or more scalar variables summarizing covariate data across a specified time window. These predictor variables are used to fit the synthetic control.

Usage

generate_predictor(data, time_window = NULL, ...)

Value

tbl_df with nested fields containing the following:

  • .id: unit id for the intervention case (this will differ when a placebo unit).

  • .placebo: indicator field taking on the value of 1 if a unit is a placebo unit, 0 if it's the specified treated unit.

  • .type: type of the nested data construct: treated or controls. Keeps tract of which data construct is located in .outcome field.

  • .outcome: nested data construct containing the outcome variable configured for the sythnetic control method. Data is configured into a wide format for the optimization task.

  • .predictors: nested data construct containing the covariate matrices for the treated and control (donor) units. Data is configured into a wide format for the optimization task.

  • .original_data: original impute data filtered by treated or control units. This allows for easy processing down stream when generating predictors.

  • .meta: stores information regarding the unit and time index, the treated unit and time and the name of the outcome variable. Used downstream in subsequent functions.

Arguments

data

nested data of type tbl_df generated from synthetic_control(). See synthetic_control() documentation for more information.

time_window

set time window from the pre-intervention period that the data should be aggregated across to generate the specific predictor. Default is to use the entire pre-intervention period.

...

Name-value pairs of summary functions. The name will be the name of the variable in the result. The value should be an expression that returns a single value like min(x), n(), or sum(is.na(y)). Note that for all summary functions na.rm = TRUE argument should be specified as aggregating across units with missing values is a common occurrence.

Details

matrices of aggregate-level covariates to be used in the following minimization task.

$$W^*(V) = min \sum^M_{m=1} v_m (X_{1m} - \sum^{J+1}_{j=2}w_j X_{jm})^2$$

The importance of the generate predictors are determine by vector \(V\), and the weights that determine unit-level importance are determined by vector \(W\). The nested optimation task seeks to find optimal values of \(V\) and \(W\). Note also that \(V\) can be provided by the user. See ?generate_weights().

Examples

Run this code

# \donttest{

# Smoking example data
data(smoking)

smoking_out <-
smoking %>%

# initial the synthetic control object
synthetic_control(outcome = cigsale,
                  unit = state,
                  time = year,
                  i_unit = "California",
                  i_time = 1988,
                  generate_placebos= FALSE) %>%

# Generate the aggregate predictors used to generate the weights
  generate_predictor(time_window=1980:1988,
                     lnincome = mean(lnincome, na.rm = TRUE),
                     retprice = mean(retprice, na.rm = TRUE),
                     age15to24 = mean(age15to24, na.rm = TRUE))

# Extract respective predictor matrices
smoking_out %>% grab_predictors(type = "treated")
smoking_out %>% grab_predictors(type = "controls")

# }

Run the code above in your browser using DataLab