csregion: Filter the data based on common support region

Description

The csregion() function estimates the boundaries of the rectangular common support region, as defined by Lopez and Gutman (2017), and filters the matrix of generalized propensity scores based on these boundaries. The function returns a matrix of observations whose generalized propensity scores lie within the treatment group-specific boundaries.

Usage

csregion(gps_matrix, borders = "include")

Value

A numeric matrix similar to the one returned by estimate_gps(), but with the number of rows reduced to exclude those observations that do not fit within the common support region (CSR) boundaries. The returned object also possesses additional attributes that summarize the calculation process of the CSR boundaries:

filter_matrix - A logical matrix with the same dimensions as the gps-part of gps_matrix, indicating which treatment assignment probabilities fall within the CSR boundaries,
filter_vector - A vector indicating whether each observation was kept (TRUE) or removed (FALSE), essentially a row-wise sum of filter_matrix,
csr_summary - A summary of the CSR calculation process, including details of the boundaries and the number of observations filtered.
csr_data - The original dataset used for the estimation of generalized propensity scores (original_data attribute of the gps object) filtered by the filter_vector

Arguments

gps_matrix: An object of classes gps and data.frame (e.g., created by the estimate_gps() function). The first column corresponds to the treatment or grouping variable, while the other columns represent the treatment assignment probabilities calculated separately for each hypotetical treatment group. The number of columns should therefore be equal to the number of unique levels of the treatment variable plus one (for the treatment variable itself). The number of rows should correspond to the number of subjects for which generalized propensity scores were estimated.
borders: A character string specifying how to handle observations at the edges of the Common Support Region (CSR). Acceptable values are "include" and "exclude". If "include" is selected (default), observations with Generalized Propensity Scores (GPS) exactly equal to the CSR boundaries are retained for further analysis. This corresponds to a non-strict inequality: lower_bound <= GPS <= upper_bound. If "exclude" is selected, observations lying exactly on the CSR boundaries are removed. This corresponds to a strict inequality: lower_bound < GPS < upper_bound. Using "exclude" will typically result in a slightly smaller matched sample size compared to "include", but may be preferred for more conservative matching.

Examples

Run this code

# We could estimate simples generalized propensity scores for the `iris`
# dataset
gps <- estimate_gps(Species ~ Sepal.Length, data = iris)

# And then define the common support region boundaries using `csregion()`
gps_csr <- csregion(gps)

# The additional information of the CSR-calculation process are
# accessible through the attributes described in the `*Value*` section
attr(gps_csr, "filter_matrix")
attr(gps_csr, "csr_summary")
attr(gps_csr, "csr_data")

Run the code above in your browser using DataLab