Learn R Programming

CausalGPS

ResourceGithub ActionsCode Coverage
PlatformsWindows, macOS, Linuxcodecov
R CMD check

Matching on generalized propensity scores with continuous exposures

Summary

CausalGPS is an R package that implements matching on generalized propensity scores with continuous exposures. The package introduces a novel approach for estimating causal effects using observational data in settings with continuous exposures, and a new framework for GPS caliper matching that jointly matches on both the estimated GPS and exposure levels to fully adjust for confounding bias.

Installation

  • Installing from source
library("devtools")
install_github("NSAPH-Software/CausalGPS")
library("CausalGPS")
  • Installing from CRAN
install.packages("CausalGPS")
  • Setting up docker environment

Developing Docker image can be downloaded from Docker Hub. See more details in docker_singularity.

Usage

The CausalGPS package encompasses two primary stages: Design and Analysis. The Design stage comprises estimating GPS values, generating weights or counts of matched data, and evaluating the generated population. The Analysis stage is focused on estimating the exposure-response function. The following figure represents the process workflow

Estimating GPS values

GPS values can be estimated using two distinct approaches: kernel and normal.

set.seed(967)
m_d <- generate_syn_data(sample_size = 500)

m_xgboost <- function(nthread = 1,
                      ntrees = 35,
                      shrinkage = 0.3,
                      max_depth = 5,
                      ...) {SuperLearner::SL.xgboost(
                        nthread = nthread,
                        ntrees = ntrees,
                        shrinkage=shrinkage,
                        max_depth=max_depth,
                        ...)}

gps_obj <- estimate_gps(.data = m_d,
                        .formula = w ~ I(cf1^2) + cf2 + I(cf3^2) + cf4 + cf5 + cf6,
                        sl_lib = c("m_xgboost"),
                        gps_density = "normal")

where

  • .data A data.frame of input data including the id column.
  • .formula The formula for modeling exposure based on provided confounders.
  • sl_lib A vector of prediction algorithms.
  • gps_density A model type which is used for estimating GPS value, including normal (default) and kernel.

Computing weight or count of matched data

The second step in processing involves computing the weight or count of matched data. For the former, the weighting approach is used, and for the latter, the matching approach.

cw_object_matching <- compute_counter_weight(gps_obj = gps_obj,
                                             ci_appr = "matching",
                                             bin_seq = NULL,
                                             nthread = 1,
                                             delta_n = 0.1,
                                             dist_measure = "l1",
                                             scale = 0.5)
                                             

where

  • ci_appr The causal inference approach. Possible values are:
    • "matching": Matching by GPS
    • "weighting": Weighting by GPS
  • bin_seq Sequence of w (treatment) to generate pseudo population. If NULL is passed the default value will be used, which is seq(min(w)+delta_n/2,max(w), by=delta_n).
  • nthread An integer value that represents the number of threads to be used by internal packages in a shared memory system.

If ci.appr = matching:

  • dist_measure: Distance measuring function. Available options:
    • l1: Manhattan distance matching
  • delta_n: caliper parameter.
  • scale: a specified scale parameter to control the relative weight that is attributed to the distance measures of the exposure versus the GPS.

Estimating psuedo population

The pseudo population is created by combining the counter_weight of data samples with the original data, including the outcome variable.

pseudo_pop_matching <- generate_pseudo_pop(.data = m_d,
                                            cw_obj = cw_object_matching,
                                            covariate_col_names = c("cf1", "cf2", "cf3",
                                                                    "cf4", "cf5", "cf6"),
                                            covar_bl_trs = 0.1,
                                            covar_bl_trs_type = "maximal",
                                            covar_bl_method = "absolute")

where

  • covar_bl_method: covariate balance method. Available options:
    • 'absolute'
  • covar_bl_trs: covariate balance threshold
  • covar_bl_trs_type: covariate balance type (mean, median, maximal)

Estimating exposure response function

The exposure-response function can be computed using parametric, semiparametric, and nonparametric approaches.

erf_obj_nonparametric <- estimate_erf(.data = pseudo_pop_matching$.data,
                                       .formula = Y ~ w,
                                       weights_col_name = "counter_weight",
                                       model_type = "nonparametric",
                                       w_vals = seq(2,20,0.5),
                                       bw_seq = seq(0.2,2,0.2),
                                       kernel_appr = "kernsmooth")
                                       

where

  • w_vals: A numeric vector of values at which you want to calculate the exposure response function.
  • bw_seq: A vector of bandwidth values.
  • kernel_appr: Internal kernel approach. Available options are locpol and kernsmooth.

Notes

  • Trimming data for extreme exposure value, or trimmming gps_obj for extreme GPS values, can be done by using trim_it function.
trimmed_data <- trim_it(data_obj = m_d,
                        trim_quantiles = c(0.05, 0.95),
                        variable = "w")
  • For the prediction model, we use the SuperLearner package. Users must prepare a wrapper function for the options available in SuperLearner to have a function with customized parameters. For instance, in the code below, we override the default values of nthread, ntrees, shrinkage, and max_depth. For example, in the following code, we override nthread, ntrees, shrinkage, and max_depth default values.
m_xgboost <- function(nthread = 1,
                      ntrees = 35,
                      shrinkage = 0.3,
                      max_depth = 5,
                      ...) {SuperLearner::SL.xgboost(
                        nthread = nthread,
                        ntrees = ntrees,
                        shrinkage=shrinkage,
                        max_depth=max_depth,
                        ...)}
  • To test your code and run examples, you can generate synthetic data.
syn_data <- generate_syn_data(sample_size=1000,
                              outcome_sd = 10,
                              gps_spec = 1,
                              cova_spec = 1)

Contribution

For more information about reporting bugs and contribution, please read the contribution page from the package web page.

Code of Conduct

Please note that the CausalGPS project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

  • CausalGPS method paper
@article{wu2022matching,
  title={Matching on generalized propensity scores with continuous exposures},
  author={Wu, Xiao and Mealli, Fabrizia and Kioumourtzoglou, Marianthi-Anna and Dominici, Francesca and Braun, Danielle},
  journal={Journal of the American Statistical Association},
  pages={1--29},
  year={2022},
  publisher={Taylor \& Francis}
}
  • CausalGPS software paper
@misc{khoshnevis2023causalgps,
      title={CausalGPS: An R Package for Causal Inference With Continuous Exposures}, 
      author={Naeem Khoshnevis and Xiao Wu and Danielle Braun},
      year={2023},
      eprint={2310.00561},
      archivePrefix={arXiv},
      primaryClass={stat.CO},
      DOI={h10.48550/arXiv.2310.00561}
}

Copy Link

Version

Install

install.packages('CausalGPS')

Monthly Downloads

541

Version

0.5.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Naeem Khoshnevis

Last Published

January 11th, 2026

Functions in CausalGPS (0.5.1)

compute_counter_weight

Compute counter or weight of data samples
estimate_npmetric_erf

Estimate smoothed exposure-response function (ERF) for pseudo population
matching_fn

Match observations
log_system_info

Log system information
print.cgps_erf

Extend print function for cgps_erf object
print.cgps_cw

Extend print function for cgps_cw object
estimate_erf

Estimate Exposure Response Function
compute_min_max

Compute minimum and maximum
create_weighting

Create pseudo population using weighting casual inference approach
set_logger

Set Logger Settings
plot.cgps_gps

Extend generic plot functions for cgps_gps class
plot.cgps_pspop

Extend generic plot functions for cgps_pspop class
estimate_gps

Estimate generalized propensity score (GPS) values
estimate_hat_vals

Estimate hat (fitted) values
trim_gps

Trim a gps object based on provided trimming quantiles
smooth_erf

Smooth exposure response function
trim_it

Trim a data frame or an S3 object
w_fun

Helper function
estimate_pmetric_erf

Estimate Parametric Exposure Response Function
generate_kernel

Generate kernel function
estimate_semipmetric_erf

Estimate semi-exposure-response function (semi-ERF).
compile_pseudo_pop

Compile pseudo population
plot.cgps_cw

Extend generic plot functions for cgps_cw class
plot.cgps_erf

Extend generic plot functions for cgps_cw class
gen_wrap_sl_lib

Generate customized wrapper for SuperLearner libraries
summary.cgps_erf

print summary of cgps_erf object
summary.cgps_gps

print summary of cgps_gps object
summary.cgps_pspop

print summary of cgps_pspop object
summary.cgps_cw

print summary of cgps_cw object
print.cgps_pspop

Extend print function for cgps_pspop object
generate_syn_data

Generate synthetic data for the CausalGPS package
print.cgps_gps

Extend print function for cgps_gps object
get_logger

Get Logger Settings
smooth_erf_kernsmooth

Compute smoothed erf with kernsmooth approach
generate_pseudo_pop

Generate pseudo population
smooth_erf_locpol

Compute smoothed erf with locpol approach
train_it

Generate Prediction Model
synthetic_us_2010

Public data set for air pollution and health studies, case study: 2010 county-Level data set for the contiguous United States
check_args

Check additional arguments
absolute_corr_fun

Check covariate balance using absolute approach
autoplot.cgps_pspop

A helper function for cgps_pspop object
CausalGPS-package

The 'CausalGPS' package.
absolute_weighted_corr_fun

Check Weighted Covariate Balance Using Absolute Approach
check_args_compile_pseudo_pop

Check compile_pseudo_pop function arguments
autoplot.cgps_cw

A helper function for cgps_cw object
check_args_estimate_gps

Check estimate_gps function arguments
compute_density

Approximate density based on another vector
compute_outer

Computes distance on all possible combinations
compute_closest_wgps

Find the closest data in subset to the original data
compute_resid

Compute residual
create_matching

Create pseudo population using matching casual inference approach
check_kolmogorov_smirnov

Check Kolmogorov-Smirnov (KS) statistics
compute_risk

Compute risk value
check_covar_balance

Check covariate balance
check_args_use_cov_transformers

Check Covariate Balance Transformers Argument
autoplot.cgps_erf

A helper function for cgps_erf object
autoplot.cgps_gps

A helper function for cgps_gps object