PS_prior: Calculating the Propensity Score-Integrated Informative Priors

Description

The PS_prior function is designed to calculate the Propensity Score-Integrated (PS) informative prior constructed based on historical data.

Usage

PS_prior(
  formula,
  data,
  outcome,
  study,
  treat,
  method,
  distance,
  ratio,
  ps.method,
  trim
)
PS_prior.default(
  formula,
  data,
  outcome,
  study,
  treat,
  method,
  distance,
  ratio,
  ps.method,
  trim
)
PS_prior.beta(
  formula,
  data,
  outcome,
  study,
  treat,
  method,
  distance,
  ratio,
  ps.method,
  trim
)
PS_prior.norm(
  formula,
  data,
  outcome,
  study,
  treat,
  method,
  distance,
  ratio,
  ps.method,
  trim
)

Value

Displays the informative prior calculated from historical data based on the selected PS method.

Arguments

formula: A two-sided formula object containing the study indicator and covariates to be used in creating the distance measure used in the matching. This formula will be supplied to the functions that estimate the distance measure. For example, the formula should be specified as G ~ X1 + X2 + ... where G represents the name of study indicator and X1 and X2 are covariates.
data: A data frame containing the variables named in formula and possible other arguments.
outcome: The variable name of the outcome.
study: The variable name of the study indicator.
treat: The variable name of the treatment indicator.
method: The matching method to be used. The allowed methods are "nearest" for nearest neighbor matching (on the propensity score by default), "optimal" [method_optimal] for optimal pair matching, "full" [method_full] for optimal full matching, "genetic" [method_genetic] for genetic matching, "cem" [method_cem] for coarsened exact matching, "exact" [method_exact] for exact matching, "cardinality" [method_cardinality] for cardinality and template matching, and "subclass" [method_subclass] for subclassification. When set to "NULL", no matching will occur, but propensity score estimation and common support restrictions will still occur if requested. See the linked pages for each method for more details on what these methods do, how the arguments below are used by each on, and what additional arguments are allowed.
distance: The distance measure to be used. Can be either the name of a method of estimating propensity scores (e.g., "glm"), the name of a method of computing a distance matrix from the covariates (e.g., "mahalanobis"), a vector of already-computed distance measures, or a matrix of pairwise distances. See [distance] for allowable options. The default is "glm" for propensity scores estimated with logistic regression using glm(). Ignored for some methods; see individual methods pages for information on whether and how the distance measure is used.
ratio: For methods that allow it, how many historical control units should be matched to each current control unit in $k:1$ matching. Should be a single integer value. See the individual methods pages for information on whether and how this argument is used. The default is 1 for 1:1 matching.
ps.method: PS method utilize to calculate an informative prior based on historical data. The allowed methods are "Weighting" or "Matching". The default method is "Weighting".
trim: Lower and upper bound of trimming used in "Weighting". The default is [0.1,0.9].

Functions

PS_prior.default(): The function calculates the Propensity Score-Integrated informative prior based on historical data for binary and continuous endpoint.
PS_prior.beta(): The function calculates the Propensity Score-Integrated informative prior based on historical data for binary endpoint.
PS_prior.norm(): The function calculates the Propensity Score-Integrated informative prior based on historical data for continuous endpoint.

Details

This function aims to calculate informative priors using historical data by incorporating covariate information to enhance borrowing strength and address prior-data conflicts.

Let $G$ be the study indicator, where $G = 1$ indicate patient is from current control study, and $G = 0$ indicate patient is from historical control study. Given the covariates data $X$, the propensity score is defined as follows, $$e(X) = \Pr(G = 1 | X),$$ where distance allows different methods to estimate the propensity scores.

Calculate informative prior through PS matching is to identify a subset of historical data ($D_h^*$) that have similar PS as current control data ($D$). Various algorithms are available for PS matching, please refer to method. The informative prior can then be calculated based on the matched historical dataset.

Alternative, we can utilize the inverse probability of treatment weighting (IPTW) to adjust the distribution of $X$ in historical data $D_h$, making it similar to that in $D$. Specifically, for the $i$th subject, we assign a weight $\alpha_i$ to the outcome $y_i$ in $D_h$ based on its PS $e(X_i)$ and a fixed weight $\alpha_i = 1$ to $X_i$ in $D$, as follows: $$\alpha_i = G_1 + (1 - G_i) \frac{e(X_i)}{1 - e(X_i)}.$$ To avoid extremely large weights that may compromise IPTW, symmetric trimming rule can be used to trim the tails of the PS distribution by input trim with default [0.1,0.9], that is to trim observations whose estimated PS is outside of this range.

To standardized $\alpha$, we compute the effective sample size (ESS), which approximately reflects the level of precision or equivalently its sample size, retained in the sample after weight as $n^{*}_h = (\sum \alpha_i)^2 / \sum{\alpha_i^2}$. The standardized weight is given by $$\alpha_i^{*} = G_i + (1 - G_i)\frac{G_i}{\sum{\alpha_i} / n_h^{*}}.$$

For binary endpoint $Y \sim Ber(\theta)$, the informative prior $\pi_1(\theta)$ can be constructed as follows, $$\pi_1(\theta) \propto L(\theta | D_h, \alpha^{*}) \pi_0(\theta) = Beta(a + \sum \alpha_i^{*}y_i, b + n_h^* - \sum \alpha_i^{*}y_i )\},$$ where $\pi_0(\theta)$ is a non-informative prior, a natural choice is $Beta(a, b)$, with $a = b = 1$.

For continuous endpoint $Y \sim N(0, \sigma^2)$, suppose $\sigma^2$ is unknown, with non-informative prior $p(\theta, \sigma^2) \propto 1/\sigma^2$, $\pi_1(\theta)$ follows a student-$t$ distribution with degree of freedom $n_h^{*} - 1$. Given that $n_h^{*}$ is moderate and large, it can be approximated by a normal distribution $N(\bar{y}^{*}, {s^{*}}^2 / n_h^{*})$ with $$\bar{y}^{*} = \sum \alpha_i^* y_i / \alpha_i^*, ~~ {s^{*}}^2 = \sum \alpha_i^* (y_i - \bar{y}^{*})^2 / (n_h^{*} - 1).$$

References

Zhao Y, Laird G, Chen J, Yuan Y. PS-SAM: doubly robust propensity-score-integrated self-adapting mixture prior to dynamically borrow information from historical data.

Examples

Run this code

## Load example data
data('PS_SAM_data')
## Subset the data to contain historical data and current control
dat <- PS_SAM_data[PS_SAM_data$A == 0, ]
str(dat)

## Examples for binary endpoints
## Generate the informative prior based on historical data using PS Matching
summary(PS_prior(formula = 'G ~ X_1 + X_2 + X_3',
                 data = dat, ps.method = 'Matching', method = 'nearest',
                 outcome = 'Y_binary', study = 'G', treat = 'A'))

## Generate the informative prior based on historical data using PS Weighting
summary(PS_prior(formula = 'G ~ X_1 + X_2 + X_3',
                 data = dat, ps.method = 'Weighting',
                 outcome = 'Y_binary', study = 'G', treat = 'A'))

## Examples for continuous endpoints
## Generate the informative prior based on historical data using PS Matching
summary(PS_prior(formula = 'G ~ X_1 + X_2 + X_3',
                 data = dat, ps.method = 'Matching', method = 'nearest',
                 outcome = 'Y_continuous', study = 'G', treat = 'A'))

## Generate the informative prior based on historical data using PS Weighting
summary(PS_prior(formula = 'G ~ X_1 + X_2 + X_3',
                 data = dat, ps.method = 'Weighting',
                 outcome = 'Y_continuous', study = 'G', treat = 'A'))

Run the code above in your browser using DataLab