Learn R Programming

causalOT (version 1.0.2)

barycentric_projection: Barycentric Projection outcome estimation

Description

Barycentric Projection outcome estimation

Usage

barycentric_projection(
  formula,
  data,
  weights,
  separate.samples.on = "z",
  penalty = NULL,
  cost_function = NULL,
  p = 2,
  debias = FALSE,
  cost.online = "auto",
  diameter = NULL,
  niter = 1000L,
  tol = 1e-07,
  ...
)

Value

An object of class "bp" which is a list with slots:

  • potentials The dual potentials from calculating the optimal transport distance

  • penalty The value of the penalty parameter used in calculating the optimal transport distance

  • cost_function The cost function used to calculate the distances between units.

  • cost_alg A character vector denoting if an \(L_1\) distance, a squared euclidean distance, or other distance metric was used.

  • p The power to which the cost matrix was raised if not using a user supplied cost function.

  • debias Whether barycentric projections should be debiased.

  • tensorized TRUE/FALSE denoting wether to use offline cost matrices.

  • data An object of class dataHolder with the data used to calculate the optimal transport distance.

  • y_a The outcome vector in the first sample.

  • y_b The outcome vector in the second sample.

  • x_a The covariate matrix in the first sample.

  • x_b The covariate matrix in the second sample.

  • a The empirical measure in the first sample.

  • b The empirical measure in the second sample.

  • terms The terms object from the formula.

Arguments

formula

A formula object specifying the outcome and covariates.

data

A data.frame of the data to use in the model.

weights

Either a vector of weights, one for each observations, or an object of class causalWeights.

separate.samples.on

The variable in the data denoting the treatment indicator. How to separate samples for the optimal transport calculation

penalty

The penalty parameter to use in the optimal transport calculation. By default it is \(1/\log(n)\).

cost_function

A user supplied cost function. If supplied, must take arguments x1, x2, and p.

p

The power to raise the cost function. Default is 2.0. For user supplied cost functions, the cost will not be raised by this power unless the user so specifies.

debias

Should debiased barycentric projections be used? See details.

cost.online

Should an online cost algorithm be used? Default is "auto", which selects an online cost algorithm when the sample size in each group specified by separate.samples.on, \(n_0\) and \(n_1\), is such that \(n_0 \cdot n_1 \geq 5000^2\) Must be one of "auto", "online", or "tensorized". The last of these is the offline option.

diameter

The diameter of the covariate space, if known.

niter

The maximum number of iterations to run the optimal transport problems

tol

The tolerance for convergence of the optimal transport problems

...

Not used at this time.

Details

The barycentric projection uses the dual potentials from the optimal transport distance between the two samples to calculate projections from one sample into another. For example, in the sample of controls, we may wish to know their outcome had they been treated. In general, we then seek to minimize $$\text{argmin}_{\eta} \sum_{ij} cost(\eta_i, y_j) \pi_{ij} $$ where \(\pi_{ij}\) is the primal solution from the optimal transport problem.

These values can also be de-biased using the solutions from running an optimal transport problem of one sample against itself. Details are listed in Pooladian et al. (2022) https://arxiv.org/abs/2202.08919.

Examples

Run this code
if(torch::torch_is_installed()) {
set.seed(23483)
n <- 2^5
pp <- 6
overlap <- "low"
design <- "A"
estimate <- "ATT"
power <- 2
data <- causalOT::Hainmueller$new(n = n, p = pp,
design = design, overlap = overlap)

data$gen_data()

weights <- causalOT::calc_weight(x = data,
  z = NULL, y = NULL,
  estimand = estimate,
  method = "NNM")
  
 df <- data.frame(y = data$get_y(), z = data$get_z(), data$get_x())
  
 fit <- causalOT::barycentric_projection(y ~ ., data = df, 
    weight = weights,
    separate.samples.on = "z",
    niter = 2)
 inherits(fit, "bp")
 }

Run the code above in your browser using DataLab