Learn R Programming

SMMAL (version 0.0.5)

ate.SSL: Estimate Average Treatment Effect (ATE) via Semi-Supervised Learning

Description

Estimate Average Treatment Effect (ATE) via Semi-Supervised Learning

Usage

ate.SSL(
  Y,
  A,
  R,
  mu1,
  mu0,
  pi1,
  pi0,
  imp.A,
  imp.A1Y1,
  imp.A0Y1,
  min.pi = 0.05,
  max.pi = 0.95
)

Value

A list containing:

est

Estimated ATE.

se

Estimated standard error of ATE.

Arguments

Y

Numeric vector. Observed outcomes for labeled data (with missing values for unlabelled).

A

Numeric vector. Treatment indicator (1 for treated, 0 for control).

R

Logical or binary vector. Indicator for labeled data (1 if labeled, 0 if not).

mu1

Numeric vector. Estimated outcome regression \(E[Y \mid A = 1, X]\).

mu0

Numeric vector. Estimated outcome regression \(E[Y \mid A = 0, X]\).

pi1

Numeric vector. Estimated propensity scores \(P(A = 1 \mid X)\).

pi0

Numeric vector. Estimated propensity scores \(P(A = 0 \mid X)\).

imp.A

Numeric vector. Estimated treatment probabilities using surrogate covariates W.

imp.A1Y1

Numeric vector. Imputed \(E[Y \mid A = 1, W]\) using surrogate variables.

imp.A0Y1

Numeric vector. Imputed \(E[Y \mid A = 0, W]\) using surrogate variables.

min.pi

Numeric. Lower bound to truncate estimated propensity scores (default = 0.05).

max.pi

Numeric. Upper bound to truncate estimated propensity scores (default = 0.95).

Details

This function estimates the ATE in a semi-supervised setting, where outcomes are only observed for a subset of the sample. Surrogate variables and imputed models are used to leverage information from unlabelled data.

Examples

Run this code
set.seed(123)
N <- 400
n <- 200  # Number of labeled observations
labeled_indices <- sample(1:N, n)

# Generate covariates and treatment
X <- rnorm(N)
A <- rbinom(N, 1, plogis(X))

# True potential outcomes
Y0_true <- X + rnorm(N)
Y1_true <- X + 1 + rnorm(N)

# Observed outcomes
Y_full <- ifelse(A == 1, Y1_true, Y0_true)

# Only labeled samples have observed Y
Y <- rep(NA, N)
Y[labeled_indices] <- Y_full[labeled_indices]
R <- rep(0, N); R[labeled_indices] <- 1

# Nuisance parameter estimates (can be replaced by actual model predictions)
mu1 <- X + 0.5
mu0 <- X - 0.5
pi1 <- plogis(X)
pi0 <- 1 - pi1
imp.A <- plogis(X)
imp.A1Y1 <- plogis(X) * (X + 0.5)
imp.A0Y1 <- (1 - plogis(X)) * (X - 0.5)

# Estimate ATE
result <- ate.SSL(
  Y = Y,
  A = A,
  R = R,
  mu1 = mu1,
  mu0 = mu0,
  pi1 = pi1,
  pi0 = pi0,
  imp.A = imp.A,
  imp.A1Y1 = imp.A1Y1,
  imp.A0Y1 = imp.A0Y1
)

print(result$est)
print(result$se)

Run the code above in your browser using DataLab