ols_bcm: Multiplicative bias-corrected OLS (BCM)

Description

Performs a multiplicative bias correction to regressions that include a binary covariate generated by AI/ML. This method requires an external estimate of the false-positive rate. Standard errors are adjusted to account for uncertainty in the false-positive rate estimate.

Usage

ols_bcm(
  Y,
  Xhat = NULL,
  fpr,
  m,
  data = parent.frame(),
  intercept = TRUE,
  gen_idx = 1,
  ...
)
# S3 method for default
ols_bcm(
  Y,
  Xhat,
  fpr,
  m,
  data = parent.frame(),
  intercept = TRUE,
  gen_idx = 1,
  ...
)
# S3 method for formula
ols_bcm(
  Y,
  Xhat = NULL,
  fpr,
  m,
  data = parent.frame(),
  intercept = TRUE,
  gen_idx = 1,
  ...
)

Value

An object of class mlbc_fit and mlbc_bcm with:

coef: bias-corrected coefficient estimates (ML-slope first, other slopes, intercept last)
vcov: adjusted variance-covariance matrix for those coefficients

Arguments

Y: numeric response vector, or a one-sided formula
Xhat: numeric matrix of regressors (if Y is numeric); the ML-regressor is column gen_idx
fpr: numeric; estimated false-positive rate of the ML regressor
m: integer; size of the external sample used to estimate the classifier's false-positive rate. Can be set to a large number when the false-positive rate is known exactly
data: data frame (if Y is a formula)
intercept: logical; if TRUE, prepends a column of 1's to Xhat
gen_idx: integer; 1-based index of the ML-generated variable to apply bias correction to. If not specified, defaults to the first non-intercept variable
...: unused

Usage Options

Option 1: Formula Interface

Y: A one-sided formula string
data: Data frame containing the variables referenced in the formula

Option 2: Array Interface

Y: Response variable vector
Xhat: Design matrix of covariates

Examples

Run this code

# Load the remote work dataset
data(SD_data)

# Formula interface
fit_bcm <- ols_bcm(log(salary) ~ wfh_wham + soc_2021_2 + employment_type_name,
                   data = SD_data,
                   fpr = 0.009,  # estimated false positive rate
                   m = 1000)     # validation sample size
summary(fit_bcm)

# Compare with uncorrected OLS
fit_ols <- ols(log(salary) ~ wfh_wham + soc_2021_2 + employment_type_name,
               data = SD_data)

# Display coefficient comparison
data.frame(
  OLS = coef(fit_ols)[1:2],
  BCM = coef(fit_bcm)[1:2]
)

Run the code above in your browser using DataLab