Learn R Programming

MLBC (version 0.2.1)

ols_bcm: Multiplicative bias-corrected OLS (BCM)

Description

Performs a multiplicative bias correction to regressions that include a binary covariate generated by AI/ML. This method requires an external estimate of the false-positive rate. Standard errors are adjusted to account for uncertainty in the false-positive rate estimate.

Usage

ols_bcm(
  Y,
  Xhat = NULL,
  fpr,
  m,
  data = parent.frame(),
  intercept = TRUE,
  gen_idx = 1,
  ...
)

# S3 method for default ols_bcm( Y, Xhat, fpr, m, data = parent.frame(), intercept = TRUE, gen_idx = 1, ... )

# S3 method for formula ols_bcm( Y, Xhat = NULL, fpr, m, data = parent.frame(), intercept = TRUE, gen_idx = 1, ... )

Value

An object of class mlbc_fit and mlbc_bcm with:

  • coef: bias-corrected coefficient estimates (ML-slope first, other slopes, intercept last)

  • vcov: adjusted variance-covariance matrix for those coefficients

Arguments

Y

numeric response vector, or a one-sided formula

Xhat

numeric matrix of regressors (if Y is numeric); the ML-regressor is column gen_idx

fpr

numeric; estimated false-positive rate of the ML regressor

m

integer; size of the external sample used to estimate the classifier's false-positive rate. Can be set to a large number when the false-positive rate is known exactly

data

data frame (if Y is a formula)

intercept

logical; if TRUE, prepends a column of 1's to Xhat

gen_idx

integer; 1-based index of the ML-generated variable to apply bias correction to. If not specified, defaults to the first non-intercept variable

...

unused

Usage Options

Option 1: Formula Interface

  • Y: A one-sided formula string

  • data: Data frame containing the variables referenced in the formula

Option 2: Array Interface

  • Y: Response variable vector

  • Xhat: Design matrix of covariates

Examples

Run this code
# Load the remote work dataset
data(SD_data)

# Formula interface
fit_bcm <- ols_bcm(log(salary) ~ wfh_wham + soc_2021_2 + employment_type_name,
                   data = SD_data,
                   fpr = 0.009,  # estimated false positive rate
                   m = 1000)     # validation sample size
summary(fit_bcm)

# Compare with uncorrected OLS
fit_ols <- ols(log(salary) ~ wfh_wham + soc_2021_2 + employment_type_name,
               data = SD_data)

# Display coefficient comparison
data.frame(
  OLS = coef(fit_ols)[1:2],
  BCM = coef(fit_bcm)[1:2]
)

Run the code above in your browser using DataLab