50% off: Unlimited data and AI learning.
State of Data and AI Literacy Report 2025

MLBC (version 0.2.1)

ols_bca: Additive bias-corrected OLS (BCA)

Description

Performs an additive bias correction to regressions that include a binary covariate generated by AI/ML. This method requires an external estimate of the false-positive rate. Standard errors are adjusted to account for uncertainty in the false-positive rate estimate.

Usage

ols_bca(
  Y,
  Xhat = NULL,
  fpr,
  m,
  data = parent.frame(),
  intercept = TRUE,
  gen_idx = 1,
  ...
)

# S3 method for default ols_bca( Y, Xhat, fpr, m, data = parent.frame(), intercept = TRUE, gen_idx = 1, ... )

# S3 method for formula ols_bca( Y, Xhat = NULL, fpr, m, data = parent.frame(), intercept = TRUE, gen_idx = 1, ... )

Value

An object of class mlbc_fit and mlbc_bca with:

  • coef: bias-corrected coefficient estimates (ML-slope first, other slopes, intercept last)

  • vcov: adjusted variance-covariance matrix for those coefficients

Arguments

Y

numeric response vector, or a one-sided formula

Xhat

numeric matrix of regressors (if Y is numeric); the ML-regressor is column gen_idx

fpr

numeric; estimated false-positive rate of the ML regressor

m

integer; size of the external sample used to estimate the classifier's false-positive rate. Can be set to a large number when the false-positive rate is known exactly

data

data frame (if Y is a formula)

intercept

logical; if TRUE, prepends a column of 1's to Xhat

gen_idx

integer; 1-based index of the ML-generated variable to apply bias correction to. If not specified, defaults to the first non-intercept variable

...

unused

Usage Options

Option 1: Formula Interface

  • Y: A one-sided formula string

  • data: Data frame containing the variables referenced in the formula

Option 2: Array Interface

  • Y: Response variable vector

  • Xhat: Design matrix of covariates

Examples

Run this code
# Load the remote work dataset
data(SD_data)

# Formula interface
fit_bca <- ols_bca(log(salary) ~ wfh_wham + soc_2021_2 + employment_type_name,
                   data = SD_data,
                   fpr = 0.009,  # estimated false positive rate
                   m = 1000)     # validation sample size
summary(fit_bca)

# Array interface
Y <- log(SD_data$salary)
Xhat <- model.matrix(~ wfh_wham + soc_2021_2, data = SD_data)[, -1]
fit_bca2 <- ols_bca(Y, Xhat, fpr = 0.009, m = 1000, intercept = TRUE)
summary(fit_bca2)

Run the code above in your browser using DataLab