one_step: One-step maximum likelihood estimation

Description

Maximum likelihood estimation of the regression model, treating the generated covariate as a noisy proxy for the true latent variable. This method is particularly useful when an estimate of the false positive rate is not available. The variance of the estimates is approximated via the inverse Hessian at the optimum.

Usage

one_step(
  Y,
  Xhat = NULL,
  homoskedastic = FALSE,
  distribution = c("normal", "t", "laplace", "gamma", "beta"),
  nu = 4,
  gshape = 2,
  gscale = 1,
  ba = 2,
  bb = 2,
  intercept = TRUE,
  gen_idx = 1,
  data = parent.frame(),
  ...
)
# S3 method for default
one_step(
  Y,
  Xhat,
  homoskedastic = FALSE,
  distribution = c("normal", "t", "laplace", "gamma", "beta"),
  nu = 4,
  gshape = 2,
  gscale = 1,
  ba = 2,
  bb = 2,
  intercept = TRUE,
  gen_idx = 1,
  ...
)
# S3 method for formula
one_step(
  Y,
  Xhat = NULL,
  homoskedastic = FALSE,
  distribution = c("normal", "t", "laplace", "gamma", "beta"),
  nu = 4,
  gshape = 2,
  gscale = 1,
  ba = 2,
  bb = 2,
  intercept = TRUE,
  gen_idx = 1,
  data = parent.frame(),
  ...
)

Value

An object of class mlbc_fit and mlbc_onestep with:

coef: estimated regression coefficients
vcov: variance-covariance matrix

Arguments

Y: numeric response vector, or a one-sided formula
Xhat: numeric matrix of regressors (if Y is numeric)
homoskedastic: logical; if TRUE, assumes a common error variance; otherwise, the error variance is allowed to vary with the true latent binary variable
distribution: character; distribution for error terms. One of "normal", "t", "laplace", "gamma", "beta"
nu: numeric; degrees of freedom (for Student-t distribution)
gshape: numeric; shape parameter (for Gamma distribution)
gscale: numeric; scale parameter (for Gamma distribution)
ba: numeric; alpha parameter (for Beta distribution)
bb: numeric; beta parameter (for Beta distribution)
intercept: logical; if TRUE, prepend an intercept column to Xhat
gen_idx: integer; index (1-based) of the binary ML-generated variable. If not specified, defaults to the first non-intercept variable
data: data frame (if Y is a formula)
...: unused

Usage Options

Option 1: Formula Interface

Y: A one-sided formula string
data: Data frame containing the variables referenced in the formula

Option 2: Array Interface

Y: Response variable vector
Xhat: Design matrix of covariates

Examples

Run this code

# Load the remote work dataset
data(SD_data)

# Basic one-step estimation
fit_onestep <- one_step(log(salary) ~ wfh_wham + soc_2021_2 + employment_type_name,
                        data = SD_data)
summary(fit_onestep)

# With different error distribution
fit_t <- one_step(log(salary) ~ wfh_wham + soc_2021_2,
                  data = SD_data,
                  distribution = "t",
                  nu = 4)
summary(fit_t)

# Homoskedastic errors
fit_homo <- one_step(log(salary) ~ wfh_wham + soc_2021_2,
                     data = SD_data,
                     homoskedastic = TRUE)
summary(fit_homo)

Run the code above in your browser using DataLab