eiv_mlr: Linear Regression with Errors-in-Variables Using Replicated Measurements

Description

Fits a linear regression model in the presence of measurement error in covariates using replicated measurements and a combination of phase-function estimation and generalized method of moments (GMM).

Usage

eiv_mlr(
  formula,
  data,
  weight_method = c("uniform", "minimax", "quasi-likelihood"),
  B = 100,
  t_grid_length = 1000
)

Value

An object of class "eiv_mlr" containing:

coef: Estimated regression coefficients.
vcov: Estimated variance-covariance matrix.
se: Standard errors of the estimates.
zvalue: Z-statistics for hypothesis testing.
pvalue: Two-sided p-values.
fitted: Fitted values at the unit level.
method: Estimation method used (GMM or quadratic fallback).
n: Number of statistical units.

Standard methods such as summary(), coef(), vcov(), confint(), predict(), and residuals() are available for objects of this class.

Arguments

formula

A symbolic description of the model to be fitted. Error-prone covariates must be wrapped in W() and error-free covariates must be wrapped in Z().

The general form is:


    y ~ W(W1 + W2 + ...) + Z(Z1 + Z2 + ...)

An intercept is included automatically unless removed explicitly.

data

A data frame containing the response variable and all covariates appearing in formula. Each row corresponds to one replicate measurement of a statistical unit.

The data frame must contain:

A column named unit identifying statistical units.
One or more rows per unit if replicated measurements exist.
One column for each error-prone covariate (appearing in W()).
One column for each error-free covariate (appearing in Z()).

Replicated measurements are represented by multiple rows sharing the same unit identifier. Error-free covariates and the response should be constant within each unit.

weight_method

Character string specifying the observation weighting method used in estimation. One of:

"uniform": Uniform weights across observations.

"minimax"

Minimax optimal weights.

"quasi-likelihood"

Quasi-likelihood-based weights (default recommended).

Integer specifying the number of bootstrap replications used to estimate the GMM weighting matrix. Defaults to 100.

t_grid_length

Integer specifying the number of frequency grid points used in phase-function integration. Larger values improve numerical accuracy at the cost of computation time.

Details

The function provides an lm-like interface while internally handling replicated error-prone covariates, measurement error correction, and robust variance estimation.

This function implements a measurement-error-corrected linear regression estimator for models with replicated error-prone covariates. When fewer than two units contain replicated measurements, the function automatically falls back to a quadratic (identity-weight) estimator. This ensures the model remains estimable even in the absence of replication.

The estimation procedure:

Aggregates replicated measurements into a structured array.
Uses phase-function estimating equations to correct for unknown measurement error distributions.
Combines moment conditions via GMM when sufficient replication information is available.
Automatically switches to a quadratic (identity-weight) estimator when fewer than two statistical units contain replicated measurements.

Variance estimation is performed using a sandwich estimator, with the GMM weighting matrix estimated via a cluster bootstrap over statistical units.

Examples

Run this code

## ------------------------------------------
## Small reproducible example (for speed reasons, we chose a too small number of bootstrap samples)
## ------------------------------------------

set.seed(1)

n  <- 30
J  <- 2

unit <- rep(1:n, each = J)

W_true <- rnorm(n)
W_obs  <- rep(W_true, each = J) + rnorm(n * J, sd = 0.5)

Z1 <- rep(rnorm(n), each = J)
y  <- rep(1 + 2 * W_true - 0.5 * Z1[seq(1, n * J, by = J)], each = J) +
      rnorm(n * J)

sim_data <- data.frame(
  unit = unit,
  y = y,
  W1 = W_obs,
  Z1 = Z1
)

# For speed reasons, we use a very small number of bootstrap samples
fit <- eiv_mlr(
  y ~ W(W1) + Z(Z1),
  data = sim_data,
  B = 10,
  t_grid_length = 20
)

coef(fit)
summary(fit)


## ------------------------------------------
## Additional examples (not run during checks)
## ------------------------------------------

# \donttest{
## ------------------------------------------
## Example using included dataset
## ------------------------------------------

fit <- eiv_mlr(
  bmi ~ W(energy + protein + fat) + Z(age_in_month),
  data = dietary_white_women,
  weight_method = "minimax",
  B = 100,
  t_grid_length = 200
)

summary(fit)
confint(fit)


## ------------------------------------------
## Simulated example with replication
## ------------------------------------------

set.seed(1)

n  <- 200
J  <- 2

unit <- rep(1:n, each = J)

W_true <- rnorm(n)
W_obs  <- rep(W_true, each = J) + rnorm(n * J, sd = 0.5)

Z1 <- rep(rnorm(n), each = J)
y  <- rep(1 + 2 * W_true - 0.5 * Z1[seq(1, n * J, by = J)], each = J) +
      rnorm(n * J)

sim_data <- data.frame(
  unit = unit,
  y = y,
  W1 = W_obs,
  Z1 = Z1
)

fit_rep <- eiv_mlr(
  y ~ W(W1) + Z(Z1),
  data = sim_data,
  B = 20
)

summary(fit_rep)


## ------------------------------------------
## Simulated example without replication
## ------------------------------------------

sim_norep <- sim_data[!duplicated(sim_data$unit), ]

fit_norep <- eiv_mlr(
  y ~ W(W1) + Z(Z1),
  data = sim_norep,
  B = 20
)

summary(fit_norep)
# }

Run the code above in your browser using DataLab