Learn R Programming

PhaseGMM (version 0.1.0)

eiv_mlr: Linear Regression with Errors-in-Variables Using Replicated Measurements

Description

Fits a linear regression model in the presence of measurement error in covariates using replicated measurements and a combination of phase-function estimation and generalized method of moments (GMM).

Usage

eiv_mlr(
  formula,
  data,
  weight_method = c("uniform", "minimax", "quasi-likelihood"),
  B = 100,
  t_grid_length = 1000
)

Value

An object of class "eiv_mlr" containing:

coef

Estimated regression coefficients.

vcov

Estimated variance-covariance matrix.

se

Standard errors of the estimates.

zvalue

Z-statistics for hypothesis testing.

pvalue

Two-sided p-values.

fitted

Fitted values at the unit level.

method

Estimation method used (GMM or quadratic fallback).

n

Number of statistical units.

Standard methods such as summary(), coef(), vcov(), confint(), predict(), and residuals() are available for objects of this class.

Arguments

formula

A symbolic description of the model to be fitted. Error-prone covariates must be wrapped in W() and error-free covariates must be wrapped in Z().

The general form is:


    y ~ W(W1 + W2 + ...) + Z(Z1 + Z2 + ...)
  

An intercept is included automatically unless removed explicitly.

data

A data frame containing the response variable and all covariates appearing in formula. Each row corresponds to one replicate measurement of a statistical unit.

The data frame must contain:

  • A column named unit identifying statistical units.

  • One or more rows per unit if replicated measurements exist.

  • One column for each error-prone covariate (appearing in W()).

  • One column for each error-free covariate (appearing in Z()).

Replicated measurements are represented by multiple rows sharing the same unit identifier. Error-free covariates and the response should be constant within each unit.

weight_method

Character string specifying the observation weighting method used in estimation. One of:

"uniform"

Uniform weights across observations.

"minimax"

Minimax optimal weights.

"quasi-likelihood"

Quasi-likelihood-based weights (default recommended).

B

Integer specifying the number of bootstrap replications used to estimate the GMM weighting matrix. Defaults to 100.

t_grid_length

Integer specifying the number of frequency grid points used in phase-function integration. Larger values improve numerical accuracy at the cost of computation time.

Details

The function provides an lm-like interface while internally handling replicated error-prone covariates, measurement error correction, and robust variance estimation.

This function implements a measurement-error-corrected linear regression estimator for models with replicated error-prone covariates. When fewer than two units contain replicated measurements, the function automatically falls back to a quadratic (identity-weight) estimator. This ensures the model remains estimable even in the absence of replication.

The estimation procedure:

  1. Aggregates replicated measurements into a structured array.

  2. Uses phase-function estimating equations to correct for unknown measurement error distributions.

  3. Combines moment conditions via GMM when sufficient replication information is available.

  4. Automatically switches to a quadratic (identity-weight) estimator when fewer than two statistical units contain replicated measurements.

Variance estimation is performed using a sandwich estimator, with the GMM weighting matrix estimated via a cluster bootstrap over statistical units.

Examples

Run this code
## ------------------------------------------
## Small reproducible example (for speed reasons, we chose a too small number of bootstrap samples)
## ------------------------------------------

set.seed(1)

n  <- 30
J  <- 2

unit <- rep(1:n, each = J)

W_true <- rnorm(n)
W_obs  <- rep(W_true, each = J) + rnorm(n * J, sd = 0.5)

Z1 <- rep(rnorm(n), each = J)
y  <- rep(1 + 2 * W_true - 0.5 * Z1[seq(1, n * J, by = J)], each = J) +
      rnorm(n * J)

sim_data <- data.frame(
  unit = unit,
  y = y,
  W1 = W_obs,
  Z1 = Z1
)

# For speed reasons, we use a very small number of bootstrap samples
fit <- eiv_mlr(
  y ~ W(W1) + Z(Z1),
  data = sim_data,
  B = 10,
  t_grid_length = 20
)

coef(fit)
summary(fit)


## ------------------------------------------
## Additional examples (not run during checks)
## ------------------------------------------

# \donttest{
## ------------------------------------------
## Example using included dataset
## ------------------------------------------

fit <- eiv_mlr(
  bmi ~ W(energy + protein + fat) + Z(age_in_month),
  data = dietary_white_women,
  weight_method = "minimax",
  B = 100,
  t_grid_length = 200
)

summary(fit)
confint(fit)


## ------------------------------------------
## Simulated example with replication
## ------------------------------------------

set.seed(1)

n  <- 200
J  <- 2

unit <- rep(1:n, each = J)

W_true <- rnorm(n)
W_obs  <- rep(W_true, each = J) + rnorm(n * J, sd = 0.5)

Z1 <- rep(rnorm(n), each = J)
y  <- rep(1 + 2 * W_true - 0.5 * Z1[seq(1, n * J, by = J)], each = J) +
      rnorm(n * J)

sim_data <- data.frame(
  unit = unit,
  y = y,
  W1 = W_obs,
  Z1 = Z1
)

fit_rep <- eiv_mlr(
  y ~ W(W1) + Z(Z1),
  data = sim_data,
  B = 20
)

summary(fit_rep)


## ------------------------------------------
## Simulated example without replication
## ------------------------------------------

sim_norep <- sim_data[!duplicated(sim_data$unit), ]

fit_norep <- eiv_mlr(
  y ~ W(W1) + Z(Z1),
  data = sim_norep,
  B = 20
)

summary(fit_norep)
# }

Run the code above in your browser using DataLab