lpmec: lpmec

Description

Implements bootstrapped analysis for latent variable models with measurement error correction

Usage

lpmec(
  Y,
  observables,
  observables_groupings = colnames(observables),
  orientation_signs = NULL,
  make_observables_groupings = FALSE,
  n_boot = 32L,
  n_partition = 10L,
  boot_basis = 1:length(Y),
  return_intermediaries = TRUE,
  ordinal = FALSE,
  estimation_method = "em",
  latent_estimation_fn = NULL,
  mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L,
    batch_size = 512L, chain_method = "parallel", subsample_method = "full",
    anchor_parameter_id = NULL, n_thin_by = 1L, n_chains = 2L),
  conda_env = "lpmec",
  conda_env_required = FALSE
)

Value

A list containing various estimates and statistics (in snake_case):

ols_coef: Coefficient from naive OLS regression.
ols_se: Standard error of naive OLS coefficient.
ols_tstat: T-statistic of naive OLS coefficient.
iv_coef: Coefficient from instrumental variable (IV) regression.
iv_se: Standard error of IV regression coefficient.
iv_tstat: T-statistic of IV regression coefficient.
corrected_iv_coef: IV regression coefficient corrected for measurement error.
corrected_iv_se: Standard error of the corrected IV coefficient (currently NA).
corrected_iv_tstat: T-statistic of the corrected IV coefficient.
var_est: Estimated variance of the measurement error (split-half variance).
corrected_ols_coef: OLS coefficient corrected for measurement error.
corrected_ols_se: Standard error of the corrected OLS coefficient (currently NA).
corrected_ols_tstat: T-statistic of the corrected OLS coefficient (currently NA).
corrected_ols_coef_alt: Alternative corrected OLS coefficient (if applicable).
corrected_ols_se_alt: Standard error for the alternative corrected OLS coefficient (if applicable).
corrected_ols_tstat_alt: T-statistic for the alternative corrected OLS coefficient (if applicable).
bayesian_ols_coef_outer_normed: Posterior mean of the OLS coefficient under MCMC, after normalizing by the overall sample standard deviation.
bayesian_ols_se_outer_normed: Posterior standard error corresponding to bayesian_ols_coef_outer_normed.
bayesian_ols_tstat_outer_normed: T-statistic for bayesian_ols_coef_outer_normed.
bayesian_ols_coef_inner_normed: Posterior mean of the OLS coefficient under MCMC, after normalizing each posterior draw individually.
bayesian_ols_se_inner_normed: Posterior standard error corresponding to bayesian_ols_coef_inner_normed.
bayesian_ols_tstat_inner_normed: T-statistic for bayesian_ols_coef_inner_normed.
m_stage_1_erv: Extreme robustness value (ERV) for the first-stage regression (x_est2 on x_est1), if computed.
m_reduced_erv: ERV for the reduced model (Y on x_est1), if computed.
x_est1: First set of latent variable estimates.
x_est2: Second set of latent variable estimates.

Arguments

Y

A vector of observed outcome variables

observables

A matrix of observable indicators used to estimate the latent variable

observables_groupings

A vector specifying groupings for the observable indicators. Default is column names of observables.

orientation_signs

(optional) A numeric vector of length equal to the number of columns in `observables`, containing 1 or -1 to indicate the desired orientation of each column. If provided, each column of `observables` will be oriented by this sign before analysis. Default is NULL (no orientation applied).

make_observables_groupings

Logical. If TRUE, creates dummy variables for each level of the observable indicators. Default is FALSE.

n_boot

Integer. Number of bootstrap iterations. Default is 32.

n_partition

Integer. Number of partitions for each bootstrap iteration. Default is 10.

boot_basis

Vector of indices or grouping variable for stratified bootstrap. Default is 1:length(Y).

return_intermediaries

Logical. If TRUE, returns intermediate results. Default is TRUE.

ordinal

Logical indicating whether the observable indicators are ordinal (TRUE) or binary (FALSE).

estimation_method

Character specifying the estimation approach. Options include:

"em" (default): Uses expectation-maximization via emIRT package. Supports both binary (via emIRT::binIRT) and ordinal (via emIRT::ordIRT) indicators.
"pca": First principal component of observables.
"averaging": Uses feature averaging.
"mcmc": Markov Chain Monte Carlo estimation using either pscl::ideal (R backend) or numpyro (Python backend)
"mcmc_joint": Joint Bayesian model that simultaneously estimates latent variables and outcome relationship using numpyro
"mcmc_overimputation": Two-stage MCMC approach with measurement error correction via over-imputation
"custom": In this case, latent estimation performed using latent_estimation_fn.

latent_estimation_fn

Custom function for estimating latent trait from observables if estimation_method="custom" (optional). The function should accept a matrix of observables (rows are observations) and return a numeric vector of length equal to the number of observations.

mcmc_control

A list indicating parameter specifications if MCMC used.

backend: Character string indicating the MCMC engine to use. Valid options are "pscl" (default, uses the R-based pscl::ideal function) or "numpyro" (uses the Python numpyro package via reticulate).

n_samples_warmup

Integer specifying the number of warm-up (burn-in) iterations before samples are collected. Default is 500.

n_samples_mcmc

Integer specifying the number of post-warmup MCMC iterations to retain. Default is 1000.

chain_method

Character string passed to numpyro specifying how to run multiple chains. Options: "parallel" (default), "sequential", or "vectorized".

n_thin_by

Integer indicating the thinning factor for MCMC samples. Default is 1.

n_chains

Integer specifying the number of parallel MCMC chains to run. Default is 2.

conda_env

A character string specifying the name of the conda environment to use via reticulate. Default is "lpmec".

conda_env_required

A logical indicating whether the specified conda environment must be strictly used. If TRUE, an error is thrown if the environment is not found. Default is FALSE.

Details

This function implements a bootstrapped latent variable analysis with measurement error correction. It performs multiple bootstrap iterations, each with multiple partitions. For each partition, it calls the lpmec_onerun function to estimate latent variables and apply various correction methods. The results are then aggregated across partitions and bootstrap iterations to produce final estimates and bootstrap standard errors.

References

Jerzak, C. T. and Jessee, S. A. (2025). Attenuation Bias with Latent Predictors. arXiv:2507.22218 [stat.AP]. https://arxiv.org/abs/2507.22218

Examples

Run this code

# \donttest{
# Generate some example data
set.seed(123)
Y <- rnorm(1000)
observables <- as.data.frame(matrix(sample(c(0,1), 1000*10, replace = TRUE), ncol = 10))

# Run the bootstrapped analysis
results <- lpmec(Y = Y,
                 observables = observables,
                 n_boot = 10,    # small values for illustration only
                 n_partition = 5 # small for size
                 )

# View the corrected IV coefficient and its standard error
print(results)
# }

Run the code above in your browser using DataLab