Implements bootstrapped analysis for latent variable models with measurement error correction
lpmec(
Y,
observables,
observables_groupings = colnames(observables),
orientation_signs = NULL,
make_observables_groupings = FALSE,
n_boot = 32L,
n_partition = 10L,
boot_basis = 1:length(Y),
return_intermediaries = TRUE,
ordinal = FALSE,
estimation_method = "em",
latent_estimation_fn = NULL,
mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L,
batch_size = 512L, chain_method = "parallel", subsample_method = "full",
anchor_parameter_id = NULL, n_thin_by = 1L, n_chains = 2L),
conda_env = "lpmec",
conda_env_required = FALSE
)A list containing various estimates and statistics (in snake_case):
ols_coef: Coefficient from naive OLS regression.
ols_se: Standard error of naive OLS coefficient.
ols_tstat: T-statistic of naive OLS coefficient.
iv_coef: Coefficient from instrumental variable (IV) regression.
iv_se: Standard error of IV regression coefficient.
iv_tstat: T-statistic of IV regression coefficient.
corrected_iv_coef: IV regression coefficient corrected for measurement error.
corrected_iv_se: Standard error of the corrected IV coefficient (currently NA).
corrected_iv_tstat: T-statistic of the corrected IV coefficient.
var_est: Estimated variance of the measurement error (split-half variance).
corrected_ols_coef: OLS coefficient corrected for measurement error.
corrected_ols_se: Standard error of the corrected OLS coefficient (currently NA).
corrected_ols_tstat: T-statistic of the corrected OLS coefficient (currently NA).
corrected_ols_coef_alt: Alternative corrected OLS coefficient (if applicable).
corrected_ols_se_alt: Standard error for the alternative corrected OLS coefficient (if applicable).
corrected_ols_tstat_alt: T-statistic for the alternative corrected OLS coefficient (if applicable).
bayesian_ols_coef_outer_normed: Posterior mean of the OLS coefficient under MCMC,
after normalizing by the overall sample standard deviation.
bayesian_ols_se_outer_normed: Posterior standard error corresponding to bayesian_ols_coef_outer_normed.
bayesian_ols_tstat_outer_normed: T-statistic for bayesian_ols_coef_outer_normed.
bayesian_ols_coef_inner_normed: Posterior mean of the OLS coefficient under MCMC,
after normalizing each posterior draw individually.
bayesian_ols_se_inner_normed: Posterior standard error corresponding to bayesian_ols_coef_inner_normed.
bayesian_ols_tstat_inner_normed: T-statistic for bayesian_ols_coef_inner_normed.
m_stage_1_erv: Extreme robustness value (ERV) for the first-stage regression
(x_est2 on x_est1), if computed.
m_reduced_erv: ERV for the reduced model (Y on x_est1), if computed.
x_est1: First set of latent variable estimates.
x_est2: Second set of latent variable estimates.
A vector of observed outcome variables
A matrix of observable indicators used to estimate the latent variable
A vector specifying groupings for the observable indicators. Default is column names of observables.
(optional) A numeric vector of length equal to the number of columns in `observables`, containing 1 or -1 to indicate the desired orientation of each column. If provided, each column of `observables` will be oriented by this sign before analysis. Default is NULL (no orientation applied).
Logical. If TRUE, creates dummy variables for each level of the observable indicators. Default is FALSE.
Integer. Number of bootstrap iterations. Default is 32.
Integer. Number of partitions for each bootstrap iteration. Default is 10.
Vector of indices or grouping variable for stratified bootstrap. Default is 1:length(Y).
Logical. If TRUE, returns intermediate results. Default is TRUE.
Logical indicating whether the observable indicators are ordinal (TRUE) or binary (FALSE).
Character specifying the estimation approach. Options include:
"em" (default): Uses expectation-maximization via emIRT package. Supports both binary (via emIRT::binIRT) and ordinal (via emIRT::ordIRT) indicators.
"pca": First principal component of observables.
"averaging": Uses feature averaging.
"mcmc": Markov Chain Monte Carlo estimation using either pscl::ideal (R backend) or numpyro (Python backend)
"mcmc_joint": Joint Bayesian model that simultaneously estimates latent variables and outcome relationship using numpyro
"mcmc_overimputation": Two-stage MCMC approach with measurement error correction via over-imputation
"custom": In this case, latent estimation performed using latent_estimation_fn.
Custom function for estimating latent trait from observables if estimation_method="custom" (optional). The function should accept a matrix of observables (rows are observations) and return a numeric vector of length equal to the number of observations.
A list indicating parameter specifications if MCMC used.
backendCharacter string indicating the MCMC engine to use.
Valid options are "pscl" (default, uses the R-based pscl::ideal function)
or "numpyro" (uses the Python numpyro package via reticulate).
n_samples_warmupInteger specifying the number of warm-up (burn-in)
iterations before samples are collected. Default is 500.
n_samples_mcmcInteger specifying the number of post-warmup MCMC
iterations to retain. Default is 1000.
chain_methodCharacter string passed to numpyro specifying how to run
multiple chains. Options: "parallel" (default), "sequential",
or "vectorized".
n_thin_byInteger indicating the thinning factor for MCMC samples.
Default is 1.
n_chainsInteger specifying the number of parallel MCMC chains to run.
Default is 2.
A character string specifying the name of the conda environment to use
via reticulate. Default is "lpmec".
A logical indicating whether the specified conda environment
must be strictly used. If TRUE, an error is thrown if the environment is not found.
Default is FALSE.
This function implements a bootstrapped latent variable analysis with measurement error correction.
It performs multiple bootstrap iterations, each with multiple partitions. For each partition,
it calls the lpmec_onerun function to estimate latent variables and apply various correction methods.
The results are then aggregated across partitions and bootstrap iterations to produce final estimates
and bootstrap standard errors.
Jerzak, C. T. and Jessee, S. A. (2025). Attenuation Bias with Latent Predictors. arXiv:2507.22218 [stat.AP]. https://arxiv.org/abs/2507.22218
# \donttest{
# Generate some example data
set.seed(123)
Y <- rnorm(1000)
observables <- as.data.frame(matrix(sample(c(0,1), 1000*10, replace = TRUE), ncol = 10))
# Run the bootstrapped analysis
results <- lpmec(Y = Y,
observables = observables,
n_boot = 10, # small values for illustration only
n_partition = 5 # small for size
)
# View the corrected IV coefficient and its standard error
print(results)
# }
Run the code above in your browser using DataLab