Implements analysis for latent variable models with measurement error correction
lpmec_onerun(
Y,
observables,
observables_groupings = colnames(observables),
make_observables_groupings = FALSE,
estimation_method = "em",
latent_estimation_fn = NULL,
mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L,
batch_size = 512L, chain_method = "parallel", subsample_method = "full", n_thin_by =
1L, n_chains = 2L),
ordinal = FALSE,
conda_env = "lpmec",
conda_env_required = FALSE
)A list containing various estimates and statistics:
ols_coef: Coefficient from naive OLS regression
ols_se: Standard error of naive OLS coefficient
ols_tstat: T-statistic of naive OLS coefficient
iv_coef_a: IV coefficient using first split as instrument
iv_coef_b: IV coefficient using second split as instrument
iv_coef: Averaged IV coefficient from both splits
iv_se: Standard error of IV regression coefficient
iv_tstat: T-statistic of IV regression coefficient
corrected_iv_coef_a: Corrected IV coefficient using first split as instrument
corrected_iv_coef_b: Corrected IV coefficient using second split as instrument
corrected_iv_coef: Averaged corrected IV coefficient from both splits
corrected_iv_se: Standard error of corrected IV coefficient
corrected_iv_tstat: T-statistic of corrected IV coefficient
corrected_ols_coef_a: Corrected OLS coefficient using first split
corrected_ols_coef_b: Corrected OLS coefficient using second split
corrected_ols_coef: Averaged corrected OLS coefficient from both splits
corrected_ols_se: Standard error of corrected OLS coefficient (currently NA)
corrected_ols_tstat: T-statistic of corrected OLS coefficient (currently NA)
corrected_ols_coef_alt: Alternative corrected OLS coefficient (currently NA)
var_est_split: Estimated variance of the measurement error
x_est1: First set of latent variable estimates
x_est2: Second set of latent variable estimates
A vector of observed outcome variables
A matrix of observable indicators used to estimate the latent variable
A vector specifying groupings for the observable indicators. Default is column names of observables.
Logical. If TRUE, creates dummy variables for each level of the observable indicators. Default is FALSE.
Character specifying the estimation approach. Options include:
"em" (default): Uses expectation-maximization via emIRT package. Supports both binary (via emIRT::binIRT) and ordinal (via emIRT::ordIRT) indicators.
"pca": First principal component of observables.
"averaging": Uses feature averaging.
"mcmc": Markov Chain Monte Carlo estimation using either pscl::ideal (R backend) or numpyro (Python backend)
"mcmc_joint": Joint Bayesian model that simultaneously estimates latent variables and outcome relationship using numpyro
"mcmc_overimputation": Two-stage MCMC approach with measurement error correction via over-imputation
"custom": In this case, latent estimation performed using latent_estimation_fn.
Custom function for estimating latent trait from observables if estimation_method="custom" (optional). The function should accept a matrix of observables (rows are observations) and return a numeric vector of length equal to the number of observations.
A list indicating parameter specifications if MCMC used.
backendCharacter string indicating the MCMC engine to use.
Valid options are "pscl" (default, uses the R-based pscl::ideal function)
or "numpyro" (uses the Python numpyro package via reticulate).
n_samples_warmupInteger specifying the number of warm-up (burn-in)
iterations before samples are collected. Default is 500.
n_samples_mcmcInteger specifying the number of post-warmup MCMC
iterations to retain. Default is 1000.
chain_methodCharacter string passed to numpyro specifying how to run
multiple chains. Options: "parallel" (default), "sequential",
or "vectorized".
n_thin_byInteger indicating the thinning factor for MCMC samples.
Default is 1.
n_chainsInteger specifying the number of parallel MCMC chains to run.
Default is 2.
Logical indicating whether the observable indicators are ordinal (TRUE) or binary (FALSE).
A character string specifying the name of the conda environment to use
via reticulate. Default is "lpmec".
A logical indicating whether the specified conda environment
must be strictly used. If TRUE, an error is thrown if the environment is not found.
Default is FALSE.
The following standard errors and t-statistics are currently returned as NA because
their analytical derivation is not yet implemented:
corrected_ols_se: Standard error for the corrected OLS coefficient
corrected_ols_tstat: T-statistic for the corrected OLS coefficient
corrected_ols_coef_alt: Alternative corrected OLS coefficient
For inference on these quantities, use the bootstrap approach via lpmec, which
provides valid confidence intervals and standard errors through resampling.
This function implements a latent variable analysis with measurement error correction. It splits the observable indicators into two sets, estimates latent variables using each set, and then applies various correction methods including OLS correction and instrumental variable approaches.
# \donttest{
# Generate some example data
set.seed(123)
Y <- rnorm(1000)
observables <- as.data.frame(matrix(sample(c(0,1), 1000*10, replace = TRUE), ncol = 10))
# Run the analysis
results <- lpmec_onerun(Y = Y,
observables = observables)
# View the corrected estimates
print(results)
# }
Run the code above in your browser using DataLab