lpmec_onerun: lpmec_onerun

Description

Implements analysis for latent variable models with measurement error correction

Usage

lpmec_onerun(
  Y,
  observables,
  observables_groupings = colnames(observables),
  make_observables_groupings = FALSE,
  estimation_method = "em",
  latent_estimation_fn = NULL,
  mcmc_control = list(backend = "pscl", n_samples_warmup = 500L, n_samples_mcmc = 1000L,
    batch_size = 512L, chain_method = "parallel", subsample_method = "full", n_thin_by =
    1L, n_chains = 2L),
  ordinal = FALSE,
  conda_env = "lpmec",
  conda_env_required = FALSE
)

Value

A list containing various estimates and statistics:

ols_coef: Coefficient from naive OLS regression
ols_se: Standard error of naive OLS coefficient
ols_tstat: T-statistic of naive OLS coefficient
iv_coef_a: IV coefficient using first split as instrument
iv_coef_b: IV coefficient using second split as instrument
iv_coef: Averaged IV coefficient from both splits
iv_se: Standard error of IV regression coefficient
iv_tstat: T-statistic of IV regression coefficient
corrected_iv_coef_a: Corrected IV coefficient using first split as instrument
corrected_iv_coef_b: Corrected IV coefficient using second split as instrument
corrected_iv_coef: Averaged corrected IV coefficient from both splits
corrected_iv_se: Standard error of corrected IV coefficient
corrected_iv_tstat: T-statistic of corrected IV coefficient
corrected_ols_coef_a: Corrected OLS coefficient using first split
corrected_ols_coef_b: Corrected OLS coefficient using second split
corrected_ols_coef: Averaged corrected OLS coefficient from both splits
corrected_ols_se: Standard error of corrected OLS coefficient (currently NA)
corrected_ols_tstat: T-statistic of corrected OLS coefficient (currently NA)
corrected_ols_coef_alt: Alternative corrected OLS coefficient (currently NA)
var_est_split: Estimated variance of the measurement error
x_est1: First set of latent variable estimates
x_est2: Second set of latent variable estimates

Arguments

Y

A vector of observed outcome variables

observables

A matrix of observable indicators used to estimate the latent variable

observables_groupings

A vector specifying groupings for the observable indicators. Default is column names of observables.

make_observables_groupings

Logical. If TRUE, creates dummy variables for each level of the observable indicators. Default is FALSE.

estimation_method

Character specifying the estimation approach. Options include:

"em" (default): Uses expectation-maximization via emIRT package. Supports both binary (via emIRT::binIRT) and ordinal (via emIRT::ordIRT) indicators.
"pca": First principal component of observables.
"averaging": Uses feature averaging.
"mcmc": Markov Chain Monte Carlo estimation using either pscl::ideal (R backend) or numpyro (Python backend)
"mcmc_joint": Joint Bayesian model that simultaneously estimates latent variables and outcome relationship using numpyro
"mcmc_overimputation": Two-stage MCMC approach with measurement error correction via over-imputation
"custom": In this case, latent estimation performed using latent_estimation_fn.

latent_estimation_fn

Custom function for estimating latent trait from observables if estimation_method="custom" (optional). The function should accept a matrix of observables (rows are observations) and return a numeric vector of length equal to the number of observations.

mcmc_control

A list indicating parameter specifications if MCMC used.

backend: Character string indicating the MCMC engine to use. Valid options are "pscl" (default, uses the R-based pscl::ideal function) or "numpyro" (uses the Python numpyro package via reticulate).

n_samples_warmup

Integer specifying the number of warm-up (burn-in) iterations before samples are collected. Default is 500.

n_samples_mcmc

Integer specifying the number of post-warmup MCMC iterations to retain. Default is 1000.

chain_method

Character string passed to numpyro specifying how to run multiple chains. Options: "parallel" (default), "sequential", or "vectorized".

n_thin_by

Integer indicating the thinning factor for MCMC samples. Default is 1.

n_chains

Integer specifying the number of parallel MCMC chains to run. Default is 2.

ordinal

Logical indicating whether the observable indicators are ordinal (TRUE) or binary (FALSE).

conda_env

A character string specifying the name of the conda environment to use via reticulate. Default is "lpmec".

conda_env_required

A logical indicating whether the specified conda environment must be strictly used. If TRUE, an error is thrown if the environment is not found. Default is FALSE.

Standard Errors

The following standard errors and t-statistics are currently returned as NA because their analytical derivation is not yet implemented:

corrected_ols_se: Standard error for the corrected OLS coefficient
corrected_ols_tstat: T-statistic for the corrected OLS coefficient
corrected_ols_coef_alt: Alternative corrected OLS coefficient

For inference on these quantities, use the bootstrap approach via lpmec, which provides valid confidence intervals and standard errors through resampling.

Details

This function implements a latent variable analysis with measurement error correction. It splits the observable indicators into two sets, estimates latent variables using each set, and then applies various correction methods including OLS correction and instrumental variable approaches.

Examples

Run this code

# \donttest{
# Generate some example data
set.seed(123)
Y <- rnorm(1000)
observables <- as.data.frame(matrix(sample(c(0,1), 1000*10, replace = TRUE), ncol = 10))

# Run the analysis
results <- lpmec_onerun(Y = Y,
                        observables = observables)

# View the corrected estimates
print(results)
# }

Run the code above in your browser using DataLab