The JointAI package performs simultaneous imputation and inference for incomplete data using the Bayesian framework. Distributions of incomplete variables, conditional on other covariates, are specified automatically and modeled jointly with the analysis model. MCMC sampling is performed in 'JAGS' via the R package rjags.
The package has the following main functions that allow analysis in different settings:
lm_imp for linear regression
glm_imp for generalized linear regression
clm_imp for (ordinal) cumulative logit models
lme_imp for linear mixed models
glme_imp for generalized linear mixed models
clmm_imp for (ordinal) cumulative logit mixed models
survreg_imp for parametric (Weibull) survival models
coxph_imp for Cox proportional hazard models
As far as possible, the specification of these functions is analogue to the
specification of their complete data versions
lm, glm,
clm (from the package ordinal),
lme (from the package nlme),
clmm2 (from the package ordinal),
survreg (from the package survival) and
coxph (from the package survival).
Computations can be performed in parallel using the argument parallel = TRUE,
the argument ridge allows the user to impose a ridge penalty on the
regression coefficients of the analysis model, and hyperparameters can be
changed via the argument hyperpars and the function default_hyperpars.
Results can be summarized and printed with summary(),
coef() and confint(),
and visualized using
traceplot() or densplot().
The function predict() allows prediction (including credible intervals)
from JointAI models.
Two criteria for evaluation of convergence and precision of the posterior estimate are available:
GR_crit implements the Gelman-Rubin criterion
('potential scale reduction factor') for convergence
MC_error calculates the Monte Carlo error to evaluate
the precision of the MCMC sample
Imputed data can be extracted (and exported to SPSS) using
get_MIdat().
The function plot_imp_distr() allows
visual comparison of the distribution of observed and imputed values.
parameters and list_models to gain
insight in the specified model
plot_all and md_pattern to visualize the
distribution of the data and the missing data pattern
The following vignettes are available
Minimal Example:
A minimal example demonstrating the use of
lm_imp,
summary.JointAI,
traceplot
and densplot.
Visualizing Incomplete Data:
Demonstrations of the options in plot_all (plotting histograms
and barplots for all variables in the data) and md_pattern
(plotting or printing the missing data pattern).
Model Specification:
Explanation and demonstration of all parameters that are required or optional
to specify the model structure in lm_imp, glm_imp
and lme_imp.
Among others, the functions parameters, list_models,
get_models and set_refcat are used.
Parameter Selection:
Examples on how to select the parameters/variables/nodes
to follow using the argument monitor_params
and the parameters/variables/nodes displayed
in the summary, traceplot,
densplot or when using
GR_crit or MC_error.
MCMC Settings:
Examples demonstrating how to set the arguments controlling settings of the MCMC sampling,
i.e., n.adapt, n.iter, n.chains, thin, inits.
After Fitting:
Examples on the use of functions to be applied after the model has been fitted,
including traceplot, densplot, summary,
GR_crit, MC_error, predict,
predDF and get_MIdat.
Theoretical Background: Explanation of the statistical method implemented in JointAI.
Nicole S. Erler, Dimitris Rizopoulos and Emmanuel M.E.H. Lesaffre (2019). JointAI: Joint Analysis and Imputation of Incomplete Data in R. arXiv e-prints, arXiv:1907.10867. URL https://arxiv.org/abs/1907.10867.
Erler, N.S., Rizopoulos, D., Rosmalen, J., Jaddoe, V.W.V., Franco, O. H., & Lesaffre, E.M.E.H. (2016). Dealing with missing covariates in epidemiologic studies: A comparison between multiple imputation and a full Bayesian approach. Statistics in Medicine, 35(17), 2955-2974. doi: 10.1002/sim.6944
Erler, N.S., Rizopoulos D., Jaddoe, V.W.V., Franco, O.H. & Lesaffre, E.M.E.H. (2019). Bayesian imputation of time-varying covariates in linear mixed models. Statistical Methods in Medical Research, 28(2), 555<U+2013>568. doi: 10.1177/0962280217730851