Learn R Programming

SBMTrees (version 1.4)

sequential_imputation: Longitudinal Sequential Imputation for Longitudinal Missing Data

Description

Implements sequential imputation for missing covariates and outcomes in longitudinal data. The function uses a Bayesian non-parametric framework with mixed-effects models to handle both normal and non-normal random effects and errors. It sequentially imputes missing values by constructing univariate models in a fixed order, initializing with LOCF/NOCB, and ensuring consistency with a valid joint distribution.

Usage

sequential_imputation(
  X,
  Y,
  Z = NULL,
  subject_id,
  type,
  binary_outcome = FALSE,
  model = c("BMTrees", "BMTrees_R", "BMTrees_RE", "mixedBART"),
  outcome_model = c("BMTrees", "BMLM"),
  nburn = 0L,
  npost = 3L,
  skip = 1L,
  verbose = TRUE,
  seed = NULL,
  tol = 1e-20,
  k = 2,
  ntrees = 200,
  reordering = TRUE,
  pi_DP = 0.99
)

Value

A list containing:

imputed_data

A three-dimensional array of imputed data with dimensions (npost / skip, N, p + 1), where N is the number of observations and p is the number of covariates. The last column represents the outcome Y.

posterior_sigma

(Only if outcome_model = "BMLM") A vector of posterior samples for the error standard deviation.

posterior_beta

(Only if outcome_model = "BMLM") A matrix of posterior samples for the regression coefficients.

Arguments

X

A matrix of missing covariates.

Y

A vector of missing outcomes (numeric or logical).

Z

A matrix of complete random predictors. Default: NULL.

subject_id

A vector of subject IDs corresponding to the rows of X and Y. Can be integer, factor, or character.

type

A vector indicating whether each covariate in X is binary (1) or continuous (0).

binary_outcome

A logical value indicating whether the outcome Y is binary. Default: FALSE.

model

A character vector specifying the imputation model for the covariates. Options are "BMTrees" (default), "BMTrees_R" (residual DP), "BMTrees_RE" (random effect DP), and "mixedBART".

outcome_model

A character vector specifying the model used for the outcome. Options are "BMTrees" (default) or "BMLM" (Bayesian Mixed Linear Model). If "BMLM" is selected, posterior estimates for beta and sigma are returned.

nburn

An integer specifying the number of burn-in iterations. Default: 0.

npost

An integer specifying the number of sampling iterations. Default: 3.

skip

An integer specifying the interval for keeping samples in the sampling phase. Default: 1.

verbose

A logical value indicating whether to display progress and MCMC information. Default: TRUE.

seed

A random seed for reproducibility. Default: NULL.

tol

A small numerical tolerance to prevent numerical overflow or underflow in the model. Default: 1e-20.

k

A numeric value for the BART prior parameter controlling the standard deviation of the terminal node values. Default: 2.0.

ntrees

An integer specifying the number of trees in BART. Default: 200.

reordering

A logical value indicating whether to apply a reordering strategy for sorting covariates based on missingness. Default: TRUE.

pi_DP

A value between 0 and 1 for calculating the empirical prior in the DP prior. Default: 0.99.

Details

The function builds on the Bayesian Trees Mixed-Effects Model (BMTrees), which extends Mixed-Effects BART by using centralized Dirichlet Process Normal Mixture priors. This framework handles non-normal random effects and errors, addresses model misspecification, and captures complex relationships.

The algorithm initializes missing values using Last Observation Carried Forward (LOCF) and Next Observation Carried Backward (NOCB) before starting the MCMC sequential imputation process.

References

For more information about the original BART3 package, see: https://github.com/rsparapa/bnptools/tree/master/BART3

Examples

Run this code
data <- simulation_imputation(NNY = TRUE, NNX = TRUE, n_subject = 10, seed = 123)
BMTrees <- sequential_imputation(X = data$data_M[,3:5], Y = data$data_M$Y, Z = data$Z,
  subject_id = data$data_M$subject_id, type = c(0, 0, 0),
  outcome_model = "BMLM", binary_outcome = FALSE, model = "BMTrees", nburn = 0,
  npost = 1, skip = 1, verbose = FALSE, seed = 123)

# Access imputed data
dim(BMTrees$imputed_data)

Run the code above in your browser using DataLab