Learn R Programming

SBMTrees (version 1.4)

apply_locf_nocb: Initialize Missing Values using LOCF and NOCB

Description

Imputes missing values in longitudinal data using a hierarchical three-step strategy to ensure complete data for model initialization. The process prioritizes within-subject information using Last Observation Carried Forward (LOCF) and Next Observation Carried Backward (NOCB), falling back to cross-sectional summary statistics (mean or mode) only when a subject has absolutely no observed data for a specific variable.

Usage

apply_locf_nocb(X, subject_id, is_binary)

Value

A data.frame with the same dimensions as X but with all missing values imputed.

Arguments

X

A data.frame or matrix containing the variables to be imputed. Columns correspond to variables.

subject_id

A vector of subject identifiers with length equal to nrow(X).

is_binary

A vector of length ncol(X) indicating the type of each variable. Values can be TRUE/1 (for binary variables) or FALSE/0 (for continuous variables).

Details

Pre-requisite: The rows of X must be ordered by time within each subject prior to calling this function.

The imputation proceeds in three specific stages:

  1. Subject-wise LOCF: For each subject, missing values are filled using the immediately preceding observed value (forward fill). This handles gaps in the middle or end of a subject's timeline.

  2. Subject-wise NOCB: For each subject, any remaining missing values (typically at the start of the timeline, before the first observation) are filled using the next available observed value (backward fill).

  3. Global Fallback: If a subject has no observed data for a specific variable (i.e., the entire column is NA for that subject_id), the function imputes these values using the global statistics calculated from the rest of the population:

    • Continuous variables: Imputed with the global mean.

    • Binary variables: Imputed with the global mode (ties default to 0).

Examples

Run this code
# Create a toy dataset with missing values
X <- data.frame(
  cont = c(NA, 5, NA,   NA, NA, NA),  # Subj 1: Gap/Lead/Trail, Subj 2: All NA
  bin  = c(0, NA, 1,    1, 1, 0)      # Subj 1: Gap,            Subj 2: Complete
)
subject_id <- c(1, 1, 1, 2, 2, 2)
is_binary <- c(FALSE, TRUE)

# Run imputation
X_imputed <- apply_locf_nocb(X, subject_id, is_binary)

Run the code above in your browser using DataLab