Learn R Programming

paneltests (version 1.0.5)

xtmispanel: Missing Data Detection and Imputation for Panel Data

Description

Detects, diagnoses, and imputes missing values in panel (longitudinal) data sets. The function can produce summary tables (Module 1), test the missingness mechanism (Module 2), impute a target variable (Module 3), and run a cross-method sensitivity analysis (Module 4).

Usage

xtmispanel(
  data,
  vars = NULL,
  index,
  detect = TRUE,
  test = FALSE,
  impute = NULL,
  target = NULL,
  new_var = NULL,
  sensitivity = FALSE,
  knn_k = 5L
)

Value

A list (invisibly) with components:

detect

Summary statistics per variable/panel/period.

test

MCAR and MAR test results.

imputed

The data frame augmented with the imputed column (when imputation is requested).

impute_stats

Summary comparing original vs imputed.

sensitivity

Sensitivity analysis results.

Arguments

data

A data.frame in long format.

vars

Character vector of variable names to analyse. If NULL (default), all numeric columns except the index are used.

index

Character vector of length 2: c("panel_id", "time_id").

detect

Logical. Run Module 1 (detection tables, default TRUE).

test

Logical. Run Module 2 (MCAR/MAR mechanism tests, default FALSE).

impute

Character or NULL. If a method name is given, run Module 3 (imputation). Supported methods: "mean", "median", "locf", "nocb", "linear", "spline", "pmm", "hotdeck", "knn", "rf", "em".

target

Character. Name of the variable to impute (required when impute is not NULL).

new_var

Character. Name of the output imputed variable (default "\{target\}_imp").

sensitivity

Logical. Run Module 4 (sensitivity analysis across all imputation methods, default FALSE).

knn_k

Integer. Number of neighbours for KNN imputation (default 5).

References

Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83(404), 1198-1202. tools:::Rd_expr_doi("10.1080/01621459.1988.10478714")

Examples

Run this code
set.seed(1)
df <- data.frame(
  id   = rep(1:4, each = 8),
  time = rep(1:8, times = 4),
  y    = c(rnorm(32))
)
# introduce some NAs
df$y[c(3, 11, 20)] <- NA
res <- xtmispanel(df, vars = "y", index = c("id", "time"), detect = TRUE)

Run the code above in your browser using DataLab